Skip to main content
  1. My Projects/

Yoji

·182 words·1 min·



ryancahildebrandt/yoji

Python
0
0


Wisdom in 4 Characters Or Less
#


Training a neural network to generate 四字熟語 (as best it can!)


Open in gitpod
This project contains 0% LLM-generated content

Purpose
#

A project to generate 四字熟語 (yoji-jukugo, 4 character Japanese idioms), using a sequential tensorflow model.


Dataset
#

The dataset used for the current project was scraped/pulled from the following:

  • Yojijukugo for idioms and meanings/readings
  • Jamdict for kanji readings, meanings, and other information
  • Kanji Database for kanji classification, grade level, and misc characteristics

Outputs
#

  • The main report, compiled with datapane and also in html format
  • The full yoji_df dataframe describing the idioms, their constituent kanji, and all additional characteristics from the data linked above
  • List of generated idioms, sans definitions and readings
  • The same list, expanded out to a dataframe including readings and meanings of constituent characters and bigrams

Update!
#

  • After sharing the initial project with some coworkers, it was suggested (by @DC & @JZ) that I retrain the model on bigrams within each idiom, as this more closely aligns with how yoji-jukugo are semantically divided and understood. I’ve updated the report linked above with some additional thoughts on the new model and its results!

Ryan Hildebrandt
Author
Ryan Hildebrandt
Data Scientist, etc.