↓ Skip to main content

Hanakotoba

13 August 2022·164 words·1 min·

ryancahildebrandt/hanakotoba

Literature in Bloom
#

This project contains 0% LLM-generated content

Purpose
#

A project to explore 花言葉 (hanakotoba, lit. flower language) in Japanese and other literary corpora.

Dataset
#

The dataset used for the current project was pulled from the following:

Aozora Bunko Corpus for Japanese full text works
Hanakotoba for flower names, translations, and associated characteristics
Wikipedia for conversions of Japanese decimal classification codes (分類番号)
Wikipedia for a list of major Japanese eras (時代)
This page for a list of sub-eras (元年) Some of these didn’t end up being necessary for the main project but are included with the accompanying code for genre and date conversions

Outputs
#

The main report, compiled with datapane and also in html format
Historical era dataframe : Jidai.csv
Sub-era dataframe : Gannen.csv
Japanese genre code dataframe : Genres.csv
Dataframe of all flowers/plants and associated characteristics : Hk_df.csv
Dataframe with all text metainfo, calculated date columns, and tagged flower occurences with locations in the text : All_df.csv

Author

Ryan Hildebrandt

Data Scientist, etc.