Random Word Generator (β)

Few days ago, a new game, Genkai Shiritori Mobile (GSM), was released by Baton Co., Ltd. and a web media and YouTubers team QuizKnock. Shiritori is a traditional Japanese word game where each player says a new word that starts with the last letter (or rather kana) of the previous word. Genkai Shiritori is a game originated from QuizKnock where they added a few new rules on top of the simple Shiritori game, including:

After the Genkai Shiritori video series has gain popularity on YouTube, QuizKnock then modified the rules further and made into a mobile game. This article is introducing my analysis, attempts and thoughts on building a semi-automated AI (?) agent, which I later named it as Random Word Generator.

Analysis of the game

Similar to other popular games like Scrabble and Boggles, Shiritori strongly relies on dictionaries. Luckily, a developer of the game, @imadake398yen has revealed the base dictionary used in a tweet.

https://twitter.com/imadake398yen/status/1164291987169669120?s=09

https://twitter.com/imadake398yen/status/1164291987169669120?s=09

The article in the tweet has linked to a GitHub repository of a dictionary used for NLP, [mecab-ipadic-NEologd](<https://github.com/neologd/mecab-ipadic-neologd>). That repository is also included in the app’s credit section. Building the repository, we can get a bunch of dictionaries in CSV format. In this case, we are only interested in the type of words and pronunciations.

neologd/mecab-ipadic-neologd

For the common rule of Shiritori, only nouns are extracted from the dictionary. These words are then sent through a python script and doing some postprocessing:

Iterations

Iterations 0: Simple lookup script

At the beginning, I have only seen the iOS version of the app running on my iPad. I first chosen some of the noun word lists that seems more possible to be included in the true database, extract only the pronunciations out, sorted and removed duplicates for further processes.

cat Noun.csv Noun.adverbal.csv Noun.org.csv Noun.place.csv Noun.proper.csv Noun.verbal.csv mecab-user-dict-seed.* > words.csv
cat words.csv | cut -d ',' -f12 > ./words-kana

After that, a pretty simple script is written to go through each line in the file, convert katakana to hiragana, then add words into respective slots in the table. The table is really easy to build, in Python type hint notation, it’s a Dict[str, Dict[int, Tuple[str, ...]]]. The outer dict is for kana, that shall only include valid kana as a beginning kana for a word, this can filter out some other invalid characters such as alphabets and numbers (although that is not common in this dataset). The inner dict is for the length of word, per the rule of GSM, the only valid lengths are 2 to 7 and “8+” where the latter takes any word that’s 8 kana or longer. The reason why tuple is used against other common options like set and list is that tuple is more memory saving when loaded for query and at the same time valid for random choice method in the standard library.