ryan/anki-wiktionary-english-dictionary.html.json at main · anil.recoil.org/thicket-eeg

Thicket data repository for the EEG
thicket-eeg / ryan / anki-wiktionary-english-dictionary.html.json
at main 5.3 kB view raw
 1{
 2  "id": "https://ryan.freumh.org/anki-wiktionary-english-dictionary.html",
 3  "title": "Expanding My Vocabulary to a Million Words",
 4  "link": "https://ryan.freumh.org/anki-wiktionary-english-dictionary.html",
 5  "updated": "2025-07-16T00:00:00",
 6  "published": "2025-07-16T00:00:00",
 7  "summary": "<div>\n        \n        <span>Published 16 Jul 2025.</span>\n        \n        \n    </div>\n    \n        <div> Tags: <a href=\"/technology.html\" title=\"All pages tagged 'technology'.\">technology</a>. </div>\n    \n    \n\n        <p><span>I often find myself coming across new words that I\nlook up in a dictionary and promptly forget about. I’ve been using Anki\nto learn Mandarin with my <a href=\"https://github.com/RyanGibb/anki-hsk-strokes/\">HSK stroke\norder</a> deck, and I want an easy way to use the same approach for\nEnglish. Existing decks I found were two small (didn’t contain words I\nwanted to learn) and lacked detail (I find the etymology very handy in\nunderstanding the meaning of words), so I decided to make my\nown.</span></p>\n<p><span>Wiktionary is a collaborative dictionary with\nincredibly detailed entries for 1.2+ million English words. The data is\nfreely available from <a href=\"https://kaikki.org\">kaikki.org</a> under\nCC BY-SA 4.0 and GFDL licenses in a raw JSONL format. I’ve written <a href=\"https://github.com/RyanGibb/anki-wiktionary-english-dictionary\">anki-wiktionary-english-dictionary</a>\nto transform this data into Anki flashcards. Each card includes\ndefinitions, IPA pronunciation, etymology, audio pronunciation, word\nforms, hyphenation (for syllable breaks). I’ve taken the top 500K words\nfrom Wiktionary according to Google Book’s <a href=\"https://storage.googleapis.com/books/ngrams/books/datasetsv3.html\">ngram\nviewer dataset</a>. You can download the deck from <a href=\"https://ankiweb.net/shared/info/1140417632\">AnkiWeb</a> if you\ndon’t want to build it yourself. This code should also be useful in\ndoing the same for other languages, or adding cross-language decks with\nWiktionary’s translation data.</span></p>\n\n\n<img src=\"./images/anki.png\">\n\nAn example Anki card for <a href=\"https://en.wiktionary.org/wiki/anathema#English\">Anathema</a>\n\n<p><span>After discovering <a href=\"https://en.m.wiktionary.org/wiki/homoiconicity#English\">homoiconicity</a>\nwas in the top 800K, I imported another 500K words which brought me\nabove the free sync server’s limit of 500MB (and to the clickbait\ntitle), so I <a href=\"https://github.com/RyanGibb/nixos/commit/74d478b5abd8a5d4b410bdb0566b34554c87d08b\">deployed</a>\nmy own sync server.</span></p>\n<p><span><em>Now if you’ll excuse me, I have a few words to\nlearn…</em></span></p>",
 8  "content": "<div>\n        \n        <span>Published 16 Jul 2025.</span>\n        \n        \n    </div>\n    \n        <div> Tags: <a href=\"/technology.html\" title=\"All pages tagged 'technology'.\">technology</a>. </div>\n    \n    \n\n        <p><span>I often find myself coming across new words that I\nlook up in a dictionary and promptly forget about. I’ve been using Anki\nto learn Mandarin with my <a href=\"https://github.com/RyanGibb/anki-hsk-strokes/\">HSK stroke\norder</a> deck, and I want an easy way to use the same approach for\nEnglish. Existing decks I found were two small (didn’t contain words I\nwanted to learn) and lacked detail (I find the etymology very handy in\nunderstanding the meaning of words), so I decided to make my\nown.</span></p>\n<p><span>Wiktionary is a collaborative dictionary with\nincredibly detailed entries for 1.2+ million English words. The data is\nfreely available from <a href=\"https://kaikki.org\">kaikki.org</a> under\nCC BY-SA 4.0 and GFDL licenses in a raw JSONL format. I’ve written <a href=\"https://github.com/RyanGibb/anki-wiktionary-english-dictionary\">anki-wiktionary-english-dictionary</a>\nto transform this data into Anki flashcards. Each card includes\ndefinitions, IPA pronunciation, etymology, audio pronunciation, word\nforms, hyphenation (for syllable breaks). I’ve taken the top 500K words\nfrom Wiktionary according to Google Book’s <a href=\"https://storage.googleapis.com/books/ngrams/books/datasetsv3.html\">ngram\nviewer dataset</a>. You can download the deck from <a href=\"https://ankiweb.net/shared/info/1140417632\">AnkiWeb</a> if you\ndon’t want to build it yourself. This code should also be useful in\ndoing the same for other languages, or adding cross-language decks with\nWiktionary’s translation data.</span></p>\n\n\n<img src=\"./images/anki.png\">\n\nAn example Anki card for <a href=\"https://en.wiktionary.org/wiki/anathema#English\">Anathema</a>\n\n<p><span>After discovering <a href=\"https://en.m.wiktionary.org/wiki/homoiconicity#English\">homoiconicity</a>\nwas in the top 800K, I imported another 500K words which brought me\nabove the free sync server’s limit of 500MB (and to the clickbait\ntitle), so I <a href=\"https://github.com/RyanGibb/nixos/commit/74d478b5abd8a5d4b410bdb0566b34554c87d08b\">deployed</a>\nmy own sync server.</span></p>\n<p><span><em>Now if you’ll excuse me, I have a few words to\nlearn…</em></span></p>",
 9  "content_type": "html",
10  "categories": [],
11  "source": "https://ryan.freumh.org/atom.xml"
12}