I've been working on adding new words to my default dictionary. Rather, words that should be in there but aren't. Recently I added a whole load of proper names, however there are still plenty of regular words that the list should have.
So I went to Project Gutenberg, which stores a whole load of free-to-download eBooks. I picked out (pretty much at random) some books, as plain-text files, by searching for "adventure" and then "love" - ha!
For the curious, these were: King Solomon's Mines, The Adventures of Tom Sawyer, Night and Day, The Mysterious Island, and Pride & Prejudice.
(I could, of course, print out any of these as pocket-sized books, using K-Pad!)
Then, I filtered out all the words from each book, and ran them through my dictionary, to pick out the words it didn't recognize. Probably a couple of hundred in total. For example:
blockhouse
bloodcurdling
boatman
bookseller
bookstall
bothersome
Now, I'd always hyphenate blood-curdling, but many I suppose wouldn't. And there's no good excuse for not having words like "bothersome" in there. But, from the next version, it will be.
I might well do this for more books in future, a few at a time - or just write code to download a thousand Gutenberg books. From the list gathered, I still have to filter out strange words, typos and American spellings.
Anyway, the other way I find words is simply to use K-Pad to write. I find a lot of words-that-should-be-there that way. The latest batch includes:
git
entomologist
swimwear
Pelé
paraplegic
...and, ironically, "booklet"!
K-Pad is a multi-featured notepad / organizer for Windows: print-out pages or booklets of photos, tables, and rich text.More... | |
This is my support blog, featuring help, tutorials, and comment. Welcome. :-) |
Thursday, 16 October 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment