K-Pad is a multi-featured notepad / organizer for Windows: print-out pages or booklets of photos, tables, and rich text.More...
This is my support blog, featuring help, tutorials, and comment. Welcome. :-)

Thursday 16 October 2008

A little help from Mark Twain

I've been working on adding new words to my default dictionary. Rather, words that should be in there but aren't. Recently I added a whole load of proper names, however there are still plenty of regular words that the list should have.

So I went to Project Gutenberg, which stores a whole load of free-to-download eBooks. I picked out (pretty much at random) some books, as plain-text files, by searching for "adventure" and then "love" - ha!

For the curious, these were: King Solomon's Mines, The Adventures of Tom Sawyer, Night and Day, The Mysterious Island, and Pride & Prejudice.

(I could, of course, print out any of these as pocket-sized books, using K-Pad!)

Then, I filtered out all the words from each book, and ran them through my dictionary, to pick out the words it didn't recognize. Probably a couple of hundred in total. For example:

blockhouse
bloodcurdling
boatman
bookseller
bookstall
bothersome

Now, I'd always hyphenate blood-curdling, but many I suppose wouldn't. And there's no good excuse for not having words like "bothersome" in there. But, from the next version, it will be.

I might well do this for more books in future, a few at a time - or just write code to download a thousand Gutenberg books. From the list gathered, I still have to filter out strange words, typos and American spellings.

Anyway, the other way I find words is simply to use K-Pad to write. I find a lot of words-that-should-be-there that way. The latest batch includes:

git
entomologist
swimwear
Pelé
paraplegic

...and, ironically, "booklet"!

No comments: