« Taxability of Home Sales | Check This Out Too » |
NAWL: The North American (Scrabble™) Word List
With the release of Hoot 2.0, a maturing word study program, I’m starting to feel pulled in another direction. I'd like to do a lot more with Hoot but one frustrating thing about developing Hoot was that the most used lexicons are “copyrighted” and it’s challenging to get permission to use them. That’s also frustrating from a player’s point of view since the tools available for word study are somewhat limited and the lists cannot be shared with prospective players. Some on-line games (Letterpress, Words with Friends) use a dictionary that is not available for study by players. Of course, that's just an indication that it's not a serious word game. (Yes, WWF uses Enable, but it adds words to it and omits some words)
Another shortcoming of current word lists is that they are not kept up to date. Every few years they do update it, but the set of North American lexicon publications is quite short. There’s TWL98, OWL2, and the current OWL3, or OTCWL2014 (and a very minor update in 2016). Alternate dictionaries ENABLE and YAWL haven't been updated in almost a decade or more.
So I'm thinking it may be time to devote some of my time developing a plan for a new Scrabble™ dictionary, more specifically a new word list for North American Scrabble which I would codename NAWL. Ideally it could be adopted by organizations and game programs such as NASPA, WGPO, Words with Friends, WordSmith, and possibly others. The latter two currently use the unrestricted word lists ENABLE and YAWL.
You can see a working site at http://www.tylerhosting.com/nawl/.
The NAWL Project
Similar word list projects have enjoyed some success. ENABLE is the primary word list for Words with Friends that was developed by Alan Beale in 1997. A revision was published in 2000 as ENABLE2K.
YAWL was first developed by Mendel Leo Cooper and has been adopted by WordSmith as their official lexicon. Although it has been “republished” at https://github.com/elasticdog/yawl, the last update I saw on the original site http://freecode.com/projects/yawl/ was in 2008.
I believe both these older word lists and the OTCWL should be replaced with a more current word list that is more freely available and that could gain more universal acceptance. TWL98 was sort of universal until the powers started cracking down on "illegal" distribution and use. Of course, I’m sure I will get the response “What's wrong with OTCWL2016?”
What's wrong with OTCWL2016?
Criteria
The OWL does go through systematic selection process that requires that a word be published in certain source dictionaries, but that criteria is a little restrictive. Current dictionaries are often slow in adding words. By the time you wait for the sources to update, and then wait for <current Scrabble organization> to update, there is a significant time lapse. The same is true for other dictionaries (ENABLE, YAWL) in use. That dictionary dependency also leads to total omission of commonly used terms. In some areas, regularly used terms are still not published in a dictionary.
For example: I started playing disc golf in 1996 or so. Even before that HYZER, ANHYZER, OVERSTABLE, and other terms were used regularly in the game. With over 5700 courses in North America disc golf has grown tremendously since then as well, with any respectable town having a course. Yet, those terms still do not appear in dictionaries and are still not valid in Scrabble. The term HYZER is so common in disc golf that some companies include the term in their name. Google the term HYZER and you get 355,000 results but it’s still not valid. Searching for TENAILS (a valid word) I found only 40,400 results.
Restriction
Another problem with OWL3 is the limited distribution. Because of the limited distribution it’s not the standard dictionary for Scrabble type games. There are different dictionaries for different games. They may have different opinions about what should be valid, but the main reason is probably that the most popular list is copyrighted so they have to find another source.
Of course, if NASPA were to change their policies and procedures so that their dictionary were more accurate (included more missed words), and could be distributed without restriction (or more relaxed restrictions), I would be all for it. The members of the NASPA dictionary committee http://www.scrabbleplayers.org/w/Dictionary_Committee are much more experienced about word sources than I am. As I mention later, a successful project would require a team to maintain it over the long term so it doesn’t go stale like ENABLE and YAWL.
What about CSW?
While CSW contains many words that are NA English, it is for English spoken outside of North America. It will probably always be the word source for Scrabble outside North America. It also has more relaxed criteria for word inclusion and they seem to be less paranoid about a digital form of the list being available to players. It could be updated more frequently but it has its place in Scrabble competitions.
Project Plans
NAWL would require some planning, and several phases. These are some of my notes, not a complete plan.
- Organizing/Managing the project goals and steps
- Organizing the word collection process
- Developing word selection criteria
Project Management
Develop Software
Software to manage the word collection and analysis would be one of the first tasks. It could be based on my word study program Hoot. In that case it would be easy to develop, manipulate, and integrate into Hoot development. Of course, other programmers could also create an app for that. I’m not sure I would want to be tied down to another piece of software. It would include word variations, definitions and source/explanation of selections.
Schedule
The next task would be to develop an update schedule. Ideally, an annual update would be used with possible completion by October 1 and an effective date of January 1. Players would then know what to expect and be more in-tune to the process. That might also help motivate involvement.
Publishing
Publishing a new dictionary may sound like a daunting task, but in this digital age, self publishing is quite easy. I'm not saying that because I've been in the printing industry for 35 years. Books can be printed on demand and don’t have to printed in bulk. It's just a list of words so not a lot of design work is needed. If adopted by organizations the required distribution channels would already be in place. Variations could also be published excluding offensive terms, or for long words such as in Super Scrabble. Of course, most serious players will be looking for digital sources instead so they would be made available to software developers.
Organize Word Collection
The grunt word in developing a word list would be word collection, though systematic processes can be developed to manage that as well.
Base Dictionary of common words
The first step would probably be developing a base dictionary of common words. One possible method would be to use a list that is common with two of three popular Scrabble dictionaries (OTCWL, CSW, YAWL).
Dictionary Survey – Compare with other word lists, dictionaries
The next process would be to evaluate all words unique to one of the sources. It is easy to get lists of unique words from digital sources. The OWL criteria dictionaries could also be included though it would take more time. The other dictionaries would provide a resource for words that have been overlooked in the other dictionaries. CSW could also be a good source. With a quick random look at CSW, I noticed the following words missing from OWL3, which deserve serious consideration.
POLISHABLE
POLYETHENE
POLYETHYLENE
POLYGENOUS
POLYCOTTON
Subject Survey
Another method for developing (and maintaining) a word list would be to create subject lists. Subject lists (or categories) would include the terminology in use in a certain field, sport, or activity. For example, all common golf terms should be valid words in Scrabble. CYCLOSPORIN is a valid word as is ACETAMINOPHEN and other drug names so drug names (not brand names) might be another category. I know many that are not valid.
Isolated Selections
Finally, individual (random) submissions would be considered, although such submissions would probably have already been included in one of subject lists and used to identify categories to include in subject surveys.
Develop Word Selection Guidelines
Of course, before beginning work, guidelines have to established to determine what to include/exclude in the list. For example, existence/absence from source dictionaries isn't automatic such as in the OWL.
Automatic inclusion
Automatic inclusions might include any word that is used in one of the Subject areas mentioned above. Other possible automatic inclusions might include any drug name, disease names, common names for plants and animals. Annual subject surveys will insure these words are promptly added.
Exclusions
Some exclusions would include proper names and abbreviations. If it’s pronounced using letters it would be an abbreviation. That’s why EMF is not in OWL any more, and why OMG shouldn't be.
Trouble areas
Obviously there are going to be trouble areas, such as when to include "foreign" words. Although foreign words are generally excluded, foreign currency isn’t. In some cases animals that are not common in North America are omitted, but I would think the use of it doesn’t constitute speaking a foreign word.
Criteria
For the questionalble words some other criteria to determine commonality might include Google search words, frequency in Google search and how long the word has been used.
Administration and Promotion
One of my biggest concerns is administration. I could just do it and release it (eventually), but there needs to be some organization in place that will continue to update the list. ENABLE was a good project that resulted in a useable word list but it hasn’t been updated and is quite out of date. Then there was YAWL, but there’s still no mechanics in place to keep it updated. There needs to be an independent volunteer organization in place to oversee future updates. What about NASPA? Who would want to volunteer to create a resource that a big business would claim as its own and restrict its use?
Who Owns Scrabble’s Word List?
New Scrabble Dictionary Disrepects The Game
I do have some ideas about organization, but with this issue in the air I’m still thinking about this project. I’m not quite committed to spending that much time on the project if it doesn’t get used. It could get adopted like ENABLE and YAWL, but it could be so much more. What do you think about it? Some of the words I've looked at are shown on my NAWL site.