« Check This Out Too | Hoot With a Ribbon, MDI, and Inheritance » |
Scrabble Playability in Hoot
Continuous Improvement
In developing the word study program Hoot, I'm constantly looking at ways to improve it. In many work environment this is called Continuous Improvement (CI). I've included most any search you would want to do. One thing I noticed that similar programs provided in the results was a playability index, so I decided I should provide a comparable feature.
True Playability
As I investigated, I discovered that the playability index was not really playability, but more of an affordability index representing value of certain plays. Although a good thing, I decided to get a more genuine indicator of playability, that is, how able you are to play a certain word in a Scrabble. Of course, this is an estimate since the board's status is constantly changing. True playability depends on the words already played and how open the board is. The first word is always 100% playable. But, I can estimate based on the total number of plays possible without those restraints.
One indicator of how playable a word is would involve how many different places can you play a word and how does that compare with other words. Except for first move, play must connect with a previous play, so I need to determine how many words our word can connect with. While that is a time-consuming project, with computers, I can do that. It's not an easy task, but with computers, it's not impossible.
Problem Definition
Following is my approach to the problem of finding a word's playability. Words can be played in basicly six different ways in Scrabble.
- Hooks
- Hooking
- Parallel
- Play through
- Extensions
- Bridges (vertical and horizontal)
For our discussion, our primary word will be TALK.
Hooks
Hooks are the letters that can be played on the beginning or end of our word. Since they depend on an additional letter, I need to factor in the probability of seeing that letter on the board. The easiest way would be to calculate the probability of drawing that letter. Looking at the word in Hoot we see
s TALK sy
Thus, TALK can be played after an S to form STALK, or before either an S or a Y to form TALKS or TALKY.
The probability of drawing an S is 4/100, or 1/25, or .04. Instead of looking at all words, to be conservative I only consider words with that letter as a hook.
Hooking
Hooking is a different concept. This measures how easy it is to play our word next to another word with one of our word's letters acting as a hook. TALK can hook onto BAR to form either BARK or KBAR. "Hooking" doesn't consider the probability of that letter being drawn since we already have it, but it does depend on the ability to hook it onto an existing word. For that, we can calculate the relative number of words that have that letter as a hook. In all there are 314 words in TWL98 that take K as a hook. Thus, we determine how many words have either T, A, L, or K as a hook?
Parallel
Then there's parallel plays. A parallel play is a play made parallel to an existing word. Yes, Hoot can determine that. We can count all words, but for practical reasons, I have some rules for what qualifies as a parallel play.
The parallel word must match all but one letter in our word.
The parallel word is limited to 5 letters, or 1 more than our word.
TALK can be played parallel to AMIA, forming AT, MA, LA, and KA. There is also almost 2000 other words that can play over 3 or more letters in the word.
Play Through
Play through is simply playing the letters through an existing word to form another word. In actual play, we could play the letters in a different order, thus TILAK. Although we can calculate the number of blank anagrams made from the letters in our word, I'm disregarding play throughs because that forms a different word. That's also why I don't consider anagrams of our word in the calculations. Each anagram will have its own playablitity rating.
Extensions
Extensions are plays made at the beginning or end or a word on the board. To find those words we don't look at words beginning with our word. Instead we have to look at words that take our word as a prefix or suffix. We look at those in the Query for words that Take Prefix/Suffix. One is CORNS (CORNS+TALK). In TWL98, there is 11, even though 28 words in the lexicon begin or end with TALK.
Bridging
Bridging involves inserting our word between two other words. Yes, you can find those with Hoot. You can use the Letter Studies search and enter our word, then in the custom box select Inserts.
That will give you the sole word,
DEERS - TALK - ERS
Bridges can also be used vertically, that is playing our word between two other words with one of its letters acting as a bridge. For example, using a Hoot search I find that the T can be inserted between the words LAVA and ION to form LAVA-T-ION. Just for the letter T there's another 2800 sets of words where that is possible. Again, this aspect will be ignored since it is difficult to calculate the likelihood that the words appear on the board in a playable manner. Not commonly encountered. Some are in extensions.
That leaves me with
- Playability of a word/letter as a hook, or hooking.
- Playability of a word as an extension.
- Playability of a word as a parallel play.
Considering the likelihood that a play would be made in one of these ways, that is a reasonable set of parameters. But, there's more to consider than these counts.
Initial Playability
The words we are playing on or against have their own playability. How likely is it that these words will show up on the board in the first place. That is, we need to give more consideration to words that we are likely to encounter on the board. Thus, for the word AMIA, we count that word and then multiply by the playability rating/100. If playability is 31, the value associated with that word is 31/100. Unfortunately, we don't know that at the outset, since playability is what we are trying to determine. This is where we encounter circular logic. We can't calculate the playability of the first word until we have the playability of all other words.
One option is to use the ratings calculated by O'Laughlin as preliminary playability ratings. The likelihood of words being played there was based on the frequency the words have been played in the past. However, because the ratings are based on past performance and value rather than actual (best) playability, they don't reflect actual likelihood that the word has been played.
Instead need to establish a benchmark playability using probability as a preliminary marker, and after that I can then use the benchmark ratings to calculate a more accurate playability rating. Even though probability of drawing is not the same as playability, it is the best estimate we have. That means two passes. Calculate the calculate playability using the temporary playability of played words and then go through another iteration using those initial playability ratings.
What is the formula?
I can count all the different plays and adjust the counts using playability. However, even with playability rating adjustments, adding them all together will bias toward words with the most hooks. I need to weigh the importance of each type of play and for simplicity I do it evenly. After determining the results, I establish a rating for each type of play. Then I average the ratings, and establish new overall rating that will be their playability.
Rank
Zyzzyva ranks the words in order and gives them a number based on what place they finished. For 2 letter words, 1 to 105, for 3 letter words, 1 to 1000, etc. However, number 25 for a two letter word is a lot better than 25 on 3 letter word. In order to use those values the player must know how many words are in each set.
As in Hoot's relative probability ratings, I rank words 1 to 100, with 100 being the most playable. 25 is not necessarily 75 places down. It represents a comparison against the most playable. The 100 is 25 times more playable. In a sample set, for two letter words that means OF (rating 22) was actually 7 places from the bottom while WE (rating 50) may be 12 places from the bottom. As might be expected 2 letter words are more equally playable. As words get longer, fewer words may stand out as being playable.
Please wait...
Because of the number of words and the number of calculations, the process of calculating playability will involve days of processing. When I have completed the calculations, and made adjustments in the formula, I'll review the ratings. If you have any suggestions or comments, please use the contact form.