So one of my favorite hobbies is playing Scrabble. Believe it or not there is a fairly large subculture of people who play this game very seriously. Yes, many of them are really dorky and have virtually no social skills, but there are some "normal" people too. Anyway, the game goes far beyond what most people have played casually. There are well over 100,000 words that serious players memorize, many of the words (I'd say roughly 75% of them) are words you would never know if you did not see them listed and study them. [/lame intro]Seen above is a screen capture of a program called ZYZZYVA (which incidentally is a word) that unscrambles words, creates quizzes, etc. One of the functions it has is determining probability, which is a great way to study. In other words, you are much better off studying words like AILERON or DARIOLE made with common letters with higher probability than words like FILIBEG or FUMULUS, at least initially. So what I was wondering is how does this program go about determining the probability of playing a word? Here is the letter distribution:At first I thought I could assign a probability to each letter (for instance E=12/100, G=3/100, Z=1/100) and then just multiply all these numbers together, take the inverse of the product, and whichever letter combination had the highest value would be the most probable. Well, just by using this method on a few different cases I found it to be an epic FAIL. I would personally think a word like BEEBEES would be near the top considering it has 4 E's, which appear at the highest probability, along with BBS, which are not totally uncommon. However BEEBEES is about 23,000 of 24,000 7-letter words.Being able to calculate this during a game would be an immense help. So if anyone out there could give me some help, I would be greatly appreciative.
3/18/2009 6:46:52 AM
I just woke up, but let me make one comment.Take a word like ZEE (if it is a word, I only use it cause it's the two letters you give us).One thing I noticed is the probability by your method would not be(1/100)*(12/100)*(12/100) it would be(1/100)*(12/99)*(11/98)Because once you get to your e (12/99), you've already pulled out one letter and when you get to your second e you've already pulled out 2 letters, one which is an e.[Edited on March 18, 2009 at 8:11 AM. Reason : ]
3/18/2009 8:08:39 AM
I agree that I didn't take into account the fewer number of remaining tiles, but I don't believe that would effect the results because it's a common error across the board.
3/19/2009 1:04:21 AM
you'd be surprised...
3/19/2009 10:06:09 PM
it especially matters when letters already have low probabilities. For instance, if b was 2/100, well, once you use the first, the probability of getting the second is nearly cut in half.
3/20/2009 11:36:06 PM
this is why scrabble is bullshit and i can never winhahagod i hate this game, but i love it and no, i dont have a clue what u talkin bout...but im feelin it man
3/22/2009 1:53:41 AM
If you alone were to draw letters from an initially full set, the probability of drawing BEEBEES, in that order, is:(2/100)*(12/99)*(11/98)*(1/97)*(10/96)*(9/95)*(4/94) = 1.2E-9However, there are multiple orders in which the letters can be drawn. Also, you can draw 'different' letters; e.g., a different set of 4 E's than you drew the previous time. Each individual scenario that gives the neccessary letters must be taken into account to get a 'true' probability of a particular word.I'm sure ZYZZYVA is making assumptions about how and when the letters are drawn. If you want to duplicate ZYZZYVA probabilities, you're going to need to know those assumptions.[Edited on March 22, 2009 at 1:04 PM. Reason : ]
3/22/2009 12:45:44 PM
I'll stick to playing dumbass slutbags on lexulous on facebook..
3/24/2009 12:06:35 AM
The official Scrabble dictionary is filled with such bs words.
3/24/2009 4:41:59 AM
3/24/2009 11:07:28 AM
...but you're choosing 4 out of 12 E's. There are 495 ways to do that.Ignoring the blank tiles and assuming you're simply pulling tiles from a bag:[ C(2, 2) * C(12,4) * C(4,1) ] / C(100,7)[ 1 * 495 * 4 ] / 1.60E101980 / 1.60E101.24E-7which is about 1 in 8.1 million. 1 in 7 million if you drop the two blank tiles.We need FeebleMinded to tell us what the probability is according to ZYZZYVA.
3/24/2009 8:41:12 PM
It doesn't say what the probability is, it simply ranks the words in order of most to least probable. If anyone is a computer programmer type person, the source code is on the website. I couldn't even begin to comprehend it though.http://www.zyzzyva.net/
3/26/2009 12:31:48 AM
whats the probability of me starting with 8 vowels in back to back games? because it happened.
3/26/2009 9:40:29 AM
(1/whatever chances of starting with 8 vowels)*(1/whatever chances of starting with 8 vowels)..overall it's probably a pretty high chance respectively.
3/26/2009 3:40:35 PM
^^^^So take your BEEBEES example.The probability drawing that in that order is (2/100)*(12/99)*(11/98)*(1/97)*(10/96)*(9/95)*(4/94) = 1.2E-9but if you take a different set of E's it's still the same scenario. Because you're (12/99) captures all the ways you can get E, not an individual E. What you need to factor in is the order you can draw different letters.[Edited on March 26, 2009 at 3:58 PM. Reason : ]
3/26/2009 3:54:26 PM
3/26/2009 4:49:30 PM
^^ Take a look at http://svn.pietdepsi.com/repos/projects/zyzzyva/trunk/src/libzyzzyva/LetterBag.cpp.It looks like he's using combinations to calculate probabilities.^ For each word he's calculating the probability based on drawing the number of tiles in the word from an initially full bag; i.e. he's determining the probability of spelling a three letter word after drawing 3 tiles, not the probability of being able to spell a particular 3 letter word after drawing 7 tiles. He includes the blank tiles (which I didn't do above).
3/26/2009 6:30:43 PM
I'll explain it better when I get outta class, but a combination would say that EEEE=EEEEEssentially, what you are saying is that each E is unique. If that is the case, then the probability of pulling an E is 1/100, not 12/100.Another thing you're saying is that the order of E's matter, but then you use C(100,7) as your denonminator, which doesn't care about order. The dominator of all the ways you can draw 7 letter combinations will be100!/92!100 choices for the first letter, 99 for the second, and so on.[Edited on March 26, 2009 at 6:53 PM. Reason : ]
3/26/2009 6:44:37 PM
That's why I started using combinations, because order doesn't matter.
3/26/2009 6:59:56 PM
you can only select 4Es from 12 one way. It's because they aren't unique.You get EEEE, you're trying to make the E's unique.I'll come back with a much longer explantation tomorrow.
3/26/2009 7:07:33 PM
3/26/2009 8:45:34 PM
It's a pretty straightforward probability problem to determine the probability of picking any particular 7 letters:Let C(n,r) be the binomial coefficient: n! / [r!(n-r)!], or 0 if r > n.The method is easiest to illustrate by example:There are C(100,7) ways of selecting 7 tiles at the beginning of the game.If you want to calculate the probability of, say, "aaeeejt": Count the number of ways you can get this combination:C(9,2) * C(12,3) * C(1,1) * C(6,1)There's exactly one binomial coefficient for each distinct letter:Count how many ways to select 2 of the 9 a'stimesCount how many ways to select 3 of the 12 e'stimesCount how many ways to select 1 of the 1 j'stimes Count how many ways to select 1 of the 6 t'sThat's how to get the numerator. Then divide by C(100,7) to get the actual probability.With this formula, it's more or less straightforward to program a computer to calculate the probability of any 7 letter combination, then order them from most to least likely.
3/31/2009 9:55:32 PM
^Sorry, error there.First, you use C(100,7); you need a permutation. You're numerator is off too.I forgot about this thread, I'll write something up in a bit.[Edited on March 31, 2009 at 10:29 PM. Reason : ]
3/31/2009 10:26:12 PM
You're going to need to tell me why I'm wrong. Just telling me I'm wrong doesn't make it so.Edited to add: For that matter, I think it's clear that this is a combination problem, not a permutation problem. If you draw a,e,e,e,e,e,e, how's that any different from e,e,a,e,e,e,e? You still have one a and 6 e's--that's all that matters.[Edited on March 31, 2009 at 10:35 PM. Reason : adding stuff]
3/31/2009 10:33:33 PM
For aaeeejtYou have (9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)that is the probability you draw aaeeejt. However, what if you draw jtaaaee. I mean you can still play aaeeejt. So we need to factor that in too. So there is c(7,3) ways to place the E, c(4,2) ways to place the a, and 2 ways to place the j and t.(9/100)*(8/99)*(12/98)*(11/97)*(10/96)(1/95)*(6/94)*c(7,3)*c(4,2)*2;that's your probability.Also notice, the denominator is 100!/93!, a permutation not a combination.
3/31/2009 10:37:27 PM
3/31/2009 11:33:01 PM
the odds you draw any 1 combination that you are looking for is so low, it isnt worth trying to study the "most" likely 7 letter word combos.
4/1/2009 12:08:19 PM
this takes me back to st311
4/1/2009 1:00:33 PM
4/1/2009 1:43:50 PM
4/1/2009 3:15:41 PM
If you got the same number as me, rock on. I must of messed up calculating one of the numbers. All I know is that mine is right.
4/1/2009 5:03:21 PM
By the way, I was curious how many different seven letter combinations you can get in Scrabble, so I got a CAS to expand the generating function for me:1 + 27*x + 373*x**2 + 3509*x**3 + 25254*x**4 + 148150*x**5 + 737311*x**6 + 3199724*x**7 + 12353822*x**8 + 43088473*x**9 + 137412392*x**10 + 404600079*x**11 + 1108793943*x**12 + 2847262062*x**13 + 6890404765*x**14 + 15792242064*x**15 + 34425824044*x**16 + 71646518736*x**17 + 142827698985*x**18 + 273533670283*x**19 + 504576050285*x**20 + 898623709228*x**21 + 1548387401915*x**22 + 2586170833356*x**23 + 4194275182613*x**24 + 6615385384601*x**25 + 10161692700549*x**26 + 15221174189579*x**27 + 22259221214607*x**28 + 31813753798288*x**29 + 44482134367066*x**30 + 60898641337468*x**31 + 81701986711369*x**32 + 107493329723951*x**33 + 138786376090493*x**34 + 175952346689553*x**35 + 219163709706077*x**36 + 268341443489446*x**37 + 323111088944227*x**38 + 382772844896252*x**39 + 446290391042394*x**40 + 512301987174498*x**41 + 579155760119564*x**42 + 644969083769945*x**43 + 707709770134396*x**44 + 765294643135632*x**45 + 815699194394498*x**46 + 857070636209692*x**47 + 887835941961195*x**48 + 906796502925404*x**49 + 913201857455724*x**50 + 906796502925404*x**51 + 887835941961195*x**52 + 857070636209692*x**53 + 815699194394498*x**54 + 765294643135632*x**55 + 707709770134396*x**56 + 644969083769945*x**57 + 579155760119564*x**58 + 512301987174498*x**59 + 446290391042394*x**60 + 382772844896252*x**61 + 323111088944227*x**62 + 268341443489446*x**63 + 219163709706077*x**64 + 175952346689553*x**65 + 138786376090493*x**66 + 107493329723951*x**67 + 81701986711369*x**68 + 60898641337468*x**69 + 44482134367066*x**70 + 31813753798288*x**71 + 22259221214607*x**72 + 15221174189579*x**73 + 10161692700549*x**74 + 6615385384601*x**75 + 4194275182613*x**76 + 2586170833356*x**77 + 1548387401915*x**78 + 898623709228*x**79 + 504576050285*x**80 + 273533670283*x**81 + 142827698985*x**82 + 71646518736*x**83 + 34425824044*x**84 + 15792242064*x**85 + 6890404765*x**86 + 2847262062*x**87 + 1108793943*x**88 + 404600079*x**89 + 137412392*x**90 + 43088473*x**91 + 12353822*x**92 + 3199724*x**93 + 737311*x**94 + 148150*x**95 + 25254*x**96 + 3509*x**97 + 373*x**98 + 27*x**99 + x**100The exponent corresponds to how many tiles you draw, and the corresponding coefficient counts the number of different combinations. So if you draw seven tiles (like in the regular rules) there are 3,199,724 different letter combinations you could possibly get.
4/1/2009 11:50:00 PM
^Generating functions are really helpful, you almost always need a computer to do it.[Edited on April 2, 2009 at 11:08 AM. Reason : ]
4/2/2009 11:07:41 AM
4/2/2009 7:06:23 PM
Yes, I included the blank tile.
4/2/2009 10:42:40 PM
4/5/2009 12:48:59 AM
What language did they use to code it?
4/5/2009 3:29:35 PM
C++andYay, combinations!
4/6/2009 4:52:15 PM