# Dollar Words with MATLAB

My middle-schoolers are looking for dollar words.

These are words that are ‘priced’ to cost exactly \$1, using the sum of the prices of their letters, where:

• ‘a’ is 1 cent
• ‘b’ is 2 cents
• ‘c’ is 3 cents, etc.

So ‘thirty’ is a dollar word!

They have a wall in their classroom where they post dollar words they find.

My son coded a Scratch program for finding the price of a word.

And what I thought was, one can find lots of dollar words using MATLAB!

To do this, I downloaded a dictionary of English words from GitHub here.

`fid = fopen('words_alpha.txt');words = textscan(fid, '%s');fclose(fid);words = words{1};words = lower(words);`

How many words are there in our dictionary?

`>> length(words)`
`ans =       370101`

We put together a function that calculates the ‘dollar value’ of a word:

`dollarvalue = @(word)(sum(arrayfun(@(letter)(letter-'a'+1), word)));`

Let’s visualize the dollar value distribution:

`histogram(cellfun(dollarvalue, words), 1:400)hold onplot([100, 100], [0, 4000], 'LineWidth',3,'LineStyle','--','Color','green')title('Dollar value distribution for 400,000 English words')xlabel('Dollar value')ylabel('Word count')`

How many dollar words is that?

`>> ind = cellfun(@(w)(dollarvalue(w)==100),words);>> dollarwords = words(ind);>> length(dollarwords)`
`ans =        3771`

So about 1% of all words in our dictionary are dollar words!

Let’s print out a few of the dollar words we found:

`>> dollarwords(1:20)`
`ans =       20×1 cell array`
`{'abactinally' } {'abatements' } {'abbreviatable'} {'abettors' } {'abomasusi' } {'abreption' } {'abrogative' } {'absconders' } {'absinthol' } {'absorbancy' } {'acceptavit' } {'acceptors' } {'acclimation' } {'accounter' } {'accumulate' } {'acenaphthene' } {'achronism' } {'achroous' } {'acylation' } {'acknowledge' }`

You may see the problem: most of these are not words that a middle-school kid can relate to. We could always use a smaller dictionary of common words instead, e.g. one we found here.

`fid = fopen('google-10000-english.txt');commonwords = textscan(fid, '%s');fclose(fid);commonwords = commonwords{1};commonwords = lower(commonwords);`

How many common words are we looking at?

`>> length(commonwords)ans = 10000`

Let’s plot our dollar value distribution again!

`histogram(cellfun(dollarvalue, commonwords), 1:400)hold onplot([100, 100], [0, 140], 'LineWidth',3,'LineStyle','--','Color','green')title('Dollar value distribution for 10,000 common English words')xlabel('Dollar value')ylabel('Word count')`
`ind = cellfun(@(w)(dollarvalue(w)==100),commonwords);commondollarwords = commonwords(ind);`

How many dollar words is that?

`>> length(commondollarwords)`
`ans =       99`

Ha! And again about 1% of all words in our dictionary are dollar words!

We’ll go ahead and list them all:

`>> sort(commondollarwords)`
`ans =       99×1 cell array`
`{'acknowledge'} {'addressing' } {'afghanistan'} {'analysis' } {'annually' } {'applying' } {'appointed' } {'arrivals' } {'asbestos' } {'attitude' } {'automated' } {'boulevard' } {'boundary' } {'browser' } {'colleagues' } {'collecting' } {'companion' } {'congress' } {'courses' } {'culture' } {'delivery' } {'designers' } {'discipline' } {'edmonton' } {'elsewhere' } {'excellent' } {'explains' } {'filtering' } {'fountain' } {'generating' } {'highways' } {'honduras' } {'hospital' } {'identifies' } {'imported' } {'inflation' } {'interfaces' } {'keyboards' } {'lightning' } {'likelihood' } {'maintains' } {'maximize' } {'milwaukee' } {'molecular' } {'motors' } {'outlined' } {'performed' } {'permits' } {'personal' } {'portland' } {'posting' } {'prevent' } {'primary' } {'printer' } {'problems' } {'producer' } {'profiles' } {'publicly' } {'pursue' } {'pussy' } {'quarter' } {'receptor' } {'referring' } {'reprint' } {'researcher' } {'resolved' } {'responded' } {'restore' } {'resumes' } {'roommate' } {'runtime' } {'selective' } {'services' } {'session' } {'sources' } {'standards' } {'status' } {'stress' } {'styles' } {'surely' } {'symantec' } {'syndicate' } {'telephone' } {'telescope' } {'temporal' } {'therefore' } {'thirty' } {'threatened' } {'thumbnail' } {'towards' } {'towers' } {'turkey' } {'twisted' } {'unavailable'} {'variety' } {'wednesday' } {'whenever' } {'wholesale' } {'writing' }`

What’s the distribution of length of our dollar words?

`histogram(cellfun(@length, commondollarwords), 1:20)title('Length distribution for common dollar words')xlabel('Word length')ylabel('Word count')`

So the longest dollar words are…

`longest = find(cellfun(@(x)(length(x)==11), commondollarwords))commondollarwords(longest)`
`ans =       3×1 cell array`
`    'afghanistan'    'unavailable'    'acknowledge'`

And what about the letter frequencies in dollar words? Look, ma, no ‘j’!

`c = cellfun(letters, commondollarwords, 'UniformOutput', false)t = tabulate([c{:}])bar(t(:, 2))text(1:26, t(:, 2)+3, char(('a'+[0:25])'))title('Letter frequency distribution for common dollar words')xlabel('Letter')ylabel('Letter Frequency')`

(You may want to know, did I talk about this code to my son? Not yet. Will talk to him after the dollar word rush is over.)