Dollar Words with MATLAB

My middle-schoolers are looking for dollar words.

These are words that are ‘priced’ to cost exactly $1, using the sum of the prices of their letters, where:

  • ‘a’ is 1 cent
  • ‘b’ is 2 cents
  • ‘c’ is 3 cents, etc.

So ‘thirty’ is a dollar word!

They have a wall in their classroom where they post dollar words they find.

My son coded a Scratch program for finding the price of a word.

And what I thought was, one can find lots of dollar words using MATLAB!

To do this, I downloaded a dictionary of English words from GitHub here.

fid = fopen('words_alpha.txt');
words = textscan(fid, '%s');
fclose(fid);
words = words{1};
words = lower(words);

How many words are there in our dictionary?

>> length(words)
ans = 
370101

We put together a function that calculates the ‘dollar value’ of a word:

dollarvalue = @(word)(sum(arrayfun(@(letter)(letter-'a'+1), word)));

Let’s visualize the dollar value distribution:

histogram(cellfun(dollarvalue, words), 1:400)
hold on
plot([100, 100], [0, 4000], 'LineWidth',3,'LineStyle','--','Color','green')
title('Dollar value distribution for 400,000 English words')
xlabel('Dollar value')
ylabel('Word count')

How many dollar words is that?

>> ind = cellfun(@(w)(dollarvalue(w)==100),words);
>> dollarwords = words(ind);
>> length(dollarwords)
ans = 
3771

So about 1% of all words in our dictionary are dollar words!

Let’s print out a few of the dollar words we found:

>> dollarwords(1:20)
ans = 
20×1 cell array
{'abactinally' } {'abatements' } {'abbreviatable'} {'abettors' } {'abomasusi' } {'abreption' } {'abrogative' } {'absconders' } {'absinthol' } {'absorbancy' } {'acceptavit' } {'acceptors' } {'acclimation' } {'accounter' } {'accumulate' } {'acenaphthene' } {'achronism' } {'achroous' } {'acylation' } {'acknowledge' }

You may see the problem: most of these are not words that a middle-school kid can relate to. We could always use a smaller dictionary of common words instead, e.g. one we found here.

fid = fopen('google-10000-english.txt');
commonwords = textscan(fid, '%s');
fclose(fid);
commonwords = commonwords{1};
commonwords = lower(commonwords);

How many common words are we looking at?

>> length(commonwords)
ans = 10000

Let’s plot our dollar value distribution again!

histogram(cellfun(dollarvalue, commonwords), 1:400)
hold on
plot([100, 100], [0, 140], 'LineWidth',3,'LineStyle','--','Color','green')
title('Dollar value distribution for 10,000 common English words')
xlabel('Dollar value')
ylabel('Word count')
ind = cellfun(@(w)(dollarvalue(w)==100),commonwords);
commondollarwords = commonwords(ind);

How many dollar words is that?

>> length(commondollarwords)
ans = 
99

Ha! And again about 1% of all words in our dictionary are dollar words!

We’ll go ahead and list them all:

>> sort(commondollarwords)
ans = 
99×1 cell array
{'acknowledge'} {'addressing' } {'afghanistan'} {'analysis' } {'annually' } {'applying' } {'appointed' } {'arrivals' } {'asbestos' } {'attitude' } {'automated' } {'boulevard' } {'boundary' } {'browser' } {'colleagues' } {'collecting' } {'companion' } {'congress' } {'courses' } {'culture' } {'delivery' } {'designers' } {'discipline' } {'edmonton' } {'elsewhere' } {'excellent' } {'explains' } {'filtering' } {'fountain' } {'generating' } {'highways' } {'honduras' } {'hospital' } {'identifies' } {'imported' } {'inflation' } {'interfaces' } {'keyboards' } {'lightning' } {'likelihood' } {'maintains' } {'maximize' } {'milwaukee' } {'molecular' } {'motors' } {'outlined' } {'performed' } {'permits' } {'personal' } {'portland' } {'posting' } {'prevent' } {'primary' } {'printer' } {'problems' } {'producer' } {'profiles' } {'publicly' } {'pursue' } {'pussy' } {'quarter' } {'receptor' } {'referring' } {'reprint' } {'researcher' } {'resolved' } {'responded' } {'restore' } {'resumes' } {'roommate' } {'runtime' } {'selective' } {'services' } {'session' } {'sources' } {'standards' } {'status' } {'stress' } {'styles' } {'surely' } {'symantec' } {'syndicate' } {'telephone' } {'telescope' } {'temporal' } {'therefore' } {'thirty' } {'threatened' } {'thumbnail' } {'towards' } {'towers' } {'turkey' } {'twisted' } {'unavailable'} {'variety' } {'wednesday' } {'whenever' } {'wholesale' } {'writing' }

What’s the distribution of length of our dollar words?

histogram(cellfun(@length, commondollarwords), 1:20)
title('Length distribution for common dollar words')
xlabel('Word length')
ylabel('Word count')

So the longest dollar words are…

longest = find(cellfun(@(x)(length(x)==11), commondollarwords))
commondollarwords(longest)
ans =
3×1 cell array
    'afghanistan'
'unavailable'
'acknowledge'

And what about the letter frequencies in dollar words? Look, ma, no ‘j’!

c = cellfun(letters, commondollarwords, 'UniformOutput', false)
t = tabulate([c{:}])
bar(t(:, 2))
text(1:26, t(:, 2)+3, char(('a'+[0:25])'))
title('Letter frequency distribution for common dollar words')
xlabel('Letter')
ylabel('Letter Frequency')

(You may want to know, did I talk about this code to my son? Not yet. Will talk to him after the dollar word rush is over.)

You can download the MATLAB code here.

Would love to hear about a problem you are working on!

Like what you read? Give Anoush Najarian a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.