Project Euler #106 — Subset Sums Meta Testing

Andy Zou
Andy’s Bytes
Published in
5 min readAug 13, 2023

Alright, here we go, the final boss of this subset sums series of problems:
https://www.hackerrank.com/contests/projecteuler/challenges/euler106/problem?isFullScreen=true

Let’s go over the examples first: Assuming {a,b,c,d} is in ascending order:

{a,b},{a,c},{a,d},{a,bc},{a,bd},{a,cd},{a,bcd},{b,c},{b,d},{b,ac},{b,ad},{b,cd},{b,acd},{c,d},{c,ab},{c,ad},{c,bd},{c,abd},{d,ab},{d,ac},{d,bc},{d,abc}, {ab,cd},{ac,bd},{ad,bc}

So, there should be a way to logic out which pairings needs to be checked for equality.

First of all, we can eliminate all unequal subsets. That only leaves the singles and the doubles.

Then, we can eliminate all the singles, since they’re definitely different.

So, it comes down to the three pairs: ({ab},{cd}), ({ac},{bd}), and ({ad},{bc}).

Now, only knowing that the numbers are strictly increasing, we then know that (ab,cd) cannot be equal, since it’s the two smaller numbers against the two larger numbers:

If a < b < c < d, then a < c and b < d, meaning a + b < c + d. We can rule (ab,cd) out.

Then, a < b and c < d, so a + c must be less than b + d. We can rule (ac,bd) out.

Lastly, a < b but d > c! So, a + d ? b + c. The lone ambiguity is (ad,bc).

Boy, this will be a doozy to suss out; it doesn’t seem so far to have a ton in common with the exact ideals of the previous special subset problems, but in terms of optimizing, it should be similar in that you only need to evaluate specific paired terms. It’s a matter of identifying those paired terms.

What it does have in common, though, is efficiently creating these unique set pairs. That’s probably where the evaluation(s) will be; how can I generalize what I just did, and in less than O(n²) time?

Since 1000000 choose 2 is explosively high, it’s clear that there’s got to be a way to skip all of the trivial cases and count more directly: no generating lists or sets, just some way of mathing it out.

But this makes sense; say we had set {a,b,c…z}. There’s no need to compare any other two-set against {y,z}, all other 2-sets will be smaller. And any other set will be bigger than {a,b}. And perhaps there are other intuitions about the second rule that might help? Knowing that a + b > z, a + b + c > y + z, etc. So, for a set-pair to be ambiguous, they must “overlap” in terms of where they pull from in the set. {a,z} vs. {b,y} is ambiguous.

But, also considering {a,y} < {b,z}, since both terms on the left are less than a respective term on the right, that’s not ambiguous. Maybe that’s it; can every element in one set be matched in the same inequality with an element from the other? Might need to build out two 3-sets to compare…

{a,b,c} < {d,e,f}, {a,b,d} < {c,e,f}, {a,b,f} ? {d,e,c}

At this point it might be worth building out something that can actually build the sets, let alone test them:

I think I’ll need to see the combinations before I can really see any patterns…

def buildPairSetsBrute(array, setSize):
"""
Prints the sorted, same-size subset pairs from an array.
"""
combos = list(combinations(array,setSize))
pairs = set()
for i in range(len(combos)):
for j in range(i + 1, len(combos)):
combo1,combo2 = combos[i],combos[j]
if not set(combo1) & set(combo2):
pairs.add((combo1,combo2))
pairs = list(pairs)
pairs.sort()
for pair in enumerate(pairs, 1):
print(pair)

After my simple implementation, I was able to clearly see the pattern: A pair needs to be tested if all the elements cannot be evenly paired with each other such that they are always less-than or greater-than.

So, building the sets in ascending order is helpful; they will already be sorted.

def consistent(setPair):
"""
Checks whether every element of the arr1 is pair-wise less than the elements of arr2
"""
return all(x < y for x, y in zip(setPair[0], setPair[1]))

Then, it should be about summing the inconsistent sets directly instead of building all the set-pairs and checking them; there should be some mathematical intuitions about this! For example, {a,z} can be ambiguously paired with {b,c},{b,d}…{b,y}. But also {c,d},{c,e}…{c,y}. The combinatorics may come in handy again! Or, at least, some summation formulas.

The rest of the implementation can be an exercise for the reader; build out some ways to see what you need to see for the patterns!

"Some ambiguous 2-subset-pairs for {a...z}"
(1, (('a', 'd'), ('b', 'c')))
(2, (('a', 'e'), ('b', 'c')))
(3, (('a', 'e'), ('b', 'd')))
(4, (('a', 'e'), ('c', 'd')))
(5, (('a', 'f'), ('b', 'c')))
(6, (('a', 'f'), ('b', 'd')))
(7, (('a', 'f'), ('b', 'e')))
(8, (('a', 'f'), ('c', 'd')))
(9, (('a', 'f'), ('c', 'e')))
(10, (('a', 'f'), ('d', 'e')))
...

After toying around with running sums, different ways of categorizing and counting, I found some polynomial patterns in the summations. The polynomial solution may involves several layers of polynomial summing! Get your algebra hats on! I’m coming across binomial expansions, Pascal’s Triangle…

I was able to distill several loops into things as simple as nCr. I didn’t know that there were so many fun properties of Pascal’s triangle in the summation of figurate numbers. There are two places so far that I’ve been able to find the sum I was looking for without iterating (too much). While this took my highest testcases from 15 to 1500 to 10000, it still wasn’t enough to crack n = 1,000,000. I think I’ll need still yet, some better optimizations. But, since I’ve distilled it to the sum of the products of two combinations, there might be some interesting factorial simplifications that can massively reduce my redundant calculations.

After mulling on it a few days, I managed to keep optimizing until I had it pretty much distilled to this, at which point I was completely stuck.

It works, it’s a correcct algorithm, but it still isn’t at all capable of running T = 1,000,000, let alone fit for memoizing for 30 test cases. For that, I finally went to Discord and StackExchange, where folks pointed out the OEIS page for this exact problem, where a recurrence relation was described. I hate having the answer given to me in that way, but I have to admit I have no idea how to derive it a different way. Folks mentioned “Hypergeometric functions” and “Fast Fourier Transforms for convolutions of two sequences” and it seems like the answer is truly, actually beyond me for now, which is disappointing. The problem is a great problem, with lots of steps of analysis and refinement, but the constraints of the HackerRank problem, are, frankly, a little un-fun. Perhaps you can come up with something more clever: https://en.wikipedia.org/wiki/Partition_problem

--

--