Subset Sum DP -further optimizations

Ashwanth K
Spider R&D
Published in
3 min readSep 18, 2023

Prerequisites: DP, Standard subset sum dp, maths
Topic difficulty: Hard

It is recommended to have a good understanding of the subset sum dp problem before reading this blog.

General Problem:
Given an array of size N, and a target_sum. You need to determine whether any subset of this array has its sum = target_sum.

Constraints:
1≤ N ≤ 2000
1 ≤ target_sum ≤ N
0 ≤ Array[i] ≤ N

The above problem is called the standard subset sum problem which is usually solved using an O(N²) dp approach like dp[index][target_sum].

However, a variation of this problem can be further optimized to O(N*sqrt(N)*logN) using some observations. This variant problem and its interesting solution is what we will see in our blog.

Consider a variant of this subset sum problem where you were asked to find whether the target_sum is achievable or not using some subset of elements.

But additional constraints:
1 ≤ N ≤ 2*10⁴
sum of all array elements ≤ 2*10⁴

In our variant problem, we are provided with additional information that my input array sum ≤ N. Can we use this to optimize our algorithm …..

Yes, we can use this additional piece of information to improve our complexity. Let’s see how can this be done.

Observation 1:
If a positive integer N is represented as a sum of positive integers, such a sum always contains at most O(sqrt(n)) distinct numbers.

  • The reason for this is that to construct a sum that contains a maximum number of distinct numbers, we should choose small numbers.
  • If we choose the numbers 1,2,…,k, the resulting sum is
    k(k +1)/2 = N (total_sum).
  • Thus, the maximum amount of distinct numbers is k = O(sqrt(n))

Conclusion:
My input array always contains O(sqrt(N)) distinct numbers. This observation gives us some hope that we can somehow compress our array into smaller sizes and perform our standard dp algorithm to achieve a good complexity.

Observation 2:
If a positive integer is repeated K times in the input array, we can compress it into just log(K) elements while retaining all possible subset sum information.

Example:

  • Lets say I have number x repeated 10 times : {x,x,x,x,x,x,x,x,x,x}
    10 = (1 + 2 + 4) + 3
  • Try to express 10 as the sum of continuous powers of 2.
  • Since 1+2+4+8 = 15 >10, we just express it as 1+2+4+(remaining) 3

{x,x,x,x,x,x,x,x,x,x} this array can generate subset sums: {x , 2x , … , 10x}
Even our compressed array: {x, 2x, 4x, 3x} can also express the same set of sums as above.

  • Similarly {x,x,x,… 15 times } can be compressed as {x,2x,4x,8x} where all my subset sums are retained.

Conclusion:
My input array has O(sqrt(N)) distinct elements. Each distinct element with frequency K can be compressed into log(K) elements. So the compressed version of my array contains only O(sqrt(N)log(N)) elements.

Running our standard dp on O(sqrt(N)log(N)) elements with target sum ≤ N takes O(Nsqrt(N)log(N)) time complexity.

Code snippet for compressing array:

Problem for practice (Hard):

Summary:

In Competitive coding, It is always good to use up all information provided to you in the problem statements, From the above content we can see that a small additional information has improved the time complexity of our program.

This article is published as a part of the ‘Algo Reads’ under Spider Research and Development Club, NIT Trichy.

Resource:

  • CSES Book chapter 27

--

--