Count number of coprimes to all numbers in array (in Python)
The code from the article is available on github:
https://github.com/AndriyLazorenko/medium_blog/blob/master/Coprimes.ipynb
Part 1. Problem setup
During one of tech interviews I was asked to solve the following challenge on HackerRank (nice platform, auto-completion IDE and auto-tests included):
Given an array of numbers, e.g. [5, 8, 14] , return an array of numbers representing counts of coprimes to given numbers.
E.g. for 5, its coprimes are:
1, 2, 3, 4
A simple test for coprime is if greatest common divisor (gcd) is equal to one.
import math
a = math.gcd(5,1)
# a = 1
So, for an input array of [5,8,14] we expect an output array of [4,4,6]. Let’s go step by step.
Coprimes of 5: 1,2,3,4, so counting them all we arrive at 4
Coprimes of 8: 1,3,5,7, so counting them all we arrive at 4
Coprimes of 14: 1,3,5,9,11,13, so counting them all we arrive at 6
Returning counts of coprimes for every number in input array we receive output array of [4,4,6]
Part 2. Solution, first attempt
My first attempt at a solution successfully passed 5/7 tests, failing last tests on pre-defined timeout, which points out to complexity issues (will be discussed below).
First, I decided to use implementation of gcd taken from math lib. I decided to iterate over all elements in range of given number and look for coprimes with the following test math.gcd(a,b) == 1
, collecting the numbers passing comprimes test in array and counting them with sum()
def coprimes(a):
return sum([math.gcd(a,el) == 1 for el in range(a)])
Having implemented coprimes count for a single number, it’s easy to scale it to an array using listcomp as follows:
def cnt_coprimes_in_array(arr):
return [coprimes(el) for el in arr]
However, the solution didn’t pass several speed tests. Let’s try to understand why.
Part 3. Solution (speeding up the algorithm)
So, we have two list comprehensions, which can be interpreted as two loops. The outer loop iterates over different input numbers in the array, while the inner loop iterates over all the integers up to a given one. For large integers the inner loop can be quite slow (even omitting the fact that gcd(a,b) is sort of a loop as well). So, let’s try to speed up the inner loop.
After some digging on the internet one can find a generalized formula of number of coprimes of a number.
If we factorize a number N into prime numbers as follows:
N = a^x*b^y*c^z*..., where a,b,c are primes
we can express the number of coprimes of N as follows:
coprimes = N * (1 - 1/a) * (1 - 1/b) * (1 - 1/c) * ...
So, let’s implement new method coprimes_2(a)
and time both methods on a test case.
Constructing a test case
ten_thousand = int(1E4)
a =[np.random.randint(ten_thousand) for i in range(ten_thousand)]
Now, a new method:
import numpy as np
from sympy import primefactorsdef coprimes_2(a):
p_f = primefactors(a)
ar = [(1-1/el) for el in p_f]
num_coprimes = int(np.prod(ar) * a)
return num_coprimes
We can test that the method yields the same result on a number 1504 as our working method:
assert coprimes_2(1504) == coprimes(1504)
We are satisfied. Let’s construct a method processing entire array with our new ‘inner loop’ method:
def cnt_coprimes_in_array_2(arr):
return [coprimes_2(el) for el in arr]
Let’s time both methods using jupyter notebook magic:
%%timeit
cnt_coprimes_in_array(a)
# 13.2 s ± 448 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
and the second one:
%%timeit
cnt_coprimes_in_array_2(a)
# 143 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
We can clearly see a winner here. If only I was that smart while solving tech interview challenge :)
P.S.: this story is a part of my series on algorithmic challenges. Check out other cool algorithms decomposed with tests and jupyter notebooks!