Software Engineer Interview Questions to Know (Entry Level) — Primes & Arrays

Andrew Oliver
The Startup
Published in
9 min readJun 14, 2019

If you’ve just stumbled onto this series, make sure to check out the introduction here.

Question 1: Generate a list of all prime numbers less than or equal to some integer n.

Variants

  • Generate a list of all prime numbers between two integers a and b, including a and b.
  • For a given integer n, let π(n) equal the number of primes that are less than n. Write a method that determines π(n).

What They’re Actually Asking

  • Many solutions to this problem require multiple methods. It can be helpful to organize your approach into methods before developing. Do you plan your code before you begin writing? Or do you jump right in with no plan?
  • This is not a common Computer Science topic. Can you take a novel problem, plan a solution, and write it?
  • Can you optimize your code if your first solution is suboptimal?

Tools for the Answer

  • A prime number is an integer greater than one with no positive divisors besides one and itself. For example, 17 is prime since the only factors of 17 are 17 and 1.
  • To check if a number n is prime, we could iterate from 2 to (n-1) and see if (n % k) is equal to 0. The % (modulus operator) finds the remainder after division. If this remainder is 0, we have found a factor of n and therefore n is not prime. For example, 14 % 2 = 0 since 14 = 2•7 with remainder 0.
  • But we don’t need to check all the way to (n-1). We know factors come in pairs. For example, 36 nontrivially factors as 2•18, 3•12, 4•9, and 6•6. We don’t need to find 18, 12, or 9 since we’ve already effectively found them with 2, 3, and 4. We can iterate from 2 to (√n). This is a huge efficient improvement, especially for large n.
Visualization of factorization of composite numbers
  • However, there are many other primality tests. For those interested, check out the Miller-Rabin Primality Test. We won’t worry go into this. However, we can use external libraries for primality testing. Java has an isProbablePrime() method in its BigInteger Class. Python has an external library SymPy, which has an isprime() method as well.
  • The last tool we need comes from Eratosthenes of Cyrene (276 BCE). It is known as the Sieve of the Eratosthenes.
  • We begin with a Boolean array of size n with all values initialized as true. We start with the first prime number 2 and we mark every multiple of 2 (i.e. 4, 6, 8, 10, …) as false, denoting that these numbers are composite. We then move to our next non-false value, which is 3 and we mark all multiples of 3 (9, 15, 21, …) as false. We’ve already marked 6, 12, and 18 as composite since they have 2 as a factor. We continue this process for all non-false values (i.e. 5, 7, 11, ..). We do this up to √n, for the same reasons above. Any remaining true values in our array are prime numbers. The indices of these true values are prime numbers.
Sieve of Erastothenes (Image source here)

My Answer

My first naive solution would be to define two methods: is_prime(n) and generate_primes(n). I would iterate from 2 to (n-1) and check if each integer is a prime number by using the modulus operator.

Solution Version 1. Code source here.

However, we can certainly make some improvements to the above. In our is_prime() method, we only need to iterate from 2 to √n. Any iterations after the square root of n are redundant. Additionally, we can use a common Python external library, SymPy. This has a built-in isprime() method. It uses a few primality tests, including the Miller-Rabin Primality test and the Baillie-PSW primality test. These tests run more quickly than either of our current methods.

An additional solution uses the Sieve of Eratosthenes. We begin with a Boolean array of size n. Every time we encounter a prime number, we remove all the multiples of this number from our array. The indices of true values in the resulting array will be prime numbers. The implementation of this is shown below.

Implementation of the Sieve of Eratosthenes. Code source here.

Closing Note

This iterative development method is very helpful in interviews. Getting a working solution is most important, even if it is suboptimal. From here, identifying bottlenecks and areas for improvements can often lead to an optimal solution. In this example, we did change our approach for the optimal solution. However, that is because there existed a specific algorithm for this problem.

Additionally, our improvements do drastically improve runtime. Suppose we want to generate all primes less than 1 million. Our first naive solution takes nearly 55 minutes. Using √n extremely reduces our runtime to 9.77 seconds. Using SymPy’s isprime() method gets us down to 6.07 seconds. But the Sieve of Eratosthenes is much more efficient, taking only 0.33 seconds.

Question 2: What is the difference between an Array, an ArrayList, and a Vector?

Variants

  • What is the difference between an Array and an Array List? What are the tradeoffs between an Array and an Array List?
  • Why is the runtime complexity of insertion into an ArrayList? What is amortized time?
  • What is synchronization? What is thread safety and why is it important?

What They’re Asking

  • Do you understand various data structures?
  • Can you explain technical concepts? Can you explain tradeoffs between data structures?
  • Do you understand runtime complexity? Do you understand why arrays allow constant time access to a specific element?
  • Do you have an understanding of how programs execute code? Can you identify when and why thread safety would matter?

Tools for Our Answer

  • Arrays, ArrayLists, and Vectors are abstract data types. An abstract data type is a class of objects whose behavior is defined by a structure, set of values, and operations.
  • An array is an ordered collection of objects (strings, integers, etc.) stored at contiguous locations in memory and indexed by integers. Two objects are contiguous if they are adjacent to each other.
Memory Locations of Elements in an Integer Array in Python
  • In the above code, we have an arbitrary integer array. We see the second memory address is the first memory address + 32. Integers are stored with 32 bits in Python. We see the memory address of the last element is equal to the memory address of the first element + 96. This is because there are 3 32-bit integers “between” the first and last element.
  • This is exactly why we can look up elements in an Array in constant time. If I know where the first element is in an array, I can find the 15th element by adding 15*(size of an element) to the first address. This is also why arrays can only store one type of element. Integers can be different sizes than Strings so our multiplication would not work.
The array from above with each element’s address in memory
  • When an Array is created it has a fixed length that cannot be changed. Arrays can store both primitive data types (int, float, double, etc.) and Objects.
  • For our purposes, we’ll define an ArrayList as an instantiation of the Java ArrayList Class. An ArrayList is a dynamic/resizeable array. The size of an ArrayList increases as elements are added and decreases as elements are removed. An ArrayList can continue growing so long as there is sufficient memory.
Instantiation of an ArrayList of Integers in Java
  • Suppose we have an ArrayList of length n. Once we have inserted n elements into our ArrayList, the size of our ArrayList is increased to 1.5 times the original size. The original ArrayList is copied to a new contiguous block of memory with enough space to hold the new ArrayList.
  • The above process is said to run in amortized time. This is effectively constant time or O(1). Suppose again we have an ArrayList of size n. We will not need to resize our ArrayList for element 1, 2, 3, 4, … , (n-1). Once we reach element n, we’ll need to resize our ArrayList. Every n elements, we have an O(n) operation. So, our average insertion time is O(n) / n ≈ O(1). This is precisely the definition of amortized time.
  • We can remember this using the definition of amortize — “gradually write off the initial cost of an asset over a period”. Our asset is the dynamic resizing of our ArrayList. Our cost is copying n elements. We gradually write off the initial cost by not resizing our ArrayList for elements 1, 2, 3, … , (n-1).
ArrayList Resizing Visualization
  • An ArrayList cannot contain primitive data types. However, Java provides wrapper classes like Integer and String for primitive data types.
  • A Vector is very similar to an ArrayList with one key difference — synchronization. A Vector is synchronized meaning that only one thread can access it a time.
  • From Wikipedia, a thread (of execution) is the smallest sequence of programmed instructions that can be managed independently by an operating system.
  • Due to the lack of synchronization, an ArrayList is faster as operations on an ArrayList can occur simultaneously. If multiple threads want to access a Vector, they effectively have to wait in line.
  • Finally, a Vector doubles in size whereas an ArrayList increases by half.

My Answer

An Array, ArrayList, and Vector are all abstract data types that store multiple objects at contiguous locations in memory. Although multiple objects can be stored, these data types can only store one distinct type of object. Objects in these data structures are all indexed by integers and support constant time access.

When an Array is created, it is created with an unchangeable size. However, an ArrayList and a Vector can be dynamically resized so long as there is sufficient memory. The size of an ArrayList increases by half while the size of a Vector doubles. This increase runs in amortized time as it is an O(n) operation performed every n elements.

The main difference between an ArrayList and a Vector is synchronization. An ArrayList is not synchronized, meaning multiple threads can concurrently access one ArrayList. This allows for quicker performance but it can open up our code to erroneous or unexpected performance. A Vector is synchronized and thread-safe but also slower. If synchronization is important, a Vector provides a built-in way to achieve thread safety.

Closing Note

For most internship interviews, it is unlikely that your interviewers will expect you to have an in-depth knowledge of operating systems, processes, and threads. However, a simple mention of synchronization, with a high-level description, is usually sufficient.

It may also be helpful to specifically mention use cases for each data structure. If I know I have exactly k elements, I can use an Array. For example, an array representing books written by Ernest Hemingway is unlikely to change size. If I am unsure how many elements I’ll need, it makes sense to use an ArrayList. For example, if I need a data structure representing users of an app, this number is likely to increase or decrease frequently. When comparing data structures, it can be helpful to highlight concrete use cases for each data structure.

Check out Part 3 here!

Questions? Comments? Send me an email at andrew.oliver.medium@gmail.com! I’d love to hear from you.

--

--