Technical challenge: The non-dup collection

Recruiters, seriously, please stop…

Jose I Santa Cruz G
CodeX
9 min readJul 1, 2021

--

The past months my LinkedIn profile received far too many job offers, some where really attractive (mainly those which considered working for non-Chilean companies, with a monthly income in USD), others just plain out of scope.

Just for the record, if any of you readers works as a recruiter, please read the candidates resume, and take a look at the links in their profiles. That usually gives a pretty good view of what expertises the candidate has.
It’s really annoying to receive offers asking for some expertise out of our experience.

Jenga has always been a challenge — Photo by Michał Parzuchowski on Unsplash

In more than one of these offers, a kind of technical challenge was involved, sadly all questions and challenges, in most cases, are taken “from the book”. And by “the book” I mean they are typical coding challenges found in most articles, forum posts, and even some published books.

In a mood of screwing every recruiter than will certainly repeat this very same coding challenges I’ll answer the “Non-duplicate collection challenge”.

Lets suppose you have a simple collection of objects. Some of them repeat themselves. You’re asked to get the unique values from this collection, and also get which element has the max number of repetitions.

I’ve seen at least 2 variations of this same problem.

  1. Instead of a collection, the input is a string, where every letter can repeat itself at least once; for eg. "aaaaaaaabbbccddddcccceeee"
  2. The input is an array of strings, for eg. company names
    ["Google", "Apple", "Apple", "Microsoft", "Facebook", "Google", "Apple", "Facebook", "Apple", "Microsoft", "Microsoft", "Google"]

Other variations can consider using a collection of more complex objects instead, but the logic remains the same. Case 2 is usually hidden behind something like:

“Our company has several projects, and each of them is linked to a particular client. We have an array that represents all our projects and all our clients. Your task is getting a list of all unique clients, and tell us which one has the most projects being developed by us.”

And as you see the problem is the same:

  • Some kind of collection (a string is a collection of characters (single char strings))
  • Get unique values
  • Get item with max repetitions …

easy”…

So lets build a solution for this. We’ll be using case 1, and we’ll be coding in Typescript:

// Note: This is one possible solution. You can get as creative as you want.
// Step by step ... (uuuuh baby !) <- NKOTB someone? :P
// 1. Get original collection
// 2. Dedup the values
// 1. & 2. can be joined in teh same function
function dedupeStringSet(data: string): string {
// result is initialized with the input string just to have something to return when testing without any code
let result: string = data;
// As the string is not a collection (per se) turn it into a char Array
const dataArr: Array<string> = data.split('');
// Sets a a "new" (not that new) datatype, also a collection but it doesn't admit any duplicates.
//Google for *javascript unique values* and it's one of the first results
const dataSet = new Set(dataArr);
// Turn the Set again into an Array (using Array.from) and then to a string
result = Array.from(dataSet).join('');
// We're ready so return
return result;
}
// 3. Get max repeated item
// I see a problem here. This deduping strategy gives no information on how many repetitions per char

As our first attempt failed on getting the max values, we’ll try another approach. I’ll put some comments in the different parts of the code.

// The object is declared outside the function so we can use it in any other place we require (for eg. the getMax function)
// Every char/letter on the string has a counter
let cntObj: { [key: string]: number } = {}
function dedupStringObj(data: string): string {
// Initialize the object on every run
cntObj = {};
let result: string = data;
const dataArr: Array<string> = data.split('');
// We'll iterate through all chars in the string
for (let index = 0; index < dataArr.length; index++) {
// This would be a char
const element = dataArr[index];
// If the char does exist as an attribute/key in the object the add 1, else first value is 1
cntObj[element] = (cntObj[element]) ? cntObj[element]+1 : 1;
}
result = (Object.keys(cntObj)).join('');

return result;
}
// 3. Get max repeated item
/**
* Get the most repeated char in a non-unique char string
* @param dataObj JSON object used as a dictionary where each char stores its repetition counter
* @returns the string corresponding to the most repeated item
*/
function mostRepeatedObj(dataObj: { [key: string]: number }): string {
// Initializes the key and maxValue to have a starting pivot
let maxValue: number = 0;
let maxKey: string = '';
// Gets all object keys in an array so we can iterate through them and get the most repeated item
const keys: Array<string> = Object.keys(dataObj);
for (let index = 0; index < keys.length; index++) {
// Get the key and counter value for that key
const key: string = keys[index];
const currValue: number = dataObj[key];
// Typical max value strategy, currValue replaces maxValue when >
// If this happens also keep track of the key (which would be the most repeated item)
if (currValue >= maxValue) {
maxValue = currValue;
maxKey = key;
}
}
return maxKey;
}

We also need some code to make our test runs:

const input = 'aaaaabbbcccccceeeccccdd';
console.log(input, '\n');
let dedup = dedupStringSet(input);
console.log(dedup, '\n');
dedup = dedupStringObj(input);
console.log(dedup);
console.log(cntObj);
const maxObj: string = mostRepeatedObj(cntObj);
console.log(maxObj, '\n');
/**
aaaaabbbcccccceeeccccdd <- input
abced <- dedupStringSetabced <- dedupStringObj
{ a: 5, b: 3, c: 10, e: 3, d: 2 } <- cntObj
c <- mostRepeatedObj
**/

To run the program.

  1. Store it on a separate folder, on a file called <whatever you want>.ts (replace with whatever you want :P )
  2. Run npm install typescript ts-node @types/node
  3. Run ./node_modules/.bin/ts-node <whatever you want>.ts

But… performance may be an issue. I hope you have read about it (if not you’re doing it now :) ) , using a JSON object as a hashmap/dictionary can be slow, as you’ll have to check if every attribute in the object does exist. That wold be this specific line:

cntObj[element] = (cntObj[element]) ? cntObj[element]+1 : 1;

So lets try a solution that uses a Map object instead of a JSON object.

// Almost the same as the object version, but using a Map
let cntMap: Map<string, number> = new Map<string, number>();
function dedupStringMap(data: string): string {
cntMap = new Map<string, number>();
let result: string = data;
const dataArr: Array<string> = data.split('');
for (let index = 0; index < dataArr.length; index++) {
const element = dataArr[index];
// Checks if the char exist as a key, if it does double check when getting (??0 else the Typescript trasnpiler will throw an error)
// and increment. If it doesn exist then first value is 1.
const inc: number = (cntMap.has(element)) ? (cntMap.get(element)??0) + 1 : 1;
// Store the value in the map
cntMap.set(element, inc);
}
// As Sets, Maps do not admit duplicates, so take the key and turn the iterator into an array. Then
// turn the Array back into a string.
result = Array.from(cntMap.keys()).join('');

return result;
}
function mostRepeatedMap(dataMap: Map<string, number>): string {
let maxValue: number = 0;
let maxKey: string = '';
const keys: Array<string> = Array.from(dataMap.keys());
for (let index = 0; index < keys.length; index++) {
const key: string = keys[index];
const currValue: number = dataMap.get(key) || 0;
if (currValue >= maxValue) {
maxValue = currValue;
maxKey = key;
}
}
return maxKey;
}

Only slight differences from the object version. And to run :

let dedup = dedupStringMap(input);
console.log(dedup);
console.log(cntMap);
const maxMap: string = mostRepeatedMap(cntMap);
console.log(maxMap, '\n');
/**
aaaaabbbcccccceeeccccdd
abced
Map(5) { 'a' => 5, 'b' => 3, 'c' => 10, 'e' => 3, 'd' => 2 }
c
**/

OK we have 3 ways of non-deduping a string (or a collection, if you adjust the code properly), and 2 ways of getting a max value from the resulting object.

What about performance?

Let’s pick all our code and mix it up, adding some time measurement messages (hope you don't mind incremental copy & pasting):

function dedupStringSet(data: string): string {
let result: string = data;
const dataArr: Array<string> = data.split('');
const dataSet = new Set(dataArr);
result = Array.from(dataSet).join('');
return result;
}
let cntObj: { [key: string]: number } = {}
function dedupStringObj(data: string): string {
cntObj = {};
let result: string = data;
const dataArr: Array<string> = data.split('');
for (let index = 0; index < dataArr.length; index++) {
const element = dataArr[index];
cntObj[element] = (cntObj[element]) ? cntObj[element]+1 : 1;
}
result = (Object.keys(cntObj)).join('');

return result;
}
// Almost the same as the object version, but using a Map
let cntMap: Map<string, number> = new Map<string, number>();
function dedupStringMap(data: string): string {
cntMap = new Map<string, number>();
let result: string = data;
const dataArr: Array<string> = data.split('');
for (let index = 0; index < dataArr.length; index++) {
const element = dataArr[index];
// Checks if the char exist as a key, if it does double check when getting (??0 else the Typescript trasnpiler will throw an error)
// and increment. If it doesn exist then first value is 1.
const inc: number = (cntMap.has(element)) ? (cntMap.get(element)??0) + 1 : 1;
// Store the value in the map
cntMap.set(element, inc);
}
// As Sets, Maps do not admit duplicates, so take the key and turn the iterator into an array. Then
// turn the Array back into a string.
result = Array.from(cntMap.keys()).join('');

return result;
}
/**
* Get the most repeated char in a non-unique char string
* @param dataObj JSON object used as a dictionary where each char stores its repetition counter
* @returns the string corresponding to the most repeated item
*/
function mostRepeatedObj(dataObj: { [key: string]: number }): string {
// Initializes the key and maxValue to have a starting pivot
let maxValue: number = 0;
let maxKey: string = '';
// Gets all object keys in an array so we can iterate through them and get the most repeated item
const keys: Array<string> = Object.keys(dataObj);
for (let index = 0; index < keys.length; index++) {
// Get the key and counter value for that key
const key: string = keys[index];
const currValue: number = dataObj[key];
// Typical max value strategy, currValue replaces maxValue when >
// If this happens also keep track of the key (which would be the most repeated item)
if (currValue >= maxValue) {
maxValue = currValue;
maxKey = key;
}
}
return maxKey;
}
function mostRepeatedMap(dataMap: Map<string, number>): string {
let maxValue: number = 0;
let maxKey: string = '';
const keys: Array<string> = Array.from(dataMap.keys());
for (let index = 0; index < keys.length; index++) {
const key: string = keys[index];
const currValue: number = dataMap.get(key)??0;
if (currValue >= maxValue) {
maxValue = currValue;
maxKey = key;
}
}
return maxKey;
}
const input = 'aaaaabbbcccccceeeccccdd';
console.log(input, '\n');
console.time('set');
let dedup = dedupStringSet(input);
console.timeEnd('set');
console.log(dedup, '\n');
console.time('obj');
dedup = dedupStringObj(input);
console.timeEnd('obj');
console.log(dedup);
console.log(cntObj);
console.time('maxObj');
const maxObj: string = mostRepeatedObj(cntObj);
console.timeEnd('maxObj');
console.log(maxObj, '\n');
console.time('map');
dedup = dedupStringMap(input);
console.timeEnd('map');
console.log(dedup);
console.log(cntMap);
console.time('maxMap');
const maxMap: string = mostRepeatedMap(cntMap);
console.timeEnd('maxMap');
console.log(maxMap, '\n');
/**
aaaaabbbcccccceeeccccdd
set: 0.108ms
abced
obj: 0.08ms
abced
{ a: 5, b: 3, c: 10, e: 3, d: 2 }
maxObj: 0.045ms
c
map: 0.076ms
abced
Map(5) { 'a' => 5, 'b' => 3, 'c' => 10, 'e' => 3, 'd' => 2 }
maxMap: 0.079ms
c
**/

You can make your own conclusions on the results.
Finally as I was writing this article I had another idea: Will the performance improve for the mostRepeated functions if the collection is sorted descendently and the first item is taken as the max value?

I’ll leave that as an interesting coding exercise for the reader (perhaps a far better challenge than this article’s challenge). It will require rewriting some lines in certain functions, and most probably changing the cntObj typing (I was thinking in an Array<{ key: string, counter: number }>).
A couple of links that can help if you're in the mood of coding for pleasure:

So if you face an interview with a technical challenge with this particular question (or alike), I hope you remember this article.

So recruiters, please stop. A technical coding challenge that requires knowing by memory any library or package method signatures, or complex algorithms such as traversing a binary tree from the bottom to the top will not evaluate with precision if a candidate is capable or not of covering a job position, neither if they are a good fit for the organization.
Worst case, most companies already consider test periods (and already consider all related costs), in which the management decides if hiring the candidate is a good choice or not.

There’re better ways of hiring high level developers.
Stay safe.

--

--

Jose I Santa Cruz G
CodeX
Writer for

Polyglot senior software engineer, amateur guitar & bass player, geek, husband and dog father. Not precisely in that order.