With typescript: Generic aggregator with only one array scan-through.

Ran Wahle
Israeli Tech Radar
Published in
5 min read10 hours ago

Let’s say you have a collection of data and you want to produce a report on it, let’s say an average salary by profession.

However, because this is quite a common task, writing it every time might be time consuming, although it may come in handy in job interviews.

Grouping and calculation process
Data processing

Here’s a sample data

const data = [
{ name: "Al", family: "Bundy", profession: "Shoe salesman", salary: 500 },
{ name: "Kelly", family: "Bundy", profession: "Model", salary: 12000 },
{ name: "Bud", family: "Bundy", profession: "Student", salary: 0 },
{ name: "Peggy", family: "Bundy", profession: "Housewife", salary: -500 },
{ name: "Marcy", family: "D'Arcy", profession: "Banker", salary: 40000 },
{ name: "Jefferson", family: "D'Arcy", profession: "Crook", salary: -40000 },
{ name: "Steve", family: "Rhoades", profession: "Forest Ranger", salary: 10 },
];

A hard code implementation

Sometimes when we get a task, we might not have time to write a reusable code, so we’d write something tailor made to the data

const avgSalary = (data) => {
const averages = data.reduce(
(acc, current) => {
const { counts, results } = acc;

counts[current.family] = (counts[current.family] || 0) + 1;
results[current.family] =
((results[current.family] || 0) * (counts[current.family] - 1) +
current.salary) /
counts[current.family];
return acc;
},
{ counts: {}, results: {} }
);

return averages.results;
};

console.log(avgSalary(data));

This solution is good. It scans through the array only once, and calculates the average on the fly. However, if we want to have the average per profession, we’d have to change the function implementation. Can’t we have a generic solution for that?

The naive implementation

Our naive implementation would be to group each family and then calculate its average salary, right?

export function groupBy<T>(data: T[], field: keyof T): Record<string, T[]> {
return data.reduce((acc, item) => {
const group = acc[item[field] as string | number] || [] as T[];
group.push(item);
return { ...acc, [item[field] as string]: group };
}, {} as Record<string | number, T[]>);
}


// Calculate the sallary
function averageSallaryByFamily(sallaryData: typeof data) {
const groups = groupBy(sallaryData, "family");
const averageSallary = (salaries: typeof data) => {
return salaries.reduce((acc, item) => {
return acc + item.salary / salaries.length;
}, 0);
};
return Object.entries(groups).reduce(
(acc, entry) => ({
...acc,
[entry[0]]: averageSallary(entry[1]),
}),
{}
);
}

// Call your function
averageSallaryByFamily(data);

This actually can be transformed easily to the function groupBy getting the average calculation function as a parameter and may be used like that

groupByAndCalc(data, 'family', average('salary')) 

All we have to do is transform the averageSalary to be average in general, and implement like that

function average<T>(dataGroup: T[], field keyof T) => {
return dataGroup.reduce((acc, item) => {
return acc + item[field] / dataGroup.length;
}, 0);
};

And then, we change the name pof groupBy, according to its implementation, to groupByAndCalc, and the function would look like that

export function groupByAndCalc<T>(data: T[], groupKey keyof T, calc(group: T[], field keyof T) => Record<string, number>
, calcOnField: keyof T) {
const groups = groupBy(data, groupKey);
return Object.entries(groups).reduce(acc, entry => {
return {...acc, [entry[0]]: calc(entry[1], calcOnField)}
}, {})
}

The solution above, although it uses generics and we like that, is not very efficient, because it scans through the array twice, and it can be significant on large data.

Could we use both generics and scan only once?

This is very convenient, however, we scan through the array twice, once for grouping and once for scanning through each group to calculate the average.

We wish to be able to do so because then we can enjoy typescript goodies such as type inference, and have the keys that we group by and calculate by, given to us to be auto-completed by our IDE, and moreover, have the compiler protecting us against typos.
The answer to that question is “Of course we can”. All we need to do is calculate the average on the fly

Let’s look at the following function:

/**
The type below uses to simplify the code on the calcOnGroup function
**/
type tempAcc = {
counts: Record<string | number, number>;
results: Record<string | number, unknown>;
};

// Implementing our calculation
export function calcOnGroup<T>(
data: T[],
groupField: keyof T,
calcFunc: (prevValue: number, item: T, tempCount?: number) => unknown,

): Record<string | number, unknown> {
const tempResults = data.reduce(
(acc: tempAcc, item) => {

// For our convenience, let's extract some consts.
const groupFieldValue = item[groupField] as string;
const tempCount = acc.counts[groupFieldValue] || 0;
const result = (acc.results[groupFieldValue] as number) || 0;

// Here is where calculating functions such as avegage, sum, count etc. come
const newResult = calcFunc(result, item, tempCount + 1);

// Maintain counters for calculation that may need them
const counts = {
...acc.counts,
[groupFieldValue]: tempCount + 1,
};

// Maintain results to be returned
const results = {
...acc.results,
[groupFieldValue]: newResult,
};

return { counts, results };
},
{ counts: {}, results: {} } as tempAcc
);

// Finaly, you don't need the counters, just the results.
return tempResults.results;
}

Some calculations, such as average, involve the count of group members, however, we can use the temp count to calculate temporary values that will come as the desired result at the end.

Simple use: count how many shoe salesmen are there in each family

function count<T>(field: keyof T, value:string){
return (prev: number, item: T) => {
return item[field] === value ? prev + 1 : prev;
};
}

console.table(calcOnGroup(data, "family", count("profession", 'Shoe salesman')));
Shoe salesmen per family

Calculating the average on the fly

To calculate the average we can take the previous “average” and multiply it by (count -1)/count, which will be the average of all elements checked so far, add the next item’s value and divide the result by count

function average<T>(field: keyof T) {
return (prev: number, item: T, count: number) => {
return ((prev || 0) * (count - 1) + ((item[field] as number) || 0)) / count;
}
}

Now let’s use it to calculate the average

// average salary per family
console.table(calcOnGroup(data, "family", average("salary"));
// average salary per profession
console.table(calcOnGroup(data, "profession", average("salary"));
Average salary per family
Average salary per profession

Summary

Our task was to have useful API for aggregated calculation in a collection of data along with having only one scan through the data. The API we’ve developed here enables us to have it both. Now, all left for the API consumer is to know how to calculate the value on the fly and we can do it by scanning through the array of data only once.

Full code samples can be obtained here

--

--