Extract me so you do not copy-paste me

Okay, so I suppose we all know that copy-paste code is quite bad for our project. Sometimes, though, it can be difficult to spot how we can reuse a piece of code in multiple places. Recently this very problem manifested in our project so let me share how we approached it.


Currently my colleagues and I are working on a project with ElasticSearch as data source. I am mentioning this so that you have an idea why the data we use is so nested.

What we want to build are two screens, each showing a MultiBarChart. The data for the first screen is the following:

{
"aggregations":{
"profiles":{
"buckets":[
{
"key":"p01",
"doc_count":1

},
{
"key":"p02",
"doc_count":2

}
]
}
}
}

We want the profile key to be the key for each object in the array that the MultiBarChart is expecting. The value inside the values array has to equal the doc_count for the profile. The structure so that we are able to properly show the chart is as follows:

[
{
"key": "p01",
"values": [
{ "label": "some text", "value": 1 }
]
},
{
"key": "p02",
"values": [
{ "label": "some text", "value": 2 }
]
}
]

Easy peasy:

aggregations.profiles.buckets.map(profile => {
key: profile.key,
values: [{
label: 'some text',
value: profile.doc_count
}]
});

Okay, now we have the first screen. Now lets build the second screen. The data for the second screen is a bit more complex:

{
"aggregations":{
"groups":{
"buckets":[
{
"key":"4000",
"doc_count":1,
"profiles":{
"buckets":[
{
"key":"p08",
"doc_count":8

}
]
}
},
{
"key":"5950",
"doc_count":1,
"profiles":{
"buckets":[
{
"key":"p01",
"doc_count":1

}
]
}
}
]
}
}
}

Wow! Let that sink. Our goal here is the same. Each profile key has to become a key for a top level object inside an array. The group key on the other has to become the label inside the values array that is contained inside each top level object. So what we need is the following:

[
{
"key":"p08",
"values": [
{ "label": "4000", "value": 8 }
]
},
{
"key":"p01",
"values": [
{ "label": "5950", "value": 1 }
]
}
]

To achieve this we might do something along the following lines:

aggregations.groups.buckets.reduce(function (acc, group) {
return acc.concat(group.profiles.buckets.map(profile => {
key: profile.key,
values: [{ label: group.key, value: profile.doc_count }]
}));
}, []);

Okay, I would assume that this seems pretty obvious at the moment. But imagine, just for a moment, that the returned data is handled by two different functions and each of this functions does a bunch of looping, mapping, extracting, etc. Looking at the implementation directly we were not able to identify how logic can be reused. Maybe this is the most important point I want to make — comparing not the implementation but the returned data will give you a clear idea how to reuse some piece of code.


Now, on to refactoring the implementation so that is reuses any code possible. To be able to do that we need to extract some logic into a separate function. Looking at the data we identified that the profiles part is pretty much the same in both the responses. Obviously the label is different so we extract it as a function parameter. Another difference is accessing the aggregation property — we decide that this should be done from the invocation point. We are left with the following function:

const createBarChartData = function (profiles, label) {
return profiles.buckets.map(profile => {
key: profile.key,
values: [{
label: label,
value: profile.doc_count
}]
});
}

Reusing this function in both places that handle the different data responses is as follows:

const extractProfiles = function (res) {
return createBarChartData(
res.data.aggregations.profiles,
MESSAGES.DEFAULT_PROFILE_MESSAGE
);
};
const extractNestedProfiles = function (res) {
let groups = res.data.aggregations.groups.buckets;
return groups.reduce(function (acc, group) {
return acc.concat(
createBarChartData(group.profiles, group.key)
));
}, []);

Right, we have done it!

It may seem pretty straightforward in isolation but believe when I tell you that the two functions that handled the different responses were in pretty bad shape. The example is pretty contrived as well — the createBarChartData function does quite a bit more in our case but I think this is enough to prove a point.

Have any ideas? Drop them in the comments below ; )