Photo by Nathan Dumlao on Unsplash

Things every software engineer should know: Map & Filter

Doogal Simpson
9 min readJul 3, 2020

--

TL:DR

Map allows you to map a whole stream of objects from one type to another.

Filter allows you to selectively choose which objects in a stream make it to the next stage of the stream.

A stream could be an array held in the memory of your application, it could also be a collection of events spread across multiple servers being processed in parallel.

Map is the stream equivalent of a for loop, filter is the stream equivalent of an if statement.

They can allow you to write code in a more concise manner, from this:

BookEntity[] bookEntities = getBooksFromDatabase();
BookDTO[] bookDTOs = new BookDTO[](bookEntities.length);
for (int index = 0; index < bookEntities.length; index++) {
BookEntity bookEntity = bookEntities[index];
if (bookEntity.isNotSoftDeleted()) {
BookDTO bookDTO = convertEntityToDTO(bookEntity);
bookDTOs[index] = bookDTO;
}
}
return bookDTOs;

to this:

return readBookStream()
.filter(bookEntity -> bookEntity.isNotSoftDeleted())
.map(bookEntity -> convertEntityToDTO(bookEntity));

The traditional way of writing control flow

Many programming languages have similar concepts, one of the most common ones is that of the ‘for loop’ and the ‘if’ statement. If you’ve done almost any type of programming you will have come across them, these are some of the most foundational statements from which the complexity of our applications are built up.

The for loop is typically used when you want to iterate over an array or some other kind of ordered collection, like an ArrayList.

Let’s say we wanted to convert an array of BookEntity into an array of BookDTO. This is a pretty standard operation where a BookEntity is the database representation of a book, and the BookDTO is the externally facing representation. That conversion might be written as:

BookEntity[] bookEntities = getBooksFromDatabase();
BookDTO[] bookDTOs = new BookDTO[](bookEntities.length);
for (int index = 0; index < bookEntities.length; index++) {
BookEntity bookEntity = bookEntities[index];
BookDTO bookDTO = mapEntityToDTO(bookEntity);
bookDTOs[index] = bookDTO;
}
return bookDTOs;

Perhaps there is the extra caveat to this method that we only want to convert and return books that have not been soft deleted. Where a soft delete is when a book has been ‘deleted’ in all user facing senses, but the row is kept in the database, to allow things like reporting / historical records / referential integrity.

In the situation where we are only returning books that have not been soft deleted, our method might look something like this:

BookEntity[] bookEntities = getBooksFromDatabase();
BookDTO[] bookDTOs = new BookDTO[](bookEntities.length);
for (int index = 0; index < bookEntities.length; index++) {
BookEntity bookEntity = bookEntities[index];
if (bookEntity.isNotSoftDeleted()) {
BookDTO bookDTO = convertEntityToDTO(bookEntity);
bookDTOs[index] = bookDTO;
}
}
return bookDTOs;

At this point, the for loop and if statements should not be particularly controversial, this is all fairly mundane stuff, moving data from some form to another.

This way of iterating over books and returning only the books that have been soft deleted is all well and good so long as the array of books can fit inside the memory of your application.

How can we deal with the situation where not all the books fit in memory?

What happens if the books are returned in batches over a period of time?

Streams

One possible solution to these questions would be to change to using streams instead of traditional for loops and ifs.

What are streams?

Streams are a different way of looking at control flow and input data. Instead of dealing with the entirety of an array, they deal with the individual elements. Each element of the array is processed as an independent piece of data.

You could think of a for loop on an array as a little like setting up a crafting bench.

You can fit the whole array on the bench, you can work on any part of the array as you see fit, doing any operation you want to it. You could combine several elements within the array if you choose to, you have the ability to make any change you want because you have everything there, laid out in front of you.

In this analogy, a stream is more like a conveyor belt.

The array is split into its component elements and one by one they make their way down the conveyor belt. The same operations are applied to every element in the stream, one after another. You can’t have an operation that combines arbitrary elements together, it needs to operate across every element in the stream.

With the conveyor belt, you don’t have the ability to lovingly handcraft each individual element of the array into the form you want, anything you do needs to be done to the whole stream in the same way. The upside, is that the array does not need to fit on your bench, it could be massive, it could arrive periodically in big chunks, doesn’t matter, you can throw what you have on the conveyor belt and it will process every element the same way.

Why are streams popular?

You may have come across streams already in your coding career, they are being added to most of the popular programming languages.

As the previous section alludes to, they are a popular choice for dealing with situation where a traditional control flow would not work.

Things like:

Event driven design — There is a push for software architectures to move towards more event driven designs, particularly in microservice architectures. This involves having systems send out lots of small, independent events and expecting the rest of the system to update itself accordingly. These events tend to lend themselves to the stream model of processing quite easily.

Big data — What even is big data? It’s a buzzword, in this case I’m using it to describe the size of data that would not fit into a single application so it is impossible to process by a single server in one go. The stream model of processing allows data to be processed by several different servers in parallel and then passed to whatever downstream service is interested in the information.

Reactive programming — Events tend to occur in a way that is spread out across a long period of time, rather than all being available all at once. As streams deal with individual elements rather than the collection as a whole, they can happily process an event now, followed by a second event 5 minutes later. There are advantages to adopting a model that reacts to the world rather than trying to force the world to work the way the software is written.

Each one of these subjects is massive and could definitely have their own stories written about them, suffice to say, people like streams because they allow you to deal with more complex situations.

How do map and filter relate to streams?

Map and filter are the stream equivalents (roughly speaking) of the for loop and the if statement.

They are both functions that take a function as an argument:

Map — Takes a function that reads the input type of the stream and returns some other type. Map will apply that function to every element in the stream. In doing so it ‘maps’ the stream from an input type to an output type.

In the example of books, say we had a stream of BookEntity, the map function on this stream would apply whatever function we wanted to every element in the stream. In our case we want to convert the entity to a DTO:

Stream<BookEntity> bookEntityStream = readBookStream();
Stream<BookDTO> bookDTOStream = bookEntityStream.map(
bookEntity -> convertEntityToDTO(bookEntity)
);
return bookDTOStream;

In this example, every BookEntity in the input stream is converted to a BookDTO by the function that was passed to the .map() function.

Filter — Takes a function that reads the input type of the stream and returns a boolean. Any element for which that function returns false is not passed to the next stage in the stream.

In the situation where we do not want to return any soft deleted books, we can use a filter to ensure those books do not make it to the rest of the stream:

Stream<BookEntity> bookEntityStream = readBookStream();
Stream<BookEntity> nonSoftDeletedBooks = bookEntityStream.filter(
bookEntity -> bookEntity.isNotSoftDeleted()
);
Stream<BookDTO> bookDTOStream = nonSoftDeletedBooks.map(
bookEntity -> convertEntityToDTO(bookEntity)
);
return bookDTOStream;

Typically you would chain all these calls together so that the end result is a bit easier to read, something like this:

return readBookStream()
.filter(bookEntity -> bookEntity.isNotSoftDeleted())
.map(bookEntity -> convertEntityToDTO(bookEntity));

When you express the stream in this chained way you also start to see why streams are popular with engineers looking for more concise code. Compared to the for loop example near the beginning of the story, this way of writing the same piece of code is a lot more dense. This is because the .map() and .filter() functions are handling a lot of the boilerplate code that you’d usually be writing with a for loop.

Naming

Not every language uses the same syntax for the map and filter operations, but they all do the same basic thing that has been described in this story.

Java — The concept of a stream is captured by the Stream<> class, it has methods for .map(), .filter() and several others.

Javascript — There is not really an out of the box implementation of something like a stream, it generally needs to be converted to an array. The array object does come with a .map() and .filter() method on it though, more details can be found here.

C# — The concept of a stream is captured in a library called LINQ, the .map() function is called .Select(), the .filter() function is called .Where(). There is also a syntax which resembles SQL. More information can be found here.

Python — The map() and filter() functions are built into the language and either work on lists or iterators depending on if you are using python 2 or 3. The iterator is closer to a stream because they do not necessarily require the full collection to be defined at the beginning.

What map / filter are good for

As the names might suggest, map is good at mapping a collection of data from one type of object to another type of object. Filter is good at filtering items out of a collection to create a new collection.

The usefulness comes from being able to treat each piece of data as independent from the rest so these operations can be performed on a collection of data that is spread out. Both, spread out across several processes, and spread out across time, this can be useful for complex systems.

In simpler systems, map and filter can be useful in expressing a common use-case of converting and filtering an array in a form that is more concise and some would say, easier to read.

What map / filter are not good for

Map and filter aren’t useful in situation where the assumption that the elements being independent breaks down. For example if you wanted to do some calculation that involved combining multiple elements, like subtracting the value of the first 3 elements in a list from all future elements, or calculating a moving average over a series of transactions.

It is possible to do this kind of operation if state is being recorded outside the map function, but at that point the map function is probably being misused, particularly as this breaks down the concept of having a side-affectless function.

The ordering of elements is not usually guaranteed when using streams, there are some specific examples where creating a stream from an ordered collection like an array will usually mean the stream is ordered. In general, streams are not guaranteed to process elements in order, so having map or filter functions that in some way rely on ordering of elements can lead to difficulties.

Summary

Traditional control flow uses for loops and if statements to operate on arrays of objects.

When the array of objects becomes too large or when the objects are not being sent in one go, using a for loop becomes more difficult, one solution is to use the concept of streams.

Map and filter operations are the stream equivalents of for loops and if statements.

They allow you to write something like this:

BookEntity[] bookEntities = getBooksFromDatabase();
BookDTO[] bookDTOs = new BookDTO[](bookEntities.length);
for (int index = 0; index < bookEntities.length; index++) {
BookEntity bookEntity = bookEntities[index];
if (bookEntity.isNotSoftDeleted()) {
BookDTO bookDTO = convertEntityToDTO(bookEntity);
bookDTOs[index] = bookDTO;
}
}
return bookDTOs;

into this:

return readBookStream()
.filter(bookEntity -> bookEntity.isNotSoftDeleted())
.map(bookEntity -> convertEntityToDTO(bookEntity));

About the author

Hi, I’m Doogal, I’m a Tech Lead that has spent a bunch of years learning software engineering from several very talented people and these stories are my way of trying to pay it forward.

During my time as a Tech Lead I have mentored a lot of new software engineers and I have found there is often a situation where engineers don’t know what they don’t know. So this “Things every software engineer should know” series is a cheatsheet of the information I’d give myself in my first year of doing software.

Software is a large subject and the golden rule is that the answer to any question can start with “It depends, …”, as a result, the information in these stories is not complete. It is an attempt to give the bare essential information, so as you read these stories, please keep in mind that the rabbit hole goes deeper than the subject matter displayed here.

I can be found on Facebook, LinkedIn or Doodl.la.

--

--

Doogal Simpson

Technical Lead at pokeprice.io, spent some years learning software and now trying to pay it forward.