# Sliding window median algorithm.

# Brief

In this article I would like to consider an efficient way of implementing the sliding window median algorithm. This algorithm is widely used at data analysis and efficient implementation is valuable for big data runtime analytics. Article will be based on a hard level “leet-code” task statement. Main goal is to develop and implement a fast possible solution. Points considered here will be the same for multiple algorithms based on the sliding window.

# Problem statement

In data analysis we often need to count the aggregating value of the sequential part of the dataset. Then we need to count this aggregating value for the next sequential part of the dataset that has equal size but sequential values selection moved front on the dataset line just by one element. Then this computation needs to be repeated several times after each move by one element.

This approach to computation known as computation in the sliding window. Sliding window is just a dataset part selection algorithm but exact aggregating computation algorithms can vary.

We need to consider a specific aggregation algorithm known as median computation.

This median can be computed different way — but in “leetcode” task statement notation median is:

*middle element of the ascendant ordered sliding window ( if elements counts is odd )**average of two elements in the middle the ascendant ordered of sliding window ( if elements count is even)*

# Design notes

# Sliding window median computation algorithm specific

Property of the aggregation algorithm defines the ability of the sliding window computation algorithm to be implemented in a runtime efficient way runtime wise. And median computation have some properties we need to consider:

*median depends only on elements on specific indexes of sliding window only**on ability of detect of this elements indexes with O(1) complexity we will have best possible solution*

These points work by just one sliding window position. But after moving of the sliding window one element forward on dataset we need

to recompute the median. This one step forward move also needs to be fast. To define the main approach for fast moving — let’s consider the following picture.

At the picture we find what is exactly going on with sliding window internals on the move:

*first we remove the first element from previous step from the sliding window**next we add new last element to sliding window at new position*

It is obvious that this action can have O(1) complexity for the one move in the best case.

To implement the algorithm we need to select or create the most appropriate container to store sliding window values. This container should to match the following parameters:

*store the data in ascending order**be tolerant to have duplicates in stored collection**be able to return middle element / near to middle elements of the collection with most efficient complexity**be able to add and drop element with best possible complexity*

Next we will consider several standard containers to find the complexity of the final solution for this container’s usage. Then we will consider the synthetic container that can operate with the best complexity.

**Vector based solution**

In the case of using a vector we can select any element with O(1) complexity.

*In total median selection complexity will be equal to O(N — K).*

N is dataset size and K is sliding window size

Algorithm will contain sorting of the vector data. This sorting expect to

Runtime wise a significant part of the algorithm is ascendant sorting of the sliding window. We need to make 2 types of sorting:

*Initial full sliding window sorting with complexity*

O(K*log(K)), K is sliding window size*Partial sorting for each of sliding window move with complexity*

O((log(K) + K) * (N — K) ).

Partial sorting will be done at each of (N-K) sliding window moves steps. For each of the moves we need to add one new element and delete the old one from the sorted sliding window in the vector with complexity contains:

*finding the appropriate place on a sorted vector to place new element with complexity O(log(K)).**inserting of new element at the vector have complexity O(K)*

Total complexity of the vector based solution will be:

*O(N-K) + O(K*log(K)) + O((N-K)*(log(K) + K)) =>*

O(N-K) + O(K*log(K)) + O((N-K)*K) =>

O((N-K)*K)

This solution is just one of two most promising trivial solutions based on standard container usage. Let’s look at the second one.

# Binary tree based solution

If we will use binary trees as base collection we will have sorted collection by design. Complexity of the sorting will be:

*Initial full sliding window put in the binary tree complexity is O(K*log(K))**Addition and deletion of the new and old element on all of the sliding window moves complexity O((N-K)*log(K))*

But selection of the middle elements at the sliding window to compute the median will not have O(1) complexity. We need to traverse from the beginning of the sliding window binary tree to the middle to get the median.

*Complexity of the median selection will be O((N-K)*K)*

This makes total complexity of binary tree bases median selection at sliding window algorithm complexity equal to:

*O((N-K)*K) + O(K*log(K)) + O((N-K)*log(K)) =>*

O((N-K)*K)

This complexity will equal the complexity of the solution based on the vector.

# Custom container based solution

Even if binary tree based implementation shows complexity that is not the best it has just one reason:

*selection of the middle element(s) of the binary tree tends to be long.*

If this issue can be resolved — then a solution can be considered as the best.

In the picture above we can see another representation of the sliding window. It is presented by 2 ( not just one ) binary trees. These binary trees have a limited size that equal to half of the sliding window size. These binary trees will contain the first and last half of sorted elements of the sliding window.

*Representation of the sliding window like this makes it possible to select middle elements of the sliding window with O(1) complexity.*

But it still is not clear how to make the move in this case and what complexity we will have. To make the move with container like this we need to split the sliding window move into 3 actions:

*add a new element to the part binary tree selected by comparison with the middle element of the sliding window. Complexity is O(log(K))**delete the old element of the binary tree from the part binary tree selected by comparison with the middle element of the sliding window. Complexity is O(log(K)).**balancing the front and back parts of the sliding window — move the element on the border from one part to another to meet half of the sliding window part size for each of the sliding windows. Complexity is O(log(K)).*

Let’s look at the picture to understand the action sequence and complexity computation order.

This tree criterions makes total complexity of the algorithm equal to:

*O((N-K)*log(K)) + O(K*log(K)) + O((N-K)*log(K)) =>*

O((N-K)*log(K)) + O(K*log(K)) =>

O(N*log(K))

This is the best possible runtime complexity can be achieved for the median of sliding window detection.

# Implementation notes

Runtime wise efficient implementation expects to use language that is appropriate for runtime efficient software development. I have use C++.

We need to have an implementation that will be tolerant to the duplicates at the sliding window. At C++ notation we have a standard container at Standard Template Library named “multiset” that implements the binary tree that is tolerant to elements duplication.

To minimize the memory footprint we need to avoid the current binary trees copying and implement all of the actions in place. It is good to use containers by reference at code or use move semantics of C++ to hold memory tight as possible.

# Conclusion

During this article the best possible runtime wise solution found for sliding window median detection.

Exposed approach to implement the move of the sliding window can be used for different purposes in an efficient way.

Thank you for the time you spend and let’s improve your analytics runtime by information from this article.