Is It Me You’re Looking For?

Searching Algorithms And Their Implementations

Rick Nilon
6 min readMar 20, 2018

When writing code or software, you will inevitably come across a situation where you need to find a certain value in a given data set. There are a few different ways to go about this search. I will discuss both Linear and Binary searches, and another method for improving the performance of the linear search. As far as optimization goes, each method has a different outcome. When comparing performance differences between actions, it is standard to reference the worst case situation, meaning what would the performance be if the item was found in the last execution of the function. For the linear and binary searches, we will see what the worst, best, and average case situations will be.

Worst Case Situation

Linear Search

A linear search literally goes through every item in a data set piece by piece until it locates the item you want to find. To find item x at index n in an ordered list, it would take me exactly n steps to find the item. This is called O(n). For a small list, this is fine and poses no problem. For example, if I have an array that looks like this: [1, 2, 3, 4, 5, 6, 7], and would like to find the number 5, it would take me exactly 5 steps to find the number I’m looking for. Lets take a look at how this works in Javascript with a sorted array.

let sortedArr = [ 2, 4, 5, 6, 8, 10, 16, 32 ]
function linearSearch(num){
sortedArr.forEach(function(item, index){
if(item === num){
console.log(`I found ${item} at index ${index}!`)
return
}
})
}
linearSearch(32)

Working Example: https://repl.it/@r_nilon92/sortedLinearSearchExample

So for each element of the sortedArr, we are checking if the element at each index equals the number(num) we passed in to our function as an argument.

This method of searching is perfectly fine for a small array such as this one. However, for a larger list of data, say one containing 4,000,000,000 items, this would pose an inconvenience. In the worst case performance (such as the number you’re searching for being the last item in the array, or not even existing!), it would take 4,000,000,000 steps, or O(n) to find the number you’re looking for. In the best case, it would be exactly one step (if the element you are looking for is the first element in our array), or O(1). The average case for this type of search is O(n). As you can plainly see, this is not the most efficient way for cases involving larger sets of data. And as programmers, we are always looking for more efficiency.

Binary Search

Binary search is also pretty straightforward and easy to understand, although it requires a bit more logic than the linear search to implement. Binary search, also known as half-interval search, logarithmic search and, my personal favorite, binary chop, is a much more efficient way of looking for an item in a particular data set. The only drawback is that the data that you are looking through needs to be sorted.

To find an item in an ordered array, you would first take a look at the midpoint in this array. If the midpoint is greater than the number you want to find, then you would limit your search to the lower half and discard the upper half. Likewise, if the midway point is less than the number you are looking for, then you would limit your search to the upper half. The number exists in only one of the halves, and so the computer discards the half that the number you are looking for can not belong in. This process occurs repeatedly until the number is found. In computer science, this resulting performance is called O(log N) . To see the difference in the steps taken in linear vs. binary, take the log value of the amount of items in the array you want to search through. Say we have an array with 8,000,000,000 items in it. Thats a lot to search through! Lets cut that in half. Now we have 2,000,000,000 items in which we know the number we want resides in. To see how many steps this would take to find our item, we can compute the binary logarithmic function of log2 8,000,000,000, which gives us 32. While 32 is a lot of steps to find a number, it is considerably shorter than the 8,000,000,000 in our worst case linear search. In the worst and average case scenarios, you find item x in O(log n) time, while the best case scenario is O(1).

let orderedList = [ 2, 3, 5, 7, 8, 10, 12, 15, 18, 19, 24, 87, 111 ];function binarySearch(arr, num) {

let lowestNum = 0;
let highestNum = orderedList.length - 1;
let midPoint;
let checkMidpoint;

while(lowestNum <= highestNum) {
midPoint = Math.floor((lowestNum + highestNum) / 2),
// midPoint = arr.length/2
checkMidpoint = arr[midPoint];
if(checkMidpoint === num) {
console.log('I found the number you were looking for at index: ' + midPoint + '! The value is ' + checkMidpoint + "!");
return;
}
if(num < checkMidpoint) {
highestNum = midPoint - 1;
}
else {
lowestNum = midPoint + 1;
}
}

return `I couldn't find ${num} in the list.` ;
}
binarySearch(orderedList,10);

Working Example: https://repl.it/@r_nilon92/Binary-Search-Example

Binary search works great in this particular situation. But this only works in situations where your data is sorted. What happens if your data isn’t sorted?

Changing Your Data Set To Be More Efficient With Linear Searches

Often times when searching for data, we want to look for the same data again and again. In many cases, 80% of our searches reside in 20% of our search area. This is called the Pareto Principle, named after a 19th century economist and engineer named Vilfredo Pareto. The Pareto principle originally was applied to income inequality and distribution, but it also has found a place in computer science in regards to optimization. For example, Microsoft found that 20% of bugs in their software caused 80% of crashes.

If we take this principle that 80% of our searches reside in such a small area, we can move these most common searches to the front of our data set, thereby making our search in many cases much shorter than it would be otherwise. Lets take a look at an example.

let unsortedArr = [2, 5, 3, 7, 8, 10, 15, 18, 24, 111, 12, 19, 87];
function linearSearch(num) {

unsortedArr.forEach(function(itemValue, index, array){
if(itemValue === num) {

if(index > 0) {
organizeArr(index, array, index -1);
}
}
});
return unsortedArr;
}
function organizeArr(index, array, indexSwitch) {
var tempCurrentValue = array[index];
//tempCurrentValue stays the same
array[index] = array[indexSwitch];
array[indexSwitch] = tempCurrentValue;
}
console.log("Before Organization: " + unsortedArr);
var first = linearSearch(7);
var second = linearSearch(7);
var third = linearSearch(7);
var fourth = linearSearch(7);
console.log("After Organization: " + fourth);
/*
Before Organization: 2,5,3,7,8,10,15,18,24,111,12,19,87
After Organization: 7,2,5,3,8,10,15,18,24,111,12,19,87
*/

Working Example: https://repl.it/@r_nilon92/LinearSearchWIthOrganization

By changing our data over time to reflect the most common searches, we can make our average performance better than it would normally be.

Summary

So which is better? Linear or Binary? Generally, we would want to use linear searching algorithms when the data we are working with is unsorted. We can make this process more efficient by creating a function that moves the most common searched for items towards the front of the array. For sorted arrays, especially when the array is of a great size, binary searching algorithms are more desirable and are considerably shorter to execute.

Sources

--

--

Rick Nilon

Full Stack Developer, Programmer, Musician, Blogger!