Learning data structures – Search algorithms for arrays

Which are the common techniques to search for a key in an array? This is Learning data structures and in today’s series I will go through the fundamental search algorithms for arrays.

In previous post, I did focus on arrays in C#, more specifically, I went through a brief overview of data structures and then I followed with arrays and how an array is created and managed in the C# language. I’ve been through various types of arrays and showed some fundamental operations on them, like insert, search and delete. In this post, I will focus on the search operation, which tends to concern us, as programmers, a lot.

Source code

You can find the code that is outlined in this article on my GitHub repository.

Why do we care about search anyways?

Search is an important operation of any kind of system. Many systems in fact are based on retrievals, while others have retrievals as core part of the system, so core that they cannot live without. These systems tend to be read optimized.

Applications throughout the world perform retrieve operations. Thousands of millions, or billions or trillions of operations are retrieval. In order for you to see this very text, the backend is retrieving this information from a database.

Imagine this scenario. You open your Twitter timeline and you wait for your tweets to load there. What do you expect when you open the timeline? Do you expect tweets to load fast? Do you expect tweets to load fast while you scrolling? I am guessing the answer should be yes, personally I don’t want to wait for ages for my tweets to load. I want them to load and I want them to show fast! This is UX 101 here!
Would you return to an application that makes you wait and wait to retrieve some information for you? Most likely not, right?

Now that we understand why search combined with speed is an important aspect of every application, let’s discuss about this operation on arrays.

Search on arrays

There are two types of search algorithms that can be used on arrays.

  1. Linear search
  2. Binary search

Let’s dissect each one of them.

Linear search

This type of search is nothing more than a plain ol’ simple table scan. If you recall, from the previous post, the code there was a linear search implementation. What we do is to start from the beginning of the array, the 0 index, and move forward until we reach the end of the array. Doing that, we ensure that we have searched the entire array, so if the element is there, we definitely are going to find it.
Here’s a reminder from the previous post.

As you can see, this is a dead simple algorithm to implement. It works excellently on unsorted sets, so if our array has element in random order, then this is the way to search for the given key.
There is a huge caveat though with this approach. The algorithm is fundamentally based on the number of elements it loops through, in other words, how fast it will retrieve the given key is entirely up to the number of elements stored in the array. Look at the for loop on the gist above. It loops and loops and loops until the if statement within succeeds.

So, for an array of N elements, the time to retrieve a given key, K, would be relative to O(N). This is huge caveat over large arrays, the retrieve operation might take a long time to bring back results, as it needs to go through each element in the array. On the other hand, if an array is small and speed is not a concern, this is definitely a good, simple way to retrieve data.

This is a major disappointment, how am I supposed to look for an element in a large array, if that technique is highly inefficient? Good news, there is another algorithm that can be used for this purpose.

Pros

  • Useful when speed is not a concern. In this cases, retrievals are not very frequent for this kind of system.
  • Can work predictively on unordered sets.
  • They allow fast inserts, as an insert is actually an append for the underlying data structure.

Cons

  • Very slow, especially in large tables this operation is inefficient.

Binary search

This algorithm goes back in time of computer science, back to 50’s with mentions and variations, before it was further developed in early 60’s and 70’s.

The idea is simple. Over a sorted set of elements, split the set in half, match the middle element with the key element. If they do not match, move to the lower or upper part, if the key has lower or higher value than the array element in middle, split the subset further in half, until you reach a trivial point, in which you either return success or failure.

Let’s look at this in kinda pseudo code.

With this kind of search algorithm, we can efficiently look for a given element in large arrays, guaranteeing fast retrieval. Of course, this only applies to sorted sets, as you can imagine from above, if you try to perform these steps in an unsorted set, you will not get back the results you might expect.
What follows, is an implementation in C#

Notice how the algorithm breaks the problem into smaller pieces, that is, the large array into smaller arrays. That is called, divide and conquer, you break a large problem into smaller, until it becomes trivial to solve.

Look at the following .NET Fiddle to get a detailed step-by-step walkthrough on the algorithm. I have an array of 10000 numbers and I am looking for 440 and 223. The first takes 12 tries, while the latter needs 9 tries to find the number successfully. I have also added an F# equivalent of this method, just to compare how the code looks using a recursive algorithm.

As you see from above, the F# equivalent is much more concise, using the pattern matching technique.

Tip: The same imperative code for C# can be written as a recursive function. A while (true) loop can be easily transformed to a recursive function, just identify the trivial case and you are done.

Performance of this algorithm is logarithmic, so, in order to calculate its time complexity you would have to calculate the log base 2, because this is a binary search, thus we use the binary logarithm, in the big O notation, which is O (log N). This is incredible speed for an algorithm, not the best, but certainly an improvement over the O (N) of previous case.

Pros

  • Very fast searches, even on huge tables.

Cons

  • Table must be ordered. This means that insert takes longer, as space for the newly added item must be made.
  • As of above, this does not work on unordered sets.

Cons for both

Both algorithms prove to be slow on deletes, as the deleted item is removed, the remaining items must move to fill the gap.

Fun times

One of the most known applications of binary search is the Guess the number game. Two players are required for this. Rules on this game are simple. Given a set of ordered numbers, player #1 picks one without telling player #2. Now the second player has to guess the number. If he gets it right, happy days, else he must know if that number is greater or not from the number he just guessed.

This information is valuable, because if I know that the number in question is greater than my guess, I just simply have to look on the rest of the set (to numbers greater or lower than my last guess).

Now, if you randomly picking up numbers, you might find it, you might not, if the set is large enough. But still it will take awfully lot. By performing a binary search though, you are guaranteed to find the number in question in just a few tries.

I leave this to you as an exercise. Code related to this problem can be found on the GitHub repository as well.

Summary

In this post we’ve gone through the linear and binary search algorithms. We’ve seen that a linear algorithm depends on the number of elements stored in the data structure, which might produce times up to O (N) in worst case, but for sorted sets the binary search algorithm saves the day, with O (logN) performance. The downside of this approach is that it requires a sorted array, which affects the insert time, as we want the elements to be in order for this algorithm to work properly.

If you liked this blog post, please like, share and subscribe! For more, follow me on Twitter @giorgosdyrra.


This post is part of the Learning data structures series

  1. Learning data structures – Arrays in C#
  2. Learning data structures – Search algorithms for arrays
  3. Learning data structures – Stack in depth in C#
  4. Learning data structures – Tuples in C# and F#
  5. Learning data structures – Queue in depth in C#
  6. Learning data structures – Priority queue
  7. Learning data structures – Simple sorting algorithms
  8. Learning data structures – Linked lists
  9. Learning data structures – Doubly linked lists
  10. Learning data structures – Recursive algorithms
  11. Learning data structures – Advanced sorting algorithms
  12. Learning data structures – Binary trees
  13. Learning data structures – Heap
  14. Learning data structures – Hash tables
  15. Learning data structures – Graphs