Exercise: Binary Search

Exercise 24 Easy

Prerequisites for the exercise

  1. PHP Array Sorting
  2. PHP Arrays — Basics
  3. All previous chapters

Objective

Implement the binary search algorithm in PHP.

Description

Searching for an item in an array is quite a common task in programming. For instance, given the array [1, 2, 3], we might be interested in finding out whether it contains 4 in it or not.

Programatically, there are two ways of searching for a given item in an array.

One is the straightforward and elementary sequential approach, referred to as sequential search (and sometimes even as linear search). In this approach, we iterate over the entire array in search of the desired value. As soon as it is found, searching is ended.

All array methods in JavaScript that are meant to search for elements, i.e. indexOf() and includes(), use the sequential approach.

The second way of searching demonstrates an extremely clever and efficient searching technique. It's called binary search.

The name comes from the fact that searching proceeds by consecutively halving the given array and then discarding one half where the item to search is guaranteed to not exist. (Halving refers to 2, and that's what 'binary' means.)

At each stage in a binary search algorithm, the middle element of the array is compared with the item to search.

  • If it gives a match, we are done.
  • If not, then the array is split apart into two subarrays right at the position of the middle element. The next step is to repeat this exact same process on the subarray where the value might exist.

One crucial point to note is that binary search requires the underlying array to be sorted. The halving technique of the algorithm relies on the very fact that the array is sorted; that's essentially how we're able to determine the subarray where the searched item might exist.

Shown below is a demonstration of how binary search would find the number 500 in the array [1, 2, 9, 20, 60, 100, 101, 200, 500]:

In this exercise, you have to create a function binary_search() implementing the binary search algorithm described above.

The function should take in two arguments, the first one being the value to search while the second one being the array. It should return back true if the item exists, or else false.

Shown below is an example of the function's usage:

<?php

function binary_search($val, $arr) { /* ... */ }

$arr = [-1, 0, 0, 10, 50, 600];

var_dump(binary_search(10, $arr));
var_dump(binary_search(-1, $arr));
var_dump(binary_search(600, $arr));
var_dump(binary_search(100, $arr));
bool(true) bool(true) bool(true) bool(false)

Notice how the array $arr is sorted before we provide it to binary_search(). This is a vital step for the correctness of the algorithm.

Hints

Hint 1

Use two variables, $start and $end, to keep track of the indices of the starting and ending elements of the subarray under inspection.

Hint 2

Compute the index of the middle element of the subarray using the expression floor(($start + $end) / 2).

View Solution

New file

Inside the directory you created for this course on PHP, create a new folder called Exercise-24-Binary-Search and put the .php solution files for this exercise within it.

Solution

The solution to this exercise boils down to implementing the binary search algorithm in PHP.

First, let's get the function binary_search() set up:

<?php

function binary_search($val, $arr) {
   // Code to go here.
}

Let's now get to the implementation...

At each step in a binary search, we operate on a subarray of the main array, searching for the item in that subarray.

First of all, note that there is no need to actually slice out subarrays from $arr. We could model them using two indices — one specifying the position where to begin the subarray while the other one specifying the position where to end the subarray.

Let's call these $start and $end, respectively.

At the start of the search, since we are concerned with the whole array, $start would be 0 while $end would be count($arr) - 1 (the index of the last element of $arr).

Given these two indices to represent the concerned segment of $arr, our next step is to read its middle element's value and compare that with $val.

But how do we determine where is the middle element? Let's see it on a couple of dummy numbers.

Given the indices 0 and 2, what would be the index of the middle element? 1? Yes. Now given the indices 7 and 9, what would be the index of the middle element? 8? You're right. Well done.

If you're a math guy, you'd know that computing the middle value of two numbers $a and $b could be simply done by ($a + $b) / 2. However, this expression could sometimes give us a decimal number which won't be suitable as an array index.

For instance, given the indices 0 and 3 (let's say to denote the array [1, 3, 5, 10]), (0 + 3) / 2 would be equal to 1.5.

Now, as we know, there is no such index in the world of arrays. Hence, it turns out that we couldn't directly rely on the return value of this expression.

Any other options? Anything else we could do to this expression?

Definitely yes! We could floor it.

So for instance, given the indices 0 and 3, as before, (0 + 3) / 2 would give 1.5 and then floor(1.5) would give 1.

This index could be used as the middle of the array, although technically, it won't truly be the middle (after all, there is no middle element of an array with an even number of elements).

To summarize this discussion, in order to compute the middle position of the concerned subarray, denoted by the indices $start and $end, we simply use floor(($start + $end) / 2). Let's call this index $mid.

Moving on, with the index of the middle element in hand, the next step is to read $arr[$mid].

There are 3 possibilities at this stage:

  1. $arr[$mid] is equal to $val. Voila! We've found the item. Search ends right here and we exit the function.
  2. $val is greater than $arr[$mid]. This means that $val would be to the right of this element (if it exists) because the array is sorted. Likewise, we discard the left subarray, including the value $arr[$mid], and continue with the right subarray for the next iteration.
  3. $val is less than $arr[$mid]. This means that $val must be to the left of this element. Likewise, we discard the right subarray, including the value $arr[$mid].

The 'discard' step above is accomplished by changing $start or $end to values that limits us only to the concerned subarray.

Now, since all the steps mentioned above are to be repeated for each subsequent subarray of $arr, they would go inside a loop. The question is: what condition to set for the loop?

Hmm. This is a good question.

In the worst case, a binary search algorithm would keep on halving $arr in search of $val uptil the point where the concerned subarray has a length of 0 or less.

More specifically, the last repositioning of the subarray leads to $start becoming greater than $end, which is counter-intuitive and signals that $arr has been exhausted.

In other words, as long as $start is less than or equal to $end, looping should go on.

Altogether, we get to the following code:

<?php
               
function binary_search($val, $arr) {
   $start = 0;
   $end = count($arr) - 1;

   while ($start <= $end) {
      $mid = floor(($start + $end) / 2);
      if ($val > $arr[$mid]) {
         $start = $mid + 1;
      }
      elseif ($val < $arr[$mid]) {
         $end = $mid - 1;
      }
      else {
         return true;
      }
   }
   return false;
}

Now, let's test the function:

<?php

function binary_search($val, $arr) { /* ... */ }

$arr = [-1, 0, 0, 10, 50, 600];

var_dump(binary_search(10, $arr));
var_dump(binary_search(-1, $arr));
var_dump(binary_search(600, $arr));
var_dump(binary_search(100, $arr));
bool(true) bool(true) bool(true) bool(false)

It works perfectly.

Time complexity of binary search

There is a diverse topic in computer science known as asymptotic algorithmic analysis, where we study the running times of different algorithms (and even their memory consumptions, but that's not important for now).

There is quite a decent amount of math involed in it, although not something like rocket science. Asymptotic analysis is an extremely useful tool for analysing the performance of various algorithms for a given problem.

Let's have a look over the time complexities of the two searching approaches we saw above, namely linear search and binary search.

The running time of linear search, also known as sequential search, is denoted as ::O(n)::, while that of binary search is denoted as ::O(\log_2n)::. For now, it's not important to understand the details of these notations. We'll just suppose that given an array arr with length ::n::, each expression returns the number of milliseconds taken to search for val in arr in that respective approach.

That is, linear search takes ::n:: milliseconds and binary search takes ::\log_2n:: milliseconds.

Now, let's say that ::n = 100,000,000::, which is not surprising these days when datasets are humongous!

Linear search would take:

::n \space \text{ms} = 100000000 \space \text{ms} \approx 28 \space \text{hrs}::

However, binary search would take:

::\log_2n \space \text{ms} = \log_2100000000 \space \text{ms} \approx 27 \space \text{ms}::

See the difference? It's enormous, jaw-dropping, insane, and what not!

The implementation using linear search would take over a day to complete, whereas the one using binary search would be ready with an answer even faster than the blink of an eye!

This is the magic of binary search. No matter how enormous the main array becomes, binary search would complete in a very short span of time.

Now you might think that is binary search actually used out there. A big big yes! It's used in almost every application that deals with hefty amounts of data on which searches ought to be conducted frequently.

The world of computer science is just huge!