Course: JavaScript

Progress (0%)

JavaScript Arrays Sorting

Chapter 29 27 mins

Learning outcomes:

  1. The sort() method
  2. Default sorting
  3. The comparison callback function
  4. Numeric sorting in decreasing order

Introduction

By far, one of the most desired operations to be performed on arrays is that of sorting. In JavaScript, this is accomplished using the sort() method.

We saw it first all the way back in the JavaScript Data Types chapter and then in the previous JavaScript Arrays Methods chapter. sort() is quite a peculiar method in that it could give extremely unexpected results when used to sort stuff.

There are many tiny details involved in the method that are necessary for us to know to be able to avoid unexpected results creeping into our programs due to it.

It's finally time to dissect sort() in immense detail. Let's begin...

The sort() method

If we were to define sort() to encompass all cases, we'd simply say:

The sort() method rearranges an array in an organized manner.

This is the right definition. Saying that sort() sorts an array in increasing order would technically be wrong since the method could also sort in decreasing order. The safest way to define it is to remain generic as we do in the definition above.

The syntax of sort() is pretty basic:

arr.sort([comparisonFn])

The optional comparisonFn argument specifies a function to use in comparing two values together to determine which one should come first and which one should come second.

We'll cover this arg later on. For now, let's see what happens when no arg is provided.

Calling sort() without an argument sorts the array in lexicographical order. There is a whole sequence of events performed each of which requires careful attention as soon as the method is invoked.

Let's see it in detail.

Default sorting

As soon as sort() is called without any sort of argument, the comparisonFn parameter is set to a sort-compare function defined internally by JavaScript.

You can read more details about this internal comparison function, as detailed in the ECMAScript spec, at https://tc39.es/ecma262/multipage/indexed-collections.html#sec-sortcompare.

The way this internal function compares two values together is quite interesting. Let's see it...

Given two elements a and b of the array, as arguments, the internally-defined comparison function goes as follows:

  1. Each of the values, a and b, is converted into a string primitive using the same logic employed by the String() function.
  2. Next, both the strings are lexicographically compared against each other.
  3. If a < b, the function returns -1. This instructs the sorting algorithm to put a before b in the sorted array.
  4. If a > b, the function returns 1. This instructs the sorting algorithm to put b before a in the sorted array.
  5. At this point, it is known that a and b are equal to one another. Hence the function returns 0. This instructs the sorting algorithm to keep a and b in their original order.

Note that the coercion into string for the given elements a and b is only done for the comparison — the values placed in the sorted array aren't these coerced strings themselves, rather they are the original values stored in the array before the sorting.

Now, in the whole sequence of steps above, there might be one confusing thing... Did you find any?

Well, it's the term 'lexicographically'.

So what exactly is meant by 'lexicographically comparing two strings'? Sounds pretty geeky? Well, it ain't!

Lexicographical comparison

Lexicographical comparison is just the way we'll compare two pieces of text naturally to determine which one should come first.

For instance, given the strings 'Python' and 'Perl', which one do you think would come first? What steps do you follow to reach to the correct order?

We compare the first character of both the strings together to determine whether there is a difference. If they are both the same, we move on to the second character. If they both are the same as well, comparison moves to the third character. This goes on until the end of one string is reached.

At this point, the lengths of both the strings are compared to one another. The one with a shorter length is said to be lexicographically smaller than the other one. However, if both the strings have the same length, then they are said to be lexicographically equal to one another.

Strings sorted this way are said to be in a lexicographical order.

For example, going with the string 'Python' and 'Perl' as given above, here's how we'd sort them lexicographically:

  1. We compare the first character of both the strings. Since, it's the same — 'P' — we move to consider the second character.
  2. The second character of 'Python' is 'y' whereas that of 'Perl' is 'e'. Since, 'e' comes before 'y' in natural alphabetical order, we say that 'Perl' is lexicographically smaller than 'Python'.

Hence, the strings would be ordered as 'Perl', 'Python'.

Now, in natural language, we are typically used to the 26 English alphabets from A to Z (ignoring the casing). However, in the context of a computer program, characters aren't limited to just 26 characters — there are thousands of them!

So how would a computer be able to lexicographically distinguish between strings that contain characters other than the 26 English alphabets?

To be precise, there are 143,859 characters representable in the piece of code at the time of this writing. Check out more details at https://en.wikipedia.org/wiki/List_of_Unicode_characters.

As an example, given the strings 'C--' and 'C++', how do you think would JavaScript be able to determine which one to put first?

Hmm... Seems like a real problem. Doesn't it?

Well, it ain't a problem. Unicode solves it very easily. 😎

Recall that strings in JavaScript are represented using the UTF-16 encoding scheme. In this scheme, each character is internally represented by a unique number, formally referred to as a code point.

Strings in JavaScript, and in other programming languages, are lexicographically compared using these code points.

That is, given two characters x and y, if the code point of x is smaller than that of y, this means that x comes before y in the Unicode's list of characters, and hence is said to be lexicographically smaller than y. And vice versa.

For instance, the lowercase character a has the code point 97 while the uppercase A has the code point 65. This simply means that A comes before a, and hence is lexicographically smaller than A.

Consider the comparison below to confirm this idea:

'A' < 'a' // is 'A' smaller than 'a'
true
'A' > 'a' // is 'A' larger than 'a'
false
'A' === 'a' // is 'A' equal to 'a'
false

Let's use this idea to see how would JavaScript determine which of the strings 'C--' and 'C++' is lexicographically smaller.

Here are the code points of each of the involved characters: C is 67; - is 45; and + is 43.

The first character of both the strings are compared against each other. Both have the code point 67, likewise, comparison moves to the second character. The second character of 'C--' has the code point 45 which is greater than 43, which is the code point of the second character of 'C++'.

This simply means that 'C--' is lexicographically larger than 'C++', and should consequently come after 'C++', if we were to sort both these strings in increasing order.

Let's see what's returned when we actually sort these two strings in an array:

['C--', 'C++'].sort()
["C++", "C--"]

Just what we computed. Perfect!

So this is the way sort() works by default.

People generally call this alphabetical order, but strictly speaking that encompasses only alphabetical characters — the term 'lexicographical order' on the other hand encompasses the entire range of characters in Unicode.

In the following code, we sort a couple of arrays using sort():

['a', 'Bother', 'call', 'Lost', 'lost', 'News'].sort()
["Bother", "Lost", "News", "a", "call", "lost"]
['Python', 'Perl', 'Apache', 'PHP'].sort()
["Apache", "PHP", "Perl", "Python"]

Now following from the fact that, by default, sort() rearranges elements in an array in lexicographical order, we might get pretty unexpected (but not erroneous) results when sorting an array of numbers.

Consider the following code:

[100, 25, 3, 70, 8, 10].sort()
[10, 100, 25, 3, 70, 8]

One might have expected sort() to correctly sort the given array of numbers, but that's just not the case. Something seems to be wrong!

Let's see why do we get this result...

Below we'll demonstrate just a couple of comparisons that might be made internally by sort() when sorting the numbers given above. In a real program, though, there would be many of these comparisons and maybe the ones that we're demonstrating here won't actually be performed.

When comparing 10 and 3 together, both the values are converted into a string. This gives '10' and '3', respectively. Lexicographically, 1 comes before 3, hence 10 is put before 3.

The same reasoning applies when comparing 70 and 8 together. When converted to strings, they become '70' and '8', respectively. Since 7 is lexicographically smaller than 8, 70 is put before 8 in the sorted array.

In short, we get an invalidly-sorted list of numbers.

The question is, how to solve this problem?

Well, we simply need to provide a function to sort() that tells it how to compare two numbers with one another. Let's see how to lay out this function.

The comparison callback

As we saw in the syntax for sort() above, the method accepts an optional function argument comparisonFn which is used to determine which of two values, when being compared, should come first in the sorted array.

This comparisonFn argument has the following syntax:

function comparisonFn(a, b) {
    // compare a and b here
}

a and b are simply the elements being compared by the internal sorting algorithm.

The interpretation of the return value of this comparison function is the same as the interpretation of the default internally-defined comparison function, which we saw above.

Let's quickly review it in terms of the comparisonFn argument:

  1. If comparisonFn returns a value less than 0, a is put before b.
  2. If comparisonFn returns a value greater than 0, b is put before a.
  3. If comparisonFn returns 0, the order of a and b is kept intact.

Alright, with this in mind, how could we set up a function to sort numbers correctly?

Any suggestions?

Well, let's set up such a function:

function compareNums(a, b) {
    if (a < b) return -1;
    if (a > b) return 1;
    return 0;
}

When a < b we return -1 to place a before b; when a > b, we return 1 to place a after b; otherwise a === b and so we simply return 0 to keep a and b in their original order.

With this comparison function in hand, let's try to use it to sort our old array of numbers:

function compareNums(a, b) {
    if (a < b) return -1;
    if (a > b) return 1;
    return 0;
}

var nums = [100, 25, 3, 70, 8, 10];

console.log(nums.sort(compareNums));
[3, 8, 10, 25, 70, 100]

As can be clearly seen, the numbers are now arranged correctly, thanks to compareNums.

The comparison function can't only be used to sort an array of numbers in increasing order. It could also be used to sort an array in decreasing order.

In fact, it could be used to sort an array based on tons of different criteria. We'll see many such examples in the next section.

Examples

In this section, we show a couple of examples of using sort() along with a callback function to sort an array in a number of various ways.

Sorting numbers in decreasing order

In the section above, we saw how to sort an array of numbers in increasing order. Now what if it's desired to sort it in decreasing order?

We have two options.

One is to first sort the array in increasing order using the method we've shown above, and then reverse the array using the reverse() method.

Consider the code below:

function compareNums(a, b) {
    if (a < b) return -1;
    if (a > b) return 1;
    return 0;
}

var nums = [100, 25, 3, 70, 8, 10];

nums.sort(compareNums); // sort in increasing order
nums.reverse(); // reverse the array

console.log(nums);
[100, 70, 25, 10, 8, 3]

Both the invocations, sort() and reverse(), mutate the original array.

The second way to sort an of numbers in decreasing order is to change the callback function. The change is simply to switch the return values of both the conditionals.

This is demonstrated as follows:

function compareNums(a, b) {
if (a < b) return 1;
if (a > b) return -1; return 0; } var nums = [100, 25, 3, 70, 8, 10]; nums.sort(compareNums); // sort in decreasing order console.log(nums);
[100, 70, 25, 10, 8, 3]

Here, instead of returning -1 for the first conditional, we return 1. This means that when a is less than b, the sorting algorithm puts b before a (or in other words, a after b).

Similarly, instead of returning 1 for the second conditional, we return -1. And this means that when a is greater than b, the sorting algorithm puts a before b.

Instead of switching the return values, we could've also switched the < and > operators.

This way of sorting in decreasing order is relatively faster as compared to the first one since there is no second run of iterations involved in reversing the array. However, in most cases, the performance benefit is negligible.

Sorting objects based on property value

Suppose we are given an array of various programming languages, each represented as an object with two properties: name specifying the name of the language and year specifying the year in which it was first released.

Something like the following:

var langs = [
    {name: 'JavaScript', year: 1995},
    {name: 'Python', year: 1991},
    {name: 'Java', year: 1995},
    {name: 'C++', year: 1989}
];

Now if we want to sort this array in increasing order based on the year in which each programming language was released, what could we do?

Well, we could replace the identifiers a and b in the callback function that we saw above, with langs.year. The behavior of sorting would regardless remain the same.

Consider the code below:

function compareYears(lang1, lang2) {
    if (lang1.year < lang2.year) return -1;
    if (lang1.year > lang2.year) return 1;
    return 0;
}

var langs = [
    {name: 'JavaScript', year: 1995},
    {name: 'Python', year: 1991},
    {name: 'Java', year: 1995},
    {name: 'C++', year: 1989}
];

console.log(langs.sort(compareYears));

Here's the console snippet:

Notice one issue over here. In alphabetic order, the word 'JavaScript' comes after 'Java', however, here the object with name equal to 'JavaScript comes first.

How to solve this?

Let's leave this as a task for you to complete...

Define the callback function compareYears() in such a way so as to sort the array langs shown above in increasing order based on the year, but at the same time making sure that the name property also follows an increasing order.

function compareYears(lang1, lang2) {
    if (lang1.year < lang2.year) return -1;
    if (lang1.year > lang2.year) return 1;

    // at this point, both lang1.year and lang2.year are the same
    // so the comparison shifts to the name property
    if (lang1.name < lang2.name) return -1;
    if (lang1.name > lang2.name) return 1;
    return 0;
}

var langs = [
    {name: 'JavaScript', year: 1995},
    {name: 'Python', year: 1991},
    {name: 'Java', year: 1995},
    {name: 'C++', year: 1989}
];

console.log(langs.sort(compareYears));

Here's the output snippet:

As you can see, now the object with name equal to 'JavaScript' comes after the object with name equal to 'Java', just as was desired.

"I created Codeguage to save you from falling into the same learning conundrums that I fell into."

— Bilal Adnan, Founder of Codeguage