Autocomplete Matching Expressions

Chapter 7 14 mins

Learning outcomes:

  1. What is matching
  2. Different kinds of matching
  3. Combining everything

Introduction

In the last discussion on sorting, we saw how to order suggestions and group labels in an autocompleter as they're shown in suggestionsBox, either in ascending order or in descending order. Uptil this point in this tutorial, we've examined more than a handful of autocomplete subfeatures such as navigation via arrow keys, selecting suggestions using the enter key or mouse pointer, grouping related suggestions, sorting items and much more.

All these subfeatures didn't limit the number of items considered to be a match for a given query - rather they just changed the way given items were laid out in suggestionsBox, or made them interactive that's it. Now in this chapter, we shall cover one of the most useful subfeatures of autocomplete sitting right at its core - searching.

In all the previous chapters, we've sticked to simple matching i.e abc matches abcdef, st matches pasta etc. In this chapter, we shall explore other types of matching, code and ultimately add them to the arsenal of subfeatures of our autocompleter.

Let's begin.

What is matching?

An autocompleter performs a serious amount of searching in order to find possible matches of a given query.

The part where a list item is examined on whether it's match of a given query or not is crucial to an autocompleter. It governs how many items are considered to be suggestions of a given query, and likewise displayed in suggestionsBox.

If the matching part is erroneous, the whole autocomplete script becomes useless - nothing will be shown in suggestionsBox and consequently no subfeature will get a chance to show off its talent.

Uptil this point, we've been using a simple indexOf() based matching expression in our autocomplete script to check whether a given list item contains the entered query "exactly as is".

For example, if the query is "ab", the strings "absolute", "abbreviate", "fabulous" will be considered matches, however the strings "baseball", "harbour" won't, since they do not contain the substring 'ab' in them.

Obviously, in addition to this, we had also normalised the cases of both the query and the list item in order to make the matching case-insensitive.

Apart from this simple matching, we can employ other matching expressions as well, as detailed below:

  1. The characters of the query are searched for in the given list item and if it contains those characters in the original order, the item is considered to be a match.
  2. The query is searched for in the given list item right at its beginning. If it does exist at the beginning of the item, the item is considered a match. For example, 'a' matches "australia", but not "malaysia", because "malaysia" starts with an 'm', and not an 'a'.

This whole chapter aims at teaching you the skills to be able to create any matching expression you want for your autocompleter, not just showcasing two such instances to you. How exactly do we implement these two instances of matching expressions will surely help you build these skills.

Looking for similar characters

One criteria for considering a list item to be a match for a given query is to look for the characters of the query in the list item, in their original order. If the item contains the characters in their original order, it is marked as a match of the query.

Let's say we have list equal to ["Python", "JavaScript", "Platinum", "Typing"].

In what we call character search, if the query entered in inputArea is 'pt', the matches would be "python", "javascript", and "platinum", since they do contain the letters 'p' and 't'; with 'p' coming before the 't'.

"Typing" isn't considered a match, although it does contain 'p' and 't'. This is because, the order of letters in "Typing" isn't the same as in "pt".

In the query "pt", 'p' comes before 't' however in the string "Typing", 'p' comes after 't'. Both the orders don't match, and so doesn't the string for the query.

Similarly, if list is ["Ginger", "Onion", "Tomato", "Turnip", "Spinach"] and the query is 'in', the matches would be "Ginger", "Onion", and "Spinach".

"Turnip" won't be considered a match, because once again, it contains the given characters in a different order - 'n' comes before 'i', although in the query "in", 'n' comes after 'i'.

Now that the type of matching is clear to you, can you write the code to accomplish this for real in an autocompleter? Well, the following discussion will get this done.

Whenever we have to solve a programming problem, we shall first break down the problem into subproblems and solve those subproblems first. This makes development easier, quicker and more logical.

In our case, what we find to be a subproblem of the main problem, of looking for matches of a given query in an autocompleter using character search, is to construct a function that checks whether a given string matches a given query.

Once we have constructed this function, we can call it for every list item and get our job done in the span of seconds. So let's focus on the function.

We'll call the function characterSearch(). It takes in two arguments: q that is the query; and item that is the item to be tested against q, using our character search logic.

If item is a match for the query q, characterSearch() shall return true, or else false.

The string is considered a match if and only if it contains each character of the query in its original order.

This gives us one hint - we need to run a loop over all the characters of the query and for each of them, check its existence in str. Following we code this hint:

function characterSearch(q, item) {
    for (var i = 0; i < q.length; i++) {
        if (item.indexOf(q) === -1) return false;
    }
    return true;
}

If at any instant, indexOf() returns -1 for the character being looked up, we exit right away with a return value of false. Similarly, if the loop completes, this indicates that all the characters do exist in item and so we return true.

Now that we have a rough solution in hand, we can rectify it until it becomes the ultimate solution. Notice that in the function above, we only check if a given character exists in item - not that it exists in the order it appears in the query. This can be accomplished very easily, if one knows about the indexOf() method really well.

Its second argument specifies the index where to start the search of a substring within a main string. So what we could do is:

Starting from the first character, save the index where the character exists in item and then for the next character, pass this index to the indexOf() method in order to start the search from the last character's position.

In this way the search for the next character will be done from the position where the last character was found.

As before, if indexOf() returns -1 for the character being looked up, we exit right away. However, if this is not the case then we need to increment the saved index by 1.

The issue this incrementing solves is highlighted as follows:

Suppose q is "tt" and item is "Python". The first character of q, that is 't', is found at index 2 in item. For the second character of q, that is again 't', searching begins at 2 (without the increment). At index 2, a 't' is indeed found and so "Python" is considered a match for 'tt'. However notice that "Python" only contains 't' once.

If we increment the previous character's index this issue can be mitigated.

In the example above, try incrementing 2 to 3 and see how does that affect the search for the second 't'.

Once more, if the loop completes successfully, this implies that every character was found in item, in its original order, and so we return true.

The following code takes into account the order of characters as well:

function characterSearch(q, item) {
    var index = 0;
    for (var i = 0; i < q.length; i) {
        index = item.indexOf(q, index);
        if (index === -1) return false;
        else index++;
    }
    return true;
}

The main player here is the local variable index. It holds the index of the character being looked up for. Initially it's set to 0 so that the indexOf() call in line 4 starts searching from the beginning of item.

With this function in hand what we need to do now is to go to our autocomplete script and replace the expression that matches q with a list item, with this function's invocation expression along with the desired arguments.

This is illustrated as follows:

inputArea.onkeyup = function() {
    /* ... code here ... */

    for (var i = 0; i < list.length; i++) {
if (characterSearch(q.toLowerCase(), nativeList[i].toLowerCase())) {
/* ... code here ... */ } } /* ... code here ... */ }

Let's take some lists and test this subfeature on 'em.

var list = ["Python", "JavaScript", "Platinum", "Typing"];

Live Example

Looking from the beginning

Sometimes when we type a query in an input field, we only mean to choose from those items that start with that query.

For example, let's say we have an online form where we have to the specify the country where we were born. When we type 'a' into the input field, we would probably not be wanting to enter a country name like "Japan", rather we would be going for something like "Australia" or "Argentina", that starts with an 'a', not just contain it.

In all the previous autocompleters, we have been using a matching expression that let items go through it that merely contained the query. There was nothing to confirm that the items had the query right at their beginning.

Accomplishing this is just the matter of adding a small expression to our old code.

We just need to check whether the index of query in the current list item is equal to 0. If it is, then we consider the item as a perfect match of the query.

Consider the following code:

inputArea.onkeyup = function() {
    /* ... code here ... */

    for (var i = 0; i < list.length; i++) {
if (nativeList[i].toLowerCase().indexOf(q.toLowerCase()) === 0) {
/* ... code here ... */ } } /* ... code here ... */ }

Live Example

Combining everything

It's good to know that we have two to three choices of a matching expression to let candidate items turn into suggestions for a given query. But, won't it be even better if we could specify right at the start of the autocomplete script that which choice do we wish to take.

Let's create a global variable matchingExp, just like the variables sort, group we created in the previous chapters to modify the behavior of the autocomplete script based on our taste.

It'll hold one of the numbers 0, 1 or 2, corresponding to the following functions.

But before that, let's first create these three functions:

function wordSearch(q, item) {
    return item.indexOf(q) !== -1;
}
function wordSearchFromStart(q, item) {
    return item.indexOf(q) === 0;
}
function characterSearch(q, item) {
    var index = 0;
    for (var i = 0; i < q.length; i) {
        index = item.indexOf(q, index);
        if (index === -1) return false;
        else index++;
    }
    return true;
}
We've already seen the last function here in the second section of this chapter.

The functions being constructed, it's time to create matchingExp and the code that selects one the above functions based on its value.

var matchingExp = 0;

if (matchingExp === 2) {
    matchingFunc = characterSearch;
}
else if (matchingExp === 1) {
    matchingFunc = wordSearchFromStart;
}
else {
    // default matching expression
    matchingFunc = wordSearch;
}

We've used an if..else statement to determine which function to use in the matching phase. There are other ways to do this as well, like by using a switch statement, or by using an array whose elements are these three functions sitting right at the indexes 0, 1 and 2.

Once the function is determined, its reference is saved in matchingFunc, which is ultimately called in the onkeyup handler:

inputArea.onkeyup = function() {
    /* ... code here ... */

    for (var i = 0; i < list.length; i++) {
if (matchingFunc(q.toLowerCase(), nativeList[i].toLowerCase())) {
/* ... code here ... */ } } /* ... code here ... */ }

This marks an end to our matching subfeature!