JavaScript Regex Basics

Chapter 2 17 mins

Learning outcomes:

  1. The replace() string method
  2. Regex literal and constructor syntax
  3. Regex patterns and flags
  4. Working with RegExp()
  5. The match() string method

Introduction

Starting off with regular expressions let's first take sometime to understand how they work at the basic level and then try to relate this with all what we've heard in the previous Introductory chapter.

So let's begin without wasting any more of this time...

Simple searching

Suppose you have the string s shown below:

var s = 'Programming is amazing.';

You want to replace the word 'Programming' with 'Coding'. How would you do this?

Well, the replace() method of strings in JavaScript is what's required here.

It serves to replace given substrings in a string with another string.

The first argument to replace() is the string to search for in the main string (on which replace() is called) while the second argument is the string to replace it with.

In our case, the first argument would be 'Programming' and the second one would be 'Coding'.

var s = 'Programming is amazing.';

console.log(s.replace('Programming', 'Coding'));
Coding is amazing.

Very simply we see that the method replaced the desired word 'Programming' with 'Coding' and returned the resulting string, which was ultimately logged to the console.

We could reassign the returned string to s or create a new variable for it and then log that variable; but anyway the main point is that we have solved the problem.

Let's consider another example.

This time we have a longer sentence to perform replacements in.

As before, we have to replace 'Programming' with 'Coding' wherever it occurs. It's very apparent that we have two occurrences of the substring here.

How would we solve this problem?

Let's try our previous code...

var s = 'Programming is amazing. Programming is love.';

console.log(s.replace('Programming', 'Coding'));
Coding is amazing. Programming is love.

Unfortunately, the code has only replaced the first occurrence of the string 'Programming', not the second one, which had to be replaced as well.

Any other solutions?

One way is to call replace() again on the resulting string after the first replacement.

This does solve our problem, but what if we had three or four occurrences of 'Programming' in the main string? Or worse yet, what if we did not know exactly how many such occurrences are there in the main string?

Then what?

It turns out that our way of manually calling replace() as many times as the substring occurs in the main string is inflexible and inefficient.

Any other solutions, after this?

Did someone say 'regex'? OK, let's try it out...

As before, we'll call replace() to do the desired replacement in s. But this time, instead of passing a string as the first argument to the method, we'll pass a regular expression.

Here's the code:

var s = 'Programming is amazing. Programming is love.';

console.log(s.replace(/Programming/g, 'Coding'));
Coding is amazing. Coding is love.

Through some sort of magic, we see that the strange expression /Programming/g somehow replaced both the desired substrings in s — not just the first one — even though we called replace() just once.

This is the power of regular expressions. And behold, this is just the start!

It's time to understand what's exactly happening in /Programming/g?

Is it really that strange?

Regex syntax

In JavaScript, just like we have two ways to create an array — [] and new Array() — we have two ways to create a regular expression.

The literal way is to use a pair of forward slashes (//) and then put a pattern between them.

In general terms, this could be denoted as follows:

/pattern/

The second way is to use the RegExp() constructor, passing it a pattern in the form of a string as the first argument.

This can be seen as follows:

new RegExp('pattern')

The most important thing in both these snippets is the pattern.

A regex pattern is a sequence of tokens to look for in a test string.

Regular expressions search for patterns in strings. Sometimes these patterns could be fixed words, but often times they are dynamic and way more complicated. That is, they could match different things.

For instance, expressing in words, a pattern could say 'match a digit, followed by a space or a tab, followed by an alphanumeric character', or 'match a word, followed by @, followed by another word, followed by . and ending at another word', and so on.

As we move through this course, we'll understand various concepts used in derving patterns for regular expressions such as quantifiers, character sets, character classes, assertions, grouping, backreferencing and a lot more.

Anyways, coming back to our expression /Programming/g, here Programming is the pattern. There isn't really anything fancy in it — it's simply saying to match the word 'Programming'.

The second aspect of a regular expression are its flags. We'll learn all regex flags in detail in the next chapter, but for now let's understand their basics.

A flag specifies how exactly should a regex search for its pattern in a given string.

It changes the default behavior of searching.

But how to give a flag to a regular expression?

In the literal way to create a regex, flags go at the end of the whole expression. In general form, this could be expressed as follows:

/pattern/flags

In the constructor way, flags go in the form of a string as the second argument to RegExp(), as follows:

new RegExp('pattern', 'flags')

Coming back to our expression /Programming/g, notice the letter g here. This is a flag.

It's called the global flag. Its purpose is to tell the regex engine to search for all occurrences of the given pattern (Programming in this case) in the given string.

If we were to remove this flag and then execute our code, the second 'Programming' substring would remain unchanged.

var s = 'Programming is amazing. Programming is love.';

console.log(s.replace(/Programming/, 'Coding'));
Coding is amazing. Programming is love.

This is because, by default, a regex searches for the first occurrence of the pattern. After this, it simply stops searching.

Eagerness of regex

In the world of regex, the fact that an expression stops searching beyond the first match, by default, has a special terminology given to it.

We say that a regex is eager to give a match. As soon as it finds one, it just ends searching.

However, using the g flag as we did above, we can change this normal behavior and instead get the expression to find all occurrences of the given pattern.

var s = 'Programming is amazing. Programming is love.';

console.log(s.replace(/Programming/g, 'Coding'));
Coding is amazing. Coding is love.

More examples

Let's get our foundation strong by considering more examples of using regex patterns to solve string matching problems.

Suppose you have the following string s:

var s = '10 x 10 = 100';

and want to replace the numbers 10 with 20, and 100 with 400.

We'll solve this problem with the code below:

var s = '10 x 10 = 100';
s = s.replace(/100/g, '400');
s = s.replace(/10/g, '20');

console.log(s);
20 x 20 = 400

Step one is to replace '10' with '20'. Step two is to replace '100' with '400' in the string resulting from step one.

And this solves the problem! Wasn't this too easy?

Time for another example...

Suppose you have the following string s:

var s = 'Abc is abc and abc is ABC, BUT ABC is easy.';

Here you have to replace the first occurrence of the character 'a', wherever it occurs, with the character '(a)', and every occurrence of the character 'A' with 'S'.

The problems seems easy, and indeed it is:

var s = 'Abc is abc and abc is ABC, BUT ABC is easy.';
s = s.replace(/a/, '(a)');
s = s.replace(/A/g, '(A)');

console.log(s);
(A)bc is (a)bc and abc is (A)BC, BUT (A)BC is easy.

First, we replace the first 'a' with '(a)' using the expression /a/ (without the g flag) and assign the result back to s. Next, we replace each 'A' with an '(A)' using the expression /A/g (with the g flag to find all matches) and again assign the result to s.

Finally, we log the string to show the result of the program.

Another easy problem!

Working with RegExp()

As we saw above, there are two ways to create a regex in JavaScript — one is using the // regex literals and the other is using the RegExp() constructor.

The question is that why would we even need the second way anyway? The literal syntax is shorter, cleaner and easier to understand, so why go with the RegExp() constructor?

Well, the constructor has its own applications.

Using the literals is a static way to create a regex i.e. it can't be made using data input by the user, or some string.

In contrast, RegExp() is provided mainly as a dynamic way to create a regex. Since the first argument to RegExp() is of type string, the function can be called at run-time with any string as the first arg to create the desired regex.

For example, if we want to create an expression based on a pattern input by the user, we'd use RegExp() and pass it the input string as the first argument. This would create the corresponding regex in memory on-the-go.

This ain't possible using regex literals — they can only be used to created a regex at the time of writing the code.

Below we use the dynamic nature of RegExp() to match all the occurrences of the pattern patt in a given test string s:

var s = 'Java is easy to learn. Is it really easy? Java is not easy. OK. But is it really not easy?';

var patt = 'Java';
var regex = new RegExp(patt, 'g');

console.log(s.replace(regex, 'Python'));
Python is easy to learn. Is it really easy? Python is not easy. OK. But is it really not easy?
This is strictly not that dynamic of an example, but it illustrates one thing that, using RegExp(), a regular expression could be created using a string. There is no need to always define it at the time of writing the code.

The match() method

The replace() method on strings is useful, as the name suggests, when we want to replace substrings within a string. The method can operate on regular expressions as we've been seeing uptil now.

However, sometimes we're interested in just getting the matches of the regular expression — not necessarily replace things here and there.

In such cases, a handy method is match().

Let's understand how it works...

The string method match() takes a regex and returns back an array holding all the matches of the regex in the given string.

If there are no matches, null is returned.

Shown below is a simple example:

var s = 'Programming is amazing. Programming is love.';

console.log(s.match(/Programming/g));
['Programming', 'Programming']

There are two matches of /Programming/g in the string s, likewise the method returns an array of two elements — both 'Programming'.

The output here won't be interesting since the regex was pretty much static — it only matches the substring 'Programming'.

However, as we progress through this course, we'll construct complex regexes that'll yield interesting results when used along with match().

One important thing about match()

When match() is called with a regex that doesn't have the g flag set, the return value of the method is an array with a couple of additional properties.

This can be seen as follows:

var s = 'Programming is amazing. Programming is love.';
    var matches = s.match(/Programming/);

    console.log(matches);
["Programming", index: 0, input: "Programming is amazing. Programming is love.", groups: undefined]

The properties are as follows:

  1. index specifies the index of the match.
  2. input holds the original input string.
  3. groups holds all the capturing groups of the regex. You'll understand this only once you read JavaScript Regex Grouping.

We'll see the significance of these properties as we progress through this course.

Coming back to the returned array, stored in matches, there is one match for the regex /Programming/ in the given string s, likewise matches has one element in it.

This can be confirmed by inspecting its length property:

matches.length
1
matches[0]
'Programming'

As stated, the length is 1 which means that there is only one element in the array.

So one thing is clear from this: if at least one match is found by the given regex, match() returns an array with as many elements as the number of individual matches (regardless of the g flag), or else the value null.

Now whether the array has additional properties on it as well, this depends on the g flag in the regex.

From this point on, whenever we'd call match() with a regex that doesn't have g set, we'll specifically refer to the first element of the returned array, avoiding the unnecessary information conveyed in logging the whole array.

Something as follows:

var s = 'Programming is amazing. Programming is love.';

console.log(s.match(/Programming/)[0]);
Programming

And if the flag is set, we'd go with logging the whole array:

var s = 'Programming is amazing. Programming is love.';

console.log(s.match(/Programming/g));
["Programming", "Programming"]

Perfect!