Introduction

In the previous chapter we got a fairly decent introduction to the syntax of a regular expression in JavaScript, and it is now that we will take it on from there to explore the bits and pieces in more detail.

In this chapter we shall begin with an understanding of what are flags in the world of regular expressions and how they can be used to modify the searching behaviour of given patterns. Once we're done, we'll test our skills at JavaScript Regex Flags Quiz.

Of all the flags detailed here, you'll find only the i and g flags to be easily understandable, since the others require concepts that we'll learn later in this tutorial.

Anyways, let's dive right into it.

What are flags?

In basic terminology:

Flags, in a regular expression, are tokens that modify its behavior of searching

Flags are optional parameters that we can add to a plain expression to make it search in a different way. Each flag is denoted by a single alphabetic character, and serves different purposes in modifying the expression's searching behaviour.

For example the flag i, which stands for ignore casing, does the job of carrying out a case-insensitive search.

Similarly, the flag g, which stands for global, serves to extend the searching to find all matches for a given expression inside a string, instead of stopping on the first match.

Talking about the way to add these flags to a regular expression, it's very easy.

For an expression created literally, i.e using the forward slashes //, flags comes after the second slash. In general notation we can expression this as follows:

/pattern/flags

Multiple flags can be set up by simply writing them one after another without any spaces or other delimiters.

Remember to write the flags in lowercase, as it's invalid to capitalise them!

As a quick example, suppose we need to make the expression /Hello/ search case-insensitively, that is, match either 'hello', 'HELLO', 'HeLLO', 'heLlo' and so on.

To do so we'll need to add the flag i in the regular expression as shown below:

/Hello/i

Similarly, in addition to insensitive searching, if we also wanted to make the search extend to match all occurences of the pattern we would have to add the g global flag as well, yielding the expression below:

/Hello/ig (or equivalently /Hello/gi)

The order in which flags appear doesn't matter - flags only modify the behaviour of searching and so putting one before the other doesn't make any difference whatsoever.

Now majority of the times your job will get done by using just these two flags, however in the dictionary of regular expressions a handful more exist.

The table below illustrates some of the other flags used in regular expressions.

FlagNameModification
iIgnore CasingMakes the expression search case-insensitively.
gGlobalMakes the expression search for all occurences.
sDot AllMakes the wild character . match newlines as well.
mMultilineMakes the boundary characters ^ and $ match the beginning and ending of every single line instead of the beginning and ending of the whole string.
yStickyMakes the expression start its searching from the index indicated in its lastIndex property.
uUnicodeMakes the expression assume individual characters as code points, not code units, and thus match 32-bit characters as well.

Ignore casing

The first and foremost flag we shall explore in this section is the i flag, where the 'i' stands for ignore casing.

As the name suggests, the i flag serves to make an expression look for its matches while ignoring character casing. That is, a lowercase character in the expression matches both lowercase as well as uppercase characters in the string.

For simplicity, many people like to think that the 'i' here stands for case-insensitive.

By default, a regular expression searches for its first match case-sensitively. However, using the i flag, we can modify this default behaviour, and usually we do need to.

Consider the example below:

Replace the first occurence of the word "Hello" in the string str below without modifying the pattern in the expression exp. You may however flag the expression any way you want.

var str = "Hello world! This 'Hello World' convention is quite common in introducing programming languages.";
var exp = /hello/;

It has been said that we can't modify the pattern i.e. we can't make it /Hello/ to solve the problem directly. However we can add flags and that is just what we will do.

Since the first occurence in this case is "Hello", not "hello", we will have to use the flag i to get the pattern /hello/ to match case-insensitively. However because we need to match the first occurrence only we won't use g.

Hence the modified expression is /hello/i.

exp = /hello/i;

Are the two expressions /Hello/i, /HELLO/i the same?

That is, do they match the same set of substrings in a given test string or not?

The i flag makes an expression ignore casing. Therefore, /A/ will match both 'A' and 'a', and similarly /a/ will also match both 'a' and 'A'.

This simply means that /Hello/i and /HELLO/i are in effect the same expressions.

Global search

The second most important flag in the world of regular expression is g.

The flag g stands for global, more specifically, global searching. It serves to make an expression look for all its matches, rather than stopping at the first one.

By default, when a regex engine finds the first match for a given pattern in a given test string, it terminates and prevents any further searching. To modify this behaviour, we have at our dispense the g flag.

For example, let's say we have two expressions /cats/ and /cats/g and our string is "cats love cats".

The first expression (without the g flag) would match only the first word 'cats' ("cats love cats"). In contrast, the second expression (with the g flag) would match both the words 'cats' ("cats love cats").

Consider the following example:

Consider the string below:

str = "50 is the half of 50 x 2 that is 80."

Construct a regular expression to replace all occurences of '50' in this string with the number '40'.

You shall save the replaced string in a new variable replacedStr.

The code is:

var replacedStr = str.replace(/50/g, "40");

We need to come up with an expression that can match all occurences of '50' in str. This ain't difficult - just use the g flag.

The expression hence becomes /50/g. With this expression in hand, we call str.replace() with "40" as the replacement string.

Create a variable replacedStr in the following code that is equal to str, but with every substring 'home' replaced with 'cake'.

var str = "home sweet home";

The question clearly says that we need to replace all occurences of 'home' in str with 'cake'. This means that our expression would be /home/g.

The code will therefore become:

var str = "home sweet home";
var replacedStr = str.replace(/home/g, "cake");

Construct an expression to match all occurences of 'hello' in a given test string.

Note that the expression shall match the following substrings as well: 'HELLO', 'Hello', 'HEllo', 'HELlo' and so on. In other words, it should ignore casing.

Firstly to match all occurences of 'hello' in the given test string we'll need the g flag. Secondly, to match all these occurences while ignoring casing we'll need the i flag.

Altogther our expression would be /hello/gi.

The expression could also be /hello/ig with the order of the flags changed, as the order doesn't matter!

Dot all

A fairly recent introduction to the list of flags in JavaScript's regular expressions is that of s.

The flag s means dot all. That is, it makes the . dot character (technically refered to as the wildcard character) match everything, even newlines. In other words, with the s flag, the dot matches all possible characters.

By default, the dot character in a regular expression matches everything, but newline characters. To get it to match newline characters as well, we are given the s flag.

What does the letter 's' in the flag s represent?

For all the curious people out there, who reason to themselves that s doesn't appear anywhere in the word 'dot all', know that s is not an abbreviation for 'dot all' at all. Many alls at a time!

Rather it's an abbreviation for single-line mode.

When the s flag is set on an expression, the expression goes into single line mode. That is, it treats a test string as a single line, not as a sequence of lines delimited by newline characters.

Due to being recent, the s flag is unsupported on many browsers, even some newer ones!

Consider the example below:

What do both the expressions /.+/g and /.+/gs match in the string str shown below?

var str = "Content flows\ndownward and\ndownward";

The substring \n here is the newline character.

To fully understand this example, you'll first need to learn about JavaScript Regex Quantifiers.

The first expression /.+/g without the s flag will match every single line in str. The highlighted portions shown below represent the matches:

"Content flows\ndownward and\ndownward"

The second expression /.+/gs with the s flag will make . match every character including \n, which means that the expression will match the whole string str, as shown below:

"Content flows\ndownward and\ndownward"

Multiline mode

The flag m stands for multiline mode and serves to make the boundary tokens ^ and $ match the beginning and end of each line.

By default, the ^ and $ characters in an expression match the beginning and ending boundaries of a given test string. But with the m flag in place, they instead do this for every line in the string.

In the previous section, we saw how the s flag serves to put a regular expression into single line mode, where a given test string is treated as one single line of characters. To many of you, multiline mode would seem to be the opposite of this - a given test string is treated as a sequence of multiple lines of characters.

However this is NOT the case. In fact, treating a string as a sequence of multiple lines of characters is the default behaviour of a regular expression.

Why would a flag do something that's already there by default?

The single-line and multiline modes set up by the flags s and m respectively, have nothing to do with one another. This usually complicates developers.

The flag s targets the wildcard character and makes it match everything. In contrast, the m flag target the ^ and $ characters, and makes them match the start and end of each line respectively.

s treats a string as one single line so that the dot can match everything, even newlines. Similarly, m treats a string as a sequence of multiple lines so that ^ and $ can match the begining and ending positions of each line.

Consider the example below:

Construct an expression to match all lines in a given string, that begin with an 'A'.

To fully understand this example, you'll first need to learn about JavaScript Regex Quantifiers and JavaScript Regex Boundaries.

The expression to solve this problem is /^A.+/mg.

The ^ character matches the start of every line, thanks to the m flag. Altogether the expression looks for an 'A' at the beginning of every line and if one is found, it matches the whole line, till the end.

The global flag here gets the expression to search for all such lines that begin with an 'A'.

Sticky searching

Often times, we might want an expression to start its searching routine, within a given test string, from an index other than 0. In other words, we might want to search for matches in the string from a custom position, like 2, 3, 4 and so on.

This can be accomplished using the y flag.

The y flag stands for sticky searching. It makes an expression search from the position specified in its lastIndex property.

Without changing the lastIndex property on an expression that has the y flag set, makes the flag useless - searching would begin at the default index 0.

The letter 'y' comes from the ending of the word 'sticky'.
The word 'sticky' here can be thought of as follows: it makes an expression stick to a desired position from where it would start its searching.

The y flag is fairly recent, likewise you won't find it supported on many browsers.

Consider the example below:

Explain the difference between the expressions /cats/ig and /cats/igy.

To fully understand this example, you'll first need to learn about JavaScript Regex Quantifiers and JavaScript Regex Boundaries.

Let's suppose we have the following code set up:

var str = "Cats love cats, and we love cats."

The first expression /cats/ig would match the following parts of the str:

"Cats love cats, and we love cats."

In contrast, the expression /cats/igy with its lastIndex property specified, will match differently.

Consider the code below where we save the expression /cats/igy in a variable exp so that we could easily change its lastIndex property:

var exp = /cats/igy;
exp.lastIndex = 4;

With lastIndex specified, now if exp searches over str it will match the following:

"Cats love cats, and we love cats."

Notice how the first substring 'Cat' is not matched. This is because it appears at index 0, whereas the expression exp is sticky and starts searching at index 4.

Unicode search

The u flag, which stands for unicode, makes an expression treat characters in a given test string as code points, rather than code units.

This means that with the u flag set, we can get our expressions to behave normally on characters that are outside the BMP range of the UTF-16 encoding.

The u flag is only required in special cases, where test strings contain characters outside the normal range of the UTF-16 character set. It's not a flag you'll be using very often.

Consider the simple example below to understand how u works.

Construct an expression to match all occurences of the non-BMP character 𐍅 in a given test string.

Since 𐍅 is outside the range of UTF-16's normal characters, we'll need to use the unicode flag u in order to match it. And we'll also need the global flag g to match all such occurences.

The expression to solve this problem is /\u{10345}/ug.