JavaScript Regex Flags

Chapter 3 23 mins

Learning outcomes:

  1. What are flags
  2. The ignore casing (i) and global (g) flags
  3. The dot all (s) and multiline (m) flags
  4. The sticky (y) flag
  5. The unicode (u) flag

Introduction

In the previous chapter we got a fairly decent introduction to the syntax of a regular expression in JavaScript, and it is now that we will take it on from there to explore the bits and pieces in more detail.

In this chapter we shall begin with an understanding of what are flags in the world of regular expressions and how they can be used to modify the searching behavior of given patterns. Once we're done, we'll test our skills at JavaScript Regex Flags Quiz.

Of all the flags detailed here, you'll find only the i and g flags to be easily understandable, since the others require concepts that we'll learn later in this tutorial.

Anyways, let's dive right into it.

What are flags?

We'll start by defining what exactly is a flag:

A flag is an optional parameter to a regex that modifies its behavior of searching.

A flag changes the default searching behavior of a regular expression. It makes a regex search in a different way.

A flag is denoted using a single lowercase alphabetic character.

In JavaScript regex, we have a total of 6 flags, each serving a different purpose.

FlagNameModification
iIgnore CasingMakes the expression search case-insensitively.
gGlobalMakes the expression search for all occurrences.
sDot AllMakes the wild character . match newlines as well.
mMultilineMakes the boundary characters ^ and $ match the beginning and ending of every single line instead of the beginning and ending of the whole string.
yStickyMakes the expression start its searching from the index indicated in its lastIndex property.
uUnicodeMakes the expression assume individual characters as code points, not code units, and thus match 32-bit characters as well.

But how to add these flags to a regex?

Very simple...

For an expression created literally, i.e. using the forward slashes //, flags comes after the second slash. In general notation we can expression this as follows:

/pattern/flags

Similarly, for an expression created using RegExp(), flags go as a string in the second argument, as shown below:

new RegExp('pattern', 'flags')

For example, if we were to give the flag i to the regex /a/, we'd write /a/i. And similarly in the case of new RegExp('a'), we'd write new RegExp('a', 'i')

To give multiple flags to a regex, we write them one after another (without any spaces or other delimiters).

For example, if we were to give both the flags i and g to the regex /a/, we'd write /a/ig (or equivalently /a/gi, since the order doesn't matter).

The order in which flags appear doesn't matter - flags only modify the behavior of searching and so putting one before the other doesn't make any difference whatsoever.
We'll see more examples of using multiple flags in a regex in the sections below as we understand the different kinds of flags in the world of JavaScript regex.

Let's consider a quick example to understand everything...

By default, a regular expression does a case-sensitive search. That is, /a/ matches only 'a'. However, by using the flag i, which stands for 'ignore casing', we can make the expression carry out a case-insensitive search. That is, /a/i would match 'a', as well as 'A'.

Similarly, by default, a regex stops searching after finding the first match for a given pattern. However, using the flag g, which stands for 'global', we can get it to find all matches for the pattern — not just stop at the first one.

With the idea flags being understood, we are good to go and explore all the 6 flags that we have at our dispense to use in JavaScript.

Ignore casing — i

The first and foremost flag we shall explore in this section is the i flag, where the 'i' stands for ignore casing.

As the name suggests, the i flag serves to make an expression look for its matches while ignoring character casing. That is, with the flag set, a lowercase/uppercase character in the pattern matches both lowercase as well as uppercase characters in the string.

For simplicity, many people like to think that the 'i' here stands for case-insensitive.

By default, a regular expression searches for its first match case-sensitively i.e. character casing matters. However, using the i flag, we can modify this default behavior.

Consider the example below:

Replace the first occurrence of the word "Hello" in the string str below with '(Hello)' without modifying the pattern in regex, and save the result in newStr.

var str = "Hello world! This 'Hello World' convention is quite common in introducing programming languages.";
var regex = /hello/;

var newStr;

It has been said that we can't modify the pattern i.e. we can't make it /Hello/ to solve the problem directly. However we can add flags and that is just what we will do.

Since the casing of the substring to be matched ("Hello") is different from the casing of the pattern /hello/, we'll need to normalize it using the flag i — the casing should not matter anymore.

The regex hence becomes /hello/i.

This takes us to the following code:

var str = "Hello world! This 'Hello World' convention is quite common in introducing programming languages.";
var regex = /hello/i;

var newStr = str.replace(regex, '(Hello)');

Are the two expressions /Hello/i, /HELLO/i the same?

That is, do they match the same set of substrings in a given test string or not?

The i flag makes an expression ignore casing. Therefore, /A/i will match both 'A' and 'a'; and similarly /a/i will also match both 'a' and 'A'.

This simply means that /Hello/i and /HELLO/i are, in effect, the same expressions.

Global search — g

The second most important flag in the world of regular expressions is g.

The flag g stands for 'global' — or more specifically, 'global searching'. It serves to make an expression look for all its matches, rather than stopping at the first one.

By default, when a regex engine finds the first match for a given pattern in a given test string, it terminates right at that point without looking any further. We say that the engine is eager to give a match.

To modify this eager nature of regexes, we can use g.

Let's say we have two expressions /cats/ and /cats/g and our string is "cats love cats".

The first expression (/cats/, without the g flag) would match only the first word 'cats' ("cats love cats"). In contrast, the second expression (/cats/g, with the g flag) would match both 'cats' ("cats love cats").

Consider the following example:

Consider the string below:

var str = "50 is the half of 50 x 2 that is 80.";

Construct a regular expression to replace all occurrences of '50' in this string with the number '40'.

You shall save the replaced string in a new variable newStr.

We need to come up with an expression that can match all occurrences of '50' in str. This ain't difficult — just use the g flag.

The expression would be /50/g.

Here's the code to solve the problem:

var str = "50 is the half of 50 x 2 that is 80.";
var newStr = str.replace(/50/g, "40");

Let's inspect the value of newStr:

newStr
"40 is the half of 40 x 2 that is 80."

Create a variable newStr in the following code that is equal to str, but with every substring 'home' replaced with 'cake', using a regex.

var str = "home sweet home";

The question clearly says that we need to replace all occurrences of 'home' in str with 'cake'. This means that our expression would be /home/g.

The code will therefore become:

var str = "home sweet home";
var newStr = str.replace(/home/g, "cake");

Construct an expression to match all occurrences of 'hello' in a given test string.

The casing of characters should not matter here. That is, the regex should also match 'HELLO', 'HELLo', 'HELlo', 'helLO' and all other permutations.

Firstly to match all occurrences of 'hello' in the given test string we'll need the g flag. Secondly, to match all these occurrences while ignoring casing we'll need the i flag.

Altogther our expression would be /hello/gi.

The expression could also be /hello/ig with the order of the flags changed, as the order doesn't matter!

Let's test this expression on a test string. We'll replace each match with '(hello)':

var str = "Hello Guys. HeLLO. HELLO. hello. HeLlo, hElLo and so on and so forth...";

console.log(str.replace(/hello/ig, '(hello)'));
(hello) Guys. (hello). (hello). (hello). (hello), (hello) and so on and so forth...

Dot all — s

A fairly recent introduction to the list of flags in JavaScript's regular expressions is that of s.

The flag s means 'dot all'. That is, it makes the . dot character (technically refered to as the wildcard character) match everything, even newlines.

In other words, with the s flag set, the dot matches all possible characters.

By default, the dot character in a regular expression matches everything, but newline characters. To get it to match newline characters as well, we are given the s flag.

Where does the 's' come from?

For all the curious people out there, who reason to themselves that 's' doesn't appear anywhere in the word 'dot all', it's time to explain does the letter 's' come from.

First of all, s is not an abbreviation for 'dot all' at all. Many alls at a time!

Rather, it's an abbreviation for single-line mode.

When the s flag is set on an expression, the expression goes into single-line mode. That is, it treats a test string as a single line, not as a sequence of lines delimited by newline characters.

Due to being recent, the s flag is unsupported on many browsers, even some newer ones!

Consider the example below:

What do both the expressions /.+/g and /.+/gs match in the string str shown below?

var str = "Content flows\ndownward and\ndownward";

The substring \n here is the newline character.

To fully understand this example, you'll first need to learn about JavaScript Regex Quantifiers.

The first expression /.+/g without the s flag will match every single line in str. The highlighted portions shown below represent the matches:

"Content flows\ndownward and\ndownward"

The second expression /.+/gs with the s flag will make . match every character including \n, which means that the expression will match the whole string str, as shown below:

"Content flows\ndownward and\ndownward"

Multiline mode — m

The flag m stands for multiline mode and serves to make the boundary tokens ^ and $ match the beginning and end of each line.

By default, the ^ and $ characters in an expression match the beginning and ending boundaries of a given test string. But with the m flag in place, they instead do this for every line in the string.

In the previous section, we saw how the s flag serves to put a regular expression into single line mode, where a given test string is treated as one single line of characters. To many of you, multiline mode would seem to be the opposite of this - a given test string is treated as a sequence of multiple lines of characters.

However this is NOT the case. In fact, treating a string as a sequence of multiple lines of characters is the default behavior of a regular expression.

Why would a flag do something that's already there by default?

The single-line and multiline modes set up by the flags s and m respectively, have nothing to do with one another. This usually complicates developers.

The flag s targets the wildcard character and makes it match everything. In contrast, the m flag target the ^ and $ characters, and makes them match the start and end of each line respectively.

s treats a string as one single line so that the dot can match everything, even newlines. Similarly, m treats a string as a sequence of multiple lines so that ^ and $ can match the begining and ending positions of each line.

Consider the example below:

Construct an expression to match all lines in a given string, that begin with an 'A'.

To fully understand this example, you'll first need to learn about JavaScript Regex Quantifiers and JavaScript Regex Boundaries.

The expression to solve this problem is /^A.+/mg.

The ^ character matches the start of every line, thanks to the m flag. Altogether the expression looks for an 'A' at the beginning of every line and if one is found, it matches the whole line, till the end.

The global flag here gets the expression to search for all such lines that begin with an 'A'.

Sticky searching — y

Often times, we might want an expression to start its searching routine, within a given test string, from an index other than 0. In other words, we might want to search for matches in the string from a custom position, like 2, 3, 4 and so on.

This can be accomplished using the y flag.

The y flag stands for sticky searching. It makes an expression search from the position specified in its lastIndex property.

Without changing the lastIndex property on an expression that has the y flag set, makes the flag useless - searching would begin at the default index 0.

The letter 'y' comes from the ending of the word 'sticky'.
The word 'sticky' here can be thought of as follows: it makes an expression stick to a desired position from where it would start its searching.

The y flag is fairly recent, likewise you won't find it supported on many browsers.

Consider the example below:

Explain the difference between the expressions /cats/ig and /cats/igy.

To fully understand this example, you'll first need to learn about JavaScript Regex Quantifiers and JavaScript Regex Boundaries.

Let's suppose we have the following code set up:

var str = "Cats love cats, and we love cats."

The first expression /cats/ig would match the following parts of the str:

"Cats love cats, and we love cats."

In contrast, the expression /cats/igy with its lastIndex property specified, will match differently.

Consider the code below where we save the expression /cats/igy in a variable exp so that we could easily change its lastIndex property:

var exp = /cats/igy;
exp.lastIndex = 4;

With lastIndex specified, now if exp searches over str it will match the following:

"Cats love cats, and we love cats."

Notice how the first substring 'Cat' is not matched. This is because it appears at index 0, whereas the expression exp is sticky and starts searching at index 4.

Unicode search — u

The u flag, which stands for unicode, makes an expression treat characters in a given test string as code points, rather than code units.

This means that with the u flag set, we can get our expressions to behave normally on characters that are outside the BMP range of the UTF-16 encoding.

The u flag is only required in special cases, where test strings contain characters outside the normal range of the UTF-16 character set. It's not a flag you'll be using very often.

Consider the simple example below to understand how u works.

Construct an expression to match all occurrences of the non-BMP character 𐍅 in a given test string.

Since 𐍅 is outside the range of UTF-16's normal characters, we'll need to use the unicode flag u in order to match it. And we'll also need the global flag g to match all such occurrences.

The expression to solve this problem is /\u{10345}/ug.