PHP Strings Basics

Chapter 17 33 mins

Learning outcomes:

  1. A quick recap
  2. String interpolation
  3. Heredoc and nowdoc strings
  4. Changing the casing of strings
  5. Searching in strings
  6. What is string mutability

Introduction

Strings are, without any kind of doubts, one of pillars of programming and computing. They represent such a simple idea which is yet extremely flexible to solve much of the problem that are faced in program output.

In this chapter, we'll unravel all the technical details of strings in PHP and also consider some other common concepts involved such as that of interpolation, also known as templating; and creation of strings via the heredoc and nowdoc syntax.

Not only this, but we'll also consider a handful of functions to perform some key operations over strings such as searching, indexing, case-changing and so on.

It's time to begin learning...

A quick recap

Before we begin, it's worthwhile to get a quick recap of all the ideas that we've learnt about strings in PHP so far uptil this point in this course.

Starting with the most basic thing:

A string is a sequence of textual characters.

Each character is followed by another character which is followed by another character in memory uptil the end of the string — that's how a string works internally.

Owing to the fact that a string is a sequence, two concepts related to sequences apply to strings as well — the concepts of length and index.

The length of a string is the total number of characters therein. It can be obtained by using the strlen() function.

<?php

echo strlen('Hello World!'), "\n";
echo strlen('Hello'), "\n";
echo strlen('123'), "\n";
echo strlen('   '), "\n";
12 5 3 3

The index is associated with each of the characters of a given string.

That is, it represents the position of the character in the string. Indexes begin at 0 and increase by 1. Hence, the first character is at the index 0, the second one is at 1, the third one is at 2, and so on and so forth.

We can use the familiar bracket notation to access a character at a particular index.

<?php

$greeting = 'Hello World!';

echo $greeting[0], "\n";
echo $greeting[1], "\n";
echo $greeting[5], "\n";
H e

The last character output here is a space, hence it couldn't be seen right away.

Moving on, a string can be created in a multitude of ways in PHP, two of which we already know, i.e. via a pair of single quotes ('') or a pair of double quotes ("").

A pair of single quotes ('') denotes a string literal where there is no special meaning attached to any character. That is, each character is represented exactly as it's written.

An example follows:

<?php

echo 'Hello\nWorld!';
Hello\nWorld!

We know that \n otherwise represents a newline character in strings, however when written inside a single-quoted string, it's parsed literally as the backslash (\) followed by n.

A double-quoted string ("") works differently. That is, it treats certain characters specially. In particular, the backslash (\) is used to denote escape sequences, as is standard across numerous programming languages, while the dollar ($) symbol is used to denote a variable.

We'll explore the latter idea here, known as interpolation, later on in this chapter.

For now, let's focus on escape sequences:

One of the most common things required in output from almost any program in any programming language is a newline. To denote a newline in PHP, we use the \n escape sequence inside a double-quoted string.

The n in \n comes from the word 'newline'.

Let's consider the same example as above, just this time replacing the single quotes with double quotes:

<?php

echo "Hello\nWorld!";
Hello World!

See how the output is split across two lines right after the word 'Hello' which the escape sequence \n comes.

You'll see a common convention out there whereby a pair of single quotes ('') is used to denote all strings, except for those that need to have a special character such as a newline (or maybe need some interpolated data — more on this later). We'll use this very convention throughout this course as well.

And this is all that we need to keep in mind about strings in PHP before we begin reading onwards from here.

String interpolation

Wanting to dump data from variables into strings is a highly frequent concern in most programs. One way to do so, that we know and have been using thus far in this course, is concatenation.

The variables are concatenated with the strings to ultimately give one single string that contains the data from the variables.

As an example, consider the following:

<?php

$x = 10;
$y = 0;

// Print as a coordinate.
echo '(' . $x . ', ' . $y . ')';

Here, we have two variables $x and $y both holding integers and want to create a coordinate representation using them. The way we do so is by concatenating five things together with one another.

Now this approach works but there are certain limitations to it.

The main problem is that we have to constantly switch contexts while typing such a concatenated expression. That is, we have to go from a string literal to PHP code, then again to a string literal, then to PHP code, and so on and so forth.

This context switching is difficult and error-prone.

A better, faster and much more readable approach is to use string interpolation.

String interpolation is to create string literals with variables embedded inside them.

When parsing code, the PHP engine can automatically realize that a given string has interpolated variables in it and, likewise, inject the desired values of those variables into the respective locations in the string.

As we already know, a single-quoted ('') literal gives none character any special meaning whatsoever. Hence, string interpolation, which uses the $ symbol to represent a variable, can't obviously be done using single-quoted strings.

What enables interpolation is a double-quotted string ("") (and even a heredoc string — more on that later in this chapter).

If a variable is to be injected inside a double-quoted ("") string, we use the exact same notation otherwise used to access the variable in normal code, i.e. $variable_name.

Similarly, if a more complicated expression is desired, we encapsulate it inside a pair of curly braces, i.e. {expression}. Note, however, that expression can't be any arbitrary expression here — PHP expects a variable at the start.

Anyways, let's create a simple interpolated string.

Below we have the same coordinate example as before, just this time the respective string is formed by means of interpolation, not concatenation:

<?php

$x = 10;
$y = 0;

// Print as a coordinate.
echo "($x, $y)";
(10, 0)

The main thing to notice is the string "($x, $y)". $x doesn't literally denote the sequence of characters $ followed by x, but rather the global variable $x whose value is 10. Same goes for $y. In other words, the string "($x, $y)" gets translated to the string "(10, 0)" automatically by PHP.

See how readable is this code. We don't have to continuously quote-unquote-concatenate string literals and other values.

For the second example, suppose we have both the x and y coordinates in an array as follows and want to make the same output as before but using these values directly:

<?php

$point = [10, 0];

The code below accomplishes this:

<?php

$point = [10, 0];

// Print as a coordinate.
echo "($point[0], $point[1])";
(10, 0)

Since the desired numbers are in the $point array, we access them using bracket notation.

Denoting $ literally in a double-quoted string

One last thing left to be discussed in this section is that given the fact that $ has a special meaning inside a double-quoted ("") string, if we want to literally denote $ inside such a string, we have to escape it.

The escape sequence is simple — \$.

Shown below is an example:

<?php

$x = 10;
echo "\$x: $x";
$x: 10

If we don't escape the first $, here's what'll happen — the first $x will be parsed as a variable as well:

<?php

$x = 10;
echo "$x: $x";
10: 10

Heredoc strings

Another way to represent a string in PHP is via the heredoc syntax — an idea inspired from the Unix terminal. Such a string is usually called a heredoc string.

In a heredoc string, instead of using quotes to delimit the string's data, we use an identifier and the special symbols <<<.

Here's how the heredoc syntax looks:

<<<identifier
<string_data>
identifier

The string begins with the <<< symbol sequence followed by the identifier identifier. Next, on a new line comes the data of the string (represented as <string_data>). Finally, once the string's data is complete, the same identifier, used at the start of the string, is used on a new line to mark the end of the heredoc string.

Now there are certain things to note regarding this syntax:

  • Firstly, a heredoc string, as with a single-quoted ('') or double-quoted ("") string, denotes an expression. Hence, if the heredoc is part of a larger expression, then it might be invalid to put a semicolon (;) after the ending identifier (depending on the expression).
  • There is no necessity to use \n inside a heredoc to denote a new line. We can write the string just as we'd want it to be output.
  • Conventionally, the names END and EOS are used as identifier. But we can use any others as well. However, we must make sure that the identifier must abide by all the rules of identifier names in PHP.

We'll review each of these points in the examples below.

The best part of using a heredoc string is that we can easily add new lines in the string's data and even include interpolated pieces to be parsed along with the string.

Moreover, we don't even need to worry about escaping the quote characters (' or "), since the beginning of the string is a special identifier that would most probably not be included in the string itself.

All in all, the heredoc syntax is highly useful when we want to output large pieces of precisely-formatted text in PHP.

Let's create a very basic 'Hello World!' greeting message using a heredoc string:

<?php

$greeting = <<<END
Hello World!
END;

echo $greeting;

Try your level best to appreciate, and even remember, the syntax of the heredoc string as shown here.

  1. The starting bit <<<END denotes the start of the string, with the identifier as END. Nothing can be written on the same line after <<<END.
  2. On the next line, which is where the string's data begins, we have the text Hello World!.
  3. Finally, on the following line, we end the string with the identifier END. Note that END can't be put on the same line as Hello World! — if we did that, it would've been parsed as the string's data as well.
  4. With the string complete, we end the whole variable-assignment statement with a semicolon (;).

Let's see the output produced:

Hello World!

Simple, yet amazing.

Uppercasing the identifier in a heredoc string is not required. It is just conventionally done to improve the readability of the code.

Time to consider another example.

In the following code, we denote a slightly complicated string with new lines, single quotes ('), double quotes (") and variable interpolations:

<?php

$a = 7;
$d = 3;
$q = (int) ($a / $d);
$r = $a % $d;

echo <<<END
The 'division algorithm' states that:

"An integer a when divided by a positive integer d produces
two integers q and r, such that a = dq + r"

For instance:
$a = $d x $q + $r    (a = $a; d = $d; q = $q; r = $r)
END;
The 'division algorithm' states that: "An integer a when divided by a positive integer d produces two integers q and r, such that a = dq + r" For instance: 7 = 3 x 2 + 1 (a = 7; d = 3; q = 2; r = 1)

Also notice that instead of passing a variable (containing the heredoc string) to echo, we directly pass the heredoc string literal.

As stated before, it isn't necessary to use END as the identifier in a heredoc string — we can use any valid identifier as we want to. In the code below, we use HEREDOC (going with the uppercasing convention):

<?php

echo <<<HEREDOC
Hello World!
HEREDOC;
Hello World!

Nowdoc strings

The fourth and last way to denote string literals in PHP is via the nowdoc syntax. Such a string is called a nowdoc string.

The difference between nowdoc and heredoc strings is the same as the difference between single-quoted ('') and double-quoted ("") strings. That is, the data in a nowdoc string isn't parsed, while that inside a heredoc string is parsed, as we already know from the last section.

Syntactically, a nowdoc string is represented exactly like a heredoc string, except for that the identifier next to <<< is enclosed in single quotes:

<<<'identifier'
<string_data>
identifier
The identifier next to <<< is to be encapsulated in single quotes (''), NOT double quotes (""). Encapsulation via double quotes would lead to a syntax error.

Let's consider an example.

In the code below, we make the elementary 'Hello World!' greeting message:

<?php

$greeting = <<<'END'
Hello World!
END;

echo $greeting;
Hello World!

In the code below, we output a simple multi-line message to the console containing multiple single quotes, double quotes, $ signs and backslash (\) characters in it:

<?php

echo <<<'END'
This is a 'nowdoc string' that can contain ' and " without any need for
escaping, and even the $ and \ characters.

Isn't this amazing?
END;
This is a 'nowdoc string' that can contain ' and " without any need for escaping, and even the $ and \ characters. Isn't this amazing?

Thanks to the nowdoc syntax, we don't have to worry about escaping a character or breaking the string at any point — the message is simply written as it is.

The nowdoc syntax is really useful when we want to print long pieces of text to the terminal that don't have to parsed for any special stuff.

As discussed previously, a nowdoc string isn't parsed like a heredoc string. This can be confirmed by the following code:

<?php

$a = 10;

echo <<<'END'
$a
END;
$a

Likewise, escaping the $ symbol using the sequence \$ doesn't hold in a nowdoc string (since a $ isn't considered any special). In the code below, we demonstrate this idea:

<?php

$a = 10;

echo <<<'END'
\$a
END;
\$a

As is apparent, the sequence \$ is simply parsed as two literal characters, not as the $ sign.

As a rule of thumb just remember that a nowdoc string parses literally every single character as a literal — no escaping, no interpolation, nothing.

With this in mind, let's try solving a simple question:

What does the following code print?

<?php

echo <<<'END'
\\
END;
  • \
  • \\

Changing the casing

To change a string into all lowercase characters or all uppercase characters, we use the strtolower() and the strtoupper() functions, respectively.

Both the functions require one argument which is the string whose casing to change and consequently return the new string. Keep in mind that the original string isn't modified.

Let's consider an example:

<?php

$greeting = 'Hello World!';

echo $greeting, "\n";
echo strtolower($greeting), "\n";
echo strtoupper($greeting);
Hello World! hello world! HELLO WORLD!

The first 'Hello World!' greeting is followed by the same text in lowercase and then in uppercase.

Although PHP doesn't provide builtin functions to check whether a string contains all lowercase or uppercase characters, this is very easy to do using strtolower() and strtoupper() only.

Let's see if you can figure out the solution.

Create a function strislower() that takes in a string argument and returns true or false depending on whether the string is in lowercase or not.

<?php

function strislower($str) {
   return strtolower($str) === $str;
}

Searching for a substring

Searching is a routine activity in computer science. And searching for certain substrings within a string is an even more routine activity.

The function str_contains() can be used to determine whether a given substring exists in a string or not.

Here's its syntax:

str_contains($string, $substring)

$string is the main string in which we want to search for a substring, while $substring is that substring.

Haystacks and needles

The official documentation of PHP uses a different naming convention for almost all of its string functions. That is, instead of $string, the main string is called the $haystack. Similarly, instead of $substring, the substring is called the $needle.

If you're unfamiliar with it, the term 'haystack' simply refers to a pile of hay arranged in a pointy shape forming a stack, while the term 'needle' refers to an actual needle that's there in the haystack.

The needle is small while the haystack is large and this is why the main string (which is almost always larger) is called as the 'haystack'.

We try not to use this naming convention because at times, it can be a little less meaningful than going with $string and $substring.

Let's perform some simple searching...

In the code below, we have the $greeting variable as before. We search for 'Hello' and then for 'Word' via two separate calls to str_contains().

<?php

$greeting = 'Hello World!';

var_dump(str_contains($greeting, 'Hello'));
var_dump(str_contains($greeting, 'Word'));
bool(true) bool(false)

Since the first word, i.e. 'Hello', is there in $greeting, the first str_contains() call returns true. However, since the second word, i.e. 'Word', isn't there, the second str_contains() call returns false.

Simple.

The str_contains() function operates case-sensitively. That is, lowercase only matches with lowercase and uppercase only matches with uppercase.

This can be confirmed as follows:

<?php

$greeting = 'Hello World!';
var_dump(str_contains($greeting, 'hello'));
bool(false)

The text searched for in $greeting is 'hello', however the variable only contains the text 'Hello', which obviously has a different case. Hence, str_contains() yields false.

As we shall see in the upcoming PHP String Functions chapter, PHP provides programmers with a wide variety of searching functions to choose from in order to cater to particular searching needs.

For instance, strpos() helps us determine the index of the substring if found in the main string; stripos does the same thing, but case-insensitively; strstr() helps us extract a certain portion of the string from the point where the substring occurs; and so on.

String mutability

Certain languages such as Python and JavaScript make their string type immutable.

That is, we can't modify a particular character in the original string. If we really want modification, we ought to create a completely new string with the desired changes.

This can be irritating sometimes. For instance, changing just a single character in JavaScript requires us to cut the string into a half at that character and then concatenate the slices together with the new character in between.

In PHP, however, this isn't the case. The string data type is mutable.

A string in PHP being mutable means that it's possible to change a particular character to any other character in the original string.

Let's consider a quick illustration of this idea:

<?php

$str = 'Good';

$str[0] = 'F';
echo $str;
Food

The first character of $str, i.e. G, is changed to F. This changes the string to 'Food'.

The syntax used to assign a new character to a given position in the string is the same as the syntax used to accomplish the same thing for an array. In fact, internally a string in PHP is merely an array of bytes (more on that later).

Note that when replacing a character in this way, PHP only reads the first character from the string that provides the replacement.

In other words, we can't magically replace a particular position in the main string with an arbitrary length of characters — a character (spanning a byte) gets replaced with another character (also spanning a byte).

Consider the code below:

<?php

$str = 'Good';

$str[0] = 'First';
echo $str;

In line 5, we assign the string 'First' to the 0th position in $str. As stated before, PHP only reads the first character from the given replacement string, hence only F is used from 'First' and ultimately put in place of G in $str.

Better yet, PHP even raises a self-explanatory warning when it encounters such a situation, as shown in the output below:

Warning: Only the first byte will be assigned to the string offset in <path> on line 5 Food