PHP Format Strings

Chapter 20 34 mins

Learning outcomes:

  1. Why we need format strings
  2. What are format strings
  3. Going from data to text — printf(), sprintf() and fprintf()
  4. Going from text to data — sscanf() and fscanf()

Introduction

Thus far in the PHP Strings unit, we've covered a lot of solid ground related to strings in the language. Now, let's expand upon that knowledge and learn about string formatting — another highly important topic, one that'll make you an aficianado when working with strings.

In this chapter, we shall take a look over a collection to some PHP string functions that allow us to work with format strings, i.e. strings with special values inside them, referred to as format specifiers, that are parsed for extracting data out or melding data in.

If you're coming from a C background, this chapter will be peaches and cream for you, owing to the fact that C programmers routinely use a similar concept when retrieving input or producing output (the scanf() and printf() functions).

Why we need format strings?

Often times while creating complex apps, it's required to output a complex piece of text that's made up of many individual pieces of data.

For instance, consider the following code:

<?php

$id = '48xp-89aa-93ks-00s4';
$name = 'Wooden table';
$price = 39.99;

echo '(' . $id . ') ' . $name . ' ----- $' . $price;

We are trying to output a single line of text, yet the echo statement looks quite intimidating, thanks to the numerous concatenation operations.

(48xp-89aa-93ks-00s4) Wooden table ----- $39.99

If we think about it for a moment, such pieces of text can be much clearly and neatly expressed if we work with placeholders for actual data. For instance, the output desired above could be expressed as follows,

(<id>) <name> ----- <price>

where <id>, <name> and <price> are all placeholders (that have to be filled with actual data).

Software specifications relating to text-based input/output are always laid out in this manner, i.e. as templates with placeholders.

No one would ever go like: "start off with the id, enclosed in parentheses, followed by a space, followed by the name of the product, and then ..." — this seems totally senseless.

In PHP, as we already know, by virtue of interpolation of double-quoted strings, we can express variables in strings. So the same code above can be expressed as follows:

<?php

$id = '48xp-89aa-93ks-00s4';
$name = 'Wooden table';
$price = 39.99;

echo "($id) $name ----- $$price";

Clearly, this is much better.

But wait. If interpolates strings do the job, then why are we even here? There must be some reason to learn about format strings, shouldn't it?

Well, there is.

The interpolated string above only allows us to express what piece of data we want in a given position in the string — it doesn't allow us to go beyond that, expressing the format of each individual piece of data.

Let's take the example of the variable $price in the example above. In the string, we use it as it is; there's no rounding performed to given number of decimal places as is typically done when representing money. More than probably, in the specification of the output for this case, we might've as well been informed about the desired precision of the price (typically 2 d.p.).

Now if we continue to stick to the interpolated string example above, we won't be able to express this requirement directly in the string.

This is where format strings enter the game.

What are format strings?

Format strings essentially allow us to express the general format of input or output text.

They do so by using placeholders, also referred to as format specifiers. A format specifier is denoted using the % character, followed by a range of other things as we shall learn below.

Since a format string is just a string, it can be denoted in any way that a normal string could be denoted in PHP. That is, we can use single quotes ('), double quotes ("), or even the heredoc (or nowdoc) syntax. The only special character in format strings is the % character — the exact syntax of denoting the string doesn't matter.

A format specifier ends with a letter denoting the type of the value that it represents. This letter is referred to as the specifier (not format specifier).

The table below lists possible specifiers:

SpecifierMeaning
%Represents the literal % character.
dRepresents a decimal integer.
bRepresents a binary integer.
oRepresents an octal integer.
xRepresents a hexadecimal integer, with lowercase letters.
XRepresents a hexadecimal integer, with uppercase letters.
fRepresents a float.
gRepresents a float, whose precision is dealt in terms of significant figures.
cRepresents a character.
sRepresents a string.

So, for instance, the placeholder %d represents a decimal integer, %f denotes a float, %s denotes a string, and so on.

Besides the type, there's a host of other things that we could include in a format specifier. The general format of a format specifier could de expressed as follows:

%[argnum$][flags][width][.precision]specifier

  • argnum specifies the number of the argument that the placeholder refers to. This will only make sense once we consider the string functions where format strings are used.
  • flags describes the padding to apply to the placeholder, including the pad character. For numbers, it also specifies whether or not to precede them with the + sign in case they are positive.
  • width specifies the width of the placeholder (in number of characters).
  • precision specifies the precision for a number or the cut-off length for a string.
  • specifier (as we've seen above) indicates the type of the placeholder. Based on the type, the corresponding data might get automatically coerced to fit the desired type.
The square brackets ([]) here are used to imply that the underlying token is optional.

By default, when formatting a given datum, it's right-aligned, and in the case of numbers, it's only followed by its sign if it's a negative number (which is just how we represent numbers normally in maths).

To override this behavior, we can use certain flags, denoted above as flags. The table below lists the possible flags:

FlagMeaning
-Left-align the actual data to fit the given width of the format specifier.
+Precede the number with its sign, even for positive numbers.
0Pad with 0.
'cPad with the given character c.
<space>Pad with the the space character.

Keep in mind that flags are optional (apparent by the square brackets surrounding them in the syntax above).

The width is an integer that specifies the minimum number of characters to use for the placeholder.

precision follows a . character and specifies the precision of the corresponding value. In the case of an f specifier, it represents the precision in terms of decimal places; in the case of a g specifier, it represents the precision in terms of significant figures. In the case of an s specifier, it represents the cut-off length of the string.

PHP provides us with a handful of string functions to work with these format strings.

Now, these functions can be divided into two categories:

  1. One where we go from data to text.
  2. One where we go from text to data.

In the former category, we have the following functions: printf(), sprintf(), fprintf(), vprintf().

The common phrase 'printf' in each of these functions' name hints to us that they work more or less the same way, printing data to a given entity (which might be the standard output stream, a given string, or a given file).

In the latter category, we have the following functions: sscanf() and fscanf(). Once again, the common phrase 'scanf' in each of the functions' name hints to us that they work more or less the same way, scanning an entity containing some text and then extracting data out of it.

Let's start by exploring the former...

From data to text

The functions printf(), sprintf(), and fprintf() all allow us to work with format strings in order to go from a given set of data to a given piece of text.

That piece of text, depending on the function used, might be printed to standard output (as is the case with echo), to a string variable, or to a file.

Let's commence the exploration with the printf() function.

printf()

The printf() function is used to output some formatted text to the standard output stream (more on streams later in this course).

In simpler words, it's similar to our old friend echo, just with some spectacular formatting capabilities.

Remember that echo is a language construct while printf() is a function.

Here's the syntax of printf():

printf($format, $arg_1, $arg_2, ..., $arg_n)

The first argument is the format string which specifies the general format of the output. Each subsequent argument thereafter is data to be plugged into this format string.

Now, it's time to consider some real examples of working with format strings and printf(), in particular.

Let's first see how to accomplish the task that we accomplished above using an interpolated string.

To recall the task, given an id, a product name, and its price, we have to output all of these in the following format: (<id>) <name> ----- $<price>.

With a format string, this could be accomplished as follows:

<?php

$id = '48xp-89aa-93ks-00s4';
$name = 'Wooden table';
$price = 39.99;

printf('(%s) %s ----- $%f', $id, $name, $price);

Here's what the format string says: $id is a string, likewise it's represented as %s; $name is also a string, likewise it's also represented as %s; $price is a float (not an integer), likewise it's represented as %f.

(48xp-89aa-93ks-00s4) Wooden table ----- $39.990000

Great. But notice the formatting of the product's price; it's a little bit more than what we want.

By default, the f specifier produces a precision of 6 decimal places. In order to reduce this down to 2 d.p, we ought to use the precision parameter in the corresponding format specifier.

Let's do this now:

<?php

$id = '48xp-89aa-93ks-00s4';
$name = 'Wooden table';
$price = 39.99;

printf('(%s) %s ----- $%.2f', $id, $name, $price);
(48xp-89aa-93ks-00s4) Wooden table ----- $39.99

Amazing!

Here's how the format specifier %.2f works. The .2 specifies the precision of the given value (because we are working with a float here, .2 refers to a precision of 2 d.p). The f tells us that the placeholder represents a float value.

As mentioned before, the specifier gives more meaning to the preceding tokens in the whole format specifier. In this case, f gives meaning to .2.

Let's try another example.

Suppose we want to output each of these three pieces of information in tabular format (in the terminal). First, obviously, to give meaning to each row of the table, we need a header row. So let's create that first.

But before that, let's make some assumptions:

  • The product's ID won't be any longer 20 characters.
  • The product's name won't be any longer than 30 characters.
  • The product's price won't be any longer than 10 characters.

We'll use these maximum lengths to pad each column to the respective length. Also note that in order to keep the overall example simple, we'll refrain from creating horizontal and vertical lines in the table (using the _ and | characters).

As stated before, let's start off with the header:

<?php

$id = '48xp-89aa-93ks-00s4';
$name = 'Wooden table';
$price = 39.99;

printf('%-20s %-30s %-10s', 'ID', 'Name', 'Price ($)');
ID Name Price ($)

Let's make intuition of the format specifier %-20s; a similar reasoning can be applied to the rest of the format specifiers.

%-20s has the - flag, a width of 20, and the s specifier. The - flag serves to align the text to the left (and apply the padding to the right). The width of 20, obtained using the assumptions above, is necessary to reserve a large area for the ID in each row. Finally, the s specifier is used because the argument 'ID' is a string.

Now, let's head over to the first (and only) data row of the table:

<?php

$id = '48xp-89aa-93ks-00s4';
$name = 'Wooden table';
$price = 39.99;

printf('%-20s %-30s %-10s', 'ID', 'Name', 'Price ($)');
echo "\n";
printf('%-20s %-30s %-10.2f', $id, $name, $price);
ID Name Price ($) 48xp-89aa-93ks-00s4 Wooden table 39.99

In both the printf() calls here, because the first and second piece of data, corresponding to the first and second format specifiers, respectively, are both strings, the first and second format specifiers remain the same in the format strings.

This, however, isn't the case with the third format specifier.

In the first printf() call, we have a string ('Price ($)') and, likewise, a format specifier meant for a string (i.e. %-10s). In the second printf() call, we have a float, and thus modify the format specifier slightly — changing the s specifier to an f, and also configuring the precision of the float (via .2) in %-10.2f.

So what do you say? Is this simple or not?

Let's experiment a little more, and with this same example.

In the following code, we drop the - flag from each and every format specifier, just to see the difference it makes:

<?php

$id = '48xp-89aa-93ks-00s4';
$name = 'Wooden table';
$price = 39.99;

printf('%20s %30s %10s', 'ID', 'Name', 'Price ($)');
echo "\n";
printf('%20s %30s %10.2f', $id, $name, $price);

And here's that difference:

ID Name Price ($) 48xp-89aa-93ks-00s4 Wooden table 39.99

See how the padding is applied on the left and the text is aligned to the right? This is the default formatting behavior which we modified above using the - flag.

Let's now consider the sprintf() function.

sprintf()

From the perspective of the format string, the sprintf() function works exactly like printf(). But from the perspective of functionality, it's different...a little bit different.

The 's' in sprintf stands for 'string'. Consequently, sprintf means to print inside a string. This is how sprintf() differs from printf() — the latter prints to standard output while the former just produces a string with the desired format.

Syntactically, sprintf() resembles printf():

sprintf($format, $arg_1, $arg_2, ..., $arg_n)

However, where printf() returns the length of the printed string while sprintf() returns the string itself.

sprintf() does NOT print anything by itself! It only returns a string.

Let's consider an example.

In the code below, we obtain a formatted string using sprintf() and then echo it out:

<?php

$id = '48xp-89aa-93ks-00s4';
$name = 'Wooden table';
$price = 39.99;

$formatted_str = sprintf('(%s) %s ----- $%.2f', $id, $name, $price);
echo $formatted_str;
(48xp-89aa-93ks-00s4) Wooden table ----- $39.99

fprintf()

In addition to printf() and sprintf(), fprintf() is yet another function that allows us to go from a given set of data to some formatted text.

But fprintf() dumps the produced text inside a given file. That's apparent by the 'f' in the function's name — it stands for 'file'.

Because fprintf() deals with files, we'll cover it later on in this course once we explore files in PHP in detail.

vprintf(), vsprintf() and vfprintf()

Each of the three functions printf(), sprintf() and fprintf() have analog functions that operate with data in the form of a single array instead of as multiple arguments.

These functions are vprintf(), vsprintf() and vfprintf(), respectively.

The 'v' in each of these function's name stands for 'vector'. If you have experience with C++, a vector is basically just an array. Likewise, vprintf() means that it's printf() that works with a vector (i.e. an array).

Consider the following example where we demonstrate how vprintf() differs from printf():

<?php

$id = '48xp-89aa-93ks-00s4';
$name = 'Wooden table';
$price = 39.99;

vprintf('(%s) %s ----- $%.2f', [$id, $name, $price]);
(48xp-89aa-93ks-00s4) Wooden table ----- $39.99

With printf(), we have to pass each concrete piece of data to the function individually, as a separate argument. However, with vprintf(), we can put all of the data inside an array and then just pass the array to the function.

This might be really handy if all that we have to work with is an array holding the data. We don't have to go very far to match this with a real-world analogy.

For example, when retrieving a record from a database, we might want to get back an indexed array, holding each field of the record in the order in which the underlying query was made. Using, vsprintf() and the returned array, we can directly format the record into a string, without having to manually access each field and pass it over to sprintf() instead.

Text to data

Besides going from data to text, PHP also provides us with certain functions to go from text to data. These functions parse the text for a given format and then extract data out of it.

To name them, we have sscanf() and fscanf().

sscanf()

The sscanf() function is meant to parse a given string based on a given format and then extract data out of it.

As with sprintf(), the 's' in sscanf() stands for 'string', but here it doesn't meant that the function returns a string like sprintf(); instead, sscanf() takes in the text in the form of a string argument.

Here's the signature of sscanf():

sscanf($string, $format[, $arg_1, $arg_2, ..., $arg_n])

The first argument is the string to parse for data. The second argument is the format string.

From this point onwards, we can either pass on variables as (reference) arguments to the function in order to get them populated with the respective data, or let the function collect all the data in the form of an array and then return it.

If no argument is provided after $format, we get an array in return holding the extracted data. Otherwise, the extracted data is dumped into the provided variables in the same order that it was expressed in the format string.

Format strings for input text work differently!

One important thing to note when working with sscanf() (and even fscanf()) is that the format string passed to it has a different parsing ruleset applied to it.

In particular, we can't have flags, width or precision in format specifiers in format strings describing input.

For instance, the format string '%.2f' is invalid if we pass it to sscanf(). This is evident in the following illustration:

<?php

$str = '50.99';

// Can't use %.2f in a format string describing input data!
sscanf($str, '%.2f', $value);
PHP Fatal error: Uncaught ValueError: Bad scan conversion character "." in <file>:4 Stack trace: ...

For those coming from a C background, this behavior of sscanf() is definitely different from that of sscanf() in C.

Let's consider a quick example to understand this.

Suppose we have a string with the phrase 'Price: $' followed by a float (for instance, 'Price: $15.99'). Our job is to extract out the exact price from this string.

Well, thanks to sscanf(), this job is pretty easy to accomplish, as demonstrated below:

<?php

$str = 'Price: $15.99';
sscanf($str, 'Price: $%f', $price);

var_dump($price);
float(15.99)

As you can see, a variable $price is passed as an argument to sscanf() here so that the function could dump the extracted float into it. The var_dump() call clearly shows us that, indeed, the variable holds a float.

If we were to change either the main string (i.e. $str) or the format string and keep the other as before, the extraction would fail to be correct.

This can be seen as follows:

<?php

$str = 'Price: $15.99';
sscanf($str, 'Price: %f', $price);

var_dump($price);

We've removed the $ sign preceding the %f specifier in the format string, while $str is exactly the same as before. Now, if we run this code, $price doesn't turn out to be what we expect it to:

NULL

And it shouldn't rightly so. The %f specifier only represents a floating-point number (optionally, with spaces preceding it); it doesn't represent $ prices. Likewise, the format string here isn't compliant with the main string $str and therefore $price is NULL.

Surely, though, if we remove the $ from the main string as well, our example would resume its expected behavior:

<?php

$str = 'Price: 15.99';
sscanf($str, 'Price: %f', $price);

var_dump($price);

Time for another example.

In the following code, we have a string consisting of a word, followed by a space, followed by an integer, representing the word's count in a large piece of text.

<?php

$str = 'awesome 37';

Once again, with the help of sscanf(), it's really simple to extract both these pieces of data from the string — just use an %s followed by a %d:

<?php

$str = 'awesome 37';
sscanf($str, '%s %d', $word, $count);

var_dump($word);
var_dump($count);
string(7) "awesome" int(37)

And that's awesome!

fscanf()

The fscanf() function allows us to extract data out from a file based on a given format. Akin to fprintf(), the 'f' in the function's name stands for 'file'.

As before, because fscanf() also deals with files, we'll cover it later on in this course once we explore files in PHP in detail.