PHP References

Chapter 12 40 mins

Learning outcomes:

  1. What are references
  2. Assign by reference (=&)
  3. Pass by reference (&$param)
  4. Return by reference (&function_name)
  5. How references work internally in PHP

Introduction

In many ways, PHP is quite different and unique in its own right if we compare it with other scripting languages such as JavaScript, Python and Ruby.

One of those ways in which it's different is that PHP allows us to modify given entities (variables, array elements, object instances) with the help of other entities. This is powered by the idea of references in the language.

In this chapter, we aim to unravel the concept of references in PHP. In particular, we'll see what exactly are references and when might we need them; the three main ways to leverage references in our code; the internals of PHP regarding references; and much more.

Understanding references is paramount to be able to predictably and effectively work with certain kinds of values in PHP and, in general, write code that behaves expectedly.

What are references?

In simple words,

References in PHP are simply a mechanism to get one or more entities to refer to the exact same content in memory.

First of all, let's clarify the term 'entity' that we've used here.

An 'entity' refers to a variable, an array element, or an object instance (more on that later in the PHP OOP unit). In general, it refers to anything that holds a value in PHP.

With this, it means that a reference is a way to get one or more variables, or array elements, or another such entities, to refer to the exact same location in memory, and likewise, to the exact same value.

When the underlying value is changed using either of those entities, all of the remaining entities get updated likewise. That's simply because they all point to the same location in memory, so when the data in that location is updated, all entities thereafter showcase that latest data.

There are three kinds of references in PHP:

  1. Assign by reference
  2. Pass by reference
  3. Return by reference

Let's explore each of these.

Assign by reference

One way to create references in PHP is via assign by reference.

What happens here is just that while assigning a value to a variable, we use an extended syntax to create a reference to the underlying value.

That extended syntax is simply to have an ampersand (&) following the assignment operator (=), something like the following:

entity_1 =& entity_2

When this statement is executed, both the entities point to the exact same value.

Just like entity_1 = entity_2 resolves to the value of entity_2, entity_1 =& entity_2 also resolves to entity_2. Remember that both of these are expressions in PHP.

Let's consider a really simple example.

In the code below, we have a variable $a holding an integer 10 and a second variable $b assigned the value $a:

<?php

$a = 10;
$b = $a;

Clearly we know by this point in this course that modifying $a won't have any effect on $b, and vice versa. This is because PHP makes a copy of the underlying value (stored in $a) and then stores that in $b.

Let's review this concretely with some echos:

<?php

$a = 10;
$b = $a;

// Changing $a
$a = 500;
echo "\$a: $a\n\$b: $b\n\n";

// Changing $b
$b = 1000;
echo "\$a: $a\n\$b: $b\n";
$a: 500 $b: 10 $a: 500 $b: 1000

See? Changing $a doesn't change $b, and similarly changing $b doesn't change $a either.

So far, so good.

Now, let's incorporate assign by reference in this very example. Previously, we assigned $a to $b normally; this time we'll assign it by reference:

<?php

$a = 10;
$b =& $a; // Changing $a $a = 500; echo "\$a: $a\n\$b: $b\n\n"; // Changing $b $b = 1000; echo "\$a: $a\n\$b: $b\n";
$a: 500 $b: 500 $a: 1000 $b: 1000

The statement $a =& $b is the only difference in this code. It's what performs the reference assignment. Once this code executes, both $b and $a effectively becomes aliases, i.e. different names referring to the same value in memory, i.e. 10 (because that's what was assigned).

As we change $a in line 7, $b also gets changed. Similarly, when we change $b in line 11, $a also gets changed.

This is amazing, isn't it?

& is not really an operator

Keep in mind that it's not necessary to have the ampersand (&) come immediately after the = sign. However, the norm is to pair it with the = sign, as in $a =& $b, instead of $a = &$b.

This is done in order to emphasize on the fact that & is NOT really an operator, per se.

In the expression $a = &$b, it seems as if & operates on $b, that is, it retrieves a reference pointing to the value stored in $b. Precisely speaking, that's not how PHP works.

A better way to think about &, in the context of assignment, is that it provides us with a new kind of assignment — one where a reference gets assigned.

Pass-by-reference and return-by-reference, once we learn them later on in this chapter, could be thought of in the same way, i.e. & provides us with a new way of passing arguments and of returning values.

Assign by reference isn't just limited to variables. Remember we used the term 'entities' in the definition above? As one example, it can also be used on array elements.

In the code below, we use two reference assignments: $arr[0] assigned by-reference to $a, and $arr[1] assigned by-reference to $b:

<?php

$arr = [10, 50, 60];

$a =& $arr[0];
$b =& $arr[1];

Now if we change $a, $arr[0] would change as well, and vice versa. On the same lines, if we change $b, $arr[1] would change as well, and vice versa again. This is demonstrated as follows:

<?php

$arr = [10, 50, 60];

$a =& $arr[0];
$b =& $arr[1];

$a = -5;
echo "\$a: $a\n\$arr[0]: {$arr[0]}\n\n";

$arr[1] = 99;
echo "\$b: $b\n\$arr[1]: {$arr[1]}\n\n";

echo "\$arr[2]: {$arr[2]}\n";
$a: -5 $arr[0]: -5 $b: 99 $arr[1]: 99 $arr[2]: 60

This is purely magical.

Let's consider yet another example, this time a bit more challenging to reason about.

In the code above, what would happen if we copied $arr to another variable $arr2 and then modified $arr2[0] and $arr2[1]? Would the references in $arr continue on in $arr2?

Well, let's see:

<?php

$arr = [10, 50, 60];

$a =& $arr[0];
$b =& $arr[1];

// Copy $arr to $arr2
$arr2 = $arr; $a = -5; echo "\$a: $a\n\$arr[0]: {$arr[0]}\n\$arr2[0]: {$arr2[0]}\n\n"; $arr[1] = 99; echo "\$b: $b\n\$arr[1]: {$arr[1]}\n\$arr2[1]: {$arr2[1]}\n\n"; echo "\$arr[2]: {$arr[2]}\n\$arr2[2]: {$arr2[2]}\n";
$a: -5 $arr[0]: -5 $arr2[0]: -5 $b: 99 $arr[1]: 99 $arr2[1]: 99 $arr[2]: 60 $arr2[2]: 60

As can be seen, the references do continue. As we change $a (or $arr[0]), $arr[1] also changes, and vice versa as well. The same applies to $b, $arr[1] and $arr2[1].

Let's see why and how does this happen...

Copying an array with elements that are references

In the code above, the first and second elements of $arr, as we know, are both references. In simpler words, they both are reference values. When $arr2 = $arr executes, a copy of each element in $arr is made and stored in $arr2. The first and second elements of $arr obviously also get copied in this process.

This means that when the first element of $arr is copied, it's the reference that gets copied and stored in $arr2[0], NOT the underlying value pointed to by the reference (i.e. 10).

The same applies to the second element of $arr, which gets copied and stored in $arr2[1].

Now since the references are mere copies, they point to the exact same respective locations in memory. That is, $arr2[0], $arr[0] and $a are aliases (identical to each other), whereas $arr2[1], $arr[1] and $b are aliases of one another.

Simple?

Pass by reference

Besides assign by reference, another way to work with references in PHP is pass by reference.

Pass by reference is when we pass arguments to a function via their references. More specifically, we label parameters in the function's definition as reference parameters.

This is done by preceding the parameter with an ampersand (&), as shown below:

function function_name(&$param_1, ...) { ... }

Recall that passing arguments to functions is merely the assignment of those arguments to the given parameters (which are simply local variables). So, theoretically, pass-by-reference is simply a special kind of assign-by-reference.

The pass-by-reference feature is commonly used by many functions in PHP that want to modify the actual variables sent into them. One familiar example is the sort() array function.

As we know, sort() takes an array and reorders the elements in it in ascending order. The fact that sort() is able to modify the array sent to it is by virtue of its first parameter which is a reference parameter.

So when we call sort($arr), where $arr is an array, we are actually providing sort() with a reference to $arr, NOT a copy of it.

Let's consider pass-by-reference on a custom function that works with an array.

Suppose we have an array of numbers and want to create a function such that when given this (or any other) array, it transforms every number in the original array to the square of that number. The function must not return anything; the provided array must be modified in-place.

Let's take this example to the glyphs of code. Here's the array:

<?php

$nums = [-5, -3, 12, 29];

And here's our take on the function square():

<?php

$nums = [-5, -3, 12, 29];

function square($arr) {
   for ($i = 0, $len = count($arr); $i < $len; $i++) {
      $arr[$i] *= $arr[$i];
   }
}

Now, let's try running square() over the $nums array and then inspect the array:

<?php

$nums = [-5, -3, 12, 29];

function square($arr) {
   for ($i = 0, $len = count($arr); $i < $len; $i++) {
      $arr[$i] *= $arr[$i];
   }
}

square($nums);
print_r($nums);
Array ( [0] => -5 [1] => -3 [2] => 12 [3] => 29 )

As we run this code, we notice that there's a logical error in it — the array doesn't hold the square of each number after the completion of square().

What could possibly be the problem here?

Well, the problem is in how PHP passes arrays to functions.

In PHP, when an array is assigned to another entity, the entire array is copied over. Since passing arguments to functions is basically just assigning values to variables, when we pass $nums to square() above, the $arr parameter receives a copy of this array.

Thereafter, all the modifications performed inside square() to $arr apply to this copied array, NOT to the original $nums array. Thus, in the end, when square() exits, the global $nums array remains unchanged.

How could we solve this problem? Well, just make $arr a reference parameter.

Let's accomplish this now:

<?php

$nums = [-5, -3, 12, 29];

function square(&$arr) { for ($i = 0, $len = count($arr); $i < $len; $i++) { $arr[$i] *= $arr[$i]; } } square($nums); print_r($nums);
Array ( [0] => 25 [1] => 9 [2] => 144 [3] => 841 )

And voila! $nums is indeed modified following the invocation of square().

By virtue of &$arr, when we now call square($nums), a reference to $nums is passed on to the $arr parameter and, consequently, all the modifications to $arr get applied to $nums.

Return by reference

The third and final way to work with references in PHP is to return them, i.e. use return by reference.

The idea is that a function returns a value that is a reference. Then this value is used to create a second reference, obviously at the time and site of the function's invocation.

To make a function return a reference, we have to precede the function's name in its definition with...you guessed it — an ampersand (&):

function &function_name(...) { ... }

When a function returns a reference, its invocation expression can be used in the context of assign-by-reference and pass-by-reference.

Let's consider an example to help clarify what this means.

Suppose we have a function get_first_item() that takes in a given list and returns a reference to its first item. By calling this function, we obtain a reference to the first item of the list. This reference can be stored inside a variable and then this variable modified in order to modify the underlying list's item.

For now, we model the list given to the function as an indexed array. However, in reality, we could have any data structure representing the list, such as an associative array in PHP, or maybe even a linked list. In such a case, the function would be really handy because it would abstract away the entire logic of figuring out the first item of the given list and then providing us with access to it.

Here's the function's definition:

<?php

function &get_first_item(&$arr) {
   return $arr[0];
}

Notice the reference parameter &$arr — it's important to make $arr a reference parameter so that we don't return a reference to the first item of a copied array. Also notice the & preceding the function's name.

Remember that passing an array to a function in a non-reference parameter results in the array being copied.

Let's now test this function on an dummy array $nums:

<?php

function &get_first_item($arr) {
   return $arr[0];
}

$nums = [1, 2, 3];

$first_element =& get_first_item($nums);
$first_element = 50;

print_r($nums);

First we call get_first_item() on $nums and then assign its return value to $element by-reference. Expectedly, this should get $element to become an alias for $nums[0]. This is inspected by the following code where we first modify $element and then print the $nums array.

Let's see the array printed:

Array ( [0] => 50 [1] => 2 [2] => 3 )

Just as expected, $element indeed is an alias of $nums[0]$nums has been modified in-place; its first element is 50 which is what we set $element to in line 11.

References are really powerful in PHP.

One important thing to note is that returning a reference value from a function that's not labeled as a reference-returning function would simply NOT work as expected.

For instance, consider the following modification to the code above:

<?php

// The function is no longer labeled to return by-reference.
// There is no '&' preceding the function's name here.
function get_first_item($arr) { return $arr[0]; } $nums = [1, 2, 3]; $element =& get_first_item($nums); $element = 50; print_r($nums);

Everything is the same as before, except that the function's name in its definition isn't preceded by an ampersand (&), i.e. the function does not return by-reference.

If we run the code again, $element doesn't seem to be an alias of $nums[0]:

Array ( [0] => 1 [1] => 2 [2] => 3 )

And rightly so. But why?

Well, there's a simple way to reason about this behavior.

If a function returns a reference, label it as one!

Consider the following code:

<?php

function ref(&$var) {
   return $var;
}

$x = 10;
$y =& ref($x);

We have a function ref() with a reference parameter $var that is returned as it is, and two global variables $x and $y, one that is initialized to 10 while the other to the return value of ref().

Now just by reading this code, we might think that since ref() returns $var, which is a reference, the statement ref($x) would be returning a reference to $x, and then $y =& ref($x) would make $y an alias of $x.

Surprisingly, that's NOT the case at all!

If we change $y, $x won't change, as demonstrated below:

<?php

function ref(&$var) {
   return $var;
}

$x = 10;
$y =& ref($x);

// Change $y and see if $x changes as well.
$y = 500;
echo "\$x: $x\n\$y: $y";
PHP Notice: Only variables should be assigned by reference in <file> on line 8 Notice: Only variables should be assigned by reference in <file> on line 8 $x: 10 $y: 500

This confirms that $y is not an alias of $x.

It's now time to reason about this behavior. For that, let's go through the code above step-by-step and see where the problem originates:

  • In order to execute the statement $y =& ref($x), first ref($x) is called.
  • Because $var in ref() is a reference parameter, its value becomes a reference to the value stored in $x. In other words, $var becomes an alias of $x.
  • Next up, $var (which is a reference value) is returned.
  • When returning, PHP creates a separate space in memory to hold the returned value. (This could be thought of as a separate variable $return_value to help understand the idea better.)
  • Now, since the function ref() isn't labeled to be a reference-returning function, the value stored in this separate memory space isn't the reference value $var, but rather the value pointed to by that reference, i.e. 10. (This is analogous to assigning a reference value to an entity without using assign-by-reference, i.e. something like $return_value = $var.)
  • Hence, 10 is returned in the end by ref($x).
  • Following this, the value 10 is assigned-by-reference to $y.

To boil it down, $y does NOT get assigned a reference to the value pointed to by $x once $x is passed into the ref() function.

And the reason for this is simple: because ref() is NOT labeled to be a reference-returning function.

Let's now do that and reason about the code again:

<?php

function &ref(&$var) {
   return $var;
}

$x = 10;
$y =& ref($x);
  • While executing $y =& ref($x), first ref($x) is called.
  • The $var parameter in ref() (and the variable $x) becomes a reference to the value currently stored in $x.
  • $var is returned.
  • When returning, PHP creates a separate space in memory to hold the returned value.
  • This time, since the function ref() is labeled to be a reference-returning function, the value dumped into this separate memory space is the reference value $var. (This is analogous to assigning a reference value to an entity with using assign-by-reference, i.e. something like $return_value =& $var.)
  • Hence, a reference to $x is returned in the end by ref($x).
  • Following this, the statement $y =& ref($x) effectively becomes $y =& $x. Thus, $y becomes an alias of $x.

Let's confirm this final proposition by modifying $y and then echoing the value of $x:

<?php

function &ref(&$var) {
   return $var;
}

$x = 10;
$y =& ref($x);

// Change $y and see if $x changes as well.
$y = 500;
echo "\$x: $x\n\$y: $y";
$x: 500 $y: 500

Remarkable.

Note that it's really important to realize that the return-value analogy employed in the steps above is merely a way to better understand the difference caused by labeling a function as a reference-returning function vs. not doing so — whether the PHP engine really proceeds this way or not is by no means asserted here.

The PHP engine is obviously free to use any way to implement references and distinguish between functions labeled as reference-returning functions and those not labeled as one, whereby the functions return reference values.

How references work internally?

It's really helpful to learn a little bit about how references are implemented internally in PHP. This might help clarify some of the magical power of references.

Still though, because such a discussion relating to the internals of an engine can quickly delve into extreme levels of sophistications, we'll abstract away many low-level details from it.

To start with, note that all values in PHP are represented by a large structure describing that value and also its type.

For instance, when a variable $x holds an integer 10, here's how the value 10 gets represented in PHP:

How PHP represents values internally (simplified).
How PHP represents values internally (simplified).

The structure obviously contains the actual value, i.e. 10, in it and also the type information for the value, i.e. it's an integer.

If you've ever worked with C before which is the language in which the PHP engine is coded, you'll know that it's not possible to determine the type of an arbitrary value at runtime in it. This is simply because this information is already known in the source code.

Henceforth, the PHP engine has to manually determine this for every value in PHP and store that information in the structure representing a value. That's why each value's structure also holds its corresponding type information in it.

As we all know, there are many different kinds of values in PHP. We have integers, floats, Booleans, strings, and so on and so forth.

Another kind, a rather special one, accessible only from within the engine, is reference values.

The structure representing a reference value has its type suggesting that it's a reference and the actual underlying value pointing to the value denoted by the reference.

For instance, if we do $y =& $x, where $x = 10, here's what $y would hold:

How PHP stores references internally (simplified).
How PHP stores references internally (simplified).

Since $y =& $x also makes $x a reference value, $x will have the exact same configuration as that of $y shown above.

Stating it once again, from the perspective of PHP, there's absolutely NO way to determine whether a given value is a reference value or not — this is kept entirely local to the engine.

Likewise, from the standpoint of PHP, there's isn't really that much difference if you work with an entity storing a value v directly vs. if you work with an entity storing the same value v but via a reference.

One might think that an entity holding v directly is not the same as an entity holding a reference that points to v. However, that's not the case. In fact, had this been the case, dealing with references in PHP would've become much more complex and low-level (somewhat similar to pointers in C, if you've heard of them).

How 'normal' and 'by-reference' assignments work?

It's also paramount to know that when assigning a value to a variable, where the value is a reference value, the variable doesn't store that reference value, but rather the value pointed to by the reference.

An example is shown as follows:

<?php

$a = 6;
$b =& $a;

// $a holds a reference value at this point.
// So, what would happen when we normally assign it to $c?  
$c = $a;

After the execution of $a =& $b, both $a and $b hold references that point to the same value in memory.

Next up, the statement $c = $a gets executed. The suspense lies in what happens here. That is, does $c becomes a reference value itself (and likewise and alias for $a and $b) or not.

Well, just by reading the statement $c = $a, we can naturally reason that we don't mean to make $c an alias for $a (and $b); instead, $c is meant to be an independent variable that just holds the value pointed to by $a, i.e. 6.

Based on this natural, intuitive reasoning, the way PHP executes $c = $a is as follows: the reference value stored in $a is not copied into $c; rather, the value pointed to by that reference, i.e. 6, is copied and stored in $c.

Likewise, modifying $c won't have any effect on $a (or $b).

The following snippet demonstrates this:

<?php

$a = 6;
$b =& $a;

$c = $a;

$c = 50;
echo "\$a: $a\n\$b: $b\n\$c: $c";
$a: 6 $b: 6 $c: 50

To elaborate it, whenever we 'normally' assign an entity entity_2 to an entity entity_1, i.e. entity_1 = entity_2, it's first checked whether entity_2 is a reference.

  • If entity_2 is a reference, then the value pointed to by that reference is copied into entity_1.
  • Else, the value stored in entity_2 (which is not a reference) is copied into entity_1.

In case we assign a value to an entity 'by-reference', i.e. entity_1 =& entity_2, the flow is a little bit different. As before, it's first checked whether entity_2 is a reference or not.

  • If entity_2 is a reference, then it's merely copied into entity_1.
  • Else, a new reference value is created, pointing to the corresponding value currently stored in entity_2, and then both the entities are assigned this new reference value.

This is generalized as follows:

How entity_1 = entity_2 is resolved by PHP
How entity_1 = entity_2 is resolved by PHP
How entity_1 &= entity_2 is resolved by PHP
How entity_1 &= entity_2 is resolved by PHP