PHP References: Assign, Pass, Return

Why talk about references in PHP? Because most coders of PHP would not be able to correctly predict the output of this program. Will this terminate or run forever?

$orig_array = [4,5,6];foreach ($orig_array as $elem) {
$orig_array[] = $elem + 1;
echo $elem;
}

or, reason about the output of this piece of code:

class SimpleClass {
public $var1;
}
$sc = new SimpleClass();
$sc->var1 = 23;
var_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(23)
}
$sd = $sc;$sd->var1 = 34;// reference like behaviourvar_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(34)
}
$sd = 1001;// non-reference like behaviourvar_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(34)
}
$se = &$sc;$se = 1002;// reference like behaviour?var_dump($sc);php shell code:1:
int(1002)

Confused much? Cool, let’s dive into it then.

Let’s start where php.net manual starts for references (More here: http://php.net/manual/en/language.references.php). There are three types of usages for references:

  1. Assign by reference

Let’s take a closer look at assignment by reference. Passing and returning by reference are comparatively much simpler.

Assignment by Reference

We will divide the discussion into three types of variables

Typical variables

$a = 'xyz';$b = &$a;$b = 'zxy';var_dump($a);zxy

For variables containing ints, strings etc, references to the variable and the variable itself are completely identical to each other, and all of them point to the same value.

zval

Internally, PHP stores all the variables in a struct called zval. It has the following four properties:

  1. type: IS_STRING, IS_LONG, IS_BOOL etc.

In the above example, $a and $b are aliases and both of these point to the same zval structure.

{
type: IS_STRING,
value: ‘zxy’,
is_ref__gc: 1,
ref_count__gc: 2
}

Changing the value of one changes the value for other, because the internal zval structure has changed.

Copy-on-write (COW)

As of PHP7, when a new variable is initialized with an existing variable ($b = $a;), then internally, both these variables are made to point to the same zval structure. However, as soon as we change the value of one of the variables ($b = 2;), a copy is made of the original variable and modified, while the other variable still points to the original zval object.

In case the initialization is with a reference ($b = &$a;), then upon modification, the existing zval structure is changed and hence value for both the variables is updated.

Arrays

Internally PHP stores arrays in a data structure called HashTables. It’s primary data store is a C-array which stores the PHP array’s values indexed by the hash of the PHP array’s key.

Simple assignment

$a = [1,2,3];$b = $a;$b[] = 4;var_dump($a);php shell code:1:
array(3) {
[0] =>
int(1)
[1] =>
int(2)
[2] =>
int(3)
}

This employs COW as described above. Both variables point to the same HashTable until $b is modified, leading to a copy being made and modified. $a still holds the original HashTable.

$a = [1,2,3];$b = &$a;$b[] = 4;var_dump($a);php shell code:1:
array(4) {
[0] =>
int(1)
[1] =>
int(2)
[2] =>
int(3)
[3] =>
int(4)
}

In the case of assignment by reference, upon modification, the original HashTable is modified and hence, both $a and $b hold the value 4 as its last element.

foreach

$a = [1,2,3];foreach ($a as $item) {
$a[2] = 0;
echo $item."\n";
}
var_dump($a);

What do you think the output is?

The original array has been modified to [1,2,0], while foreach prints 1,2,3.

This happens because as soon as the array $a is modified inside the foreach loop, a copy of the array is created for modification, while foreach continues to run on the original copy (basically, Copy-on-write).

What about this next piece of code? Can you predict the output?

$a = [1,2,3,4,5];foreach ($a as $item1) {
foreach ($a as $item2) {
if (($item1 == 1) && ($item2 == 1)) {
unset($a[1]);
}
echo $item1.",".$item2." ; ";
}
echo "\n";
}

Output:

1,1 ; 1,2 ; 1,3 ; 1,4 ; 1,5 ;
2,1 ; 2,3 ; 2,4 ; 2,5 ;
3,1 ; 3,3 ; 3,4 ; 3,5 ;
4,1 ; 4,3 ; 4,4 ; 4,5 ;
5,1 ; 5,3 ; 5,4 ; 5,5 ;

Let’s look at this one closely. When $a is initialized, a new HashTable object is created internally.

Internally PHP7 runs foreach with the help of HashPointers, which are basically iterator like objects registered with the array. Whenever there is a modification to the array, the HashPointers are notified of the change.

After the first iteration of the outer loop and first iteration of the inner loop, the original array is modified. This notifies both the outer and the inner loop HashPointers. They create a copy of the original array and start iterating over them.

On second iteration of the inner foreach loop, it is the modified array which is in used.

Since the outer foreach loop keeps working with the original copy, it also runs for the 2nd element in the array.

And let’s do a last one to take the point home.

$a = [1,2,3,4,5];foreach ($a as &$item1) {
foreach ($a as &$item2) {
if (($item1 == 1) && ($item2 == 1)) {
unset($a[1]);
}
echo $item1.",".$item2." ; ";
}
echo "\n";
}

Output:

1,1 ; 1,3 ; 1,4 ; 1,5 ;
3,1 ; 3,3 ; 3,4 ; 3,5 ;
4,1 ; 4,3 ; 4,4 ; 4,5 ;
5,1 ; 5,3 ; 5,4 ; 5,5 ;

Here the unset command changes the original variable $a as well as the ones being used by the both foreach loops (all of them point to the same zval internally).

Objects

The other major class of variables when it comes to references is objects. Let’s run through the example given at the beginning of this article. The apparent problem with this code is that at one place the variable for object behaves like reference ($sd->var1 = 34; changes the var1 values in $sc), while at another ($sd = 1001; does not change the value of $sc).

class SimpleClass {
public $var1;
}
$sc = new SimpleClass();
$sc->var1 = 23;
var_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(23)
}
$sd = $sc;$sd->var1 = 34;// reference like behaviourvar_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(34)
}
$sd = 1001;// non-reference like behaviourvar_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(34)
}
$se = &$sc;$se = 1002;// reference like behaviour?var_dump($sc);php shell code:1:
int(1002)

This is explained simply when we consider how PHP stores object variables. The zval structure for an object stores a unique identifier for the object in the value parameter.

A construct like $sc = new SimpleClass(); creates a new object of SimpleClass and assigns the id of this object to the $sc variable. It is this unique identifier which is pointed to by the $sc variable, and not the object itself. Whenever this variable is accessed with a class-like construct (for ex, $sc->var1), PHP internally fetches the object pointed to by the id in $sc and gets the object’s property or method.

Now when we do $sd = $sc;, $sd also comes to have the same id value (because it is pointing to the same zval struct), and calling any class like constructs on $sd (like $sd->var1), ends up making changes to the original object, and therefore to $sc.

// when we do $sd = $sc, both point to this zval structure
{
type: IS_OBJECT,
value: obj_id_1,
is_ref__gc: 0,
ref_count__gc: 2
}

Assigning $sd to another variable ($sd = 1001;) simply employs Copy-on-write on the zval, and $sd starts pointing to a brand new zval, leaving $sc untouched.

// $sd = 1001 creates a new zval struct
// $sc points to the original zval
{
type: IS_LONG,
value: 1001,
is_ref__gc: 0,
ref_count__gc: 1
}

However, when we modify a variable which references $sc ($se = &$sc; $se= 1002;), the $sc variable comes to contain a value different from the object id, and no longer points to the original object.

Passing by Reference

function foo(&$var) {
$var = 213;
}
$var1 = 312;foo($var1);var_dump($var1);213

Passing by reference makes sure that the argument in the function is an alias of the variable being passed in function call. Modifying the alias would change the original variable passed to the function.

Return by reference

class BarWrap {
private $barTest = 123;
function &bar() {
return $this->barTest;
}
function printBarTest() {
echo $this->barTest;
}
}
$bw = new BarWrap();$bt = &$bw->bar();$bt = 234;echo $bw->printBarTest();234

Returning by reference helps you return variables which can be modified by the caller as opposed to returning a copy of the variable.

Note that in returning by reference, the ampersand operator has to be used in the function definition as well as during function call.

I code, among other things.