PHP References: Assign, Pass, Return
Why talk about references in PHP? Because most coders of PHP would not be able to correctly predict the output of this program. Will this terminate or run forever?
$orig_array = [4,5,6];foreach ($orig_array as $elem) {
$orig_array[] = $elem + 1;
echo $elem;
}
or, reason about the output of this piece of code:
class SimpleClass {
public $var1;
}$sc = new SimpleClass();
$sc->var1 = 23;var_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(23)
}$sd = $sc;$sd->var1 = 34;// reference like behaviourvar_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(34)
}$sd = 1001;// non-reference like behaviourvar_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(34)
}$se = &$sc;$se = 1002;// reference like behaviour?var_dump($sc);php shell code:1:
int(1002)
Confused much? Cool, let’s dive into it then.
Let’s start where php.net manual starts for references (More here: http://php.net/manual/en/language.references.php). There are three types of usages for references:
- Assign by reference
- Pass by reference
- Return by reference
Let’s take a closer look at assignment by reference. Passing and returning by reference are comparatively much simpler.
Assignment by Reference
We will divide the discussion into three types of variables
Typical variables
$a = 'xyz';$b = &$a;$b = 'zxy';var_dump($a);zxy
For variables containing ints, strings etc, references to the variable and the variable itself are completely identical to each other, and all of them point to the same value.
zval
Internally, PHP stores all the variables in a struct
called zval
. It has the following four properties:
- type:
IS_STRING, IS_LONG, IS_BOOL
etc. - value
- Is_ref__gc: Whether this variable is a reference or not
- ref_count__gc: The number of variables pointing to this zval
In the above example, $a
and $b
are aliases and both of these point to the same zval structure.
{
type: IS_STRING,
value: ‘zxy’,
is_ref__gc: 1,
ref_count__gc: 2
}
Changing the value of one changes the value for other, because the internal zval structure has changed.
Copy-on-write (COW)
As of PHP7, when a new variable is initialized with an existing variable ($b = $a;
), then internally, both these variables are made to point to the same zval structure. However, as soon as we change the value of one of the variables ($b = 2;
), a copy is made of the original variable and modified, while the other variable still points to the original zval object.
In case the initialization is with a reference ($b = &$a;
), then upon modification, the existing zval structure is changed and hence value for both the variables is updated.
Arrays
Internally PHP stores arrays in a data structure called HashTable
s. It’s primary data store is a C-array which stores the PHP array’s values indexed by the hash of the PHP array’s key.
Simple assignment
$a = [1,2,3];$b = $a;$b[] = 4;var_dump($a);php shell code:1:
array(3) {
[0] =>
int(1)
[1] =>
int(2)
[2] =>
int(3)
}
This employs COW as described above. Both variables point to the same HashTable
until $b is modified, leading to a copy being made and modified. $a still holds the original HashTable
.
$a = [1,2,3];$b = &$a;$b[] = 4;var_dump($a);php shell code:1:
array(4) {
[0] =>
int(1)
[1] =>
int(2)
[2] =>
int(3)
[3] =>
int(4)
}
In the case of assignment by reference, upon modification, the original HashTable
is modified and hence, both $a and $b hold the value 4 as its last element.
foreach
$a = [1,2,3];foreach ($a as $item) {
$a[2] = 0;
echo $item."\n";
}var_dump($a);
What do you think the output is?
The original array has been modified to [1,2,0]
, while foreach prints 1,2,3
.
This happens because as soon as the array $a
is modified inside the foreach loop, a copy of the array is created for modification, while foreach continues to run on the original copy (basically, Copy-on-write).
What about this next piece of code? Can you predict the output?
$a = [1,2,3,4,5];foreach ($a as $item1) {
foreach ($a as $item2) {
if (($item1 == 1) && ($item2 == 1)) {
unset($a[1]);
}
echo $item1.",".$item2." ; ";
}
echo "\n";
}
Output:
1,1 ; 1,2 ; 1,3 ; 1,4 ; 1,5 ;
2,1 ; 2,3 ; 2,4 ; 2,5 ;
3,1 ; 3,3 ; 3,4 ; 3,5 ;
4,1 ; 4,3 ; 4,4 ; 4,5 ;
5,1 ; 5,3 ; 5,4 ; 5,5 ;
Let’s look at this one closely. When $a
is initialized, a new HashTable
object is created internally.
Internally PHP7 runs foreach with the help of HashPointers, which are basically iterator like objects registered with the array. Whenever there is a modification to the array, the HashPointers are notified of the change.
After the first iteration of the outer loop and first iteration of the inner loop, the original array is modified. This notifies both the outer and the inner loop HashPointers. They create a copy of the original array and start iterating over them.
On second iteration of the inner foreach loop, it is the modified array which is in used.
Since the outer foreach loop keeps working with the original copy, it also runs for the 2nd element in the array.
And let’s do a last one to take the point home.
$a = [1,2,3,4,5];foreach ($a as &$item1) {
foreach ($a as &$item2) {
if (($item1 == 1) && ($item2 == 1)) {
unset($a[1]);
}
echo $item1.",".$item2." ; ";
}
echo "\n";
}
Output:
1,1 ; 1,3 ; 1,4 ; 1,5 ;
3,1 ; 3,3 ; 3,4 ; 3,5 ;
4,1 ; 4,3 ; 4,4 ; 4,5 ;
5,1 ; 5,3 ; 5,4 ; 5,5 ;
Here the unset command changes the original variable $a as well as the ones being used by the both foreach loops (all of them point to the same zval
internally).
Objects
The other major class of variables when it comes to references is objects. Let’s run through the example given at the beginning of this article. The apparent problem with this code is that at one place the variable for object behaves like reference ($sd->var1 = 34;
changes the var1 values in $sc), while at another ($sd = 1001;
does not change the value of $sc).
class SimpleClass {
public $var1;
}$sc = new SimpleClass();
$sc->var1 = 23;var_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(23)
}$sd = $sc;$sd->var1 = 34;// reference like behaviourvar_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(34)
}$sd = 1001;// non-reference like behaviourvar_dump($sc);php shell code:1:
class SimpleClass#1 (1) {
public $var1 =>
int(34)
}$se = &$sc;$se = 1002;// reference like behaviour?var_dump($sc);php shell code:1:
int(1002)
This is explained simply when we consider how PHP stores object variables. The zval
structure for an object stores a unique identifier for the object in the value parameter.
A construct like $sc = new SimpleClass();
creates a new object of SimpleClass and assigns the id of this object to the $sc variable. It is this unique identifier which is pointed to by the $sc variable, and not the object itself. Whenever this variable is accessed with a class-like construct (for ex, $sc->var1
), PHP internally fetches the object pointed to by the id in $sc and gets the object’s property or method.
Now when we do $sd = $sc;
, $sd also comes to have the same id value (because it is pointing to the same zval
struct), and calling any class like constructs on $sd (like $sd->var1
), ends up making changes to the original object, and therefore to $sc.
// when we do $sd = $sc, both point to this zval structure
{
type: IS_OBJECT,
value: obj_id_1,
is_ref__gc: 0,
ref_count__gc: 2
}
Assigning $sd to another variable ($sd = 1001;
) simply employs Copy-on-write on the zval, and $sd starts pointing to a brand new zval, leaving $sc untouched.
// $sd = 1001 creates a new zval struct
// $sc points to the original zval
{
type: IS_LONG,
value: 1001,
is_ref__gc: 0,
ref_count__gc: 1
}
However, when we modify a variable which references $sc ($se = &$sc; $se= 1002;
), the $sc variable comes to contain a value different from the object id, and no longer points to the original object.
Passing by Reference
function foo(&$var) {
$var = 213;
}$var1 = 312;foo($var1);var_dump($var1);213
Passing by reference makes sure that the argument in the function is an alias of the variable being passed in function call. Modifying the alias would change the original variable passed to the function.
Return by reference
class BarWrap {
private $barTest = 123;function &bar() {
return $this->barTest;
}function printBarTest() {
echo $this->barTest;
}
}$bw = new BarWrap();$bt = &$bw->bar();$bt = 234;echo $bw->printBarTest();234
Returning by reference helps you return variables which can be modified by the caller as opposed to returning a copy of the variable.
Note that in returning by reference, the ampersand operator has to be used in the function definition as well as during function call.
I hope this was as entertaining for you as it was for me. Please feel free to drop comments below.
Some fun articles to read on the subject:
Highly recommended: The internals of PHP foreach loop, and the idiosyncracies that existed in PHP5 wrt foreach loop
Clean and simple: php manual explaining the reference-like behaviour of objects
More on the internal zval representation of variables in PHP:
Internals of PHP Arrays: