One of the nice new features of Perl5 is the ability to create references: a scalar that points to another Perl data object (e.g., a list or an associative array). Along with references comes the ability to create compound data types (lists of lists or arrays of lists, for example), which were difficult to create in Perl4. These new compound data objects have the typical properties of other Perl data structures - most importantly they automatically allocate storage for themselves, unlike C.
$scalar_ref = \$a_scalar;When you want to get to the value of the scalar, you just substitute the reference for the name of the variable:
$$scalar_ref = "some value"; print "$$scalar_ref\n";Note the double dollar signs. Perl uses the leftmost dollar sign to recognize what type of object we are talking about - in this case a scalar variable. With this information, Perl can appropriately dereference anything that might follow.
You can also create references to lists and associative arrays:
$list_ref = \@some_list; $hash_ref = \%the_hash;Again, the symbols surrounding the reference determine exactly how Perl will dereference and use the object. Here are a couple of examples using the list reference defined above:
@$list_ref = localtime(); $hour = $$list_ref[2];In the first case we are resetting the entire contents of the list pointed to by
$list_ref
. In the second we are
manipulating a single element. In the second case, Perl deduces the
context from both the dollar sign to the left of and the square
brackets following the reference.
The same idea applies to references to associative arrays, except the
special characters there are %
instead of @
and curly braces instead of square brackets:
%$hash_ref = ( "January" => 1, "February" => 2, ); $$hash_ref{"March"} = 3;Things get even more complicated when we start having compound data types (arrays of list references, etc.). Suppose we were going to store various time vectors in an associative array. First we create lists holding the values, and then we store references for those lists in the array:
@gmtime = gmtime(); @localtime = localtime(); $time{"greenwich"} = \@gmtime; $time{"localtime"} = \@localtime;Sometime later, we want to get the hours value out of the lists. You might be tempted to do:
# WRONG! WRONG! WRONG! $gmhour = $$time{"greenwich"}[2];but this does not work. There is a precedence problem - scalar variables get dereferenced BEFORE key lookups. Because the scalar
$time
is undefined in our example, you will never get the
value you want.
What you have to do is enclose compound references in curly braces:
# CORRECT $gmhour = ${$time{"greenwich"}}[2];The formal rule at work here is that you can replace a scalar reference with a Perl block - that is, an expression in curly braces. So the expression above is the moral equivalent of writing:
$list_ref = $time{"greenwich"}; $gmhour = $$list_ref[2];This nested curly brace syntax is extremely cumbersome, so you can use the following shortcut:
$gmhour = $hash{"greenwich"}->[2];C programmers should be familiar with the
->
operator,
which means "follow pointer"- same thing here. The lefthand side of
the ->
is an expression whose result is a reference, and
the right-hand side is an index in the object that reference points
to.
Because this is Perl, there is yet another way to do the same
thing. You can omit the ->
between list and array indexes
(i.e., things in square or curly brackets):
$gmhour = $hash{"greenwich"}[2];I generally prefer this last syntax, but your mileage may vary.
The ->
was made optional for these operations simply
because programmers commonly want to use multidimensional arrays and
lists, and it is more natural to write
$coord[$x][$y] = $z;than
$coord[$x]->[$y] = $z; ${$coord[$x]}[$y] = $z;which are equivalent, but ugly and cumbersome.
The easiest cases are where we want to create an anonymous list or associative array and a reference to the object:
$short_months = ["Sep", "Apr", "Jun", "Nov", "Feb"]; $mail_info = { "hal" => "hal@netmarket.com", "tina" => "tmd@iwi.com", "rob" => "kolstad@bsdi.com", };So, square brackets for anonymous lists and curlies for anonymous hashes, just like their index brackets. These examples are not very interesting, however, because we could have just explicitly declared a list,
@short_months
, or an array, %mail_info
.
Things get more interesting when we start declaring compound
objects. Here is an example of declaring an associative array that has
one value that is a list reference:
%hostinfo = ( "name" => "myhost", "domain" => "netmarket.com", "addrs" => ["199.79.247.20", "204.25.36.200"], "owner" => "Hal Pomeranz", );You would print the second address with:
print "$hostinfo{`addrs'}[1]\n";Yes, you can nest these kinds of declarations arbitrarily deeply:
@hosts = ( {"name" => "myhost", "domain" => "netmarket.com", "addrs" => ["199.79.247.20", "204.25.36.200"], "owner" => "Hal Pomeranz", }, {"name" => "thathost", "domain" => "netmarket.com", "addrs" => ["199.79.247.21"], "owner" => "Bob Smith", }, # etc, etc, etc, );Given the declaration above,
print "$hosts[1]{`addrs'}[0]\n";would print "199.79.247.21." Just to reiterate, you could also rewrite the above
print
statement either of the following ways:
print "$hosts[1]->{`addrs'}->[0]\n"; print "${${$hosts[1]}{`addrs'}}[0]\n";You can see now why I prefer the first syntax.
@gmtime = gmtime(); @localtime = localtime(); $time{"greenwich"} = \@gmtime; $time{"localtime"} = \@localtime;Rather than creating the
@gmtime
and
@localtime
arrays, we could
@$gm_vec_ref = gmtime(); @$loc_vec_ref = localtime(); $time{"greenwich"} = $gm_vec_ref; $time{"localtime"} = $loc_vec_ref;This is not very exciting. True, we got rid of those annoying backslashes, but who really cares? Remember one of the early rules we learned: you can put a block in place of a scalar reference. This means that we can get rid of the extra assignment statements altogether:
@{$hash{"localtime"}} = localtime(); @{$hash{"greenwich"}} = gmtime();We are just replacing
$gm_vec_ref
with the block
{$hash{"greenwich"}}
, and the same for the
localtime()
vector.
sub hello { print "Hello world!\n"; } $sub_ref = \&hello; &$sub_ref();Perl5 allows you to call your own subroutines without the
&
, but when you are dealing with references, Perl needs
the &
as a hint to tell it what type of data the
reference points to.
You can also create references to anonymous subroutines:
$sub_ref = sub { print "Hello World!\n"; }; &$sub_ref();Notice the trailing semicolon after the closing curly brace.
ref()
operator which tells you what kind
of object a given reference points to. So,
$array_ref = \%this_hash; print ref($hash_ref), "\n";prints
HASH
. Other values returned by ref()
include
SCALAR
, ARRAY
(for lists), and
CODE
(for subroutine references). ref($foo)
returns undef if $foo
is not a reference.
By the way, the following code:
$refname = "foo"; $$refname = "Surprise!"; print "$foo\n";prints
Surprise!
In other words, if you use a variable as
a reference and if the value of that variable is not a reference, then
Perl interprets the value of the variable as the name of an
identifier. You can really shoot yourself in the foot with this one.
In my last column I briefly mentioned the concept of "marshalling"
data: converting complex data objects to a format that can easily be
saved to disk and retrieved later. The idea is to create a function
marshall()
such that if we
$string = marshall($some_ref); eval("\$other_ref = $string");then the data structure pointed to by
$other_ref
will
have the same contents as the data structure pointed to by
$some_ref
. Remember that the data structure pointed to by
$some_ref
could be arbitrarily complex: a list of
associative arrays whose elements could be lists, arrays, and/or
scalars, for example.
Good luck with your coding. See you next time.
Reproduced from ;login: Vol. 21 No. 1, February 1996.
Back to Table of Contents
12/3/96ah