| Question: |
How do I sort a hash by the hash
value? |
|
Answer:
First, sorting a hash by the hash key
Sorting the output of a hash by the hash key
is a pretty well-known recipe. It's covered in another Q&A
article titled "How
to sort a hash by the hash key".
Sorting a hash by the hash value
Sorting a hash by the hash value is a bit more
difficult than sorting the hash by the key, but it's
not too bad. It just requires a small "helper" function.
This is easiest to demonstrate by example. Suppose we have a
class of five students. Rather than give them names, we'll
call them student1, student2, etc. Suppose these students just
took a test, and we stored their grades in a hash
(called associative arrays prior to the release of
Perl 5) named grades.
The hash definition might look like this:
%grades = (
student1 => 90,
student2 => 75,
student3 => 96,
student4 => 55,
student5 => 76,
);
If you're familiar with hashes, you know that the
student names are the keys, and the
test scores are the hash values.
The key to sorting a hash by value is the function
you create to help the sort command perform
it's function. Following the format defined by the
creators of Perl, you create a function I call a helper
function that tells Perl how to sort the list it's about
to receive. In the case of the program you're about to
see, I've created two helper functions named
hashValueDescendingNum (sort by hash value in
descending numeric order) and
hashValueAscendingNum (sort by hash value in
ascending numeric order).
Here's a program that prints the contents of
the grades hash, sorted numerically by
the hash value:
#!/usr/bin/perl -w
#----------------------------------------------------------------------#
# printHashByValue.pl #
# #
# Copyright 1998 DevDaily Interactive, Inc. All Rights Reserved. #
#----------------------------------------------------------------------#
#----------------------------------------------------------------------#
# FUNCTION: hashValueAscendingNum #
# #
# PURPOSE: Help sort a hash by the hash 'value', not the 'key'. #
# Values are returned in ascending numeric order (lowest #
# to highest). #
#----------------------------------------------------------------------#
sub hashValueAscendingNum {
$grades{$a} <=> $grades{$b};
}
#----------------------------------------------------------------------#
# FUNCTION: hashValueDescendingNum #
# #
# PURPOSE: Help sort a hash by the hash 'value', not the 'key'. #
# Values are returned in descending numeric order #
# (highest to lowest). #
#----------------------------------------------------------------------#
sub hashValueDescendingNum {
$grades{$b} <=> $grades{$a};
}
%grades = (
student1 => 90,
student2 => 75,
student3 => 96,
student4 => 55,
student5 => 76,
);
print "\n\tGRADES IN ASCENDING NUMERIC ORDER:\n";
foreach $key (sort hashValueAscendingNum (keys(%grades))) {
print "\t\t$grades{$key} \t\t $key\n";
}
print "\n\tGRADES IN DESCENDING NUMERIC ORDER:\n";
foreach $key (sort hashValueDescendingNum (keys(%grades))) {
print "\t\t$grades{$key} \t\t $key\n";
}
|
Perl Functions by Category
map BLOCK LIST
map EXPR,LIST
Evaluates the
BLOCK or
EXPR for each element of
LIST (locally setting $_ to each element) and returns the list value composed of the results of each such evaluation. Evaluates
BLOCK or
EXPR in a list context, so each element of
LIST may produce zero, one, or more elements in the returned value.
@chars = map(chr, @nums);
translates a list of numbers to the corresponding characters. And
%hash = map { getkey($_) => $_ } @array;
is just a funny way to write
%hash = ();
foreach $_ (@array) {
$hash{getkey($_)} = $_;
}
eval EXPR
eval BLOCK
EXPR is parsed and executed as if it were a little
Perl program. It is executed in the context of the current Perl program, so
that any variable settings or subroutine and format definitions remain
afterwards. The value returned is the value of the last expression
evaluated, or a return statement may be used, just as with subroutines. The
last expression is evaluated in scalar or array context, depending on the
context of the eval.
If there is a syntax error or runtime error, or a die()
statement is executed, an undefined value is returned by
eval(), and $@ is set to the error message. If there was no error, $@ is guaranteed to be a null string. If
EXPR is omitted, evaluates $_. The final semicolon, if any, may be omitted from the expression. Beware that using eval() neither silences perl from printing warnings to
STDERR, nor does it stuff the text of warning messages into
$@. To do either of those, you have to use the $SIG{__WARN__} facility. See warn() and the perlvar manpage.
Note that, because eval() traps otherwise-fatal errors, it is
useful for determining whether a particular feature (such as
socket() or symlink()) is implemented. It is also
Perl's exception trapping mechanism, where the die operator is used to
raise exceptions.
If the code to be executed doesn't vary, you may use the eval-BLOCK form to
trap run-time errors without incurring the penalty of recompiling each
time. The error, if any, is still returned in $@. Examples:
# make divide-by-zero nonfatal
eval { $answer = $a / $b; }; warn $@ if $@;
# same thing, but less efficient
eval '$answer = $a / $b'; warn $@ if $@;
# a compile-time error
eval { $answer = };
# a run-time error
eval '$answer ='; # sets $@
When using the eval{} form as an exception trap in libraries, you may wish
not to trigger any __DIE__ hooks that user code may have installed. You can use the local $SIG{__DIE__} construct for this purpose, as shown in this example:
# a very private exception trap for divide-by-zero
eval { local $SIG{'__DIE__'}; $answer = $a / $b; }; warn $@ if $@;
This is especially significant, given that __DIE__ hooks can call die() again, which has the effect of changing
their error messages:
# __DIE__ hooks may modify error messages
{
local $SIG{'__DIE__'} = sub { (my $x = $_[0]) =~ s/foo/bar/g; die $x };
eval { die "foo foofs here" };
print $@ if $@; # prints "bar barfs here"
}
With an eval(), you should be especially careful to remember
what's being looked at when:
eval $x; # CASE 1
eval "$x"; # CASE 2
eval '$x'; # CASE 3
eval { $x }; # CASE 4
eval "\$$x++" # CASE 5
$$x++; # CASE 6
Cases 1 and 2 above behave identically: they run the code contained in the
variable $x. (Although case 2 has misleading double quotes making the
reader wonder what else might be happening (nothing is).) Cases 3 and 4
likewise behave in the same way: they run the code '$x', which does nothing
but return the value of $x. (Case 4 is preferred for purely visual reasons, but it also has the
advantage of compiling at compile-time instead of at run-time.) Case 5 is a
place where normally you WOULD like to use double quotes, except that in this particular situation, you
can just use symbolic references instead, as in case 6.
wantarray
Returns
TRUE if the context of the currently executing subroutine is looking for a list value. Returns
FALSE if the context is looking for a scalar. Returns the undefined value if the context is looking for no value (void context).
return unless defined wantarray; # don't bother doing more
my @a = complex_calculation();
return wantarray ? @a : "@a";
Perl Predefined Names
The following names have special meaning to
perl.
I could have used alphabetic symbols for some of these, but I didn't want
to take the chance that someone would say reset "a-zA-Z" and wipe them all
out.
You'll just have to suffer along with these silly symbols.
Most of them have reasonable mnemonics, or analogues in one of the shells.
- $_
- The default input and pattern-searching space.
The following pairs are equivalent:
while (<>) {... # only equivalent in while!
while ($_ = <>) {...
/^Subject:/
$_ =~ /^Subject:/
y/a-z/A-Z/
$_ =~ y/a-z/A-Z/
chop
chop($_)
(Mnemonic: underline is understood in certain operations.)
- $.
- The current input line number of the last filehandle that was read.
Readonly.
Remember that only an explicit close on the filehandle resets the line number.
Since <> never does an explicit close, line numbers increase across ARGV files
(but see examples under eof).
(Mnemonic: many programs use . to mean the current line number.)
- $/
- The input record separator, newline by default.
Works like
awk's
RS variable, including treating blank lines as delimiters
if set to the null string.
You may set it to a multicharacter string to match a multi-character
delimiter.
Note that setting it to "\n\n" means something slightly different
than setting it to "", if the file contains consecutive blank lines.
Setting it to "" will treat two or more consecutive blank lines as a single
blank line.
Setting it to "\n\n" will blindly assume that the next input character
belongs to the next paragraph, even if it's a newline.
(Mnemonic: / is used to delimit line boundaries when quoting poetry.)
- $,
- The output field separator for the print operator.
Ordinarily the print operator simply prints out the comma separated fields
you specify.
In order to get behavior more like
awk,
set this variable as you would set
awk's
OFS variable to specify what is printed between fields.
(Mnemonic: what is printed when there is a , in your print statement.)
- $""
- This is like $, except that it applies to array values interpolated into
a double-quoted string (or similar interpreted string).
Default is a space.
(Mnemonic: obvious, I think.)
- $\
- The output record separator for the print operator.
Ordinarily the print operator simply prints out the comma separated fields
you specify, with no trailing newline or record separator assumed.
In order to get behavior more like
awk,
set this variable as you would set
awk's
ORS variable to specify what is printed at the end of the print.
(Mnemonic: you set $\ instead of adding \n at the end of the print.
Also, it's just like /, but it's what you get "back" from
perl.)
- $#
- The output format for printed numbers.
This variable is a half-hearted attempt to emulate
awk's
OFMT variable.
There are times, however, when
awk
and
perl
have differing notions of what
is in fact numeric.
Also, the initial value is %.20g rather than %.6g, so you need to set $#
explicitly to get
awk's
value.
(Mnemonic: # is the number sign.)
- $%
- The current page number of the currently selected output channel.
(Mnemonic: % is page number in nroff.)
- $=
- The current page length (printable lines) of the currently selected output
channel.
Default is 60.
(Mnemonic: = has horizontal lines.)
- $-
- The number of lines left on the page of the currently selected output channel.
(Mnemonic: lines_on_page - lines_printed.)
- $~
- The name of the current report format for the currently selected output
channel.
Default is name of the filehandle.
(Mnemonic: brother to $^.)
- $^
- The name of the current top-of-page format for the currently selected output
channel.
Default is name of the filehandle with "_TOP" appended.
(Mnemonic: points to top of page.)
- $|
- If set to nonzero, forces a flush after every write or print on the currently
selected output channel.
Default is 0.
Note that
STDOUT
will typically be line buffered if output is to the
terminal and block buffered otherwise.
Setting this variable is useful primarily when you are outputting to a pipe,
such as when you are running a
perl
script under rsh and want to see the
output as it's happening.
(Mnemonic: when you want your pipes to be piping hot.)
- $$
- The process number of the
perl
running this script.
(Mnemonic: same as shells.)
- $?
- The status returned by the last pipe close, backtick (\`\`) command or
system
operator.
Note that this is the status word returned by the wait() system
call, so the exit value of the subprocess is actually ($? >> 8).
$? & 255 gives which signal, if any, the process died from, and whether
there was a core dump.
(Mnemonic: similar to sh and ksh.)
- $&
- The string matched by the last successful pattern match
(not counting any matches hidden
within a BLOCK or eval enclosed by the current BLOCK).
(Mnemonic: like & in some editors.)
- $\`
- The string preceding whatever was matched by the last successful pattern match
(not counting any matches hidden within a BLOCK or eval enclosed by the current
BLOCK).
(Mnemonic: \` often precedes a quoted string.)
- $'
- The string following whatever was matched by the last successful pattern match
(not counting any matches hidden within a BLOCK or eval enclosed by the current
BLOCK).
(Mnemonic: ' often follows a quoted string.)
Example:
$_ = 'abcdefghi';
/def/;
print "$\`:$&:$'\n"; # prints abc:def:ghi
- $+
- The last bracket matched by the last search pattern.
This is useful if you don't know which of a set of alternative patterns
matched.
For example:
/Version: (.*)|Revision: (.*)/ && ($rev = $+);
(Mnemonic: be positive and forward looking.)
- $*
- Set to 1 to do multiline matching within a string, 0 to tell
perl
that it can assume that strings contain a single line, for the purpose
of optimizing pattern matches.
Pattern matches on strings containing multiple newlines can produce confusing
results when $* is 0.
Default is 0.
(Mnemonic: * matches multiple things.)
Note that this variable only influences the interpretation of ^ and $.
A literal newline can be searched for even when $* == 0.
- $0
- Contains the name of the file containing the
perl
script being executed.
Assigning to $0 modifies the argument area that the ps(1) program sees.
(Mnemonic: same as sh and ksh.)
- $<digit>
- Contains the subpattern from the corresponding set of parentheses in the last
pattern matched, not counting patterns matched in nested blocks that have
been exited already.
(Mnemonic: like \digit.)
- $[
- The index of the first element in an array, and of the first character in
a substring.
Default is 0, but you could set it to 1 to make
perl
behave more like
awk
(or Fortran)
when subscripting and when evaluating the index() and substr() functions.
(Mnemonic: [ begins subscripts.)
- $]
- The string printed out when you say "perl -v".
It can be used to determine at the beginning of a script whether the perl
interpreter executing the script is in the right range of versions.
If used in a numeric context, returns the version + patchlevel / 1000.
Example:
# see if getc is available
($version,$patchlevel) =
$] =~ /(\d+\.\d+).*\nPatch level: (\d+)/;
print STDERR "(No filename completion available.)\n"
if $version * 1000 + $patchlevel < 2016;
or, used numerically,
warn "No checksumming!\n" if $] < 3.019;
(Mnemonic: Is this version of perl in the right bracket?)
- $;
- The subscript separator for multi-dimensional array emulation.
If you refer to an associative array element as
$foo{$a,$b,$c}
it really means
$foo{join($;, $a, $b, $c)}
But don't put
@foo{$a,$b,$c} # a slice--note the @
which means
($foo{$a},$foo{$b},$foo{$c})
Default is "\034", the same as SUBSEP in
awk.
Note that if your keys contain binary data there might not be any safe
value for $;.
(Mnemonic: comma (the syntactic subscript separator) is a semi-semicolon.
Yeah, I know, it's pretty lame, but $, is already taken for something more
important.)
- $!
- If used in a numeric context, yields the current value of errno, with all the
usual caveats.
(This means that you shouldn't depend on the value of $! to be anything
in particular unless you've gotten a specific error return indicating a
system error.)
If used in a string context, yields the corresponding system error string.
You can assign to $! in order to set errno
if, for instance, you want $! to return the string for error n, or you want
to set the exit value for the die operator.
(Mnemonic: What just went bang?)
- $@
- The perl syntax error message from the last eval command.
If null, the last eval parsed and executed correctly (although the operations
you invoked may have failed in the normal fashion).
(Mnemonic: Where was the syntax error "at"?)
- $<
- The real uid of this process.
(Mnemonic: it's the uid you came FROM, if you're running setuid.)
- $>
- The effective uid of this process.
Example:
$< = $>; # set real uid to the effective uid
($<,$>) = ($>,$<); # swap real and effective uid
(Mnemonic: it's the uid you went TO, if you're running setuid.)
Note: $< and $> can only be swapped on machines supporting setreuid().
- $(
- The real gid of this process.
If you are on a machine that supports membership in multiple groups
simultaneously, gives a space separated list of groups you are in.
The first number is the one returned by getgid(), and the subsequent ones
by getgroups(), one of which may be the same as the first number.
(Mnemonic: parentheses are used to GROUP things.
The real gid is the group you LEFT, if you're running setgid.)
- $)
- The effective gid of this process.
If you are on a machine that supports membership in multiple groups
simultaneously, gives a space separated list of groups you are in.
The first number is the one returned by getegid(), and the subsequent ones
by getgroups(), one of which may be the same as the first number.
(Mnemonic: parentheses are used to GROUP things.
The effective gid is the group that's RIGHT for you, if you're running setgid.)
Note: $<, $>, $( and $) can only be set on machines that support the
corresponding set[re][ug]id() routine.
$( and $) can only be swapped on machines supporting setregid().
- $:
- The current set of characters after which a string may be broken to
fill continuation fields (starting with ^) in a format.
Default is "\ \n-", to break on whitespace or hyphens.
(Mnemonic: a "colon" in poetry is a part of a line.)
- $^D
- The current value of the debugging flags.
(Mnemonic: value of
-D
switch.)
- $^F
- The maximum system file descriptor, ordinarily 2. System file descriptors
are passed to subprocesses, while higher file descriptors are not.
During an open, system file descriptors are preserved even if the open
fails. Ordinary file descriptors are closed before the open is attempted.
- $^I
- The current value of the inplace-edit extension.
Use undef to disable inplace editing.
(Mnemonic: value of
-i
switch.)
- $^L
- What formats output to perform a formfeed. Default is \f.
- $^P
- The internal flag that the debugger clears so that it doesn't
debug itself. You could conceivable disable debugging yourself
by clearing it.
- $^T
- The time at which the script began running, in seconds since the epoch.
The values returned by the
-M ,
-A
and
-C
filetests are based on this value.
- $^W
- The current value of the warning switch.
(Mnemonic: related to the
-w
switch.)
- $^X
- The name that Perl itself was executed as, from argv[0].
- $ARGV
- contains the name of the current file when reading from <>.
- @ARGV
- The array ARGV contains the command line arguments intended for the script.
Note that $#ARGV is the generally number of arguments minus one, since
$ARGV[0] is the first argument, NOT the command name.
See $0 for the command name.
- @INC
- The array INC contains the list of places to look for
perl
scripts to be
evaluated by the "do EXPR" command or the "require" command.
It initially consists of the arguments to any
-I
command line switches, followed
by the default
perl
library, probably "/usr/local/lib/perl",
followed by ".", to represent the current directory.
- %INC
- The associative array INC contains entries for each filename that has
been included via "do" or "require".
The key is the filename you specified, and the value is the location of
the file actually found.
The "require" command uses this array to determine whether
a given file has already been included.
- $ENV{expr}
- The associative array ENV contains your current environment.
Setting a value in ENV changes the environment for child processes.
- $SIG{expr}
- The associative array SIG is used to set signal handlers for various signals.
Example:
sub handler { # 1st argument is signal name
local($sig) = @_;
print "Caught a SIG$sig--shutting down\n";
close(LOG);
exit(0);
}
$SIG{'INT'} = 'handler';
$SIG{'QUIT'} = 'handler';
...
$SIG{'INT'} = 'DEFAULT'; # restore default action
$SIG{'QUIT'} = 'IGNORE'; # ignore SIGQUIT
The SIG array only contains values for the signals actually set within
the perl script.
grep BLOCK LIST
grep EXPR,LIST
This is similar in spirit to, but not the same as, grep(1) and
its relatives. In particular, it is not limited to using regular
expressions.
Evaluates the
BLOCK or
EXPR for each element of
LIST (locally setting $_ to each element) and returns the list value consisting of those elements for which the expression evaluated to
TRUE. In a scalar context, returns the number of times the expression was
TRUE.
@foo = grep(!/^#/, @bar); # weed out comments
or equivalently,
@foo = grep {!/^#/} @bar; # weed out comments
Note that, because $_ is a reference into the list value, it can be used to modify the elements of the array. While this is useful and supported, it can cause bizarre results if the
LIST is not a named array. Similarly, grep returns aliases into the original list, much like the way that
Foreach Loops's index variable aliases the list elements. That is, modifying an element
of a list returned by grep actually modifies the element in the original
list.
|
|