awk
Last edited June 7, 2008
More by Ian Lewis »
AWK - Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/Awk

Structure of AWK programs

An AWK program is a series of pattern action pairs, written as:

pattern { action }

where pattern is typically an expression and action is a series of commands. Each line of input is tested against all the patterns in turn and the action executed if the expression is true. Either the pattern or the action may be omitted. The pattern defaults to matching every line of input. The default action is to print the line of input.

In addition to a simple AWK expression, the pattern can be BEGIN or END causing the action to be executed before or after all lines of input have been read, or pattern1, pattern2 which matches the range of lines of input starting with a line that matches pattern1 up to and including the line that matches pattern2 before again trying to match against pattern1 on future lines.

In addition to normal arithmetic and logical operators, AWK expressions include the tilde operator, ~, which matches a regular expression against a string. As a handy default, /regexp/ without using the tilde operator matches against the current line of input.

AWK - Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/Awk

AWK commands

AWK commands are the statement that is substituted for action in the examples above. AWK commands can include function calls, variable assignments, calculations, or any combination thereof. AWK contains built-in support for many functions; many more are provided by the various flavors of AWK. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.

For brevity, the enclosing curly braces ( { } ) will be omitted from these examples.

[edit] The print command

The print command is used to output text. The output text is always terminated with a predefined string called the output record separator (ORS) whose default value is a newline. The simplest form of this command is:

print

This displays the contents of the current line. In AWK, lines are broken down into fields, and these can be displayed separately:

print $1
Displays the first field of the current line
print $1, $3
Displays the first and third fields of the current line, separated by a predefined string called the output field separator (OFS) whose default value is a single space character

Although these fields ($X) may bear resemblance to variables (the $ symbol indicates variables in perl), they actually refer to the fields of the current line. A special case, $0, refers to the entire line. In fact, the commands "print" and "print $0" are identical in functionality.

The print command can also display the results of calculations and/or function calls:

print 3+2
print foobar(3)
print foobar(variable)
print sin(3-2)

Output may be sent to a file:

print "expression" > "file name"

or through a pipe:

print "expression" | "command"

[edit] Variables and Syntax

Variable names can use any of the characters [A-Za-z0-9_], with the exception of language keywords. The operators + - * / represent addition, subtraction, multiplication, and division, respectively. For string concatenation, simply place two variables (or string constants) next to each other. It is optional to use a space in between if string constants are involved. But you can't place two variable names adjacent to each other without having a space in between. String constants are delimited by double quotes. Statements need not end with semicolons. Finally, comments can be added to programs by using # as the first character on a line.

[edit] User-defined functions

In a format similar to C, function definitions consist of the keyword function, the function name, argument names and the function body. Here is an example of a function.

function add_three (number, temp) {
  temp = number + 3
  return temp
}

This statement can be invoked as follows:

print add_three(36)     # Outputs 39

Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some whitespace in the argument list before the local variables, in order to indicate where the parameters end and the local variables begin.

AWK - Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/Awk

Sample applications

[edit] Hello World

Here is the ubiquitous "Hello world program" program written in AWK:

BEGIN { print "Hello, world!" }

Note that you do not need an explicit exit statement, as if the only pattern is BEGIN, no command-line arguments are processed.

[edit] Print lines longer than 80 characters

Print all lines longer than 80 characters. Note that the default action is to print the current line.

length > 80 

The AWK Programming Language now specifies an explicit $0 in the length function:

length($0) > 80

[edit] Print a count of words

Count words in the input, and print lines, words, and characters (like wc)

{
    w += NF
    c += length + 1
}
END { print NR, w, c }

As there is no pattern for the first line of the program, every line of input matches by default so the increment actions are executed for every line. Note that w += NF is shorthand for w = w + NF.

[edit] Sum last word

{ s += $NF }
END { print s + 0 }

s is incremented by the numeric value of $NF which is the last word on the line as defined by AWK's field separator, by default white-space. NF is the number of fields in the current line, e.g. 4. Since $4 is the value of the fourth field, $NF is the value of the last field in the line regardless of how many fields this line has, or whether it has more or fewer fields than surrounding lines. $ is actually a unary operator with the highest operator precedence. (If the line has no fields then NF is 0, $0 is the whole line, which in this case is empty apart from possible white-space, and so has the numeric value 0.)

At the end of the input the END pattern matches so s is printed. However, since there may have been no lines of input at all, in which case no value has ever been assigned to s, it will by default be an empty string. Adding zero to a variable is an AWK idiom for coercing it from a string to a numeric value. (Concatenating an empty string is to coerce from a number to a string, e.g. s "". Note, there's no operator to concatenate strings, they're just placed adjacently.) With the coercion the program prints 0 on an empty input, without it an empty line is printed.

[edit] Match a range of input lines

$ yes Wikipedia | awk 'NR % 4 == 1, NR % 4 == 3 { printf "%6d  %s\n", NR, $0 }' | sed 7q
     1  Wikipedia
     2  Wikipedia
     3  Wikipedia
     5  Wikipedia
     6  Wikipedia
     7  Wikipedia
     9  Wikipedia
$

The yes command repeatedly prints the letter "y" on a line. In this case, we tell the command to print the word "Wikipedia". The action statement prints each line numbered. The printf function emulates the standard C printf, and works similarly to the print command described above. The pattern to match, however, works as follows: NR is the number of records, typically lines of input, AWK has so far read, i.e. the current line number, starting at 1 for the first line of input. % is the modulo operator. NR % 4 == 1 is true for the first, fifth, ninth, etc., lines of input. Likewise, NR % 4 == 3 is true for the third, seventh, eleventh, etc., lines of input. The range pattern is false until the first part matches, on line 1, and then remains true up to and including when the second part matches, on line 3. It then stays false until the first part matches again on line 5. The sed command is used to print the first 7 lines, to prevent yes running forever. It is equivalent to head -7 if the head command is available.

The first part of a range pattern being constantly true, e.g. 1, can be used to start the range at the beginning of input. Similarly, if the second part is constantly false, e.g. 0, the range continues until the end of input:

/^--cut here--$/, 0

prints lines of input from the first line matching the regular expression ^--cut here--$, that is, a line containing only the phrase "---cut here---", to the end.

[edit] Calculate word frequencies

Word frequency, uses associative arrays:

BEGIN { FS="[^a-zA-Z]+" }

{ for (i=1; i<=NF; i++)
     words[tolower($i)]++
}

END { for (i in words)
    print i, words[i]
}

The BEGIN block sets the field separator to any sequence of non-alphabetic characters. Note that separators can be regular expressions. After that, we get to a bare action, which performs the action on every input line. In this case, for every field on the line, we add one to the number of times that word, first converted to lowercase, appears. Finally, in the END block, we print the words with their frequencies. The line

for (i in words)

creates a loop that goes through the array words, setting i to each subscript of the array. This is different from most languages, where such a loop goes through each value in the array. This means that you print the word with each count in a simple way. tolower was an addition to the One True awk (see below) made after the book was published.

[edit] Match pattern from command line

This program can be represented in several ways. The first one uses the Bourne shell to make a shell script that does everything. It is the shortest of these methods:

$ cat grepinawk
pattern=$1
shift
awk '/'$pattern'/ { print FILENAME ":" $0 }' $*
$

The $pattern in the awk command is not protected by quotes. A pattern by itself in the usual way checks to see if the whole line ($0) matches. FILENAME contains the current filename. awk has no explicit concatenation operator; two adjacent strings concatenate them. $0 expands to the original unchanged input line.

There are alternate ways of writing this. This shell script accesses the environment directly from within awk:

$ cat grepinawk
pattern=$1
shift
awk '$0 ~ ENVIRON["pattern"] { print FILENAME ":" $0 }' $*
$

This is a shell script that uses ENVIRON, an array introduced in a newer version of the One True awk after the book was published. The subscript of ENVIRON is the name of an environment variable; its result is the variable's value. This is like the getenv function in various standard libraries and POSIX. The shell script makes an environment variable pattern containing the first argument, then drops that argument and has awk look for the pattern in each file.

~ checks to see if its left operand matches its right operand; !~ is its inverse. Note that a regular expression is just a string and can be stored in variables.

The next way uses command-line variable assignment, in which an argument to awk can be seen as an assignment to a variable:

$ cat grepinawk
pattern=$1
shift
awk '$0 ~ pattern { print FILENAME ":" $0 }' "pattern=$pattern" $*
$

Finally, this is written in pure awk, without help from a shell or without the need to know too much about the implementation of the awk script (as the variable assignment on command line one does), but is a bit lengthy:

BEGIN {
    pattern = ARGV[1]
    for (i = 1; i < ARGC; i++) # remove first argument
        ARGV[i] = ARGV[i + 1]
    ARGC--
    if (ARGC == 1) { # the pattern was the only thing, so force read from standard input (used by book)
        ARGC = 2
        ARGV[1] = "-"
    }
}
$0 ~ pattern { print FILENAME ":" $0 }

The BEGIN is necessary not only to extract the first argument, but also to prevent it from being interpreted as a filename after the BEGIN block ends. ARGC, the number of arguments, is always guaranteed to be ≥1, as ARGV[0] is the name of the command that executed the script, most often the string "awk". Also note that ARGV[ARGC] is the empty string, "". # initiates a comment that expands to the end of the line.

Note the if block. awk only checks to see if it should read from standard input before it runs the command. This means that

awk 'prog'

only works because the fact that there are no filenames is only checked before prog is run! If you explicitly set ARGC to 1 so that there are no arguments, awk will simply quit because it feels there are no more input files. Therefore, you need to explicitly say to read from standard input with the special filename -.

AWK - Wikipedia, the free encyclopedia
en.wikipedia.org/wiki/Awk

Self-contained AWK scripts

As with many other programming languages, self-contained AWK script can be constructed using the so-called "shebang" syntax.

For example, a UNIX command called hello.awk that prints the string "Hello, world!" may be built by creating a file named hello.awk containing the following lines:

#!/usr/bin/awk -f
BEGIN { print "Hello, world!" }

The -f tells awk that the argument that follows is the file to read the awk program from, which is placed there by the shell when running.

Variables and Special Variables

Variables can be used in an awk program by referencing them. With the exception of function parameters (see User-Defined Functions), they are not explicitly declared. Function parameter names shall be local to the function; all other variable names shall be global. The same name shall not be used as both a function parameter name and as the name of a function or a special awk variable. The same name shall not be used both as a variable name with global scope and as the name of a function. The same name shall not be used within the same scope both as a scalar variable and as an array. Uninitialized variables, including scalar variables, array elements, and field variables, shall have an uninitialized value. An uninitialized value shall have both a numeric value of zero and a string value of the empty string. Evaluation of variables with an uninitialized value, to either string or numeric, shall be determined by the context in which they are used.

Field variables shall be designated by a '$' followed by a number or numerical expression. The effect of the field number expression evaluating to anything other than a non-negative integer is unspecified; uninitialized variables or string values need not be converted to numeric values in this context. New field variables can be created by assigning a value to them. References to nonexistent fields (that is, fields after $NF), shall evaluate to the uninitialized value. Such references shall not create new fields. However, assigning to a nonexistent field (for example, $(NF+2)=5) shall increase the value of NF; create any intervening fields with the uninitialized value; and cause the value of $0 to be recomputed, with the fields being separated by the value of OFS. Each field variable shall have a string value or an uninitialized value when created. Field variables shall have the uninitialized value when created from $0 using FS and the variable does not contain any characters. If appropriate, the field variable shall be considered a numeric string (see Expressions in awk).

Implementations shall support the following other special variables that are set by awk:

ARGC
The number of elements in the ARGV array.
ARGV
An array of command line arguments, excluding options and the program argument, numbered from zero to ARGC-1.

The arguments in ARGV can be modified or added to; ARGC can be altered. As each input file ends, awk shall treat the next non-null element of ARGV, up to the current value of ARGC-1, inclusive, as the name of the next input file. Thus, setting an element of ARGV to null means that it shall not be treated as an input file. The name '-' indicates the standard input. If an argument matches the format of an assignment operand, this argument shall be treated as an assignment rather than a file argument.

CONVFMT
The printf format for converting numbers to strings (except for output statements, where OFMT is used); "%.6g" by default.
ENVIRON
An array representing the value of the environment, as described in the exec functions defined in the System Interfaces volume of IEEE Std 1003.1-2001. The indices of the array shall be strings consisting of the names of the environment variables, and the value of each array element shall be a string consisting of the value of that variable. If appropriate, the environment variable shall be considered a numeric string (see Expressions in awk); the array element shall also have its numeric value.

In all cases where the behavior of awk is affected by environment variables (including the environment of any commands that awk executes via the system function or via pipeline redirections with the print statement, the printf statement, or the getline function), the environment used shall be the environment at the time awk began executing; it is implementation-defined whether any modification of ENVIRON affects this environment.

FILENAME
A pathname of the current input file. Inside a BEGIN action the value is undefined. Inside an END action the value shall be the name of the last input file processed.
FNR
The ordinal number of the current record in the current file. Inside a BEGIN action the value shall be zero. Inside an END action the value shall be the number of the last record processed in the last file processed.
FS
Input field separator regular expression; a <space> by default.
NF
The number of fields in the current record. Inside a BEGIN action, the use of NF is undefined unless a getline function without a var argument is executed previously. Inside an END action, NF shall retain the value it had for the last record read, unless a subsequent, redirected, getline function without a var argument is performed prior to entering the END action.
NR
The ordinal number of the current record from the start of input. Inside a BEGIN action the value shall be zero. Inside an END action the value shall be the number of the last record processed.
OFMT
The printf format for converting numbers to strings in output statements (see Output Statements); "%.6g" by default. The result of the conversion is unspecified if the value of OFMT is not a floating-point format specification.
OFS
The print statement output field separation; <space> by default.
ORS
The print statement output record separator; a <newline> by default.
RLENGTH
The length of the string matched by the match function.
RS
The first character of the string value of RS shall be the input record separator; a <newline> by default. If RS contains more than one character, the results are unspecified. If RS is null, then records are separated by sequences consisting of a <newline> plus one or more blank lines, leading or trailing blank lines shall not result in empty records at the beginning or end of the input, and a <newline> shall always be a field separator, no matter what the value of FS is.
RSTART
The starting position of the string matched by the match function, numbering from 1. This shall always be equivalent to the return value of the match function.
SUBSEP
The subscript separator string for multi-dimensional arrays; the default value is implementation-defined.
Regular Expressions

The awk utility shall make use of the extended regular expression notation (see the Base Definitions volume of IEEE Std 1003.1-2001, Section 9.4, Extended Regular Expressions) except that it shall allow the use of C-language conventions for escaping special characters within the EREs, as specified in the table in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Notation ( '\\', '\a', '\b', '\f', '\n', '\r', '\t' , '\v' ) and the following table; these escape sequences shall be recognized both inside and outside bracket expressions. Note that records need not be separated by <newline>s and string constants can contain <newline>s, so even the "\n" sequence is valid in awk EREs. Using a slash character within an ERE requires the escaping shown in the following table.

Table: Escape Sequences in awk

Escape

 

 

Sequence

Description

Meaning

\"

Backslash quotation-mark

Quotation-mark character

\/

Backslash slash

Slash character

\ddd

A backslash character followed by the longest sequence of one, two, or three octal-digit characters (01234567). If all of the digits are 0 (that is, representation of the NUL character), the behavior is undefined.

The character whose encoding is represented by the one, two, or three-digit octal integer. Multi-byte characters require multiple, concatenated escape sequences of this type, including the leading '\' for each byte.

\c

A backslash character followed by any character not described in this table or in the table in the Base Definitions volume of IEEE Std 1003.1-2001, Chapter 5, File Format Notation ( '\\' , '\a', '\b', '\f', '\n', '\r', '\t', '\v' ).

Undefined

A regular expression can be matched against a specific field or string by using one of the two regular expression matching operators, '˜' and "!˜". These operators shall interpret their right-hand operand as a regular expression and their left-hand operand as a string. If the regular expression matches the string, the '˜' expression shall evaluate to a value of 1, and the "!˜" expression shall evaluate to a value of 0. (The regular expression matching operation is as defined by the term matched in the Base Definitions volume of IEEE Std 1003.1-2001, Section 9.1, Regular Expression Definitions, where a match occurs on any part of the string unless the regular expression is limited with the circumflex or dollar sign special characters.) If the regular expression does not match the string, the '˜' expression shall evaluate to a value of 0, and the "!˜" expression shall evaluate to a value of 1. If the right-hand operand is any expression other than the lexical token ERE, the string value of the expression shall be interpreted as an extended regular expression, including the escape conventions described above. Note that these same escape conventions shall also be applied in determining the value of a string literal (the lexical token STRING), and thus shall be applied a second time when a string literal is used in this context.

When an ERE token appears as an expression in any context other than as the right-hand of the '˜' or "!˜" operator or as one of the built-in function arguments described below, the value of the resulting expression shall be the equivalent of:

$0 ˜ /ere/

The ere argument to the gsub, match, sub functions, and the fs argument to the split function (see String Functions) shall be interpreted as extended regular expressions. These can be either ERE tokens or arbitrary expressions, and shall be interpreted in the same manner as the right-hand side of the '˜' or "!˜" operator.

An extended regular expression can be used to separate fields by using the -F ERE option or by assigning a string containing the expression to the built-in variable FS. The default value of the FS variable shall be a single <space>. The following describes FS behavior:

  1. If FS is a null string, the behavior is unspecified.

  2. If FS is a single character:

    1. If FS is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or more <blank>s.

    2. Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c.

  3. Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields.

Except for the '˜' and "!˜" operators, and in the gsub, match, split, and sub built-in functions, ERE matching shall be based on input records; that is, record separator characters (the first character of the value of the variable RS, <newline> by default) cannot be embedded in the expression, and no expression shall match the record separator character. If the record separator is not <newline>, <newline>s embedded in the expression can be matched. For the '˜' and "!˜" operators, and in those four built-in functions, ERE matching shall be based on text strings; that is, any character (including <newline> and the record separator) can be embedded in the pattern, and an appropriate pattern shall match any character. However, in all awk ERE matching, the use of one or more NUL characters in the pattern, input record, or text string produces undefined results.

Patterns

A pattern is any valid expression, a range specified by two expressions separated by a comma, or one of the two special patterns BEGIN or END.

Special Patterns

The awk utility shall recognize two special patterns, BEGIN and END. Each BEGIN pattern shall be matched once and its associated action executed before the first record of input is read (except possibly by use of the getline function-see Input/Output and General Functions - in a prior BEGIN action) and before command line assignment is done. Each END pattern shall be matched once and its associated action executed after the last record of input has been read. These two patterns shall have associated actions.

BEGIN and END shall not combine with other patterns. Multiple BEGIN and END patterns shall be allowed. The actions associated with the BEGIN patterns shall be executed in the order specified in the program, as are the END actions. An END pattern can precede a BEGIN pattern in a program.

If an awk program consists of only actions with the pattern BEGIN, and the BEGIN action contains no getline function, awk shall exit without reading its input when the last statement in the last BEGIN action is executed. If an awk program consists of only actions with the pattern END or only actions with the patterns BEGIN and END, the input shall be read before the statements in the END actions are executed.

Expression Patterns

An expression pattern shall be evaluated as if it were an expression in a Boolean context. If the result is true, the pattern shall be considered to match, and the associated action (if any) shall be executed. If the result is false, the action shall not be executed.

Pattern Ranges

A pattern range consists of two expressions separated by a comma; in this case, the action shall be performed for all records between a match of the first expression and the following match of the second expression, inclusive. At this point, the pattern range can be repeated starting at input records subsequent to the end of the matched range.

Actions

An action is a sequence of statements as shown in the grammar in Grammar. Any single statement can be replaced by a statement list enclosed in braces. The application shall ensure that statements in a statement list are separated by <newline>s or semicolons. Statements in a statement list shall be executed sequentially in the order that they appear.

The expression acting as the conditional in an if statement shall be evaluated and if it is non-zero or non-null, the following statement shall be executed; otherwise, if else is present, the statement following the else shall be executed.

The if, while, do... while, for, break, and continue statements are based on the ISO C standard (see Concepts Derived from the ISO C Standard), except that the Boolean expressions shall be treated as described in Expressions in awk , and except in the case of:

for (variable in array)

which shall iterate, assigning each index of array to variable in an unspecified order. The results of adding new elements to array within such a for loop are undefined. If a break or continue statement occurs outside of a loop, the behavior is undefined.

The delete statement shall remove an individual array element. Thus, the following code deletes an entire array:

for (index in array)
    delete array[index]

The next statement shall cause all further processing of the current input record to be abandoned. The behavior is undefined if a next statement appears or is invoked in a BEGIN or END action.

The exit statement shall invoke all END actions in the order in which they occur in the program source and then terminate the program without reading further input. An exit statement inside an END action shall terminate the program without further execution of END actions. If an expression is specified in an exit statement, its numeric value shall be the exit status of awk, unless subsequent errors are encountered or a subsequent exit statement with an expression is executed.

String Functions

The string functions in the following list shall be supported. Although the grammar (see Grammar ) permits built-in functions to appear with no arguments or parentheses, unless the argument or parentheses are indicated as optional in the following list (by displaying them within the "[]" brackets), such use is undefined.

gsub(ererepl[in])
Behave like sub (see below), except that it shall replace all occurrences of the regular expression (like the ed utility global substitute) in $0 or in the in argument, when specified.
index(st)
Return the position, in characters, numbering from 1, in string s where string t first occurs, or zero if it does not occur at all.
length[([s])]
Return the length, in characters, of its argument taken as a string, or of the whole record, $0, if there is no argument.
match(sere)
Return the position, in characters, numbering from 1, in string s where the extended regular expression ere occurs, or zero if it does not occur at all. RSTART shall be set to the starting position (which is the same as the returned value), zero if no match is found; RLENGTH shall be set to the length of the matched string, -1 if no match is found.
split(sa[fs  ])
Split the string s into array elements a[1], a[2], ..., a[n], and return n. All elements of the array shall be deleted before the split is performed. The separation shall be done with the ERE fs or with the field separator FS if fs is not given. Each array element shall have a string value when created and, if appropriate, the array element shall be considered a numeric string (see Expressions in awk). The effect of a null string as the value of fs is unspecified.
sprintf(fmtexprexpr, ...)
Format the expressions according to the printf format given by fmt and return the resulting string.
sub(ererepl[in  ])
Substitute the string repl in place of the first instance of the extended regular expression ERE in string in and return the number of substitutions. An ampersand ( '&' ) appearing in the string repl shall be replaced by the string from in that matches the ERE. An ampersand preceded with a backslash ( '\' ) shall be interpreted as the literal ampersand character. An occurrence of two consecutive backslashes shall be interpreted as just a single literal backslash character. Any other occurrence of a backslash (for example, preceding any other character) shall be treated as a literal backslash character. Note that if repl is a string literal (the lexical token STRING; see Grammar), the handling of the ampersand character occurs after any lexical processing, including any lexical backslash escape sequence processing. If in is specified and it is not an lvalue (see Expressions in awk), the behavior is undefined. If in is omitted, awk shall use the current record ($0) in its place.
substr(sm[ ])
Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, or if n specifies more characters than are left in the string, the length of the substring shall be limited by the length of the string s.
tolower(s)
Return a string based on the string s. Each character in s that is an uppercase letter specified to have a tolower mapping by the LC_CTYPE category of the current locale shall be replaced in the returned string by the lowercase letter specified by the mapping. Other characters in s shall be unchanged in the returned string.
toupper(s)
Return a string based on the string s. Each character in s that is a lowercase letter specified to have a toupper mapping by the LC_CTYPE category of the current locale is replaced in the returned string by the uppercase letter specified by the mapping. Other characters in s are unchanged in the returned string.

All of the preceding functions that take ERE as a parameter expect a pattern or a string valued expression that is a regular expression as defined in Regular Expressions.

Arithmetic Functions

The arithmetic functions, except for int, shall be based on the ISO C standard (see Concepts Derived from the ISO C Standard). The behavior is undefined in cases where the ISO C standard specifies that an error be returned or that the behavior is undefined. Although the grammar (see Grammar) permits built-in functions to appear with no arguments or parentheses, unless the argument or parentheses are indicated as optional in the following list (by displaying them within the "[]" brackets), such use is undefined.

atan2(y,x)
Return arctangent of y/x in radians in the range [-,].
cos(x)
Return cosine of x, where x is in radians.
sin(x)
Return sine of x, where x is in radians.
exp(x)
Return the exponential function of x.
log(x)
Return the natural logarithm of x.
sqrt(x)
Return the square root of x.
int(x)
Return the argument truncated to an integer. Truncation shall be toward 0 when x>0.
rand()
Return a random number n, such that 0<=n<1.
srand([expr])
Set the seed value for rand to expr or use the time of day if expr is omitted. The previous seed value shall be returned.
Input/Output and General Functions

The input/output and general functions are:

close(expression)
Close the file or pipe opened by a print or printf statement or a call to getline with the same string-valued expression. The limit on the number of open expression arguments is implementation-defined. If the close was successful, the function shall return zero; otherwise, it shall return non-zero.
expression |  getline [var]
Read a record of input from a stream piped from the output of a command. The stream shall be created if no stream is currently open with the value of expression as its command name. The stream created shall be equivalent to one created by a call to the popen() function with the value of expression as the command argument and a value of r as the mode argument. As long as the stream remains open, subsequent calls in which expression evaluates to the same string value shall read subsequent records from the stream. The stream shall remain open until the close function is called with an expression that evaluates to the same string value. At that time, the stream shall be closed as if by a call to the pclose() function. If var is omitted, $0 and NF shall be set; otherwise, var shall be set and, if appropriate, it shall be considered a numeric string (see Expressions in awk).

The getline operator can form ambiguous constructs when there are unparenthesized operators (including concatenate) to the left of the '|' (to the beginning of the expression containing getline). In the context of the '$' operator, '|' shall behave as if it had a lower precedence than '$'. The result of evaluating other operators is unspecified, and conforming applications shall parenthesize properly all such usages.

getline
Set $0 to the next input record from the current input file. This form of getline shall set the NF, NR, and FNR variables.
getline  var
Set variable var to the next input record from the current input file and, if appropriate, var shall be considered a numeric string (see Expressions in awk). This form of getline shall set the FNR and NR variables.
getline [varexpression
Read the next record of input from a named file. The expression shall be evaluated to produce a string that is used as a pathname. If the file of that name is not currently open, it shall be opened. As long as the stream remains open, subsequent calls in which expression evaluates to the same string value shall read subsequent records from the file. The file shall remain open until the close function is called with an expression that evaluates to the same string value. If var is omitted, $0 and NF shall be set; otherwise, var shall be set and, if appropriate, it shall be considered a numeric string (see Expressions in awk).

The getline operator can form ambiguous constructs when there are unparenthesized binary operators (including concatenate) to the right of the '<' (up to the end of the expression containing the getline). The result of evaluating such a construct is unspecified, and conforming applications shall parenthesize properly all such usages.

system(expression)
Execute the command given by expression in a manner equivalent to the system() function defined in the System Interfaces volume of IEEE Std 1003.1-2001 and return the exit status of the command.
The content on this page is provided by a Google Notebook user, and Google assumes no responsibility for this content.