Taking Notes from Effective awk Programming

Taking Notes from Effective awk Programming

an awk program looks like:

pattern { action }

pattern { action }

 

In an awk rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, the action is performed for every input line. If the action is omitted, the default action is to print all lines that match the pattern.

 

Run awk Programs

awk 'program' input-file1 input-file2

awk -f program-file input-file1 input-file

 

Comments in awk Programs#

 

Samples:

awk "BEGIN { print \"Don't Panic!\" }"

awk '/foo/ { print $0 }' BBS-list

That slashes (/) indicates that foo is the pattern to search for.

 

awk '{ if (length($0) > max) max = length($0) } END { print max }' data

awk 'length($0) > 80' data

awk '{ if (x < length()) x = length() } END { print "maximum line length is " x }' data

awk 'NF > 0' data

Print the total number of kilobytes under one directory

ls -l files | awk '{ x += $5 } END { print "total K-bytes: " (x + 1023)/1024 }'

awk -F: '{ print $1 }' /etc/passwd | sort

Count the lines in a file:

awk 'END { print NR }' data

awk 'NR % 2 == 0' data

 

awk keeps track of the number of records that have been read from the current input file. This value is stored in a built-in variable called FNR. It is reset to zero when a new file is started. Another built-in variable, NR, is the total number of input records read so far from all datafiles. It starts at zero, but is never automatically reset to zero.

A different character can be used for the record separator by assigning the character to the built-in variable RS.

 

awk 'BEGIN { RS = "/" } { print $0 }' BBS-list

awk '{ print $0 }' RS="/" BBS-list

 

The value of the built-in variable NF is the number of fields in the current record.

After the end of the record has been determined, gawk sets the variable RT to the text in the input that matched RS.

 

When RS is a single character, RT contains the same single character. However, when RS is a regular expression, RT contains the actual input text that matched the regular expression.

 

Examining Fields

A dollar-sign ($) is used to refer to a field,, $0 means the current whole input record; $1 refers to the first field, $2 to the second, and so on

NF is a built-in variable whose value is the number of fields in the current record, and the last field in a record can be represented by $NF.

awk automatically updates the value of NF each time it reads a record.

 

awk '$1 ˜ /foo/ { print $0 }' BBS-list

 

Changing the Contents of a Field

The contents of a field can be changed. This changes what awk perceives as the current input record. (The actual input is untouched; awk never modifies the input file.)

 

awk '{ nboxes = $3 ; $3 = $3 - 10 print nboxes, $3 }' inventory-shipped

 

When the value of a field is changed, the text of the input record is recalculated to contain the new field where the old one was. In other words, $0 changes to reflect the altered field.

 

OFS: the output field separator

Making an assignment to an existing field changes the value of $0 but does not change the value of NF, even when you assign the empty string to a field.

echo a b c d | awk '{ OFS = ":"; $2 = ""  print $0; print NF }'

a::c:d

4

 

Decrementing NF throws away the values of the fields after the new value of NF and recomputes $0.

 

The built-in variable FS (field separator) is used to specify how Fields Are Separated

Using Regular Expressions to Separate Fields

The stripping of leading and trailing white space also comes into play whenever $0 is recomputed.

 

Making Each Character a Separate Fields (only for gawk)

echo a b | gawk 'BEGIN { FS = "" } {print $0} END {print NF}'

 

Setting FS from the Command Line

awk -F, 'program' input-files                                                   # same as FS = "\\"

awk -F\\\\ '...' files ...

awk -F\t awk figures that fields are separated with tabs and not ts,

Use -v FS="t" or -F"[t]" on the command line if you really do want to separate your fields with ts.

awk -F: '$2 == "!"' /etc/passwd

 

Field-Splitting Summary

FS == " "

Fields are separated by runs of whitespace. Leading and trailing whitespace are ignored. This is the default.

sed 1q /etc/passwd | awk '{ FS = ":" ; print $1 }'

 

Reading Fixed-Width Data

Use the built-in variable FIELDWIDTHS to specify fixed-width fields.

BEGIN { FIELDWIDTHS = "9 6 10 6 7 7 35" }

 

Multiple-Line Records

Another technique is to have blank lines separate records.

An empty string as the value of RS indicates that records are separated by one or more blank lines.

RS == "\n"

Records are separated by the newline character (\n). In effect, every line in the datafile is a separate record, including blank lines. This is the default.

 

RS == any single character

Records are separated by each occurrence of the character. Multiple successive occurrences delimit empty records.

 

RS == ""

Records are separated by runs of blank lines. The newline character always serves as a field separator, in addition to whatever value FS may have. Leading and trailing newlines in a file are ignored.

 

Explicit Input with getline

The awk language has a special built-in command called getline that can be used to read input under your explicit control.

 

Using getline with No Arguments

All it does in this case is read the next input record and split it up into fields. This is useful if you've finished processing the current record, but want to do some special processing on the next record right now.

 

Using getline into a Variable

The following example swaps every two lines of input:

{

if ((getline tmp) > 0) {

print tmp

print $0

} else

print $0

}

 

Using getline from a File

Use getline < file to read the next record from file.

 

Using getline into a Variable from a File

Use getline var < file to read input from the file, and put it in the variable var.

 

Using getline from a Pipe

Using command | getline. The string command is run as a shell command and its output is piped into awk to be used as input. This form of getline reads one record at a time from the pipe.

 

Using getline into a Variable from a Pipe

When you use command | getline var, the output of command is sent through a pipe to getline and into the variable var.

"date" | getline current_time

 

A common mistake in using the print statement is to omit the comma between two items. This often has the effect of making the items run together in the output, with no space. The reason for this is that juxtaposing two string expressions in awk means to concatenate them.

 

Output Separator

Set the built-in variable OFS: output field separator, its initial value is the string "".

Output record separator (or ORS). The initial value of ORS is the string "\n"

 

awk 'BEGIN { OFS = ";"; ORS = "\n\n" } { print $1, $2 }' BBS-list

 

Redirecting Output of print and printf

print items > output-file

When this type of redirection is used, the output-file is erased before the first output is written to it. Subsequent writes to the same output-file do not erase output-file, but append to it. (This is different from how you use redirections in shell scripts.)

 

print items >> output-file

print items | command

 

Using printf Statements for Fancier Printing

awk '{ printf "%-10s %s\n", $1, $2 }' BBS-list

 

Piping into sh

{ printf("mv %s %s\n", $0, tolower($0)) | "sh" }

 

Special Filenames in gawk

Special Files for Standard Descriptors: (only for gawk)

/dev/stdin

/dev/stdout

/dev/stderr

/dev/fd/N

The filenames /dev/stdin, /dev/stdout, and /dev/stderr are aliases for /dev/fd/0, /dev/fd/1, and /dev/fd/2, respectively.

 

Special Files for Network Communications

Starting with gawk 3.1, awk programs can open a two-way TCP/IP connection, acting as either a client or a server. This is done using a special filename of the form:

/inet/protocol/local-port/remote-host/remote-port

The protocol is one of tcp, udp,or raw, and the other fields represent the other essential pieces of information for making a networking connection. These filenames are used with the |& operator for communicating with a coprocess.

 

Closing Input and Output Redirections

If the same filename or the same shell command is used with getline more than once, the file is opened (or the command is executed) at the first time only, the first record of input is read from that file or command. The next time the same file or command is used with getline, another record is read from it, and so on.

Similarly, when a file or pipe is opened for output, the filename or command associated with it is remembered by awk, and subsequent writes to the same file or command are appended to the previous writes. The file or pipe stays open until awk exits.

This implies that special steps are necessary in order to read the same file again from the beginning, or to rerun a shell command (rather than reading more output from the same command).          The close function makes these things possible:

close(filename)

close(command)

 

"sort -r names" | getline foo, then close it with this:

close("sort -r names")

 

sortcom = "sort -r names"

sortcom | getline foo

...

close(sortcom)

 

Always close on files when you are done with them

 

Assigning Variables on the Command Line

awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list

 

-v variable=text

The variable is set at the very beginning, even before the BEGIN rules are run. The -v option and its assignment must precede all the filename arguments, as well as the program text.

 

The exact manner in which numbers are converted into strings is controlled by the awk built-in variable CONVFMT

CONVFMT's default value is "%.6g", which prints a value with at least six significant digits.

 

As a special case, if a number is an integer, then the result of converting it to a string is always an integer, no matter what the value of CONVFMT may be.

 

String Concatenation

Concatenation is performed by writing expressions next to one another, with no operator.

awk '{ print "Field number one: " $1 }' BBS-list

 

Conditional Expressions

selector ? if-true-exp : if-false-exp

 

Function Calls

awk '{ print "The square root of", $1, "is", sqrt($1) }'

 

Patterns, Actions, and Variables

 

Pattern Elements

Regular Expressions as Patterns

/regular expression/                                          Its meaning is $0 ˜ /pattern/

Expressions as Patterns

awk '$1 == "foo" { print $2 }' BBS-list

awk '$1 ˜ /foo/ { print $2 }' BBS-list

awk '/2400/ && /foo/' BBS-list

awk '/2400/ || /foo/' BBS-list

awk '! /foo/' BBS-list

pat1, pat2

BEGIN

END

 

Specifying Record Ranges with Patterns

A range pattern is made of two patterns separated by a comma, in the form begpat, endpat. It is used to match ranges of consecutive input records. The first pattern, begpat, controls where the range begins, while endpat controls where the pattern ends.

awk '$1 == "on", $1 == "off"' myfile

It prints every record in myfile between on/off pairs, inclusive.

Range patterns can not combine with other patterns:

 

The BEGIN and END Special Patterns

Startup and cleanup actions

awk ' BEGIN { print "Analysis of \"foo\"" }

 /foo/ { ++n }

 END { print "\"foo\" appears", n, "times." }' BBS-list

 

BEGIN rules are executed before any input is read, there simply is no input record, and therefore no fields.

 

The POSIX standard specifies that NF is available in an END rule. It contains the number of fields from the last input record.

 

The Empty Pattern

An empty (i.e., nonexistent) pattern is considered to match every input record.

awk '{ print $1 }' BBS-list

 

Using Shell Variables in Programs

The most common method is to use shell quoting to substitute the variable's value into the program inside the script.

 

awk "/$pattern/ "'{ nmatches++ }

END { print nmatches, "found" }' /path/to/data

 

A better method is to use awk’s variable assignment feature to assign the shell variable's value to an awk variable's value.

 

read pattern

awk -v pat="$pattern" '$0 ˜ pat { nmatches++ }

END { print nmatches, "found" }' /path/to/data

 

Actions

An action consists of one or more awk statements, enclosed in curly braces ({}).

An omitted action is equivalent to { print $0 }

/foo/  { } # match foo, do nothing -- empty action

/foo/  # match foo, print the record -- omitted action

 

Input statements

getline, next, nextfile

 

Control Statements in Actions

 

The next Statement

The next statement forces awk to immediately stop processing the current record and go on to the next record. This means that no further rules are executed for the current record, and the rest of the current rule's action isn't executed.

 

Contrast this with the effect of the getline function. That also causes awk to read the next record immediately, but it does not alter the flow of control in any way (i.e., the rest of the current action executes with a new input record).

 

The next statement is analogous to a continue statement in for loop.

 

Using gawk's nextfile Statement

Instead of abandoning processing of the current record, the nextfile statement instructs gawk to stop processing the current datafile.

 

Upon execution of the nextfile statement, FILENAME is updated to the name of the next datafile listed on the command line, FNR is reset to one, ARGIND is incremented, and processing starts over with the first rule in the program.

 

The exit Statement

When an exit statement is executed from a BEGIN rule, the program stops processing everything immediately. No input records are read. However, if an END rule is present, as part of executing the exit statement, the END rule is executed.

 

In such a case, if you don't want the END rule to do its job, set a variable to nonzero before the exit statement and check that variable in the END rule.

 

Built-in Variables

Built-in Variables That Control awk

The variables that are specific to gawk are marked with a pound sign (#)

CONVFMT

This string controls conversion of numbers to strings,Its default value is "%.6g".

FIELDWIDTHS #

This is a space-separated list of columns that tells gawk how to split input with fixed columnar boundaries.

FS

This is the input field separator

If the value is the null string (""), then each character in the record becomes a separate field. (#)

 

You can set the value of FS on the command line using the –F option:

awk -F, 'program' input-files

 

IGNORECASE #

If IGNORECASE is nonzero or non-null, then all string comparisons and all regular expression matching are case independent. Thus, regexp matching with ˜ and !˜, as well as the gensub, gsub, index, match, split, and sub functions, record termination with RS, and field splitting with FS, all ignore case when doing their particular regexp operations.

 

LINT #

When this variable is true (nonzero or non-null), gawk behaves as if the ––lint command-line option is in effect.

OFMT

It controls conversion of numbers to strings for printing with the print statement.

OFS

This is the output field separator. It is output between the fields printed by a print statement. Its default value is "", a string consisting of a single space.

ORS

This is the output record separator. It is output at the end of every print statement. Its default value is "\n", the newline character.

RS

This is awk 's input record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. It can also be the null string, in which case records are separated by runs of blank lines. If it is a regexp, records are separated by matches of the regexp in the input text.

 

Built-in Variables That Convey Information

ARGC, ARGV

The command-line arguments available to awk programs are stored in an array called ARGV. ARGC is the number of command-line arguments present.

The program text is not included in ARGV, nor are any of awk 's command-line options.

ARGIND #

The index in ARGV of the current file being processed. Every time gawk opens a new datafile for processing, it sets ARGIND to the index in ARGV of the file name. When gawk is processing the input files, FILENAME == ARGV[ARGIND] is always true.

ENVIRON

An associative array that contains the values of the environment. The array indices are the environment variable names; the elements are the values of the particular environment variables. For example, ENVIRON["HOME"] might be /home/arnold.

ERRNO #

If a system error occurs during a redirection for getline, during a read for getline, or during a close operation, then ERRNO contains a string describing the error.

FILENAME

The name of the file that awk is currently reading. When no datafiles are listed

on the command line, awk reads from the standard input and FILENAME is set

to "-". FILENAME is changed each time a new file is read

FNR

The current record number in the current file. FNR is incremented each time a new record is read. It is reinitialized to zero each time a new input file is started.

NF

The number of fields in the current input record. NF is set each time a new record is read, when a new field is created or when $0 changes

NR

The number of input records awk has processed since the beginning of the program's execution.

PROCINFO #

The elements of this array provide access to information about awk program. The following elements (listed alphabetically) are available:

PROCINFO["egid"]

The value of the getegid system call.

PROCINFO["euid"]

The value of the geteuid system call.

PROCINFO["FS"]

This is "FS" if field splitting with FS is in effect, or it is "

field splitting with FIELDWIDTHS is in effect.

PROCINFO["gid"]

The value of the getgid system call.

PROCINFO["pgrpid"]

The process group ID of the current process.

PROCINFO["pid"]

The process ID of the current process.

PROCINFO["ppid"]

The parent process ID of the current process.

PROCINFO["uid"]

The value of the getuid system call.

 

RLENGTH

The length of the substring matched by the match function. RLENGTH is set by invoking the match function. Its value is the length of the matched string, or −1 if no match is found.

RSTART

The start index in characters of the substring that is matched by the match function RSTART is set by invoking the match function. Its value is the position of the string where the matched substring starts, or zero if no match was found.

RT #

This is set each time a record is read. It contains the input text that matched the text denoted by RS, the record separator.

 

Using ARGC and ARGV

Notice that the awk program is not entered in ARGV.

The other special command-line options, with their arguments, are also not entered. This includes variable assignments done with the –v option

Normal variable assignments on the command line are treated as arguments and do show up in the ARGV array

 

A program can alter ARGC and the elements of ARGV.

Use - to represent the standard input. Storing additional elements and incrementing ARGC causes additional files to be read.

If the value of ARGC is decreased, that eliminates input files from the end of the list.

 

Arrays in awk

Arrays in awk are different — they are associative. This means that each array is a collection of pairs: an index and its corresponding array element value:

The value of IGNORECASE has no effect upon array subscripting. The identical string value used to store an array element must be used to retrieve it.

 

A reference to an array element that has no recorded value yields a value of "",the null string.

Such a reference automatically creates that array element, with the null string as its value. (In some cases, this is unfortunate, because it might waste memory inside awk.)

 

To determine whether an element exists in an array at a certain index, index in array

This expression tests whether the particular index exists, without the side effect of creating that element if it is not present.

 

Scanning All Elements of an Array

for (var in array)

body

 

The delete Statement

delete array[index]

All the elements of an array may be deleted with a single statement by leaving off the subscript in the delete statement, as follows:

delete array (only for gawk)

 

subscripts for awk arrays are always strings. Uninitialized variables, when used as strings, have the value "", not zero.

 

Multidimensional Arrays

grid[x,y]

foo[5,12] = "value" when the value of SUBSEP is "@",thus, the array element foo["5@12"] is set to "value".

 

The two expressions foo[5,12] and foo[5 SUBSEP 12] are always equivalent.

 

The default value of SUBSEP is the string "\034", which contains a nonprinting character that is unlikely to appear in an awk pr  ogram or in most input data. The usefulness of choosing an unlikely character comes from the fact that index values that contain a string matching SUBSEP can lead to combined strings that are ambiguous.

 

To test whether a particular index sequence exists in a multidimensional array

(subscript1, subscript2, ...) in array

 

Scanning Multidimensional Arrays

for (combined in array) {

split(combined, separate, SUBSEP)

...

}

This sets the variable combined to each concatenated combined index in the array, and splits it into the individual indices by breaking it apart where the value of SUBSEP appears. The individual indices then become the elements of the array separate.

 

Sorting Array Values and Indices with gawk

n = asort(data)

After the call to asort, the array data is indexed from 1 to some number n, the total number of elements in data. (This count is asort's retur  n value.) data[1] data[2] data[3], and so on.

An important side effect of calling asort is that the array's original indices are irrevocably lost.

 

n = asort(source, dest)

In this case, gawk copies the source array into the dest array and then sorts dest, destroying its indices.  However, the source array is not affected.

 

Often, what's needed is to sort on the values of the indices instead of the values of the elements. To do this, use a helper array to hold the sorted index values, and then access the original array's elements.

 

Functions

Built-in Functions

Numeric Functions

int(x)

This returns the nearest integer to x, located between x and zero and truncated toward zero.

sqrt(x)

exp(x)

log(x),sin(x),cos(x),atan2(y, x),

rand()

This returns a random number. The values of rand ar  e unifor  mly distributed between zero and one. The value is never zero and never one.

int(n * rand())

srand([x])

The function srand sets the starting point, or seed, for generating random numbers to the value x.

Each seed value leads to a particular sequence of random numbers.* Thus, if the seed is set to the same value a second time, the same sequence of random numbers is produced again.

 

String-Manipulation Functions

asort(source [, dest])#

index(in, find)

This searches the string in for the first occurrence of the string find, and retur  ns the position in characters at which that occurrence begins in the string in.

Remember that string indices in awk start at one.

length([string])

If no argument is supplied, length returns the length of $0.

 

match(string, regexp [, array])

The match function searches string for the longest, leftmost substring matched by the regular expression, regexp. It returns the character position, or index,at which that substring begins (one, if it starts at the beginning of string). If no match is found, it returns zero.

 

The match function sets the built-in variable RSTART to the index. It also sets the built-in variable RLENGTH to the length in characters of the matched substring. If no match is found, RSTART is set to zero, and RLENGTH to −1.

 

If array is present, it is cleared, and then the 0th element of array is set to the entire portion of string matched by regexp.If regexp contains parentheses, the integer-indexed elements of array are set to contain the portion of string matching the corresponding parenthesized subexpression.

 

echo foooobazbarrrrr | gawk '{ match($0, /(fo+).+(ba*r)/, arr)  print arr[1], arr[2] }' foooo barrrrr

 

split(string, array [, fieldsep])

This function divides string into pieces separated by fieldsep and stores the pieces in array. The string value of the third argument, fieldsep, is a regexp describing where to split string. If fieldsep is omitted, the value of FS is used. split returns the number of elements created.  If string does not match fieldsep, array is empty and split returns zero.

sprintf(format, expression1,...)

This returns (without printing) the string that printf would have printed out with the same arguments

strtonum(str)#

sub(regexp, replacement [, target])

The sub function alters the value of tar get. It searches this value, which is treated as a string, for the leftmost, longest substring matched by the regular expression regexp. Then the entire string is changed by replacing the matched text with replacement. The modified string becomes the new value of tar get.

This function is peculiar because target is not simply used to compute a value, and not just any expression will do—it must be a variable, field, or array element so that sub can store a modified value there. If this argument is omitted, then the default is to use and alter $0.

 

If the special character & appears in replacement, it stands for the precise substring that was matched by regexp.

 

{ sub(/candidate/, "& and his wife"); print }

changes the first occurrence of candidate to candidate and his wife on each input line.

 

The effect of this special character (&) can be turned off by putting a backslash before it in the string. As usual, to insert one backslash in the string, you must write two backslashes. Therefore, write \\& in a string constant to include a literal & in the replacement.

 

gsub(regexp, replacement [, target])

This is similar to the sub function, except gsub replaces all of the longest, leftmost, nonoverlapping matching substrings it can find. The g in gsub stands forglobal,” which means replace everywhere.

 

{ gsub(/Britain/, "United Kingdom"); print }

If the variable to search and alter (tar get) is omitted, then the entire input record ($0) is used.

 

gensub(regexp, replacement, how [, target])#

gensub is a general substitution function. Like sub and gsub, it searches the target string tar get for matches of the regular expression regexp. Unlike sub and gsub, the modified string is retur  ned as the result of the function and the original target string is not changed. If how is a string beginning with g or G, then it replaces all matches of regexp with replacement.

If how is a string beginning with g or G, then it replaces all matches of regexp with replacement. Otherwise, how is treated as a number that indicates which match of regexp to replace.

 

If no target is supplied, $0 is used.

 

gensub provides an additional feature that is not available in sub or gsub: the ability to specify components of a regexp in the replacement text. This is done by using parentheses in the regexp to mark the components and then specifying \N in the replacement text, where N is a digit from 1 to 9.

gawk ' BEGIN { a = "abc def"; b = gensub(/(.+) (.+)/, "\\2 \\1", "g", a); print b }'

def abc

 

echo a b c a b c |  gawk '{ print gensub(/a/, "AA", 2) }'

abcAAbc

 

substr(string, start [, length])

tolower(string)

toupper(string)

 

More about \ and & with sub, gsub, and gensub

Input/Output Functions

close(filename [, how])

fflush([filename])

system(command)

Executes operating-system commands and then return to the awk program.

END { system("date | mail -s 'awk run done' root") }

print command | "/bin/sh"

 

Using gawk's Timestamp Functions

systime()

This function returns the current time as the number of seconds since the system epoch.

mktime(datespec)

strftime([format [, timestamp]])

If no for mat argument is supplied, strftime uses "%a %b %d %H:%M:%S %Z %Y". This format string produces output that is (almost) equivalent to that of the date utility.

 

The mktime function allows you to convert a textual representation of a date and time into a timestamp.

The strftime function allows you to easily turn a timestamp into human-readable information.

 

User-Defined Functions

Advanced Features of gawk

Allowing Nondecimal Input Data

If you run gawk with the --non –decimal–data--option, you can have nondecimal constants in your input data

echo 0123 123 0x123 | gawk --non-decimal-data '{ printf "%d, %d, %d\n", $1, $2, $3 }'

 

Two-Way  Communications with Another Process

It is useful to be able to send data to a separate program for processing and then read the result.

For gawk, it is possible to open a two-way pipe to another process. The second process is termed a coprocess, since it runs in parallel with gawk. The two-way connection is created using the new |& operator

 

do {

print data |& "subprogram"

"subprogram" |& getline results

} while (data left to process)

close("subprogram")

The first time an I/O operation is executed using the |& operator, gawk creates a two-way pipeline to a child process that runs the other program. Output created with print or printf is written to the program’s standard input, and output from the program’s standard output can be read by the gawk program using getline.

 

It is possible to close just one end of the two-way pipe to a coprocess, by supplying a second argument to the close function of either "to" or "from". These strings tell gawk to close the end of the pipe that sends data to the process or the end that reads from it, respectively.

 

Using gawk for Network Programming

The full syntax of the special filename is /inet/pr otocol/local-port/remote-host/remote-port.

 

Profiling Your awk Programs

pgawk is identical in every way to gawk, except that when it has finished running, it creates a profile of your program in a file named awkprof.out. The profile option can be used to change the name of the file where pgawk will write the profile:

pgawk --profile=myprog.prof -f myprog.awk data1 data2

 

Running awk and gawk

Command-Line Options

-F fs (--field-separator fs)

-f source-file (--file source-file)

-v var=val (--assign var=val)

Sets the variable var to the value val before execution of the program begins.

Such variable values are available inside the BEGIN rule.

The –v option can only set one variable, but it can be used more than once, setting another variable each time, like this: awk -v foo=1 -v bar=2....

 

-- Signals the end of the command-line options. The following arguments are not treated as options even if they begin with -.

-W gawk-opt

Following the POSIX standard, implementation-specific options are supplied as arguments to the –W option.

 

The following list describes gawk-specific options:

-W copyright (--copyright)

-W copyleft (--copyleft)

-W dump-variables[=file] (--dump-variables[=file])

Prints a sorted list of global variables, their types, and final values to file

-W non-decimal-data (--non-decimal-data)

Enable automatic interpretation of octal and hexadecimal values in input data

-W lint-old (--lint-old)

Warns about constructs that are not available in the original version of awk

-W posix (--posix)

Operates in strict POSIX mode. This disables all gawk extensions

-W profile[=file] (--profile[=file])

Enable profiling of awk programs

 

-W version (--version)

 

Other Command-Line Arguments

All these arguments are made available to your awk program in the ARGV array Command-line options and the program text (if present) are omitted from ARGV. All other arguments, including variable assignments, are included. As each element of ARGV is processed, gawk sets the variable ARGIND to the index in ARGV of the current element.

 

The AWKPATH Environment Variable

ENVIRON["AWKPATH"]


Labels

Java (159) Lucene-Solr (110) All (60) Interview (59) J2SE (53) Algorithm (37) Eclipse (35) Soft Skills (35) Code Example (31) Linux (26) JavaScript (23) Spring (22) Windows (22) Web Development (20) Tools (19) Nutch2 (18) Bugs (17) Debug (15) Defects (14) Text Mining (14) J2EE (13) Network (13) PowerShell (11) Chrome (9) Continuous Integration (9) How to (9) Learning code (9) Performance (9) UIMA (9) html (9) Design (8) Dynamic Languages (8) Http Client (8) Maven (8) Security (8) Trouble Shooting (8) bat (8) blogger (8) Big Data (7) Google (7) Guava (7) JSON (7) Problem Solving (7) ANT (6) Coding Skills (6) Database (6) Scala (6) Shell (6) css (6) Algorithm Series (5) Cache (5) IDE (5) Lesson Learned (5) Miscs (5) Programmer Skills (5) System Design (5) Tips (5) adsense (5) xml (5) AIX (4) Code Quality (4) GAE (4) Git (4) Good Programming Practices (4) Jackson (4) Memory Usage (4) OpenNLP (4) Project Managment (4) Python (4) Spark (4) Testing (4) ads (4) regular-expression (4) Android (3) Apache Spark (3) Become a Better You (3) Concurrency (3) Eclipse RCP (3) English (3) Firefox (3) Happy Hacking (3) IBM (3) J2SE Knowledge Series (3) JAX-RS (3) Jetty (3) Restful Web Service (3) Script (3) regex (3) seo (3) .Net (2) Android Studio (2) Apache (2) Apache Procrun (2) Architecture (2) Batch (2) Build (2) Building Scalable Web Sites (2) C# (2) C/C++ (2) CSV (2) Career (2) Cassandra (2) Distributed (2) Fiddler (2) Google Drive (2) Gson (2) Html Parser (2) Http (2) Image Tools (2) JQuery (2) Jersey (2) LDAP (2) Life (2) Logging (2) Software Issues (2) Storage (2) Text Search (2) xml parser (2) AOP (1) Application Design (1) AspectJ (1) Bit Operation (1) Chrome DevTools (1) Cloud (1) Codility (1) Data Mining (1) Data Structure (1) ExceptionUtils (1) Exif (1) Feature Request (1) FindBugs (1) Greasemonkey (1) HTML5 (1) Httpd (1) I18N (1) IBM Java Thread Dump Analyzer (1) JDK Source Code (1) JDK8 (1) JMX (1) Lazy Developer (1) Mac (1) Machine Learning (1) Mobile (1) My Plan for 2010 (1) Netbeans (1) Notes (1) Operating System (1) Perl (1) Problems (1) Product Architecture (1) Programming Life (1) Quality (1) Redhat (1) Redis (1) Review (1) RxJava (1) Solutions logs (1) Team Management (1) Thread Dump Analyzer (1) Troubleshooting (1) Visualization (1) boilerpipe (1) htm (1) ongoing (1) procrun (1) rss (1)

Popular Posts