How to use awk

How to use awk

AWK is another popular stream editor, just similar to SED. The basic function of awk is to search files for lines or other text units containing one or more patterns. When a line matches one of the patterns, special actions are performed on that line.

There are several ways to run awk. If the program is short, it is easiest to run it on the command line:

awk PROGRAM inputfile(s)

If multiple changes have to be made, possibly regularly and on multiple files, it is easier to put the awk commands in a script. This is read like this:

awk -f PROGRAM-FILE inputfile(s)

The most used program in awk is print, as we will see soon.

Printing selected fields

The print command in awk outputs selected data from the input file.

The variables $1, $2, $3, …, $N hold the values of the first, second, third until the last field of an input line. The variable $0 (zero) holds the value of the entire line.

how to use awk

To print the size ($2) and use% ($5) , use df -h | awk ‘{print $2,$5}’ .
Notice: $2,$5 : the “,” between them will separate output with  a space.

Th3-Gam3 ~ # df -h | awk '{print $2,$5}'
Size Use%
3.9G 0%
786M 2%
58G 89%
3.9G 1%
5.0M 1%
3.9G 0%
Formatting fields

Without formatting, using only the output separator, the output looks rather poor. Inserting a couple of tabs and a string to indicate what output this is will make it look a lot better:

example:

Th3-Gam3 ~ # df -h | sort -rnk 5 | head -3 | awk '{ print "Partition " $6 "\t: " $5 " full!" }'
Partition /mnt/sdb6    : 99% full!
Partition /mnt/sdb5    : 97% full!
Partition /home    : 89% full!
Th3-Gam3 ~ #
  • we used \t to print a tab, we must use around text.

Formatting characters for gawk (awk)

Sequence Meaning
\a Bell character
\n Newline character
\t Tab

Quotes, dollar signs and other meta-characters should be escaped with a backslash.

The print command and regular expressions

A regular expression can be used as a pattern by enclosing it in slashes. The regular expression is then tested against the entire text of each record. The syntax is as follows:

awk ‘EXPRESSION { PROGRAM }’ file(s)

For example to list files that start with letter a or c and ends with .conf from /etc directory and print the 9th field :

Th3-Gam3 ~ # ls -l /etc/ | awk '/\<(a|c).*\.conf$/ { print $9 }'
adduser.conf
apg.conf
ca-certificates.conf
casper.conf
resolv.conf
Th3-Gam3 ~ #
Special patterns

In order to precede output with comments, use the BEGIN statement.

The END statement can be added for inserting text after the entire input is processed.

awk scripts

As commands tend to get a little longer, you might want to put them in a script, so they are reusable. An awk script contains awk statements defining patterns and actions.

For example: create a file called test.awk and add this lines :

BEGIN { print "*** WARNING WARNING WARNING ***" }
/\<[8|9][0-9]%/ { print "Partition " $6 "\t: " $5 " full!" }
END { print "*** Give money for new disks URGENTLY! ***" }

Run the awk command using a file :

Th3-Gam3 ~ # df -h | awk -f test.awk 
*** WARNING WARNING WARNING ***
Partition /    : 89% full!
Partition /mnt/sdb5    : 97% full!
Partition /mnt/sdb6    : 99% full!
Partition /home    : 89% full!
*** Give money for new disks URGENTLY! ***
Th3-Gam3 ~ # 

We used awk with BEGIN , END , regular expression and in a file, that is awesome !!

awk variables

As awk is processing the input file, it uses several variables. Some are editable, some are read-only.

input field separator

The default separator used by awk is space or tab , what if we need to specify different separator , use BEGIN { FS=”fs” } at beging of awk command.

for example to use “:” as a field separator :

Th3-Gam3 ~ # awk 'BEGIN { FS=":" } { print $1 "\t" $5 }' /etc/passwdroot    root
daemon    daemon
bin    bin
sys    sys
sync    sync
games    games
man    man
output field separator

Fields are normally separated by spaces in the output. This becomes apparent when you use the correct syntax for the print command, where arguments are separated by commas.

output record separator

The output from an entire print statement is called an output record. Each print command results in one output record, and then outputs a string called the output record separator, ORS. The default value for this variable is “\n”, a newline character. Thus, each print statement generates a separate line.

To change the way output fields and records are separated, assign new values to OFS and ORS

Example :

Th3-Gam3 ~ # awk 'BEGIN { FS=":" ; OFS=" ; " ; ORS="\n-->\n" } { print $1 "\t" $5 }' /etc/passwd
root    root
-->
daemon    daemon
-->
bin    bin
-->
sys    sys
-->
sync    sync
-->
games    games
-->
man    man
-->
number of records

The built-in NR holds the number of records that are processed. It is incremented after reading a new input line. You can use it at the end to count the total number of records, or in each output record.

for example:

Th3-Gam3 ~ # awk 'BEGIN { FS=":" ; OFS=" ; " ; ORS="\n-->\n" } { print $1 "\t" $5 "\t" "Record Number: " NR }' /etc/passwd 
root    root    Record Number: 1
-->
daemon    daemon    Record Number: 2
-->
bin    bin    Record Number: 3
-->
sys    sys    Record Number: 4
-->
sync    sync    Record Number: 5
-->
games    games    Record Number: 6
-->
man    man    Record Number: 7
-->

Example: to list all scripts that use awk in /etc/init.d/*

grep awk /etc/init.d/*
printf program

For more precise control over the output format than what is normally provided by print, use printf. The printf command can be used to specify the field width to use for each item, as well as various formatting choices for numbers (such as what output base to use, whether to print an exponent, whether to print a sign, and how many digits to print after the decimal point). This is done by supplying a string, called the format string, that controls how and where to print the other arguments.

The syntax is the same as for the C-language printf statement; see your C introduction guide. The gawk info pages contain full explanations.

Summary

The gawk utility interprets a special-purpose programming language, handling simple data-reformatting jobs with just a few lines of code. It is the free version of the general UNIX awk command.

This tools reads lines of input data and can easily recognize columned output. The print program is the most common for filtering and formatting defined fields.

On-the-fly variable declaration is straightforward and allows for simple calculation of sums, statistics and other operations on the processed input stream. Variables and commands can be put in awk scripts for background processing.

Other things you should know about awk:

  • The language remains well-known on UNIX and alikes, but for executing similar tasks, Perl is now more commonly used. However, awk has a much steeper learning curve (meaning that you learn a lot in a very short time). In other words, Perl is more difficult to learn.
  • Both Perl and awk share the reputation of being incomprehensible, even to the actual authors of the programs that use these languages. So document your code!

That is it , i hope it was simple, thanks for joining me.
Enjoy !.

Comments are closed.