Mastering AWK: The Ultimate Guide to Text Processing in Linux

3 min read1 day ago

Availability

Default Presence: AWK is typically available in GNU/Linux distributions.
Check Installation: Use the which awk command to verify if AWK is installed.
Installation: If not installed, use the following command:

sudo apt-get install awk

Introduction to AWK

Stream Processing: AWK is used for processing streams of text, treating the text as a collection of records (lines) and fields (columns).
Basic Unit: The basic unit in AWK is a string, with each line being a record and each word or element within that line being a field.

AWK Syntax

Basic Syntax:

awk [options] 'pattern {action}' input_file

Patterns: Conditions to match lines.
Actions: Operations to perform on matched lines.
Default Behavior: If no pattern is specified, the action applies to all lines. If no action is specified, AWK prints the matched line.

Example Usage:

Here is the data file I used for the processing:

Name Surname Roll_number Rank
Rahul Saha 23 3
Ketan Bhagat 22 8
Rajni Kant 1 1
Sameer Anqari 5 -2

# Print the first column (Name)
awk '{print $1}' test

# Print the first and third columns (Name and Roll_number)
awk '{print $1,$3}' test

AWK Processing Flow

AWK scripts are divided into three main parts:

BEGIN: Initialization, such as setting up variables or headers.

BEGIN {print "----- Start of Processing -----"}

2. Main Body: Processes each line that matches the pattern.

{print $0}

3. END: Final actions, like summarizing results.

END {print "----- End of Processing -----"}

Example AWK Script:

# Content of awkscript
BEGIN {print "-----------Find total marks and average-----------------"}
{
  tot=$3+$4+$5
  avg=tot/3
  print "Total of " $2 " " $1 ":",tot
  print "Avg of " $2 " " $1 ":",avg
}
END {print "--------Script Finished--------"}

# Running the script
awk -f awkscript test

Handling NR (Record Number):

To avoid processing header rows:

BEGIN {print "-----------Find total marks and average-----------------"}
{
  if(NR != 1) {
    tot=$3+$4+$5
    avg=tot/3
    print "Total of " $2 " " $1 ":",tot
    print "Avg of " $2 " " $1 ":",avg
  }
}
END {print "--------Script Finished--------"}a

Regular Expressions in AWK:

Operators:
~: Matches a regex.
!~: Does not match a regex.
Enclose regex patterns in slashes /.

# Match lines starting with "Kapil" (case-insensitive)
awk '$0 ~ /^[Kk]apil/' test

# Exclude lines starting with "Kapil"
awk '$0 !~ /^[Kk]apil/' test

AWK Operators:

Arithmetic: +, -, *, /, %, ^
Relational: >, >=, <, <=, ==, !=
Logical: &&, ||, !

# Print lines where Surname is "joshi" and Maths score is greater than 80
awk '$2 == "joshi" && $3 > 80 {print}' test

# Print lines where the sum of Maths, Physics, and Chemistry scores is greater than 218
awk '$3+$4+$5 > 218 {print}' test

System Variables in AWK:

FS: Field Separator (default: space).
RS: Record Separator (default: newline \n).
NF: Number of Fields in the current record.
NR: Number of the current record.
OFS: Output Field Separator (default: space).
ORS: Output Record Separator (default: newline \n).
FILENAME: Name of the current file being processed.

Example for FS and OFS:

# test1 file: fields separated by commas
awk 'BEGIN {FS = ","; OFS = "-"; print "-----Script started------"} {print $1, $3} END {print "--------Script Finished--------"}' test

RS (Record Separator):

awk 'BEGIN {FS = ","; RS = "#"} {print $1, $3}' test

ORS (Output Record Separator):

awk 'BEGIN {RS = "#"; ORS = "\n"} {print $0}' test

NF (Number of Fields):

awk 'BEGIN {RS = "#"} {if(NF == 5) print $0; else print "Less than 5 fields"}' test1

Summery:

AWK is a powerful text processing tool available by default in most GNU/Linux distributions. It processes text by treating each line as a record and each word as a field.