Mastering AWK: The Ultimate Guide to Text Processing in Linux

Never Ending Saga
3 min read1 day ago

--

Availability

  • Default Presence: AWK is typically available in GNU/Linux distributions.
  • Check Installation: Use the which awk command to verify if AWK is installed.
  • Installation: If not installed, use the following command:
sudo apt-get install awk

Introduction to AWK

  • Stream Processing: AWK is used for processing streams of text, treating the text as a collection of records (lines) and fields (columns).
  • Basic Unit: The basic unit in AWK is a string, with each line being a record and each word or element within that line being a field.

AWK Syntax

  • Basic Syntax:
awk [options] 'pattern {action}' input_file
  • Patterns: Conditions to match lines.
  • Actions: Operations to perform on matched lines.
  • Default Behavior: If no pattern is specified, the action applies to all lines. If no action is specified, AWK prints the matched line.

Example Usage:

Here is the data file I used for the processing:

Name Surname Roll_number Rank
Rahul Saha 23 3
Ketan Bhagat 22 8
Rajni Kant 1 1
Sameer Anqari 5 -2
# Print the first column (Name)
awk '{print $1}' test

# Print the first and third columns (Name and Roll_number)
awk '{print $1,$3}' test

AWK Processing Flow

AWK scripts are divided into three main parts:

  1. BEGIN: Initialization, such as setting up variables or headers.
BEGIN {print "----- Start of Processing -----"}

2. Main Body: Processes each line that matches the pattern.

{print $0}

3. END: Final actions, like summarizing results.

END {print "----- End of Processing -----"}

Example AWK Script:

# Content of awkscript
BEGIN {print "-----------Find total marks and average-----------------"}
{
tot=$3+$4+$5
avg=tot/3
print "Total of " $2 " " $1 ":",tot
print "Avg of " $2 " " $1 ":",avg
}
END {print "--------Script Finished--------"}

# Running the script
awk -f awkscript test

Handling NR (Record Number):

To avoid processing header rows:

BEGIN {print "-----------Find total marks and average-----------------"}
{
if(NR != 1) {
tot=$3+$4+$5
avg=tot/3
print "Total of " $2 " " $1 ":",tot
print "Avg of " $2 " " $1 ":",avg
}
}
END {print "--------Script Finished--------"}a

Regular Expressions in AWK:

  • Operators:
  • ~: Matches a regex.
  • !~: Does not match a regex.
  • Enclose regex patterns in slashes /.
# Match lines starting with "Kapil" (case-insensitive)
awk '$0 ~ /^[Kk]apil/' test

# Exclude lines starting with "Kapil"
awk '$0 !~ /^[Kk]apil/' test

AWK Operators:

  • Arithmetic: +, -, *, /, %, ^
  • Relational: >, >=, <, <=, ==, !=
  • Logical: &&, ||, !
# Print lines where Surname is "joshi" and Maths score is greater than 80
awk '$2 == "joshi" && $3 > 80 {print}' test

# Print lines where the sum of Maths, Physics, and Chemistry scores is greater than 218
awk '$3+$4+$5 > 218 {print}' test

System Variables in AWK:

  • FS: Field Separator (default: space).
  • RS: Record Separator (default: newline \n).
  • NF: Number of Fields in the current record.
  • NR: Number of the current record.
  • OFS: Output Field Separator (default: space).
  • ORS: Output Record Separator (default: newline \n).
  • FILENAME: Name of the current file being processed.

Example for FS and OFS:

# test1 file: fields separated by commas
awk 'BEGIN {FS = ","; OFS = "-"; print "-----Script started------"} {print $1, $3} END {print "--------Script Finished--------"}' test

RS (Record Separator):

awk 'BEGIN {FS = ","; RS = "#"} {print $1, $3}' test

ORS (Output Record Separator):

awk 'BEGIN {RS = "#"; ORS = "\n"} {print $0}' test

NF (Number of Fields):

awk 'BEGIN {RS = "#"} {if(NF == 5) print $0; else print "Less than 5 fields"}' test1

Summery:

AWK is a powerful text processing tool available by default in most GNU/Linux distributions. It processes text by treating each line as a record and each word as a field.

--

--