Mastering AWK: The Ultimate Guide to Text Processing in Linux
3 min read 1 day ago
Availability
- Default Presence: AWK is typically available in GNU/Linux distributions.
- Check Installation: Use the
which awk
command to verify if AWK is installed. - Installation: If not installed, use the following command:
sudo apt-get install awk
Introduction to AWK
- Stream Processing: AWK is used for processing streams of text, treating the text as a collection of records (lines) and fields (columns).
- Basic Unit: The basic unit in AWK is a string, with each line being a record and each word or element within that line being a field.
AWK Syntax
- Basic Syntax:
awk [options] 'pattern {action}' input_file
- Patterns: Conditions to match lines.
- Actions: Operations to perform on matched lines.
- Default Behavior: If no pattern is specified, the action applies to all lines. If no action is specified, AWK prints the matched line.
Example Usage:
Here is the data file I used for the processing:
Name Surname Roll_number Rank
Rahul Saha 23 3
Ketan Bhagat 22 8
Rajni Kant 1 1
Sameer Anqari 5 -2
# Print the first column (Name)
awk '{print $1}' test
# Print the first and third columns (Name and Roll_number)
awk '{print $1,$3}' test
AWK Processing Flow
AWK scripts are divided into three main parts:
- BEGIN: Initialization, such as setting up variables or headers.
BEGIN {print "----- Start of Processing -----"}
2. Main Body: Processes each line that matches the pattern.
{print $0}
3. END: Final actions, like summarizing results.
END {print "----- End of Processing -----"}
Example AWK Script:
# Content of awkscript
BEGIN {print "-----------Find total marks and average-----------------"}
{
tot=$3+$4+$5
avg=tot/3
print "Total of " $2 " " $1 ":",tot
print "Avg of " $2 " " $1 ":",avg
}
END {print "--------Script Finished--------"}
# Running the script
awk -f awkscript test
Handling NR (Record Number):
To avoid processing header rows:
BEGIN {print "-----------Find total marks and average-----------------"}
{
if(NR != 1) {
tot=$3+$4+$5
avg=tot/3
print "Total of " $2 " " $1 ":",tot
print "Avg of " $2 " " $1 ":",avg
}
}
END {print "--------Script Finished--------"}a
Regular Expressions in AWK:
- Operators:
~
: Matches a regex.!~
: Does not match a regex.- Enclose regex patterns in slashes
/
.
# Match lines starting with "Kapil" (case-insensitive)
awk '$0 ~ /^[Kk]apil/' test
# Exclude lines starting with "Kapil"
awk '$0 !~ /^[Kk]apil/' test
AWK Operators:
- Arithmetic:
+
,-
,*
,/
,%
,^
- Relational:
>
,>=
,<
,<=
,==
,!=
- Logical:
&&
,||
,!
# Print lines where Surname is "joshi" and Maths score is greater than 80
awk '$2 == "joshi" && $3 > 80 {print}' test
# Print lines where the sum of Maths, Physics, and Chemistry scores is greater than 218
awk '$3+$4+$5 > 218 {print}' test
System Variables in AWK:
- FS: Field Separator (default: space).
- RS: Record Separator (default: newline
\n
). - NF: Number of Fields in the current record.
- NR: Number of the current record.
- OFS: Output Field Separator (default: space).
- ORS: Output Record Separator (default: newline
\n
). - FILENAME: Name of the current file being processed.
Example for FS and OFS:
# test1 file: fields separated by commas
awk 'BEGIN {FS = ","; OFS = "-"; print "-----Script started------"} {print $1, $3} END {print "--------Script Finished--------"}' test
RS (Record Separator):
awk 'BEGIN {FS = ","; RS = "#"} {print $1, $3}' test
ORS (Output Record Separator):
awk 'BEGIN {RS = "#"; ORS = "\n"} {print $0}' test
NF (Number of Fields):
awk 'BEGIN {RS = "#"} {if(NF == 5) print $0; else print "Less than 5 fields"}' test1
Summery:
AWK is a powerful text processing tool available by default in most GNU/Linux distributions. It processes text by treating each line as a record and each word as a field.