AWK Arrays Explained with 5 Practical Examples

by Sasikala on March 10, 2010

Awk programming language supports arrays. As part of our on-going awk examples series, we have seen awk user defined variables and awk built-in variables. Arrays are an extension of variables. Arrays are variable that hold more than one value. Similar to variables, arrays also has names. In some programming languages, arrays has to be declared, so that memory will be allocated for the arrays. Also, array indexes are typically integer, like array[1],array[2] etc.,

Awk Associative Array

Awk supports only associative array. Associative arrays are like traditional arrays except they uses strings as their indexes rather than numbers. When using an associative array, you can mimic traditional array by using numeric string as index.

Syntax:

arrayname[string]=value

In the above awk syntax:

arrayname is the name of the array.
string is the index of an array.
value is any value assigning to the element of the array.

Accessing elements of the AWK array

If you want to access a particular element in an array, you can access through its index — arrayname[index], which gives you the value assigned in that index.

If you want to access all the array elements, you can use a loop to go through all the indexes of an array as shown below.

Syntax:

for (var in arrayname)
actions

In the above awk syntax:

var is any variable name
in is a keyword
arrayname is the name of the array.
actions are list of statements to be performed. If you want to perform more than one action, it has to be enclosed within braces.

This loop executes list of actions for each different value which was used as an index in array with the variable var set to that index.

Removing an element from the AWK array

If you want to remove an element in a particular index of an array, use awk delete statement. Once you deleted an element from an awk array, you can no longer obtain that value.

Syntax:

delete arrayname[index];

The loop command below removes all elements from an array. There is no single statement to remove all the elements from an array. You have to go through the loop and delete each array element using awk delete statement.

for (var in array)
     delete array[var]

5 Practical Awk Array Examples

All the examples given below uses the Iplogs.txt file shown below. This sample text file contains list of ip address requested by the gateway server. This sample Iplogs.txt file contains data in the following format:

[date] [time] [ip-address] [number-of-websites-accessed]

$ cat Iplogs.txt
180607 093423	123.12.23.122 133
180607 121234	125.25.45.221 153
190607 084849   202.178.23.4 44
190607 084859   164.78.22.64 12
200607 012312	202.188.3.2 13
210607 084849   202.178.23.4 34
210607 121435	202.178.23.4 32
210607 132423	202.188.3.2 167

Example 1. List all unique IP addresses and number of times it was requested

$ awk '{
> Ip[$3]++;
> }
> END{
> for (var in Ip)
> print var, "access", Ip[var]," times"
> }
> ' Iplogs.txt
125.25.45.221 access 1  times
123.12.23.122 access 1  times
164.78.22.64 access 1  times
202.188.3.2 access 2  times
202.178.23.4 access 3  times

In the above script:

Third field ($3) is an ip address. This is used as an index of an array called Ip.
For each line, it increments the value of the corresponding ip address index.
Finally in the END section, all the index will be the list of unique IP address and its corresponding values are the occurrence count.

Example 2. List all the IP address and calculate how many sites it accessed

The last field in the Iplogs.txt is the number of sites each IP address accessed on a particular date and time. The below script generates the report which has list of IP address and how many times it requested gateway and total number of sites it accessed.

$cat ex2.awk
BEGIN {
print "IP Address\tAccess Count\tNumber of sites";
}
{
Ip[$3]++;
count[$3]+=$NF;
}
END{
for (var in Ip)
	print var,"\t",Ip[var],"\t\t",count[var];
}

$ awk -f ex2.awk Iplogs.txt
IP Address	Access Count	Number of sites
125.25.45.221 	 1 		 153
123.12.23.122 	 1 		 133
164.78.22.64 	 1 		 12
202.188.3.2 	 2 		 180
202.178.23.4 	 3 		 110

In the above example:

It has two arrays. The index for both the arrays are same — which is the IP address (third field).
The first array named “Ip” has list of unique IP address and its occurrence count. The second array called “count” has the IP address as an index and its value will be the last field (number of sites), so whenever the IP address comes it just keeps on adding the last field.
In the END section, it goes through all the IP address and prints the Ip address and access count from the array called Ip and number of sites from the array count.

Example 3. Identify maximum access day

$ cat ex3.awk
{
date[$1]++;
}
END{
for (count in date)
{
	if ( max < date[count] ) {
		max = date[count];
		maxdate = count;
	}

}
print "Maximum access is on", maxdate;
}

$ awk -f ex3.awk Iplogs.txt
Maximum access is on 210607

In this example:

array named “date” has date as an index and occurrence count as the value of the array.
max is a variable which has the count value and used to find out the date which has max count.
maxdate is a variable which has the date for which the count is maximum.

Example 4. Reverse the order of lines in a file

$ awk '{ a[i++] = $0 } END { for (j=i-1; j>=0;) print a[j--] }' Iplogs.txt
210607 132423	202.188.3.2 167
210607 121435	202.178.23.4 32
210607 084849   202.178.23.4 34
200607 012312	202.188.3.2 13
190607 084859   164.78.22.64 12
190607 084849   202.178.23.4 44
180607 121234	125.25.45.221 153
180607 093423	123.12.23.122 133

In this example,

It starts by recording all the lines in the array ‘a’.
When the program has finished processing all lines, Awk executes the END { } block.
The END block loops over the elements in the array ‘a’ and prints the recorded lines in reverse manner.

Example 5. Remove duplicate and nonconsecutive lines using awk

$ cat > temp
foo
bar
foo
baz
bar

$ awk '!($0 in array) { array[$0]; print }' temp
foo
bar
baz

In this example:

Awk reads every line from the file “temp”, and using “in” operator it checks if the current line exist in the array “a”.
If it does not exist, it stores and prints the current line.

If you enjoyed this article, you might also like..

Tagged as: Awk Array Assignment, Awk Array Initialization, Awk Array Length, Awk Array Size, Awk Array Sort, Awk Tutorial Examples, Linux Awk Examples, Linux Awk Tutorial, Unix Awk Examples, Unix Awk Tutorial

Comments on this entry are closed.

aldoem March 10, 2010, 2:46 am

Very gut…
Multiarray example:

#!/bin/bash
Matrix=(“one” “two” “three”)
one=(“1.1” “1.2” “1.3”)
two=(“2.1” “2.2” “2.3”)
three=(“3.1” “3.2” “3.3” )

for i in ${Matrix[@]}
do
one_Item=$(eval echo “\${$i[0]}”)
two_Item=$(eval echo “\${$i[1]}”)
three_Item=$(eval echo “\${$i[2]}”)
done

∞
Eric Pulvino March 10, 2010, 8:53 am

Looking at several of these examples here I would refer to these arrays more as MAPS. The difference in my mind is that the indexes are not necessarily numerically consecutive in the awk cases…. they are more like the keys used in C++ style maps which can take any value as needed to suit the data set.

Interesting functionality though, I will be sure to use it at some point I’m sure.

∞
porosec June 6, 2010, 1:30 am

I think solve example 3 more effective is

awk ‘max < $1 { max = $1 } END { print max }' Iplogs.txt

∞
jagadeeshwaran k June 11, 2012, 7:20 am

really interesting . i have learned many things regarding arrays in awk.

∞
kaique June 28, 2012, 12:30 am

..i need more examples of an array!

∞
archana August 10, 2012, 1:25 am

suer super super!!!!!!!!

∞
Aishwarya April 30, 2013, 2:26 am

another approach for Example 4. Reverse the order of lines in a file
awk ‘BEGIN{s=””} {s=$0″\n”s;} END{printf(“%s”,s);}’ Iplogs.txt

∞
Jan June 11, 2013, 7:36 am

Please help. I need to add ” 0|” before “QESTMD” or all the 4th segments – how do I do this with awk?

data:

0||08276101|QESTMD||10|1|257.05|787736015|7877360B|10062013|0|DISP|||
0||08276101|QESTMD||10|2|85.86|744549019|7445490B|10062013|0|DISP|||
0||08276401|PHARMD||100|1|7.49|894672004|8946720E|10062013|0|DISP|||
0||08276402|TRANNS||20|0|0|759694001|7596940B|10062013|0|DISP|||

output shoud be:

0||08276101| 0|QESTMD||10|1|257.05|787736015|7877360B|10062013|0|DISP|||
0||08276101| 0|QESTMD||10|2|85.86|744549019|7445490B|10062013|0|DISP|||
0||08276401| 0|PHARMD||100|1|7.49|894672004|8946720E|10062013|0|DISP|||
0||08276402| 0|TRANNS||20|0|0|759694001|7596940B|10062013|0|DISP|||

∞
Manish July 30, 2013, 1:19 am

a1 is the file contain the question data.
awk ‘BEGIN {FS = “|”;OFS=”|” }; {TT=”0|”$4;$4=TT;print $0}’ a1

∞
chandigarh chat April 23, 2014, 3:33 pm

how to split all owners of files in a dir into an array and display the 1st owner? There code im using is:

ls -ltr | awk ‘{ x=split($3,a,” “); print x[0];}’

But it gives error. Please help!

Thanks
~C

∞
Ratnendra Pandey April 28, 2016, 12:02 pm

Great tips on using array in gawk. Thanks.

∞
Ratnendra Pandey April 28, 2016, 12:27 pm

Is there a direct way to find out how many times a particular IP has been accessed? I tried this but it did not work:

awk ‘{ Ip[$3]++; } END{ print var, “access”, Ip[202.188.3.2],” times” }’

∞
Paul_Pedant May 27, 2016, 9:48 am

Ratnendra,

First, var is an undefined variable in the print statement.

Second, 202.188.3.2 is an invalid constant. It has too many dots to be a floating-point nimner, and no quotes to make it a string.

awk would have given you enough diagnostics to help you fix this, but in any case you should post any error messages you see, not just “it did not work”.

∞
Ratnendra Pandey December 7, 2016, 6:39 pm

Hi Paul_Pedant,
Thanks for answering. I am able to print after removing the dots ‘.’ from the ip address above:

plxc25800> cat ~/Iplogs2.txt
180607 093423 123 133
180607 121234 125221 153
190607 084849 2024 44
190607 084859 16464 12
200607 012312 2022 13
210607 084849 2024 34
210607 121435 2024 32
210607 132423 2022 167

plxc25800> awk ‘{ Ip[$3]++; } END{ print var, “access”, Ip[2022], “times”}’ ~/Iplogs2.txt
access 2 times

∞
Paul Pedant January 20, 2017, 7:12 am

Ratnedra,
You might like to read my previous post again.

(1) You have not just removed dots from the sample data.

Input (in item 5 of the main post) was:
200607 012312 202.188.3.2 13
210607 084849 202.178.23.4 34
210607 121435 202.178.23.4 32
210607 132423 202.188.3.2 167

Your input (shown in your last post above) is:
200607 012312 2022 13
210607 084849 2024 34
210607 121435 2024 32
210607 132423 2022 167

So in fact you have left out two of the IP4 address bytes to make this work. That’s made them non-unique anyway, so you are now counting the wrong thing.

(2) You are still starting the print with “var, “. Var is an undefined variable — it appears nowhere else in the code. It happens awk treats that situation as a null (empty) value, but it’s poor practice to do it anyway.

(3) Your problem is not the dots, it is the quoting.

Your code should make the IP address a string, and then it will work on the original data.

Because you present the IP address without quotes, awk will try to figure text stuff as a variable name, and numeric stuff as an integer or floating-point, and anything else as an expression, and it will choke on it, or do something unexpected.

Test this:

awk ”’
BEGIN { Key = “202.188.3.2”; }
{ Ip[$3]++; }
END{ print “access”, Ip[Key], “times”}
”’ ~/Iplogs.txt

∞

Next post: How to Find and Delete Empty Directories and Files in Unix

Previous post: How to Convert Text Document to Speech on Ubuntu Using eSpeak