Thursday, April 19, 2007

awk - The right tool for the right job

While doing some research, I recently needed to know how long it took for a simulated network packet to go from one spot to another. My data looked like this:

1.333103,down,140,3,140,2340025502,0,0,36,sent-9.2.11
1.333103,down,140,2,140,2340025502,0,0,36,ok
1.333103,down,140,1,140,2340025502,0,0,36,ok

I had to find the difference in time (first number) from when a packet was sent and subtract it from when it was received. I have to track this for each node address which is the long number in the data above. I could not mix it with any other node's data.

Thankfully there is "awk". With awk, I can solve this problem in two lines of code:

/9\.2\.11/ {s[$5]=$1;}
/received/ {print $1-s[$5]}
The first line of code matches any text line that has "9.2.11" in it, which is key that a packet has been transmitted. That code will store in an array the time the packet was sent.

The cool thing with awk is I don't have to declare my array, and my array parameters can be a string that has whatever I want in it. So I put in field five as the index, and assign that array location to the value in field one. (I know, technically that means it isn't an array, whatever.)

The second line matches any line with "received" in it and subtracts the sent time from the current time, printing the result.

This is so simple I can quickly create such an awk script for any ad-hoc reporting I need. In fact I can specify the entire program as a command-line parameter. This functionality would be painful to create in Java or C#.

If you are a programmer, you owe it to yourself to learn text processing tools like grep, sed, awk, and the F-22 of text processing, Perl. For ad-hoc work or even production work with text manipulation

...and yes, you can get these tools on Windows.