Link: Command-line tools can be 235x faster than your Hadoop cluster

Bash symbol

http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html


What I find funny is how he goes from this intermediate step:

cat *.pgn | grep "Result" | sort | uniq -c

To this intermediate step in one shot.

cat *.pgn | grep "Result" | awk '{ split($0, a, "-"); res = substr(a[1], length(a[1]), 1); \
if (res == 1) white++; if (res == 0) black++; if (res == 2) draw++;} \
END { print white+black+draw, white, black, draw }'

This is what I refer to as “knowing your business”.