http://aadrake.com/command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html
What I find funny is how he goes from this intermediate step:
cat *.pgn | grep "Result" | sort | uniq -c
To this intermediate step in one shot.
cat *.pgn | grep "Result" | awk '{ split($0, a, "-"); res = substr(a[1], length(a[1]), 1); \
if (res == 1) white++; if (res == 0) black++; if (res == 2) draw++;} \
END { print white+black+draw, white, black, draw }'
This is what I refer to as “knowing your business”.