Recently, I needed to generate and process multi-billion record synthetic datasets for some data structure benchmarking. Rather than write a bunch of custom code to do this processing, I decided to try using shell scripts and standard system utilities.