Probably the biggest one was cutting down on the amount of Erlang message passing that it takes to append a batch of data to the end of the db file.
There was also an Erlang configuration change which allowed us to make use of a thread pool for disk-io so that fsyncs didn't block other activity.
Really it was a bunch of tiny things, that all added up. The catalyst was the creation and use of a few highly-concurrent benchmark suites that we could use to identify bottlenecks.