Working in the field, that seems overconfidently good, considering its faster than most wire to wire SLA of world class FPGAs doing market data translation.
They cite "750 to 800 nanoseconds" wire to wire latency and that their next platform is going to be even faster.
They were using FPGA and yes, FPGA is faster, but not as much faster as many people would think.
First, market events come in isolation. There are no concurrent messages. WSE guarantees 1/10000th of a secend between each message. So you have your entire machine dedicated to executing the order.
FPGAs are usually used to run multiple copies of same net to speed up a simple problem but that is not the case here.
Second, FPGAs are usually used as a shortcut to optimize the execution of the problem. With FPGA you say "rather than trying to solve this problem with generic instructions that add a lot of delays I will just design dedicated net that will not be bothered by the generic baggage".
So in generic assembly you may want to write a branch and the branch predictor may go the wrong way and that costs. On FPGA you design your net and so you just go straight to the point.
But it doesn't mean you can't design normal code to be fast. You just need to be aware of actual cost of every single instruction of the critical path.
And third, FPGAs are clocked slower. What this means you have to do a lot per clock cycle on FPGA just to be on par with x86 core.
That application I worked on it was not intended as HFT. 5 us was an arbitrary goal we wanted to reach knowing full well that it is way behind HFT-ers.
Working in the field, that seems overconfidently good, considering its faster than most wire to wire SLA of world class FPGAs doing market data translation.