You should always be comparing best case for this kind of thing. Slower cases are most likely "your thread got switched out by the OS to let something else run", and that's not really a fair test.
If you want to guard against context switches by the OS, don't use stuff that "most likely" works. Measure with perf and let it count the context switches.