Optimizing memory layout and reducing allocations are pretty ISA-independent. But yeah, at some point you will have to start looking at the assembly to optimize further (even if only to know when to tell the compiler to stop inlining a function that causes tons of register spilling inside a tight loop).
And this will almost certainly require a deep understanding of the instruction set, and dropping into inline ASM, periodically.