SVT-AV1 has still a lot of visual quality issues. PSNR, SSIM, VMAF are useful metrics, but optimising for these won't get you the best encoder.
x264 didn't get its reputation for going after PSNR and SSIM.
The subjective results in testing follow a similar pattern though. Even with variations between the metrics and subjective scores there's not really enough wiggle room for it to bridgevthe gap:
They had some psy optimisations that introduced "false" "detail" that the eye liked but metrics didn't.
Kind of like what AV1 does with film grain or various audio codecs do with filler audio that's roughly the right "texture" even if not accurate to the original signal.
edit: this is on top of the basics all working fast and well. You could argue that many competitors "overfit" to metrics and they had the wisdom or correct organisational incentives to avoid this.