There are some exceptions, like Mistral 0.1 (which is technically 32K according to the config but practically 8K because the sliding window is awful) and InternLM (which (at least initially) used auto rope scaling to extend the context as part of the model's architecture).