There are some exceptions, like Mistral 0.1 (which is technically 32K according ...

		brucethemoose2 on Feb 21, 2024 \| parent \| context \| favorite \| on: Gemma: New Open Models There are some exceptions, like Mistral 0.1 (which is technically 32K according to the config but practically 8K because the sliding window is awful) and InternLM (which (at least initially) used auto rope scaling to extend the context as part of the model's architecture).

Yes, RoPE has thrown a wrench into things a bit.