We will eventually increase the Phind Model to 100K tokens -- the RoPE embedding...

arugulum · on Oct 31, 2023

> the RoPE embeddings in Code Llama were designed for this.

The RoPE embeddings were not "designed" for that. The original RoPE was not designed with length extrapolation in mind. Subsequent tweaks to extrapolate RoPE (e.g. position interpolation) are post-hoc tweaks (with optional tuning) to an entirely vanilla RoPE implementation.

antupis · on Nov 1, 2023

100k tokens and good ide support would be great. Copy pasting back and forth with browser and IDE is kinda annoying and you always miss some context. I think model is now good enough but what is kinda missing is good developer experience eg what to load in that context window and how model integrates to IDE. But this is kinda missing with copilot and chatgpt4 as well.

m3kw9 · on Oct 31, 2023

Is it “100k” or really 100k there are so many ways to do context, I remember seeing 100k before but it was doing some cheap trick to get it

razodactyl · on Nov 6, 2023

What about ALiBi and Sliding Window Attention?

Additionally Apple researchers seem to be playing with "Attention Free" variants.