And that’s the point of fine tuning models. Still good to see someone walk throu...

scosman · on July 1, 2024

On that note: is there a good service for “here’s my dataset”, please fine tune these 9 models and give me evaluation stats?

strickvl · on July 1, 2024

OpenpPipe - https://openpipe.ai/ - is probably the service that most closely resembles what you’re asking for, but I found the evals weren’t really what I wanted — i.e. following my custom evaluation criteria — so you probably will end up having to do that yourself anyway. But for the finetuning, they’re all somewhat the same. Predibase and OpenPipe are two good options for that. Predibase has more base models for you to finetune, but it’s a bit more unwieldy to work with. I wrote about that in a previous post here -- https://mlops.systems/posts/2024-06-17-one-click-finetuning.....

kcorbitt · on July 1, 2024

(Disclaimer: founder of OpenPipe). Thanks for the shout-out. Note that we're actively working on improved evaluations that will let you add more specific criteria as well as more evaluation types, like comparing field values to that of a golden dataset. This is definitely something that customers are asking for!

scosman · on July 1, 2024

Wild to see them advertising collecting GPT4 responses for training other models. That’s definitely not allowed by TOS. I suspect many do, but front page advertising is another thing entirely.

w4nderlust · on July 1, 2024

Predibase ( http://predibase.com ), also referred in the article, is a platform specifically designed for exactly that. It also has "repos" for finetuning multiple models and comapre their performance and keeping things organzie. It also allow you to query any of the finetuned models on the fly from a single GPU with multi-lora serving. (Predibase founder here)

tucnak · on July 1, 2024

Together.AI is a good starting point. Even though I'm not sure what fine-tuning method they're using, the results are REALLY good.

geokon · on July 1, 2024

As I understood the point was not that they fine tuned a model and it got better

They use a much simpler model, fine tune it, and manage to beat a way more advanced model

wongarsu · on July 1, 2024

When jumping from 7B parameters to 70B to 400B (or whatever GPT-4 uses) most of the additional neurons seem to go towards a better world model and better reasoning (or whatever you want to call the inference of new information from known information). There doesn't seem to be any major improvements in basic language skills past 7B, and even 1B and 3B models do pretty well on that front.

In that sense it's not that surprising that on a pure text extraction task with little "thinking" required a 7B model does well and outperforms other models after fine tuning. In the "noshotsfired" label GPT-4 is even accused of overthinking it.

It is interesting how finetuned mistral-7b and llama3-7b outperform finetuned gpt3.5-turbo. I would tend to attribute that to those models being newer and "more advanced" despite their low parameter count, but maybe that's interpreting too much into a small score difference.

scosman · on July 1, 2024

Re: 7b models vs gpt-3.5, I’m guessing different fine tuning parameters can account for the difference. The OpenAI fine tuning is a black box.

scosman · on July 1, 2024

That’s still the point. That model now does exactly one thing, and because of that can do better than a model 50x the size that tries to do everything. It will crush it in instruction following and consistency.

A fine tuned 500b parameter model would probably beat the fine tuned 7b model, but only by a bit (depending on task obviously). A lot of that capacity is being used for knowledge, and isn’t needed for extraction/classification tasks. Fine tuning isn’t touching most of those weights. The smaller models need to focus on more general language skills, not answering “describe the evolution of France’s economy in the 1800s”.