They mean that they took a 1.3B parameter model, applied the InstructGPT finetun... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		sebzim4500 on March 14, 2023 \| parent \| context \| favorite \| on: Alpaca: A strong open-source instruction-following... They mean that they took a 1.3B parameter model, applied the InstructGPT finetuning model and found that it worked better for their usecase than a 175B parameter model which had not gone through that process.

est on March 14, 2023 [–]

Ah I got it now. Thanks.

From the gpt-3 paper it looks like they have many variants like

- GPT-3-350M

- GPT-3-1.3B

- GPT-3-2.7B

- GPT-3-6.7B

- GPT-3-13B

- GPT-3-175B

Ada, Babbage, Curie and Davinci line up closely with 350M, 1.3B, 6.7B, and 175B respectively. The names are pretty suggestive.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact