> it is frustrating to be blocked from doing a lot of research in this space due...

zone411 · on Feb 4, 2023

Long story short, training requires intensive device-to-device communication. Distributed training is possible in theory but so inefficient that it's not worth it. Here is a new paper that looks to be the most promising approach yet: https://arxiv.org/abs/2301.11913

sillysaurusx · on Feb 4, 2023

It doesn’t, actually. The model weights can be periodically averaged with each other. No need for synchronous gradient broadcasts.

Why people aren’t doing this has always been a mystery to me.

Relevant: https://battle.shawwn.com/swarm-training-v01a.pdf

telotortium · on Feb 4, 2023

You linked a paper with no results and no conclusion. Perhaps you meant to link a different paper?

sillysaurusx · on Feb 5, 2023

I never finished it.

hackernewds · on Feb 5, 2023

so it is unproven? what is the value of it?

sillysaurusx · on Feb 5, 2023

It’s how we trained roughly 40 GPT 1.5B models. The technique works; it’s up to you to try it out.

zone411 · on Feb 5, 2023

The abstract mentions fine-tuning, not full pre-training?

sillysaurusx · on Feb 5, 2023

Yeah, sorry for not being precise. We used the technique to fine tune around 40 GPT 1.5B models, including the chess one.

It was very apparent that the technique was working well. The kiss curve suddenly started dropping dramatically the first day we got it working.

6510 · on Feb 5, 2023

I think the landscape has plenty to think about with few explorers able to wrap their wetware around all of it?

naasking · on Feb 5, 2023

Wouldn't other signal propagation approaches, like Forward-Forward, make this easier?

nylonstrung · on Feb 4, 2023

Would have to be federated learning to work I think

8f2ab37a-ed6c · on Feb 4, 2023

That's brilliant, I would love to spare compute cycles and network on my devices for this if there's an open source LLM on the other side that I can use in my own projects, or commercially.

Doesn't feel like there's much competition for ChatGPT at this point otherwise, which can't be good.

davely · on Feb 4, 2023

On the generative image side of the equation, you can do the same thing with Stable Diffusion[1], thanks to a handy open source distributed computing project called Stable Horde[2].

LAION has started using Stable Horde for aesthetics training to back feed into and improve their datasets for future models[3].

I think one can foresee the same thing eventually happening with LLMs.

Full disclosure: I made ArtBot, which is referenced in both the PC World article and the LAION blog post.

[1] https://www.pcworld.com/article/1431633/meet-stable-horde-th...

[2] https://stablehorde.net/

[3] https://laion.ai/blog/laion-stable-horde/

vineyardmike · on Feb 5, 2023

> Doesn't feel like there's much competition for ChatGPT at this point otherwise, which can't be good.

Facebook open sourced their LLM, called OPT [1]. There's not much else, and OPT isn't exactly easy to run (requires like 8 GPUs).

I'm not an expect, so I don't know why some models, like the graphics generation we've seen, are able to fit on phones, while LLM require $500k worth of GPUs to run. Hopefully this is the first step to changing that.

[1] https://ai.facebook.com/blog/democratizing-access-to-large-s...

VadimPR · on Feb 4, 2023

That already exists - https://github.com/bigscience-workshop/petals

mdorazio · on Feb 5, 2023

I've seen Petals mentioned several times before and I don't think it's the same thing. Correct me if I'm wrong, but it seems Petals is for running distributed inference and fine-tuning of an existing model. What the above poster and I really want to see is distributed training of a new model across a network.

Much like I was able to choose to donate CPU cycles to a wide variety of BOINC-based projects, I want to be able to donate GPU cycles to anyone with a crazy idea for a new ML model - text, image, finance, audio, etc.

andai · on Feb 4, 2023

I read about something a few weeks ago which does just this! Does anyone know what it's called?

lucidrains · on Feb 4, 2023

you are probably thinking of https://arxiv.org/abs/2207.03481

for inference, there is https://github.com/bigscience-workshop/petals

however, both are only in the research phase. start tinkering!

Iv · on Feb 5, 2023

Hell it could even be the proof of work for a usable crypto-currency. "Prove that you lowered the error rate compared to SOTA and earn 50 ponzicoins!"

qudat · on Feb 5, 2023

The labelled data seems more of a blocker than anything else. As far as I'm aware, the actually NN running the models are relatively simple, it's the human labor involved in gathering, cleaning, and labeling data for training that is the most resource intensive.

naasking · on Feb 5, 2023

The data is valuable yes, but training a model still requires millions of dollars worth of compute. That's a perfect cost to distribute among volunteers if it could be done.

ikekkdcjkfke · on Feb 4, 2023

Yeah man, and youvget access to the model as payment for donati g cycles

realce · on Feb 4, 2023

Hyperion

ec109685 · on Feb 4, 2023

Another idea is to dedicate cpu cycles to something else that is easier to distribute, and then use the proceeds for massive amounts of gpu for academic use.

Crypto is an example.

jxf · on Feb 4, 2023

This creates indirection costs and counterparty risks that don't appear in the original solution.

ec109685 · on Feb 4, 2023

There is also indirection cost by taking something that is optimized to run on GPU’s within the data center and distributing that to individual PCs.

slim · on Feb 4, 2023

this would be very wasteful

ec109685 · on Feb 4, 2023

So is trying to distribute training across nodes compared to what can be done inside a data center.