How Sber Built ruDALL-E

minimaxir · on Dec 30, 2021

A very curious effect of ruDALL-E is that the finetuning works on small datasets with unexpectedly good results. The Sneakers example they note in this article is on about ~10k images.

As an experiment, I finetuned ruDALL-E on about 1000 images of Pokemon and generated from that, which yielded incredible results that went viral: https://twitter.com/minimaxir/status/1470913487085785089

I then tried finetuning ruDALL-E on 1 Pokemon, yet still good/horrifying results: https://twitter.com/minimaxir/status/1474913997807755268

Unfortunately it's still a convoluted process to finetune ruDALL-E; I hope they end up releasing a smaller model to make it possible to do on a smaller/free GPU. (if they do, I'll release a streamined Colab notebook + blog post on how to do it)

smusamashah · on Dec 31, 2021

Did you try feeding it with something else f.e. human faces only? It looks like it should already be able to generate faces as good as Nvidia StyleGAN.

Pikachu variations from a single image are still way too good for anything we have seen before. It apparently understood the face part well and applied variations, for the rest it did as much as it understood.

What else did you try after Pokémon and can you share those results?

etaioinshrdlu · on Dec 30, 2021

How much GPU RAM and time does it currently take to fine-tune the current model?

minimaxir · on Dec 30, 2021

Essentially all of a 16GB GPU VRAM, even with some layers frozen.

The more diverse the input images, the longer/more epochs the finetuning process should take in order to get stable results. The first Pokemon model was trained for about 4.5 hours; the one-shot model was about 2 minutes.

lostmsu · on Dec 30, 2021

Curious. How does freezing layers save you memory? Does it save compute time much?

I understand the frozen layers do not need gradients to be stored?

minimaxir · on Dec 30, 2021

Essentially yes. That technique is not exclusive to ruDALL-E; large models often freeze early layers and train lower layers only due to VRAM constraints.

lostmsu · on Dec 30, 2021

Oh, right, only freezing early layers makes sense. I was thinking you froze inner ones, but gradients would need to be computed and kept for them to backpropagate to the unfrozen early ones.

f311a · on Dec 30, 2021

Sber also has an open-source version of GPT-3 for Russian.

Sber is a state-owned Russian bank which is a pretty funny detail given that a lot of banks can't even built a decent mobile app.

cpursley · on Dec 30, 2021

The Sberbank mobile app in Russia is an order of magnitude better than anything I've used in the US. The other large Russian tech and service company apps are very very good (Ozone, anything Yandex puts out). Even the federal services apps are well executed - you can pay your property taxes and other services by scanning a QR code. Some great tech coming out of that country (accusations of hacking, aside).

baybal2 · on Dec 30, 2021

Sberbank is a joke of a bank, mostly serving older generation who kept using it on inertia from the time it was the only bank you got in the country.

Generally, it's bureaucratic, kafkaesque, and ill, as the country which once made it.

kgeist · on Dec 30, 2021

Have you used Sberbank lately? I have a different experience and I'm not from the "older generation". Its mobile app is pretty decent, this year I got a mortgage loan and it went pretty smooth, I didn't notice anything bureaucratic or kafkaesque about it? I'm its client for 4 years now and I'm struggling to remember negative experience with it. They've been having an overhaul lately, maybe it was far worse before. Yeah the cool kids prefer Tinkoff nowadays but it's not true that only old people use Sberbank.

gdy · on Dec 31, 2021

That's factually incorrect. You are seriously out of date.

Sberbank is by far the most popular retail bank in Russia (over 80% of Russians use its services) and the distribution of the client age is almost uniform. [0] The performance of its stock since 2015 is the best proof of it. [1]

[0] https://www2.deloitte.com/content/dam/Deloitte/ru/Documents/...

[1] https://finance.yahoo.com/quote/SBER.ME/?guccounter=1

trhway · on Dec 30, 2021

Sberbank CEO (he is a Russian German and has some typical traits making him noticeably different from typical Russian bureaucrat) and his posse is the leading part of the technocratic wing of the political elite in Russia. Their people also lead another important bank - VTB (international payments/etc for large corps), and Sber has strong hold on various national networks, like naturally anything money related, like municipal services and traffic ticket payments for example, as well as on generic network infra and datacenters. If Putin is gone tomorrow there is strong chance that those technocrats will take the power (i haven't noticed any significant animosity between them and FSB which would otherwise be a complication). Particularly important aspect showing their power is that there have been no corruption scandals associated with them, at least not that i can remember in the last decade at least. They tread very carefully, not making any open political claims while presenting themselves basically like apolitical tech-infrastructure/platform for the efficient government and society and doubling down on the source of their shadow power - network/infra/technocracy. Thus they can't allow themselves to suck too much technically, and thus they naturally hire decent technical people (i have some first handshakes among the upper management in technology there)

cpursley · on Dec 30, 2021

Even so, they do a pretty good job for a state-backed bank. Better than anything state run I've experienced in the US.

But I agree in principal with you - and from what I hear, Tinkoff is one of the better choices and the founder is well respected.

another_kel · on Dec 30, 2021

It's a shitty bank by russian standards indeed, but this has nothing to do with the fact that

>The Sberbank mobile app in Russia is an order of magnitude better than anything I've used in the US.

zkid18 · on Dec 30, 2021

Well, so do 95% retail banks across the globe.

nsenifty · on Dec 31, 2021

Interesting name. Rudaali in Hindi means a professional mourner, who used to be someone you hire for a mourning ceremony when there's a death in the family in parts of India. There's a famous Bollywood movie[1] by that name.

[1] https://en.wikipedia.org/wiki/Rudaali

criticaltinker · on Dec 30, 2021

> The model is considered the greatest computational project in Russia for now, totaling 24,256 GPU days to train the models.

> We don’t know for sure why OpenAI hasn’t shown its work in a more reproducible way. But this step is definitely done to stimulate the further openness and progress of such models.

Super interesting and great commentary, thanks for sharing!

smusamashah · on Dec 31, 2021

People are generating images by fine tuning ruDall-e on different kind of images (including their own artwork) under hashtag LookingGlassAI https://mobile.twitter.com/hashtag/LookingGlassAI and they are mostly incredible.

junon · on Dec 31, 2021

I follow a handful of people playing with this and the results are incredible.

amelius · on Dec 30, 2021

These models all seem to have the flaw that faces don't come out symmetrically. Especially eyes look like they are in the wrong location.