A very curious effect of ruDALL-E is that the finetuning works on small datasets with unexpectedly good results. The Sneakers example they note in this article is on about ~10k images.
Unfortunately it's still a convoluted process to finetune ruDALL-E; I hope they end up releasing a smaller model to make it possible to do on a smaller/free GPU. (if they do, I'll release a streamined Colab notebook + blog post on how to do it)
Did you try feeding it with something else f.e. human faces only? It looks like it should already be able to generate faces as good as Nvidia StyleGAN.
Pikachu variations from a single image are still way too good for anything we have seen before. It apparently understood the face part well and applied variations, for the rest it did as much as it understood.
What else did you try after Pokémon and can you share those results?
Essentially all of a 16GB GPU VRAM, even with some layers frozen.
The more diverse the input images, the longer/more epochs the finetuning process should take in order to get stable results. The first Pokemon model was trained for about 4.5 hours; the one-shot model was about 2 minutes.
Essentially yes. That technique is not exclusive to ruDALL-E; large models often freeze early layers and train lower layers only due to VRAM constraints.
Oh, right, only freezing early layers makes sense. I was thinking you froze inner ones, but gradients would need to be computed and kept for them to backpropagate to the unfrozen early ones.
The Sberbank mobile app in Russia is an order of magnitude better than anything I've used in the US. The other large Russian tech and service company apps are very very good (Ozone, anything Yandex puts out). Even the federal services apps are well executed - you can pay your property taxes and other services by scanning a QR code. Some great tech coming out of that country (accusations of hacking, aside).
Have you used Sberbank lately? I have a different experience and I'm not from the "older generation". Its mobile app is pretty decent, this year I got a mortgage loan and it went pretty smooth, I didn't notice anything bureaucratic or kafkaesque about it? I'm its client for 4 years now and I'm struggling to remember negative experience with it. They've been having an overhaul lately, maybe it was far worse before. Yeah the cool kids prefer Tinkoff nowadays but it's not true that only old people use Sberbank.
That's factually incorrect. You are seriously out of date.
Sberbank is by far the most popular retail bank in Russia (over 80% of Russians use its services) and the distribution of the client age is almost uniform. [0]
The performance of its stock since 2015 is the best proof of it. [1]
Sberbank CEO (he is a Russian German and has some typical traits making him noticeably different from typical Russian bureaucrat) and his posse is the leading part of the technocratic wing of the political elite in Russia. Their people also lead another important bank - VTB (international payments/etc for large corps), and Sber has strong hold on various national networks, like naturally anything money related, like municipal services and traffic ticket payments for example, as well as on generic network infra and datacenters. If Putin is gone tomorrow there is strong chance that those technocrats will take the power (i haven't noticed any significant animosity between them and FSB which would otherwise be a complication). Particularly important aspect showing their power is that there have been no corruption scandals associated with them, at least not that i can remember in the last decade at least. They tread very carefully, not making any open political claims while presenting themselves basically like apolitical tech-infrastructure/platform for the efficient government and society and doubling down on the source of their shadow power - network/infra/technocracy. Thus they can't allow themselves to suck too much technically, and thus they naturally hire decent technical people (i have some first handshakes among the upper management in technology there)
Interesting name. Rudaali in Hindi means a professional mourner, who used to be someone you hire for a mourning ceremony when there's a death in the family in parts of India. There's a famous Bollywood movie[1] by that name.
> The model is considered the greatest computational project in Russia for now, totaling 24,256 GPU days to train the models.
> We don’t know for sure why OpenAI hasn’t shown its work in a more reproducible way. But this step is definitely done to stimulate the further openness and progress of such models.
Super interesting and great commentary, thanks for sharing!
People are generating images by fine tuning ruDall-e on different kind of images (including their own artwork) under hashtag LookingGlassAI https://mobile.twitter.com/hashtag/LookingGlassAI and they are mostly incredible.
As an experiment, I finetuned ruDALL-E on about 1000 images of Pokemon and generated from that, which yielded incredible results that went viral: https://twitter.com/minimaxir/status/1470913487085785089
I then tried finetuning ruDALL-E on 1 Pokemon, yet still good/horrifying results: https://twitter.com/minimaxir/status/1474913997807755268
Unfortunately it's still a convoluted process to finetune ruDALL-E; I hope they end up releasing a smaller model to make it possible to do on a smaller/free GPU. (if they do, I'll release a streamined Colab notebook + blog post on how to do it)