Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To quote myself from a comment on sora:

Iterations are the missing link. With ChatGPT, you can iteratively improve text (e.g., "make it shorter," "mention xyz"). However, for pictures (and video), this functionality is not yet available. If you could prompt iteratively (e.g., "generate a red car in the sunset," "make it a muscle car," "place it on a hill," "show it from the side so the sun shines through the windshield"), the tools would become exponentially more useful.

I‘m looking forward to try this out and see if I was right. Unfortunately it’s not yet available for me.



You can do that with Gemini's image model, flash 2.0 (image generation) exp.[1] It's not perfect but it does mostly maintain likeness between generations.

[1]https://aistudio.google.com/prompts/new_chat


Whisk I think is possibly the best at it. No idea what it uses under the hood though.

https://labs.google/fx/tools/whisk


DALLE-3 with ChatGPT has been able to approximate this for a while now by internally locking the seed down as you make adjustments. It's not perfect by any means but can be more convenient than manual inpainting.

Ditto Instruct Pix2Pix https://www.timothybrooks.com/instruct-pix2pix


Reading other comments in other threads on HN has left me with the impression that iterative improvement within a single chat is not a good idea.

For example, https://news.ycombinator.com/item?id=43388114


You‘re right. I’m actually doing this quite often when coding. Starting with a few iterative promts to get a general outline of what I want and when that’s ok, copy the outline to a new chat and flesh out the details. But that’s still iterative work, I’m just throwing away the intermediate results that I think confuse the LLM sometimes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: