I came to the same conclusion as the authors after generating 1000s of thumbnails[1]. OpenAI alters faces too much and smoothes out details by default. NanoBanana is the best but lacks high fidelity option. SeeDream is catching up to NanoBanana and sometimes is better. It's been too long since OpenAI's gpt-img-1 came out, hope they launch a better model soon.
It was interesting to see how often the OpenAI model changed the face of the child. Often the other two models wouldn't, but OpenAI would alter the structure of their head (making it rounder), eyes (making them rounder), or altering the position and facing of the children in the background.
It's like OpenAI is reducing to some sort of median face a little on all of these, whereas the other two models seemed to reproduce the face.
For some things, exactly reproducing the face is a problem -- for example in making them a glass etching, Gemini seemed unwilling to give up the specific details of the child's face, even though that would make sense in that context.
I've noticed that OpenAI modifies faces on a regular basis. I was using it to try and create examples of different haircuts and the face would randomly turn into a different face -- similar but noticeably changed. Even when I prompted to not modify the face, it would do it regardless. Perhaps part of their "safety" for modifying pictures of people?
It looks to me like OpenAI's image pipeline takes an image as input, derives the semantic details, and then essentially regenerates an entirely new image based on the "description" obtained from the input image.
Even Sam Altman's "Ghiblified" twitter avatar looks nothing like him (at least to me).
Other models seem much more able to operate directly on the input image.
They don't want you creating images that mimic either works of other artists to an extent that's likely to confuse viewers (or courts), or that mimic realistic photographs to an extent that allows people to generate low-effort fake news. So they impose an intentionally-crappy orange-cyan palette on everything the model generates.
Peak quality in terms of realistic color rendering was probably the initial release of DALL-E 3. Once they saw what was going to happen, they fixed that bug fast.
It's interesting to me that the models often have their "quirks". GPT has the orange tint, but it also is much worse at being consistent with details. Gemini has a problem where it often returns the image unchanged or almost unchanged, to the point where I gave up on using it for editing anything. Not sure if Seedream has a similar defining "feature".
They noted the Gemini issue too:
> Especially with photos of people, Gemini seems to refuse to apply any edits at all
I've definitely noticed Gemini's tendency to return the image basically unchanged, but not noticed it being worse or better for images of people. When I tested by having it change aspects of a photo of me, I found it was far more likely to cooperate when I'd specify, for instance, "change the hair from long to short" rather than "Make the hair short" (the latter routinely failed completely).
It also helped to specify which other parts should not be changed, otherwise it was rather unpredictable about whether it would randomly change other aspects.
I have had that problem with nano banana but when it works I find it so much better than the others for editing an image. Since it’s free I usually try it first, and I would say approximately 10% of the time find myself having to use something else.
I’m editing mostly pics of food and beverages though, it wouldn’t surprise me if it is situationally better or worse.
Nano Banana in general cannot do style transfer effectively unless the source image/subject is a similar style as the target style, which is an interesting and unexpected model quirk. Even the documentation examples unintentionally demonstrates this.
Seedream will always alter the global color balance with edits.
Found OpenAI too often heavy handed. On balance, I'd probably pick Gemini narrowly over Seedream and just learn that sometimes Gemini needs a more specific prompt.
Hey. We'd love to fund thr generations for free for you to try Riverflow 2 out if you're up for it. Riverflow 1 ranks above them all and 2 is now in preview this week.
Honestly, I think it was misfounded. As an photographer and artist myself, I find the OpenAI results head-and-shoulders above the others. It's not perfect, and in a few cases one or the other alternative did better, but if I had to pick one, it would be OpenAI for sure. The gap between their aesthetics and mine makes me question ever using their other products (which is purely academic since I'm not an Apple person).
No, but this is the beginning of a new generation of tools to accelerate productivity. What surprises me is that the AI companies are not market savvy enough to build those tools yet. Adobe seems to have gotten the memo though.
In testing some local image gen software, it takes about 10 seconds to generate a high quality image on my relatively old computer. I have no idea the latency on a current high end computer, but I expect it's probably near instantaneous.
Right now though the software for local generation is horrible. It's a mish-mash of open source stuff with varying compatibility loaded with casually excessive use of vernacular and acronyms. To say nothing of the awkwardness of it mostly being done in python scripts.
But once it gets inevitably cleaned up, I expect people in the future are going to take being able to generate unlimited, near instantaneous images, locally, for free, for granted.
Did you test some local image gen software in that you installed the Python code on the github page for a local model, which is clearly a LOT for a normal user... or did you look at ComfyUI, which is how most people are running local video and image models? There are "just install this" versions, which eases the path for users (but it's still, admittedly, chaos beneath the surface).
I've been waiting for solutions that integrate into the artistic process instead of replacing it. Right now a lot of the focus is on generating a complete image, but if I was in photoshop (or another editor) and could use AI tooling to create layers and other modifications that fit into a workflow, that would help with consistency and productivity.
I haven't seen the latest from adobe over the last three months, but last I saw the firefly engine was still focused on "magically" creating complete elements.
So far Adobe AI tools are pretty useless, according to many professional illustrators. With Firefly you can use other (non-Adobe) image generators. The output is usually barely usable at this point in time.
"AI won't replace you, but someone who knows how to use AI will replace you" appears to be too short a phrase.
There is no better recent example than AI comedy made by a professional comedian [0]
Of course, this makes sense once you think about it for a second. Even AGI, without a BCI, could not read your mind to understand what you want. Of course, the people who have been communicating these ideas with other humans up to this point, are the best at doing that.
Horse and buggy isn't quite the analogy, I think it is more like the arrival of junk food, packed with sugar, salt and saturated fats. You will still be able to find a cafe or restaurant where a full kitchen team cooks from scratch but everything else is fast food garbage.
Maybe just the advent of the microwave oven is the analogy.
Either way, I am out. I have spent many days fiddling with AI image generation but, looking back on what I thought was 'wow' at the time, I now think everything AI art is practically useless. I only managed one image I was happy with and most of that was GIMP, not AI.
This study has confirmed my suspicions, hence I am out.
Going back to the fast food analogy, for the one restaurant that actually cooks actual food from actual ingredients, if everyone else is selling junk food then the competition has been decimated. However, the customers have been decimated too. This isn't too bad as those customers clearly never appreciated proper food in the first place, so why waste effort on them? It is a pearls and swine type of thing.
For that matter, the car didn’t make horse riding completely obsolete either.
For artists, the question is whether generative AI is like photography or the car. My guess, at this stage, is photography.
For what it’s worth I think the proponents of generative AI are grossly overestimating the utility and economic value of meh-OK images that approximate the thing you’ve asked for.
I've seen cover art on a lot of magazines already replaced with AI images. I suspect, for the time being, that a lot of the low hanging art fruit will be destroyed by image generation. The knock on effect is less art jobs, but more artists. In the vein of your analogy, it removes the gas station attendants that fill your tank.
When there's a need for something with specific traits and composition at high quality, I've yet to see a model that can deliver that, especially in a reasonable amount of time. It's still way more reliable to just hand a description to a skilled illustrator along w/references and then go back and forth a bit to get a quality result. The illustrator is more expensive, but my time isn't free, so it works out.
Artists no, illustrators and graphic designers yes. They'll mostly become redundant within the next 50 years. With these kind of technologies, people tend to overestimate the short-term effects and severely underestimate the long-term effects.
Is it me or ChatGPT change subtle or sometimes more prominent things? Like ball holding position of the hand, face features like for head, background trees and alike?
I like that they call openai’s image generator ground breaking and then explain that it’s prone to taking eight times longer to generate an image before showing it add a third cat over and over and over again
Using gen. ai for filters is stupid, a filter guarantees the same object but filtered, a gen. AI version of this guarantees nothing and an expensive AI bill.
It’s like using gen. ai to do math instead of extracting the numbers from a story and just doing the math with +, -, / and *
This seems to imply that the capabilities being tested are like the descriptive words used in the prompts, but, as a category using random words would be just as valid for exercising the extents of the underlying math. And when I think of that reality I wonder why a list of tests like this should be interesting and to what ends. The repeated nature of the iteration implies that some control or better quality is being sought but the mechanism of exploration is just trial and error and not informative of what would be repeatable success for anyone else in any other circumstance given these discoveries.
I came to the same conclusion as the authors after generating 1000s of thumbnails[1]. OpenAI alters faces too much and smoothes out details by default. NanoBanana is the best but lacks high fidelity option. SeeDream is catching up to NanoBanana and sometimes is better. It's been too long since OpenAI's gpt-img-1 came out, hope they launch a better model soon.
[1] = https://thumbnail.ai/
It was interesting to see how often the OpenAI model changed the face of the child. Often the other two models wouldn't, but OpenAI would alter the structure of their head (making it rounder), eyes (making them rounder), or altering the position and facing of the children in the background.
It's like OpenAI is reducing to some sort of median face a little on all of these, whereas the other two models seemed to reproduce the face.
For some things, exactly reproducing the face is a problem -- for example in making them a glass etching, Gemini seemed unwilling to give up the specific details of the child's face, even though that would make sense in that context.
https://www.reddit.com/r/ChatGPT/comments/1n8dung/chatgpt_pr...
I've noticed that OpenAI modifies faces on a regular basis. I was using it to try and create examples of different haircuts and the face would randomly turn into a different face -- similar but noticeably changed. Even when I prompted to not modify the face, it would do it regardless. Perhaps part of their "safety" for modifying pictures of people?
It looks to me like OpenAI's image pipeline takes an image as input, derives the semantic details, and then essentially regenerates an entirely new image based on the "description" obtained from the input image.
Even Sam Altman's "Ghiblified" twitter avatar looks nothing like him (at least to me).
Other models seem much more able to operate directly on the input image.
It's also changing scene features too. Like removed background trees.
It's crazy that the 'piss filter' of openAI image generation hasn't been fixed yet. I wonder if it's on purpose for some reason ?
They don't want you creating images that mimic either works of other artists to an extent that's likely to confuse viewers (or courts), or that mimic realistic photographs to an extent that allows people to generate low-effort fake news. So they impose an intentionally-crappy orange-cyan palette on everything the model generates.
Peak quality in terms of realistic color rendering was probably the initial release of DALL-E 3. Once they saw what was going to happen, they fixed that bug fast.
It's interesting to me that the models often have their "quirks". GPT has the orange tint, but it also is much worse at being consistent with details. Gemini has a problem where it often returns the image unchanged or almost unchanged, to the point where I gave up on using it for editing anything. Not sure if Seedream has a similar defining "feature".
They noted the Gemini issue too:
> Especially with photos of people, Gemini seems to refuse to apply any edits at all
I've definitely noticed Gemini's tendency to return the image basically unchanged, but not noticed it being worse or better for images of people. When I tested by having it change aspects of a photo of me, I found it was far more likely to cooperate when I'd specify, for instance, "change the hair from long to short" rather than "Make the hair short" (the latter routinely failed completely).
It also helped to specify which other parts should not be changed, otherwise it was rather unpredictable about whether it would randomly change other aspects.
Not only does it return the image unchanged, but if you are using the Gemini interface, it confidently tells you it made the changes.
I have had that problem with nano banana but when it works I find it so much better than the others for editing an image. Since it’s free I usually try it first, and I would say approximately 10% of the time find myself having to use something else.
I’m editing mostly pics of food and beverages though, it wouldn’t surprise me if it is situationally better or worse.
Nano Banana in general cannot do style transfer effectively unless the source image/subject is a similar style as the target style, which is an interesting and unexpected model quirk. Even the documentation examples unintentionally demonstrates this.
Seedream will always alter the global color balance with edits.
Something like a style transfer works better in Whisk. Still quirky and hit and miss.
Found OpenAI too often heavy handed. On balance, I'd probably pick Gemini narrowly over Seedream and just learn that sometimes Gemini needs a more specific prompt.
Hey. We'd love to fund thr generations for free for you to try Riverflow 2 out if you're up for it. Riverflow 1 ranks above them all and 2 is now in preview this week.
You can always identify the OpenAI result because it's yellow.
And mid journey because it's cell shading:)
Also because it’s mid :)
> If you made it all the way down here you probably don’t need a summary
Love the optimism
I skipped to the end to see if they did any local models. spoilers: they didn't.
Honestly, I think it was misfounded. As an photographer and artist myself, I find the OpenAI results head-and-shoulders above the others. It's not perfect, and in a few cases one or the other alternative did better, but if I had to pick one, it would be OpenAI for sure. The gap between their aesthetics and mine makes me question ever using their other products (which is purely academic since I'm not an Apple person).
Interesting experiment, though I'm not certain quite how the models are usefully compared.
I wish they'd used a better image than the low contrast mountain, which rarely transformed into anything much.
Everyday I generate more than 600 image and also compare them, it takes me 5 hours
Seedream is the only one that outputs 4k. Last time I checked that is..
Are artists and illustrators going the way of the horse and buggy?
No, but this is the beginning of a new generation of tools to accelerate productivity. What surprises me is that the AI companies are not market savvy enough to build those tools yet. Adobe seems to have gotten the memo though.
In testing some local image gen software, it takes about 10 seconds to generate a high quality image on my relatively old computer. I have no idea the latency on a current high end computer, but I expect it's probably near instantaneous.
Right now though the software for local generation is horrible. It's a mish-mash of open source stuff with varying compatibility loaded with casually excessive use of vernacular and acronyms. To say nothing of the awkwardness of it mostly being done in python scripts.
But once it gets inevitably cleaned up, I expect people in the future are going to take being able to generate unlimited, near instantaneous images, locally, for free, for granted.
Did you test some local image gen software in that you installed the Python code on the github page for a local model, which is clearly a LOT for a normal user... or did you look at ComfyUI, which is how most people are running local video and image models? There are "just install this" versions, which eases the path for users (but it's still, admittedly, chaos beneath the surface).
I've been waiting for solutions that integrate into the artistic process instead of replacing it. Right now a lot of the focus is on generating a complete image, but if I was in photoshop (or another editor) and could use AI tooling to create layers and other modifications that fit into a workflow, that would help with consistency and productivity.
I haven't seen the latest from adobe over the last three months, but last I saw the firefly engine was still focused on "magically" creating complete elements.
> Adobe seems to have gotten the memo though.
So far Adobe AI tools are pretty useless, according to many professional illustrators. With Firefly you can use other (non-Adobe) image generators. The output is usually barely usable at this point in time.
"AI won't replace you, but someone who knows how to use AI will replace you" appears to be too short a phrase.
There is no better recent example than AI comedy made by a professional comedian [0]
Of course, this makes sense once you think about it for a second. Even AGI, without a BCI, could not read your mind to understand what you want. Of course, the people who have been communicating these ideas with other humans up to this point, are the best at doing that.
[0] old.reddit.com/r/ChatGPT/comments/1oqnwvt/ai_comedy_made_by_a_professional_comedian/
Horse and buggy isn't quite the analogy, I think it is more like the arrival of junk food, packed with sugar, salt and saturated fats. You will still be able to find a cafe or restaurant where a full kitchen team cooks from scratch but everything else is fast food garbage.
Maybe just the advent of the microwave oven is the analogy.
Either way, I am out. I have spent many days fiddling with AI image generation but, looking back on what I thought was 'wow' at the time, I now think everything AI art is practically useless. I only managed one image I was happy with and most of that was GIMP, not AI.
This study has confirmed my suspicions, hence I am out.
Going back to the fast food analogy, for the one restaurant that actually cooks actual food from actual ingredients, if everyone else is selling junk food then the competition has been decimated. However, the customers have been decimated too. This isn't too bad as those customers clearly never appreciated proper food in the first place, so why waste effort on them? It is a pearls and swine type of thing.
Yes and now. IKEA and co didn't replace custom made tables, just reduced the number of people needing a custom table.
Same will happen to music, artists etc. They won't vanish. But only a few per city will be left
For some applications.
Photography didn’t make artists obsolete.
For that matter, the car didn’t make horse riding completely obsolete either.
For artists, the question is whether generative AI is like photography or the car. My guess, at this stage, is photography.
For what it’s worth I think the proponents of generative AI are grossly overestimating the utility and economic value of meh-OK images that approximate the thing you’ve asked for.
I've seen cover art on a lot of magazines already replaced with AI images. I suspect, for the time being, that a lot of the low hanging art fruit will be destroyed by image generation. The knock on effect is less art jobs, but more artists. In the vein of your analogy, it removes the gas station attendants that fill your tank.
When there's a need for something with specific traits and composition at high quality, I've yet to see a model that can deliver that, especially in a reasonable amount of time. It's still way more reliable to just hand a description to a skilled illustrator along w/references and then go back and forth a bit to get a quality result. The illustrator is more expensive, but my time isn't free, so it works out.
I could see that changing in a few years.
Artists no, illustrators and graphic designers yes. They'll mostly become redundant within the next 50 years. With these kind of technologies, people tend to overestimate the short-term effects and severely underestimate the long-term effects.
Is it me or ChatGPT change subtle or sometimes more prominent things? Like ball holding position of the hand, face features like for head, background trees and alike?
It's not you. The model seems to refuse to accurately reproduce details. It changes things and leaves stuff out every time.
I like that they call openai’s image generator ground breaking and then explain that it’s prone to taking eight times longer to generate an image before showing it add a third cat over and over and over again
we build our sandbox just for this use case, fal.ai/sandbox. take the same image/prompt, and compare across tens of models.
Using gen. ai for filters is stupid, a filter guarantees the same object but filtered, a gen. AI version of this guarantees nothing and an expensive AI bill.
It’s like using gen. ai to do math instead of extracting the numbers from a story and just doing the math with +, -, / and *
This seems to imply that the capabilities being tested are like the descriptive words used in the prompts, but, as a category using random words would be just as valid for exercising the extents of the underlying math. And when I think of that reality I wonder why a list of tests like this should be interesting and to what ends. The repeated nature of the iteration implies that some control or better quality is being sought but the mechanism of exploration is just trial and error and not informative of what would be repeatable success for anyone else in any other circumstance given these discoveries.