I've learned that character consistency with generative AI is now becoming a real thing
One of the biggest hurdles I was running into when building Droomverhalen was image consistency.
Generating a story is one thing and quite straight-forward. The biggest challenge was generating visuals relevant to the part of the story where it was placed (in the story) and also making it consistent with all the other visuals. Let alone the fact that there was a hallucination risk factor involved which made the whole process even more tedious.
This made the use of characters in visuals more or less impossible since you wouldn't able to get consistent results in an automated process. E.g. a story about 'family of three' would produce different results in each visual if you were to generate multiple contextual fitting visuals for a story.
Guess OpenAI managed to fix this issue with their latest model update and it's not going to be problem anymore?
I've been on the fence on paying for ChatGPT but decided to give it a shot for a month to see if it's worthwhile and took this opportunity to test this model for image consistency. I'm actually mindblown.
Prompt #1:
"Generate an image of a world war 1 soldier sitting in a trench reading anime on his iPad, give it a New Yorker illustrative style"

Prompt #2:
"Can you use the same image, visual style and character, but this time he gets busted by his officer"

Prompt #3:
"Can you use the same image, visual style and character, but this time the soldier shows the display of the iPad to the officer and the officer decides to sit next to him"

So in this particular image, the headwear of the officer is different than the second one. Minor detail, I'm still very impressed by the result. However, I asked ChatGPT to fix it but I received the same result:
Prompt #4:
"In this image, the officer has a wrong helmet (see second image), can you fix that?"

I mean, the fact the visual looks identical is pretty impressive to begin with, but the helmet is still wrong. My prompt was obviously too simple to begin with but I was hoping it would understand 'second image' since its within the context boundary.
So Instead I created a new prompt but attached the visual from the second prompt as a reference image.
Prompt #5:
[ATTACHED REFERENCE IMAGE] "Can you use the same image, visual style and character from the reference image, but this time the soldier shows the display of the iPad to the officer and the officer decides to sit next to him"

Visual style wise, you can see a bit of a difference from the other generated visuals. There are also some minor details missing (e.g. see clothing), but regardless, the results are great and there's definitely consistency between the generated visuals. The fact I was able to create this using the simple prompts I used is pretty amazing, imagine using more complex and detailed prompts.