5 things Dall-E or Midjourney don’t (yet) do right

Letters, words and texts

A widely used resource in the world of video games is to fill poster scenarios with intelligible languages. Fully synthetic alphabets that look real and form words that don’t exist. Programmers use this trick to save time because they don’t have to locate assets for each region in which games are released. Well, if you ask an AI to generate a poster with text, the result will be a label or a paragraph with completely made-up characters, very similar to what you see in video games.

Sometimes the AI ​​will try to be able to create characters that we know, but will not be able to order the cards, or even repeat some of them.

Eyes

stable scattering eyes

Generally, eyes are somewhat resistant to AI. Software such as Midjourney or Stable Diffusion can generate nearly perfect human or animal faces. However, you need to make several attempts until you find eyes that look consistent.

It is completely normal to have red eyes, totally black eyeballs or totally asymmetrical images. In what is acceptable, there are also results in which the artificial intelligence never stops separating the white from the iris and the pupil. Luckily, there are other AIs like GFPGAN that are able to fix images that have weird faces or poorly resolved eyes.

Sector

streaming main stables reddit.jpg

How many fingers does a hand have? Neither AI is completely clear. Artificial intelligences have a hard time understanding that the five fingers of a human hand are different. The same as you get an image of a hand that only has two fingers. Or, on the contrary: a whole catalog of indexes and rings. This problem is quite present in Dall-E, Stable Diffusion and Midjourney.

Lateral thinking and context

mid-term problem context

At this point, the three main AIs have their advantages and disadvantages, but we are back to a situation where there are common problems. If you push the AI ​​out of its squares, you will get bad results. Do you want a picture of a person with three eyes? Or that of a nine-tailed fox? Well, you can have things complicated, because the AI, sometimes, won’t understand what you’re asking it. They are quite square and have been trained in such a way that they don’t want you to break their plans.

In this same line, we have the analysis of the context. Dall-E 2 takes gold in this aspect, but that’s not to say you have to explain very carefully what you want the AI ​​to paint for you. For the AI, an egg is one thing, a fried egg is another. You must describe the image as if you were explaining it to an alien. Otherwise, you will have a result that will make you laugh out loud, just like it happened to me with the image I gave as an example. We will talk a bit more about context in the last block of this article, as it is closely related to the last point.

Censorship enforcement

words censored mid-term

When GANs started showing the world their full potential, we quickly knew that censorship was going to be our daily bread. This subject would be discussed at length in another article, but the problem here is not censorship, but the way it is applied.

We fully understand that an AI prevents you from generating an image that is pornographic or that invites self-harm. But it doesn’t make sense for something called “artificial intelligence” to work with a list of forbidden words.

warning censorship ia.jpg

In English (which is how you should interact with the AI), the same word can easily have ten meanings. If only one of the meanings is listed, you will not be able to use it. And we’re not talking about crazy terms, but normal, common words that we use on a daily basis. I tried to generate a leaf texture with many branches in Midjourney. I got a warning because you can’t paint “veins” on this AI. I tried making a giant Maine Coon cat that merged with the clouds – I have an image with the same prompt done in SD and they didn’t bother me at all. The AI ​​wouldn’t let me – after searching the Collins, I discovered that the term “Coon” could be used with racist connotations. I wanted to generate a painting of a Renaissance woman cutting onions, but I couldn’t; the verb ‘to cut’ is also censored.

Censorship is the weak point of Dall-E 2 and Midjourney. In Stable Diffusion, censorship can be bypassed using software on your own computer. It was obvious that these systems were going to be censored, but the program itself should have tools to determine what is malicious and what is not. Ok, don’t let me generate a photo of Lady Gaga, but don’t stop me from generating a dog with Lady Gaga sunglasses. AIs still have a long way to go at this point because the censorship they are currently subjected to makes no sense.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *