This week, X launched an AI image generator that lets paying subscribers to Elon Musk’s social platform create their own art. And, of course, some users immediately created images of Donald Trump piloting a plane toward the World Trade Center, of Mickey Mouse with an assault rifle, another of him enjoying a cigarette and beer on the beach, and so on. Some of the images people have created with the tool are deeply disturbing; others are just weird, or even kind of funny. They depict wildly different scenarios and characters. But somehow, they all look kind of the same, bearing unmistakable hallmarks of the AI art that has popped up in recent years thanks to products like Midjourney and DALL-E.
Two years into the generative AI boom, these programs’ creations seem more technically advanced — the Trump image looks better than, say, a similarly tacky image of SpongeBob SquarePants that Microsoft’s Bing Image Creator created last October — but they have an aesthetic all their own. The colors are bright and saturated, the people are beautiful, and the lighting is dramatic. Many of the images appear blurry or retouched, carefully smoothed like the icing on a wedding cake. Sometimes the images seem exaggerated. (And yes, there are common errors, such as extra fingers.) A user can get around this algorithmic monotony by using more specific prompts — for example, by typing: a picture of a dog riding a horse, in the style of Andy Warhol instead of just a picture of a dog riding a horse. But unless someone provides any information, these tools seem to default to a strange mix of cartoon and dreamscape.
Such programs are becoming more common. Google just announced a new AI image creation app called Pixel Studio that will let users create such artwork on their Pixel phone. The app will come pre-installed on all of the company’s latest devices. Apple will launch Image Playground later this year as part of its Apple Intelligence suite of AI tools. OpenAI now allows ChatGPT users to create two free images per day from DALL-E 3, its latest text-to-image model. (Previously, a user needed a paid premium plan to access the tool.) And so I wanted to understand: Why does so much AI art look the same?
The AI companies themselves aren’t particularly forthcoming. X responded to a request for comment on its new product and the images its users create with a standard email. Four companies behind popular image generators – OpenAI, Google, Stability AI and Midjourney – either didn’t respond or didn’t provide comment. A Microsoft spokesperson referred me to some of its guides and referred technical questions to OpenAI, since Microsoft uses a version of DALL-E in products like Bing Image Creator.
So I turned to outside experts, who gave me four possible explanations. The first focuses on the data the models are trained on. Text-to-image generators rely on extensive photo libraries paired with text descriptions, which they then use to create their own original images. The tools can inadvertently detect biases in their datasets—be it racial or gender bias, or something as simple as bright colors and good lighting. The internet is full of decades of filtered and artificially brightened photos, as well as a lot of ethereal illustrations. “We see a lot of fantasy art and stock photos that then go into the models themselves,” Zivvy Epstein, a scientist at the Stanford Institute for Human-Centered AI, told me. Plus, people only have a limited number of good datasets to build image models with, Phillip Isola, a professor at MIT’s Computer Science & Artificial Intelligence Laboratory, told me, meaning the models could overlap in what they’re trained on. (One popular app, CelebA, offers 200,000 labeled photos of celebrities. Another, LAION 5B, is an open-source option with 5.8 billion photo-text pairs.)
The second explanation has to do with the technology itself. Most modern models use a technique called diffusion: During training, the models are taught to add “noise” to existing images paired with text descriptions. “Think of it like TV noise,” Apolinário Passos, an art machine learning engineer at Hugging Face, a company that creates its own open-source models, told me. The model is then trained to remove that noise over and over for tens of thousands, if not millions, of images. The process repeats, and the model learns how to denoise an image. Eventually, it is able to take that noise and create an original image from it. All it needs is a text prompt.
Many companies use this technique. “These models are all pretty similar technically,” Isola said, noting that newer tools are based on the Transformer model. Perhaps this technology is geared toward a particular look. Take an example from the not-too-distant past: Five years ago, he explained, image generators tended to produce very blurry results. Researchers realized this was the result of a mathematical fluke; the models were essentially averaging all the images they were trained on. Averaging, it turns out, “looks like blurring.” It’s possible that something similarly technical is happening with this generation of image models today that causes them to produce the same kind of dramatic, highly stylized images—but researchers haven’t quite figured it out yet. In addition, “most models have an ‘aesthetic’ filter on both the input and output that rejects images that don’t meet certain aesthetic criteria,” Hany Farid, a professor at the UC Berkeley School of Information, told me via email. “This kind of filtering of the input and output is almost certainly a big reason why all images generated by AI have a certain ethereal quality.”
The third theory revolves around the people using these tools. Some of these sophisticated models incorporate human feedback; they learn over time. This could happen by picking up a signal, such as which photos are being downloaded. In others, Isola said, trainers manually rate which photos they like and don’t like. Perhaps that feedback finds its way into the model. If people download artwork, which often features really dramatic sunsets and absurdly beautiful seascapes, then perhaps the tools learn that this is what people want and then offer them more of it. Alexandru Costin, vice president of generative AI at Adobe, and Zeke Koch, vice president of product management at Adobe Firefly (the company’s AI image tool), told me in an email that user feedback can actually be a factor for some AI models—a process called “reinforcement learning from human feedback,” or RLHF. They also pointed to training data as well as ratings by human raters as influencing factors. “Art generated by AI models sometimes has a particular look (especially when created using simple prompts),” they said in a statement. “This is generally due to a combination of the images used to train the image output and the taste of those training or evaluating the images.”
The fourth theory has to do with the developers of these tools. Although Adobe representatives told me their company doesn’t do anything to encourage a particular aesthetic, it’s possible that other AI makers have identified human preferences and programmed them in—basically putting their thumb on the scale and telling models to create dreamier beach scenes and fairytale-like women. This could be intentional: if there was a market for such images, perhaps companies would focus on that. But it could also be unintentional; for example, companies do a lot of manual work in their models to combat bias, and various optimizations that favor one type of imagery over another could inadvertently result in a certain look.
More than one of these explanations could be true. In fact, that’s probably the case: Experts told me that the style we’re seeing is most likely caused by several factors at once. Ironically, all of these explanations suggest that the scary scenes we associate with AI-generated images are actually a reflection of our own human preferences taken to the extreme. It’s no surprise, then, that Facebook is full of AI-generated trash images that creators use to make money, that Etsy recently asked users to label products made with AI after a flood of junk listings, and that craft store Michaels was recently caught selling a canvas with an image partially generated by AI (the company pulled the product, calling it an “unacceptable error”).
AI imagery will soon become even more prevalent in our daily lives. For now, this art is visually distinct enough to be recognized as being made by a machine. But that could change. Technology could improve. Passos told me he sees “an attempt to deviate from” the current aesthetic “on newer models.” In fact, computer-generated art may one day lose its strange, cartoonish look and slip by unnoticed. Perhaps then we will miss the kitschy style that was once a surefire identifying feature.