In 2018, Christie's auction house made history by selling the first-ever painting created by an AI. The piece, called "Portrait of Edmond de Belamy," not only fetched a cool $432,500, but also showed us that machines can be quite the artists these days! Who knows, maybe one day we'll be hanging AI-generated masterpieces in our living rooms.
AI art is on the rise and we've seen some pretty impressive advancements in the field in recent years. One exciting area is text-to-image models, and some big players are leading the charge. OpenAI's DALL-E made headlines in January 2021 for releasing a series of stunning images, while Google Brain's Imagen and Microsoft's NUWA-Infinity have also made strides in this area.
Interestingly, Google has opted not to release their models due to concerns about biases in the generated data. But fear not, art lovers, DALL-E 2 has been made available through the OpenAI website and has already been adopted by various online service providers. Who knows what kind of AI-generated art we'll be admiring in the future!
On August 22nd 2022, StabilityAI took a major step forward by open-sourcing their Stable Diffusion model. This means that anyone can access the code to train and run the model, as well as an instance of the model that has seen a massive dataset of over 5 billion image/text pairs (LAION-5B). And get this - it fits on consumer hardware, so now everyone can run it locally!
This shift towards open-source AI art tools is a game-changer, allowing independent creators to explore the possibilities of AI-generated art without relying on larger software corporations. Who knows where this wave of AI art will take us, but 2022 may very well be remembered as the year when the AI Art Renaissance began. This movement will force us to rethink our relationship with both art and machine intelligence, and it could even lead to a cultural shift similar to the Renaissance. As artificial intelligence continues to permeate every aspect of our society, our culture will undoubtedly be reshaped in profound ways.
AI art is increasingly being created through the use of diffusion models, which have been rapidly advancing in recent years. But how do these models actually work, and how can they be used to generate visual art?
How does a diffusion model work?
Diffusion models are generative AI models used to create data similar to the data on which they are trained. These models work by destroying training data, such as images, through the successive addition of Gaussian noise. Then, the model learns to recover the data by reversing this noising process.
The diffusion process involves repeatedly adding small amounts of noise to an image in small steps. The final output has no visual resemblance to the original image. While there are several mathematical approaches to this process, the focus here is on understanding how AI "makes" art.
The goal of diffusion models is to learn a function that reverses the noise-adding process. This means that the model can turn noise into images. The more steps the model takes, the clearer the image becomes.
Diffusion models that create images or texts out of noise have been around for some time. This Person Does Not Exist is an example of such a model. The model was trained on pictures of faces and generates random pictures of nonexistent people when the page is reloaded.
An unguided diffusion model creates images or texts without any influence from the user. To create specific images, guided diffusion models are used. All text2image models, including DALL-E 2, Imagen, Midjourney, and Stable Diffusion, are guided diffusion models.
Adding context
Adding context can significantly improve the output of diffusion models, such as Stable Diffusion and other text-to-image models. These models are trained not only on removing noise from images but also on a contextual input, which can be provided in the form of an image or text, known as a “prompt.” By providing a prompt, the diffusion model knows where to retrieve information from within the so-called latent space.
It's important to note that these models work not on images or texts directly, but on embeddings, which are numerical representations of compressed data. These embeddings get created by sending data through encoders, and are stored in the latent space. The latent space can be thought of as a vast multidimensional room, in which bits of information are stored throughout.
To provide context to the model, we can input a prompt which will guide the diffusion process towards a specific region in the latent space. This prompt is not a command, but rather a search query or an arrow pointing to a certain region in an n-dimensional space. With each step, the model denoises random noise into a coherent image that lies close to the prompt's region in the latent space. In this way, the model is guided by the contextual input, rather than simply guessing what is in the image.
The fun stuff
Now that you have a general understanding of how guided diffusion models can be used to create works of art, it's important to know how to utilize them. To generate an image with a guided diffusion model, you need to input a noisy pattern of pixels called a seed, as well as a string of words describing the image you wish to create, which is called a prompt.
The seed is a number that controls the randomness during the image generation process. By using the same seed and settings, you can recreate the exact same output multiple times. However, changing the seed will result in a completely different output.
In essence, the image you generate is primarily determined by the seed and prompt. The seed sets the starting point for the creation process, while the prompt determines the direction. The final image is the result after a certain number of iterations, referred to as runtime.
To illustrate this, consider the three images created with Stable Diffusion version 1.5:
Prompt: “happy native american chieftain portrait, dark fantasy, red, by zdzisław beksiński and wayne barlowe”
Seed: 581706
Prompt: “angry native american chieftain portrait, dark fantasy, red, by zdzisław beksiński and wayne barlowe”
Seed: 581706
Prompt: “angry native american chieftain portrait, dark fantasy, red, by zdzisław beksiński and wayne barlowe”
Seed: 101131
In the second image, the prompt was changed slightly by replacing "happy" with "angry," while holding the seed constant. In the third image, a different seed was used with the same prompt. Both changes produced different but similar images. Changing the prompt while keeping the seed constant maintains the size, place, and shape of the elements, but alters their contents. Alternatively, changing the seed while keeping the prompt constant maintains the contents of the image but alters the shapes and placement of the elements. Therefore, both the seed and prompt must be considered to achieve a desired image or effect.
The creative process of visual art has evolved over time. Initially, artists had to physically apply paint to a flat surface, while later, digital artists had to master software and pixel manipulation. Nowadays, artists must also master the craft of manipulating seeds and prompts to feed various AI models. While each of these forms of art require different skills, the general tendency is to free human creativity from the time-consuming aspects of the creation process.
Examples
Below you’ll see a small set of images I created using Stable Diffusion version 1.5 (KML_S sampling, 50 iterations).
Prompt: “a beautiful surreal illustration of rouge highly detailed, liquid oilpaint, doug chiang, gustave dore, leonardo da vinci, industry, lucid and intricate, rectilinear, digital art, octane, redshift, vray, 8 k, 6 4 megapixels, hypermaximalist, well rendered”
Seed: 482093
Prompt: “a horrifying sphere of meat and eyeballs, by zdzisław beksinski and greg rutkowski, surreal, horror, oil on canvas, dark, nightmare fuel, highly detailed, hd”
Seed: 472392
Prompt: “a huge gladiator by beksinski and tristan eaton, dark neon trimmed beautiful dystopian digital art”
Seed: 527328
Prompt: “a portrait of an old weather beaten woman with a grim look, a worn but tough old washing women, fisherboat; tom bagshaw, gustav dore, dark smoke in background, photorealistic, lifelike, unreal engine, sharp, sharpness, detailed, 8K, film noir”
Seed: 928879
Prompt: “a japanese sci fi horror girl with big eyes, character portrait, portrait, close up, concept art, intricate details, highly detailed, vintage sci-fi poster, in the style of chris foss, rodger dean, moebius, michael whelan, and gustave dore”
Seed: 849500
Prompt: “a beautiful surreal illustration of an insatiable vietnamese woman, highly detailed, liquid oilpaint, doug chiang, gustave dore, leonardo da vinci, industry, lucid and intricate, rectilinear, digital art, octane, redshift, vray, 8k, 64 megapixels, hypermaximalist”
Seed: 36778