In part one of this series, I was talking about the general idea of AI, a rough overview of how it works, and what I think it could mean for people to be suddenly beholden to the mess I think it’s about to unleash on the world. In short, it’s a sustained scream into the void where the hopefulness I used to hold for technology used to be.
In this part, I’ll be discussing AI image generators.
This is the third or fourth time I’ve started to write this piece, and it’s been difficult to find an angle that hasn’t already been covered at length and by far more serious people than myself. Nonetheless, I think it’s worth trying to crystallise exactly what it is about AI imagery that makes me so uneasy. What I want to do is describe to you, on a personal level, how it has already impacted myself and a collaborator of mine, then expand on that. This seems to be missing from a lot of the articles I’ve read, which tend to focus on AI in the abstract.
A few weeks ago, I sat down with the illustrator I’ve worked with on my podcast and on several other works. His name is Jon Stubbington, and he’s a full-time freelance illustrator. I wanted to get a firsthand view of what is really happening for people such as him, and to get his thoughts as to what the problems are with these tools.
I was fairly surprised by the results, and the discussion helped to solidify some of my own thinking. I want to talk broadly about the pressure he feels to reduce his art to nothing more than output. I want to talk about exactly who is providing the value to these datasets, and then we’re going to get into the weeds about bias and misinformation. Yay.
What I’m hoping to convince you of is that even if these generators are able to replicate, or even improve upon the ability of human artists, the implications of that should be enough to make you pause. Ultimately, we should consider whether AI is really something you want to engage with without any control structures in place.
Sounds fun? Well I do this a bit, so hit subscribe if you haven’t.
Pretty Pictures for You and Me.
Many proponents of AI imagery have a fundamental misunderstanding of what it is that makes art so worthwhile. When I sat down with Jon, I asked him to just broadly discuss what it was that he thought about AI, before we got into the weeds. Almost immediately, I got a response that made me think.
“Do I like the stuff it produces? That is probably a harder question to answer, because people produce some nice looking images, some of the time.”
Since we spoke, Stable Diffusion 5 has been released, and by Jove he’s right. Some of the pictures are very nice looking. They still often have that sheen, that uncanny plasticity that AI seems to have but overall, nice job! The gap between AI generated imagery and human art or photography is thinning. Where before a brief look at hands, or counting the number of teeth would quickly reveal the fake, you often need to scrutinise the image a lot more carefully now.
So, will Jon be producing his nice pictures by prompt any time soon?
Erm, no. During our discussion, the way that proponents have tried to convinced artists of AI’s uses also came up. It boils down to “Well, if any average joe can do this, imagine what a trained artist would be able to do!”
“So, the arguments that people make are that the people who can benefit from this are artists… you just jump onto the AI bandwagon and you’ll be better than the regular AI users. It’s like, forget all the painting stuff and just learn how to prompt better than other people and you’ll still get a nice picture at the end!”
But Jon feels that this is missing the point of artistry in the first place.
“It’s not the same to sit down and enter prompts. It’s not the same as sitting down and thinking about it, and sketching it, and coming up with different versions of it and trying your best to realize that image that you’ve got in your brain.”
Jon is grappling here with the reason that artists do art. It’s an intangible thing, but art is created through a series of tensions. Iteration, discussion, abstraction and an understanding of the different drivers of a project are the things that push the artwork to new heights. At every stage of the process, artistic creativity is decision making. Those decisions are not limited just to process, but are imbued in every facet of artistic creation. In its best version, this process inscribes deliberate meaning to a work that can be derived from the finished product.
There is an inherent value to a display of virtuosity. A difficult, nebulous thing to define, but it’s definitely a thing. It’s why we get so angry when we find out that musicians have been lip-synching performances, or that athletes have been taking steroids, or that elite card players are cheating. It’s a breach of the social contract that we should be viewing a demonstration of their actual accumulated skill. With art, on a deep level, it’s worth knowing that the decisions that led to this output were made by a thoughtful navigation and development of skills.
Now, sure, you might be saying “Well, an AI proponent will have to make decisions about the prompts they enter, so what’s the difference?” That’s true, the decision-making process hasn’t been removed entirely. What has happened is that it’s been stymied, and it’s limited. The prompts are only used to guide the denoising algorithm through the vector space to where the “correct” solution rests. The words are not being evaluated by the AI in any meaningful way. It’s a conversion of ideas to a weighted algorithm. That is, the thing actually making the art is not going through a decision making process, only a number crunching one.
Keep that in mind, because it’s really important: AI is fundamentally uncritical in the ways that it produces its imagery. It’s at the behest of its input parameters, and its training data.
Speaking of training data.
How to Train Your (AI) Dragon
So we arrive at the part where I whine about how AI art is theft. I know, you’ve heard it all before, but let’s just for a moment try to drive this conversation home with a very specific example: My artwork.
Actually, it’s Jon’s artwork. I asked him whether or not he was aware of any of his work being taken and fed into the AI databases. He nodded immediately, and told me about haveibeentrained.com, where he was able to verify that his work had been uploaded to the LAION-5B image dataset. To be clear, this is one of the datasets that has been used to train Stable Diffusion, a company that last I checked was valued at more than four billion dollars. How did Jon’s artwork end up on there, I hear you ask?
“As far as I can tell from the metadata in that website, it’s scraped it from my website directly.”
Now, it’s not unusual to scrape things from websites. I’ve done it, with things like weather data, or historical records for something that I’m researching. However, Jon’s website has a section called “Usage terms”. In it, he asserts his copyright over the work, but mentions that in general he’s alright with the images on his site being used for non-commercial purposes so long as there’s credit given, and the images aren’t manipulated or changed. He points out repeatedly that he relies on these images for income, and that if you want to use them for commercial rights to get in contact with him.
Once I found out that he’d had his website scraped, my curiosity was piqued. After all, the Illustrations he’d done for my novella (Spice Trader) and my podcast (Sunward Sky) are displayed on his webpage for all to see. I checked haveibeentrained, and sure enough:
As soon as that happened I felt something about my unease click into place. It completed the picture, as it were.
This thing, this whole system, this set of companies that are currently worth billions of dollars, haven’t just taken from artists. They’ve taken from everybody, every step of the way, in building these machines.
When I commissioned Jon, I didn’t purchase the rights to the work. As a matter of fact, at time of writing, I still don’t have the commercial rights to this image. I have, as described in our original contract, the right to use it for:
The podcast.
An e-book cover.
A paperback cover.
At a later date, I checked with Jon if he’d mind if people who supported me on Ko-Fi were given the image as a thank you and he graciously agreed, but I still do not have the commercial rights to this image. I can’t alter it, I can’t put it on shirts and sell it, I can’t make commemorative Sunward Sky mugs, nothing. I can use it for those three things in the contract, and I’m lucky enough to be allowed to use it for an additional use at no extra charge, because I asked for permission.
But wait, there’s more. Images aren’t implicitly understood by computers, as we discussed in the previous section. A non-human eye can’t tell it’s a spaceship, and without knowing it’s a spaceship, the model can’t replace it. For this to be useful in an AI model, it needs to be tagged. That’s right. You need to know that there’s a <horizon>, a <planet>, that it’s in <space>, with a <spaceship> in it. Who did that?
The Time article covering the frankly appalling treatment of people in the global south in tagging the training data has done the rounds, but there’s a long history of things like Amazon’s Mechanical Turk and other systems underpaying people to perform repetitive, machine unfriendly tasks. As a matter of fact, the only reason these machines are viable today is due to the proliferation of low cost labour modalities. Whether or not the Kenyan workers working for OpenAI were technically earning an appropriate wage for their country is beside the point: it’s yet another example of global colonialism in the modern age; modern day serfdom that those in the first world reap the benefits of, often without knowledge.
So with that horrifying example of globalist exploitation, we’ve now got a complete picture for how a company in 2023 is able to get their hands on enough imagery to make the slow, unwieldy, expensive and time consuming process worth it. Here’s the recipe:
Someone seeks out a commission artist, which takes some time to find and communicate with someone who has an appropriate style.
The commission artist goes through the laborious task of drawing an image to specification. This can take days or possibly weeks.
That image is licensed to the commissioner, for the purposes described. This costs money, which has taken several hours or days of other labour to earn.
LAION or another AI image collection company comes along and downloads the image.
Some poor bastard gets paid functionally nothing to label the data.
The image is fed into a machine that is explicitly for commercial use, in contravention of both usage terms and the license agreement.
Automatically repeat these steps six billion times, paying nothing to artists or commissioners who actually produced the work, and paying less than a living wage to the data labelers.
It’s easy, right? So long as you’re not the one paying, it’s really simple to come up with a way to make the finances stack up. Every single person in this chain is being exploited. Every contract is breached. The data labeler is underpaid, and if the Time article is to be believed, likely suffering emotional and mental trauma (NB: the Time article discusses images briefly and is talking mainly about text labelling. I'm not sure who labels the data for LAION specifically, but I believe this applies to some degree to all data sets). My contract has been breached by somebody I didn’t even know existed, and then Jon’s copyright has been treated like it’s nothing more than a joke.
As for whether AI art hurts the artist at the other end, the answer is yes. Jon mentioned that small, indie book authors have made the not unreasonable claim that he’s simply too expensive in a world where they can generate their own covers. I don’t want to speak for him here, but I would find it extremely grating to know that he was losing work to a database that was trained, at least in part, on other work of his. That would rankle.
Now, there are a bunch of lawsuits going on worldwide with regard to this, and the discussion around who retains copyright with AI art is an ongoing one. So this is, at least, starting to be addressed. But don’t get it twisted; these companies know on some level that what they’re doing is wrong. The LAION website takes pains to mention that no, the database doesn’t contain the actual images, it just collects all the URLs and points at them. That’s a semantic loophole for the lawyers to settle, but I would argue that it’s functionally the same the way it’s being used. Nevertheless, in the here and now it’s already causing material financial harm to artists who are already having to deal with the precarity of trying to do honest work in a world filled with hustle culture grifting.
The thing I mentioned before? About the Time article mentioning trauma? Yeah. It’s because these companies have decided to troll through and download imagery pretty much the whole internet. I’m not sure how much time you’ve spent on the internet, but it’s worth talking about what happens when you collect all the pictures from it and look at the resultant view of the world you get.
White Lab Coats on White Men
Jon pointed out to me that if you go to haveibeentrained and type in “Doctor”, you get a preponderance of middle aged white guys in lab coats. While this is an archetypical image of a doctor, it's not representative of the entire industry.
Gender of medical graduates has been at parity for years, and while males are still overrepresented in practice, you’re by no means guaranteed to see a white haired, smiling man when you need medical attention. You wouldn’t know this by looking at what an AI image generator thinks a medical doctor looks like. It just spits out an assortment of white guys with stethoscopes. Occasionally, you’ll get a white woman, but come now. Diversity has gotten a bit better than that.
It’s the same if you type in “Lawyer”, or “Engineer”, or “Scientist”, or any number of professions that have been historically highly regarded. The garb changes, but the man beneath stays the same.
On the other hand, if you go to LAION-5b and type in “Asian” or “Latina”, what do you get? Even with the safesearch on, you are inundated with pornographic imagery of women of those nationalities. Typing “black” doesn’t yield a similar result but does result in images of women in fetishwear. I honestly didn’t have the stomach to type in too much else, but I imagine different ethnicities, sexual orientations or gender identities would result in a similar level of bias. Incidentally, “Caucasian” returned nothing but memes, but “White” returned images of Caucasian women in white clothes, and some suggestive but not explicit imagery.
Remember how I said that AI is fundamentally uncritical in the way it produces imagery? This is why that’s bad. When a series of prompts is entered, the program goes to what the collective conscious of the internet thinks that prompt means. And the collective conciousness of the internet has, historically, turned machines into racist pricks. The internet is a playground for a very loud minority of people with distasteful ideas, from casual racists to explicit white supremacists, and that is borne out by the kind of content you retrieve when you cast a broad net over cyberspace.
Now, I have biases. Everyone does. Some are implicit, some are explicit. I’m definitely not perfect, but I know that when I work on my craft I try to excise those biases as much as possible. In my current novel that I’m writing, I have thought long and hard about how to portray certain members of the population, and there are no easy answers. I’ll get it wrong. I’ll learn, and I’ll do better each time. Similarly, when Jon or an artist has to draw something, he can recognise that he has those biases and do his best to overcome them. Recognising, lessening and moving past the implicit bias in society is a key part of improving culture. Its a level of self-imposed criticality that you have to engage in if you want to improve the quality of the world around you.
AI art won’t do that. In a world where it’s seen as a means to an end, the biases in the machine will be used to produce images of what is expected. Imagery of doctors will be output, then fed back into the machine, thus providing its algorithm with the idea that this is what a doctor really is. That impression will become reified as people become more and more used to the image models as a source of truth, and it begins to affect hiring practices. All the media shows are white males in positions of power (even more than it already does), and people stop even asking if it could be any other way. The machines feed back into themselves, cementing and increasing the idea of white supremacy in an endless churn of near automatic content creation. Those that want to question it can’t produce the imagery fast enough, because they have to do it by hand and hey! That takes time. For every one image created by a human artist, the AI is making hundreds, if not thousands, each one imbued with absolutely no inquisitiveness beyond seeking the right area of the vector space.
I don’t want that.
Diabolus ex Machina
So, here we are. Let’s pretend, for a moment, that it all goes the way of the tech industry’s wildest dreams. AI is found not to be a breach of copyright, and every picture ever created is fed into the machine. Diffusion modeling gets all the way there; photorealistic images of anyone and anything you want are available for a negligible price. Artists are out of work, because why would anyone pay for process when outcome is right there? Any image generated is fed back into the machine, and so any inherent biases are reinforced. It’s a quick process, and so people use it unthinkingly, uncaringly. Soon, the machines are seen as a source of truth. What happens then?
Don’t be flippant about it, people are already trying to get away with generated imagery as though it were true. Eric Trump, son of cameo actor from the Home Alone franchise who was recently indicted and arrested for fraud charges, posted an image of his father walking down the street surrounded by thousands of supporters and American flags. This was to promote the idea that the crimes1 of his father were not of significant import to detract from the size of his base. Ultimately, because the tech isn’t there yet, it was still possible to tell that the image was faked, but there is no guarantee that these generators will stay that way. It absolutely can and will be used by unscrupulous individuals for social and political gain.
Already, people are becoming annoyed at the way Midjourney and Stable Diffusion are restricting the way their tools can be used. Explicit sexual imagery cannot, in principle, be generated by these two diffusion models (though people have certainly tried). But the tech is digital, and the data is (largely) open source. There was a kickstarter campaign to remove the limitations of the tools and allow people to generate the objects of their sexual desires, which, spoilers, may include real people whose images are in their dataset. Pokemane, a twitch streamer, was distraught when she found out people had made deepfake pornography of her. The agency of her own body was in effect taken away from her. Now, imagine it’s the images of children on the web, fed through a new AI model created by people who don’t mind people using it for whatever. Got any pictures of your kids on Facebook? In an online article? On a family website that’s been scraped? Are you sure?
And let’s not forget that the worst kinds of people are going to use this to their own ends. The Tuckers Carlson, the Matts Walsh, the Jordans Peterson, the Stefans Molyneux, and worse. They’ll use the internal bias of these machines to demonstrate to their audience that yes, being White(tm) is a sign of intellect and good breeding, while being anything else is a sign of degeneracy. Why, look at the output of the AI generators! These Caucasian men are all lawyers, doctors. They stayed in school and did the Right Thing. Not like the degenerates in (insert other nationalities here). These are machines that have just had the history of white supremacy hard-coded into them, and there are people that think that there’s not a problem with that.
Every technology has a wild west stage. In my opinion, the internet has fallen from the original promise of being the ultimate democratic engine, but never before have we been able to create such convincing and dangerous misinformation, at such scale, with so little oversight. These are just the outcomes that I, with my silly little science fiction brain, can come up with. People with actual nefarious agendas and twisted ideologies can come up with worse, and are probably already trying.
Now, this may be catastrophic thinking, but we’ve just lived through an insurrection on the US Capitol that was fueled by a misinformation campaign. Conspiracy theories abound, from fifteen minute cities to 5G microchips to the fake moon landing. If we aren’t careful, AI is going to build a suite of tools that allow for the thorough and complete dilution and silencing of truth. The AI doesn’t care if it does that, which means that we have to.
Join me next time, I’m going to talk about how the Turing test ended up being a pretty low bar to clear in my investigations of Large Language Models such as ChatGPT.
Thanks to Jon for giving me his time and being such a thoughtful and gracious interviewee. I really appreciate it.
Do you think I’m overblowing this? Do you agree with me? Leave a comment, join the chat, or just subscribe as I muse on writing and art and the technology of our future. I also post on twitter @huntingsunrise.
Alleged crimes
Immensely insightful.
Thank you for your words and time!
Symon