AI Trials: Mar Pt 2

Unveiling the Perils Within Beauty

Mar 25, 2024

This vibrant image features a low-poly landscape transitioning from cool blue hues on the left to warm reds on the right, reminiscent of a change from day to night or from one season to another. Three stylized trees with fiery orange foliage stand out against the geometric backdrop, their colors popping vividly, suggesting an autumnal theme. Above the landscape, an assortment of letters and numbers in different fonts and sizes float across the image, perhaps indicating a layer of information or data superimposed over a natural scene. The blend of digital art and elements of nature creates a striking contrast between the organic and the constructed.

Preface

This post is part of a year-long project where AI is being used to create content about holiday traditions worldwide. The goal is to track how various AI do and improve at content creation with minimal help over time. This is the final post for March, click here for the project index.

This post contains detailed interactions with different AI to share the approach, challenges, and prompts used in the creation of the related articles.

Let's recap. I originally planned to work with different tones this past month, but with the challenges I encountered in the prior month and the approach of Holi, a celebration I've been looking forward to, I invited Leonardo and StableDiffusion to join the party, with special appearances by Midjourney v5.2. I was pleasantly surprised with the images for Spring Festival, and equally horrified by the results that received for Holika Dahan. It felt as if all the advantages I had seen in the models in a previous iteration had disappeared, only to be replaced by additional issues.

After overcoming the disappointment of the results and concern that the images for Holi would be a disaster, I reflected upon prior experiences and concluded that they hadn't performed as poorly as I felt. Rather, I've become a bit spoiled with the alpha release of Midjourney v6.

Leonardo did great with close-ups and small groups of people.
Midjourney v5.2 did reasonably well in general, maintaining its position as the bar for its peers.
StableDiffusion was fine as long as the subject matter wasn't human.

Additionally, I hadn't put in the work to take full advantage of Leonardo or StableDiffusion, especially when it comes to the need for negative prompts in the latter. All of this aside, I still question the ethics of pay-as-you-go platforms that produce questionable results as a default.

Holi

Article Creation

To maximize the number and variety of descriptions for Holi, I invited Gemini back to the party with Claude and ChatGPT. I had each of the three AI write its own article, followed by the descriptions for their respective articles.

Gemini's article exhibited the same shortcomings as prior articles on well-defined holidays. Rather than fighting to get the desired end result, I spoon-fed it step by step, having it write each article section one by one until I could assemble the entire article. Afterwards, I had it review the article for continuity, which is farther than I got with Bard. I had to nudge it along to complete the process, but it wasn't too challenging.

All three AI provided solid image descriptions with the current photographer role, although both Claude and Gemini needed a nudge regarding word count. I did some research to identify a more reliable way to get accurate word counts, and found I'm not alone in that quest. It's something to consider for future projects.

Image Creation

With the image descriptions ready, I used Leonardo.ai's Kino model, Midjourney versions v5.2 and v6 alpha, and StableDiffusion's SDXL 1.0 model to generate images for each description in each article. As a result, I had hundreds of images to review and select the best ones for the articles. While it may be somewhat unfair to compare MJv6 against the others, I couldn't ignore its potential to produce the highest quality images for Holi.

For Midjourney, I provided the original descriptions without any edits or specialized parameters, only defining the aspect ratio.

Images from Midjourney can be found within their respective articles, as well as the gallery for Holi.

I had intended to research optimizing prompts for Leonardo, but the readily available guidance focused mainly on the built-in prompt creator. To supplement this and considering that most images featured people, I used PhotoReal v2, a recently released model.

These images evoke the deep connections and communal joy of Holi, the festival of colors. In the first, a couple shares a tender, color-streaked moment, their closeness framed by whispers of tradition and love. The second picture captures a group of women, their laughter and camaraderie shining as bright as the vivid powders they delight in. Each image, rich with the festival's hues, celebrates not just a cultural event, but the very essence of human togetherness and the shared happiness that colors every aspect of life during this festive time. — Leonardo.ai

To improve the results from StableDiffusion, I read about creating effective prompts for it. This led to notably better outcomes, but I clearly need to invest more effort to match the quality I've seen others achieve. I started with a smaller prompt and gradually expanded it to the following as I made updates along the way.

In the throes of Holi's jubilation, these images are a radiant testament to the festival's power to bring people together in a spectacle of shared joy. The first captures the infectious smiles and camaraderie among men adorned with vibrant colors against the backdrop of historic architecture, a living canvas of tradition and merriment. The second image is a whirlwind of ecstatic celebration, with young women lost in the moment, their laughter and dynamic movements creating a symphony of colors that seem to dance in the air. These moments, rich in culture and emotion, underscore the essence of Holi: a time when the zest for life is expressed in its most vivid and communal form. — StableDiffusion SDXL 1.0

ugly | | bad face | | strange face | | disfigured | | misshapen face | | poorly drawn | | extra limbs | | extra hands | | extra feet | | backwards limbs | | extra fingers | | extra toes | | unrealistic, incorrect, bad anatomy | | cut off body pieces | | strange body positions | | impossible body positioning | | Mismatched eyes | | cross eyed | | crooked face | | crooked lips | | unclear | | undefined | | mutations | | deformities | | duplicate faces, plastic

A vast number of the issues were resolved with this basic negative prompt, but I have considerable work to do if I intend to uncover how far I can take SD.

Your Own Worst Enemy

In my haste to generate Holi visuals, I neglected the most important lesson from my AI experiments this year: there is no magic bullet. AI, like any tool, requires effort to achieve optimal results. This holds true for content creators, designers, and virtually everyone else. I produced countless images that went straight to the digital trash bin. It was a sobering reminder that while AI is incredibly powerful, it's not a panacea. To get the desired output, you must carefully consider your input, refine through multiple iterations, and be prepared to learn from your missteps along the way.

Nevertheless, I witnessed significant advancements through the new Claude models, less remarkable improvements with Gemini, and of course, the substantial difference between Midjourney v6 and the other AI I used to create images this month. With more releases on the horizon, it promises to be an interesting year.

Key Insights

Positive Observations:

The process of guiding Gemini to produce an article highlighted the AI's ability to create coherent content with step-by-step instructions.
Research into effective prompt creation for StableDiffusion led to improved outcomes, demonstrating the value of tailored prompts.

Encountered Challenges:

The need for negative prompts in StableDiffusion and the lack of effort in optimizing prompts for Leonardo resulted in subpar image quality.
The pay-as-you-go model of some AI platforms raises ethical concerns regarding the default quality of generated content.

This image features a geometric gradient of colors transitioning from cool blues on the left to warm reds on the right, composed of numerous triangular facets that create a dynamic and textured visual effect. Superimposed on this colorful, low-poly background are the numbers "1021" and "47" in a large, simple white font, centrally located and staggered across the horizontal axis. The overall design has a modern, digital feel, and could be associated with contemporary graphic design, data visualization, or abstract art, with the numbers possibly signifying a specific date, quantity, or identifier within a creative or analytical context. — Artwork created with Midjourney v6 alpha

As an eternal tinkerer, my curiosity, passion, and sheer stubbornness fuel a relentless desire to experiment, learn, and share knowledge, which keeps my creative spirit ignited. I'm constantly looking for new areas to explore, driven by imagination to see where new and evolving technologies might take me.

Driven by passion, not profit, though a coffee is always welcome.

Disclaimer: The views and opinions expressed in this article are solely those of the author and do not reflect the official policy or position of Amazon Web Services (AWS). The author is a UX designer at Amazon Web Services (AWS) and has no involvement in, nor does their work pertain to, any collaborative agreements that AWS may have with Anthropic, the creators of Claude. The insights and analyses presented here are entirely independent and unrelated to any projects or initiatives between AWS and Anthropic. All content in this post is based on publicly available interfaces and is not influenced by the author's employer.

Quiet Evolution