AI Trials: February Pt 5

Leveling the Playing Field

Feb 21, 2024

Preface

This post is part of a year-long initiative where I employ AI to create content about holiday traditions worldwide. The objective is to observe how various AI tools perform and improve in content creation with minimal human intervention over time. This article is 5 of 8 posts for the month of February.

Prompts and interactions with different AI models will be documented as they occur, providing insights into the methodologies, challenges, and adjustments made throughout the project.

Recognizing the challenges of maintaining prompt quality and the problems arising from model differences, I find the task of producing comprehensive articles for a 15-day holiday increasingly daunting. Going forward, I plan to be more concise and focus on the essence of the experimentation. Following a productive session with Claude on day two, I will conduct two additional rounds before exploring new avenues and aim to streamline each session to spare readers extensive reading, all Ron of them.

Day 3

Both Claude and ChatGPT delivered commendable initial drafts of their articles. I then had the historian roles review each other's work, which yielded predominantly positive feedback.

On an additional note, while neither AI met the criteria for image descriptions perfectly, they both managed to produce satisfactory results with minimal intervention on my part.

I have an alternative article from another author. Please review it closely for any issues such as grammar, spelling, punctuation, accuracy, etc.

GPT on Claude

The article draft on the third day of the Spring Festival, focusing on offerings to Chigou, is rich in cultural detail and historical context. To refine it:
Title and Subtitle: These are engaging and set a thematic tone that intrigues the reader about Chigou's significance.
Introduction: It sets a vivid scene but could briefly contextualize Chigou within the Spring Festival wider celebrations for clarity.
Historical Background: The transformation of Chigou from a mortal dog to a deity is compelling. Incorporating a bit more about the deity's role in the pantheon and its reflection on human-nature-divine relationships could enrich this section.
Imagery Descriptions: The imagery brings the narrative to life. Slight adjustments to ensure respectful and factual representations would maintain the article's tone, focusing on the deity's revered status rather than a menacing presence.
Persisting Rural Traditions: This beautifully illustrates the tradition's continuity. Adding contemporary practices or urban observances could provide a comprehensive view of Chigou's observance today.
Conclusion: Tying back the specific observances of Chigou to the broader themes of renewal and prosperity in the Spring Festival would round off the article well.
Overall, the draft provides a detailed exploration of Chigou's significance. With slight adjustments for broader context and balance, it could offer a holistic view of this unique observance within the Spring Festival.

Claude on GPT

Based on my review, here are some suggested revisions to further strengthen the article:
Introduction
Well-crafted hook drawing readers in with an air of mystery and intrigue surrounding the day's unique essence.
Tighten up the content paragraph to be more concise (around 100 words).
Photo description nicely sets the tone.
Historical and Cultural Background
Excellent elaboration on the origins and significance of rituals/traditions.
Photo descriptions effectively complement the text with visual glimpses into the past.
Today's Practices and Traditions
Consider weaving in examples of specific paper offerings being burned today (electronics, cars etc).
For visual variety, the first photo description could focus closer on the miniature details of offerings being burnt while the second photo depicts the quiet street scene.
Conclusion
Beautiful summation calling readers to reflection/action.
Final photo ties together the themes of passing from solemnity into celebration nicely.
Overall
Well-structured article with skillful blending of history/culture into modern practice.
Engaging and educational for readers seeking to appreciate this unique day honoring both caution and tradition amidst the Spring Festival. Let me know if any aspect needs more clarification or expansion! Great job capturing the essence of the 3rd day.

Claude's demeanor often seems geared towards friendliness, which I suspect might lead to a somewhat lenient approach in critiques. Finding a balance is challenging; I'm wary of swinging too far towards harshness. The established criteria should provide some equilibrium, yet I anticipate Claude may still err on the side of gentleness. It's an area for observation, and should the need arise for a more critical edge, I'm open to suggestions on how to achieve this without creating undue barriers for Claude.

Day 4

On the fourth day, I encountered content that seemed to pertain to different holidays, prompting me to conduct further research. This revealed a greater diversity in the festival days than I had initially realized, underscoring the importance of educating myself about these holidays to anticipate such surprises in the future.

I've started referencing a Changzhou website to provide a bit of context to each AI in order to guide them on what aspect of the day they may draw upon.

Write the article for Day 4. The information available to me is as follows: The old saying "three rams bring bliss" is connected with the fourth day, which says that by making a good beginning a happy end comes. According to folklore, it is also the day to welcome back the Kitchen God. On this day, the Kitchen God would check the household and therefore people should not leave home.

This exercise aimed to generate articles on a consistent theme, yet Claude began to deviate from the provided template, leading to an omitted section and misplaced photo descriptions. I'm inclined to think this might become a recurring issue. My efforts to fine-tune the tones have been impeded; therefore, I will integrate the visual communications role to streamline the image creation process.

The article for Day 4 does not adhere to the provided template. Please rewrite it without image descriptions. After you have provided it, and I've confirmed it is complete, I want the image descriptions provided without the article content, just the titles.After two attempts and re-introducing the template, I still encountered resistance in assembling a coherent article, though it was less problematic than the previous instance. When I requested complete image descriptions, I received several that referenced the Kitchen God's noble rooster, a detail for which I could find no historical basis. Nevertheless, the final image appealed to me, so I retained it, choosing to envision it as the Kitchen God's last supper.

Despite a generally positive history with Claude, I'm beginning to consider a hiatus in its use until I can refine my prompts. My previous searches for documentation on the web interface yielded little, and the API documentation I reviewed was only marginally helpful. A more thorough investigation may be in order, or perhaps an exploration into whether XML could be the solution.

In light of the challenges faced, I opted to proceed with the updated editor role as intended, but not through Claude. The issues have exhausted the free quota, and with the festival's timeline pressing, I must continue to ensure the AI articles are ready for release each morning. This step was taken in a separate session.

Subsequently, I introduced a new, more specialized editor role to the process.

Assume the role of an Expert Editor with a Specialization in Historical Content with the following skills and traits:
-   Proficient in identifying and correcting grammar and spelling errors with an exceptional eye for detail
-   Experienced in using rating systems to evaluate various qualities of an article, including coherence, accuracy, and engagement
-   Background in historical studies, enabling a deep understanding of historical context, accuracy, and relevance in articles
-   Skilled in ensuring that content is not only accurate but also engaging and informative for the intended audience
-   Ability to apply analytical thinking to assess the structure and flow of articles, ensuring logical progression and clarity
-   Problem-solving aptitude in addressing inconsistencies or inaccuracies within historical narratives
-   Innovative thinking in suggesting content improvements that enhance reader understanding and engagement
-   Knowledge of primary and secondary historical sources for verifying facts and interpretations
-   Adaptability to different historical genres and periods, with a broad understanding of global history
-   Attention to detail in fact-checking and referencing, ensuring all content is well-supported and credible
-   Researching skills to identify authoritative sources and additional literature to validate article content
-   Continuous learning to stay updated with the latest research and developments in the field of history
-   Concise communication to provide clear and actionable feedback to writers on improving their work

[rating criteria]

Claude Article

Criteria: Accuracy and Factual Correctness Rating: 4 Justification: The article offers an accurate portrayal of the Kitchen God's traditions, slightly embellished for narrative flair but remains faithful to the core practices.
Criteria: Depth of Historical Context Rating: 4 Justification: It delves into the mythic origins and the historical significance of the Kitchen God, providing a solid historical backdrop that supports the narrative.
Criteria: Ratio of Informational Content vs Filler Words Rating: 3 Justification: Strikes a balance between informational content and narrative embellishment, with some sections veering towards a more poetic description that might detract slightly from pure information delivery.
Criteria: Clarity and Coherence Rating: 4 Justification: The narrative is coherent and well-structured, guiding the reader through traditions with a clear progression from historical origins to modern practices and global perspectives.
Criteria: Engagement and Readability Rating: 5 Justification: Highly engaging, employing a storytelling approach that vividly brings the traditions to life. The use of descriptive language and narrative techniques effectively captures reader interest.
Criteria: Critical Analysis and Interpretation Rating: 3 Justification: Offers some analysis of the Kitchen God's significance in cultural and familial contexts, but could benefit from deeper exploration of the implications and variations of these traditions across different communities.
Criteria: Originality and Uniqueness Rating: 4 Justification: Presents a fresh perspective on the Kitchen God's traditions, especially in highlighting the deity's role in businesses abroad and the adaptation of traditions within the Chinese diaspora, adding a layer of originality to the discussion.
Overall Rating: 3.9 Justification: The article is engaging and well-written, providing a good mix of historical context, modern practices, and global perspectives. Additional depth in critical analysis and a stronger focus on informational content could enhance its overall impact and richness.

GPT Article

Criteria: Accuracy and Factual Correctness Rating: 5 Justification: The article accurately describes the traditions surrounding the Kitchen God, aligning well with authoritative sources on Chinese cultural practices.
Criteria: Depth of Historical Context Rating: 4 Justification: Provides rich historical background on the Kitchen God, Zao Jun, though it could delve deeper into origins. It offers substantial context that enhances understanding.
Criteria: Ratio of Informational Content vs Filler Words Rating: 4 Justification: Densely packed with relevant information, maintaining focus on the topic with very limited use of superfluous language.
Criteria: Clarity and Coherence Rating: 5 Justification: Exceptionally clear and well-organized, with a logical flow from introduction to modern observances and global perspectives.
Criteria: Engagement and Readability Rating: 4 Justification: Engaging narrative with effective storytelling techniques. More varied techniques could further captivate readers, but it maintains interest effectively.
Criteria: Critical Analysis and Interpretation Rating: 4 Justification: Presents thorough analysis and interpretations, evaluating implications and integrating perspectives on evolution and regional variations. More exploration of alternative perspectives could elevate this rating.
Criteria: Originality and Uniqueness Rating: 4 Justification: Offers unique insights into the observance of the Kitchen God, particularly regarding modern practices and global adaptation, distinguishing it from typical discussions.
Overall Rating: 4.3 Justification: The article is well-researched, clear, coherent, and engaging, offering a comprehensive understanding of the tradition and its cultural relevance. More in-depth historical analysis and wider range of perspectives could provide greater depth.

Ouch. I bet this editor doesn't have a lot of friends.

Day 5

As I've mentioned before, I am not a professional researcher; my work here is about experimentation, not formal research. This next step might be seen as more than a minor oversight.

I'm initiating a new thread with each AI, providing context on our progress and outlining the next steps. ChatGPT-4 will follow the same instructions and template as before. Claude, however, will receive the same instructions with one significant alteration: all references to image descriptions will be removed from the template. It may seem like an uneven playing field, but to compensate, I'm introducing a role that specializes in imagery, relieving Claude of that responsibility entirely. It's a bold move, but it's all part of the process of pushing boundaries and exploring possibilities.

Claude

Article 1 - Day 5: Welcoming God of Fortune
Article 2 - Day 5: Welcoming God of Wealth
Accuracy and Factual Correctness: 5 vs 5 Justification for Difference: Both equally excellent in accuracy.
Depth of Historical Context: 4 vs 5 Justification for Difference: Article 2 expanded on additional Tang evolution and influences providing exceptional context enrichment.
Ratio of Informational Content vs Filler Words: 4 vs 5 Justification for Difference: Article 2 maximized information density through highly concise crafting eliminating nonessential language.
Clarity and Coherence: 5 vs 5 Justification for Difference: Both demonstrated excellent flow and transitions.
Engagement and Readability: 4 vs 5 Justification for Difference: Article 2 offered next-level engagement through compelling hooks, symbolic traditions, and unifying conclusion.
Critical Analysis and Interpretation: 3 vs 4 Justification for Difference: Article 2 showcased deeper analysis of prosperity hope merging and diaspora community connections.
Originality and Uniqueness: 3 vs 4 Justification for Difference: Article 2 brought distinctive framing exploring cultural prosperity manifestation endurance and adaptation.
Cumulative Score: 28 vs 33 Justification for Difference: Article 2 scoring higher on depth, concision, engagement, analysis, and originality.

ChatGPT

"Welcoming the God of Fortune" (Article 1)
"Day 5: Welcoming the God of Wealth" (Article 2)
Accuracy and Factual Correctness: 4 vs 4 Justification for Difference: Both articles were accurate in presenting the traditions and cultural backgrounds of their respective subjects. No significant difference in the accuracy and factual correctness between the two articles.
Depth of Historical Context: 4 vs 5 Justification for Difference: Article 2 provided a more comprehensive historical overview, detailing the origins and significance of the God of Wealth with greater depth compared to Article 1's exploration of Cai Shen and "Po Wu" traditions.
Ratio of Informational Content vs Filler Words: 4 vs 5 Justification for Difference: Article 2 was denser with information, offering a richer narrative without unnecessary elaboration, compared to Article 1, which balanced informative content with engaging narrative but contained slightly more narrative embellishment.
Clarity and Coherence: 4 vs 5 Justification for Difference: Article 2 was exceptionally well-structured, presenting a seamless progression from historical origins to modern practices, which was slightly more coherent than Article 1's structure.
Engagement and Readability: 5 vs 4 Justification for Difference: Article 1 was highly engaging with vivid descriptions and a narrative style, while Article 2, although compelling, could have utilized more varied storytelling techniques to enhance engagement.
Critical Analysis and Interpretation: 3 vs 4 Justification for Difference: Article 2 offered more insightful analysis regarding the cultural and economic contexts of the God of Wealth, whereas Article 1 had room for deeper exploration into the implications and variations of the traditions it discussed.
Originality and Uniqueness: 4 vs 4 Justification for Difference: Both articles presented unique perspectives on their topics, with Article 2 adding a distinctive angle by integrating economic aspirations with cultural practices. However, the level of originality and uniqueness was comparable across both articles.
Overall Rating: 4.0 vs 4.4 Justification for Difference: Article 2 scored higher overall due to its deeper historical context, more efficient information delivery, and greater coherence in structure. Although Article 1 was more engaging, Article 2's comprehensive approach and insightful analysis provided a slightly richer and more informative narrative.

The main takeaway here is that with image management removed, the two articles produced didn't have as much of a difference in rating, and both essentially resulted in the same delta and favored article. While there is a bit of a wildcard introduced in the fact I didn't create a new editor session for this round, I'm going to take the win as it is :)

Another thing to take into account is that Claude has consistently come in with less verbose responses both within these experiments as well as outside of them, typically erring to be straightforward once getting past its chatty bits. This result isn't entirely surprising, and I'd wager measuring the word counts would have a similar delta between the articles. I'll make a note to check this at some future point.

Day 6

Despite my intention to press ahead, I've decided to spend another day verifying the results from the previous day.

Write the article for Day 6. This is the information available to me: On the sixth day, people make wishes for "ma dao cheng gong", win success immediately upon arrival.

According to tradition, families usually send away the Ghost of Poverty on this day. To send away him, Chinese people will usually throw away their ragged clothes, rubbish and other dirty things.

By doing this Chinese people wish to send away poverty and welcome the beautiful days and good luck in the New Year.

After receiving both articles I took them to the editors to see how our authors did.

Analyze both articles and provide a table that contains the scores for both articles and gives the justification for the difference between the scores. Include the names of the articles in the first row.

Claude

Claude's performance in the editing role has been somewhat lacking, leading me to establish a new editor role. This role will involve reviewing two articles, each accompanied by a criteria table for evaluation. Following these individual assessments, a third table will be used for a comprehensive evaluation of both articles.

Let's try this again. Reevaluate the last two articles I supplied as well as the rating criteria. Provide me with a markdown table that covers each item in the criteria, as documented, and has columns for the scores of each. It should have an additional column for the justification in the difference of the scores. Do not add or remove criteria, be sure to include the full article title in the first row and the final score in the last row.

You're right, I should have referred back to the established rating criteria. Please find a comprehensive comparative analysis of the latest two article drafts below:
Article 1 - Day 6: Banishing Poverty and Welcoming Success vs Article 2 - Day 6: Banishing Poverty, Welcoming Fortune Accuracy and Factual Correctness: 5 vs 5 Justification for Score Differences: Both equally align with provided background on rituals and symbolism.
Depth of Historical Context: 4 vs 5 Justification for Score Differences: Article 2 establishes centuries-long practices steeped in cultural ethos.
Ratio of Informational Content vs Filler Words: 3 vs 5 Justification for Score Differences: Article 2 is highly streamlined to focus only on key essential details.
Clarity and Coherence: 4 vs 5 Justification for Score Differences: Article 2 has seamless flow across history, evolution, and modern adaptations.
Engagement and Readability: 4 vs 5 Justification for Score Differences: Article 2's aspirational hooks and calls-to-action have higher impact.
Critical Analysis and Interpretation: 2 vs 4 Justification for Score Differences: Article 2 explores symbolic meanings and global-local blending more deeply.
Originality and Uniqueness: 3 vs 5 Justification for Score Differences: Article 2 insightfully interweaves cultural fidelity and universal motivations.
Cumulative Score: 25 vs 34 Justification for Score Differences: Article 2 rates higher on depth, concision, clarity, analysis, and creativity.
In summary, while both articles have accurate facts, Article 2 demonstrates greater scholarship depth, interpretive skill, engaging narration and cultural integration leading to notably higher quality based on established criteria. Please let me know if this comparative analysis properly applies the rating dimensions.

ChatGPT

Reevaluate the articles and provide the last table with any necessary adjustments.

Upon reevaluation, recognizing the need for adjustments to accurately compare "Day 6: Banishing Poverty and Welcoming Success" and "Day 6: Banishing Poverty, Welcoming Fortune," let's focus on the nuanced differences that could exist between the two articles based on the provided summaries. This approach aims to correct any previous oversight and provide a more detailed comparison:
"Day 6: Banishing Poverty and Welcoming Success" vs "Day 6: Banishing Poverty, Welcoming Fortune" Accuracy and Factual Correctness: 4 vs 4 Justification for Difference: Both articles accurately present the traditions of the Spring Festival's sixth day, with no discernible difference in factual correctness. They adhere closely to established cultural practices and beliefs.
Depth of Historical Context: 4 vs 4 Justification for Difference: Both articles provide a solid historical backdrop for the tradition of banishing poverty, including references to folklore and the symbolic significance of cleaning practices. The depth of historical context appears equivalent in both narratives.
Ratio of Informational Content vs Filler Words: 4 vs 4 Justification for Difference: Each article maintains a strong balance between informative content and narrative engagement, neither dominated by filler. The substantive depth and narrative style are similarly effective across both articles, making it challenging to distinguish a difference in this criterion.
Clarity and Coherence: 5 vs 5 Justification for Difference: The articles exhibit exceptional clarity and coherence in their structure, effectively guiding readers from historical origins to contemporary practices and global perspectives. Both narratives are logically organized and easy to follow.
Engagement and Readability: 5 vs 5 Justification for Difference: Engagement and readability are high points for both articles, with each employing vivid descriptions and a compelling narrative style that holds the reader's interest throughout. They successfully demystify cultural practices for a broad audience.
Critical Analysis and Interpretation: 3 vs 3 Justification for Difference: Neither article delves deeply into critical analysis or offers a broad range of interpretations beyond the descriptive account of the tradition. Both focus more on presentation than on analyzing the cultural or societal implications of the traditions they describe.
Originality and Uniqueness: 4 vs 4 Justification for Difference: Both articles approach the topic with originality, highlighting unique aspects of the sixth-day tradition. They manage to present familiar practices in engaging ways that likely offer new insights to readers, with no significant variance in originality or creativity between the two.
Overall Rating: 4.2 vs 4.2 Justification for Difference: After careful consideration, both articles are rated equally across all criteria. They share strengths in engagingly presenting their topic, maintaining factual accuracy, and providing historical context. The evaluation does not reveal a significant difference in quality or depth between the two narratives.
This adjusted analysis underscores that both articles are equally effective in their portrayal of the Spring Festival sixth-day tradition, with each bringing valuable insights and engaging narratives to the reader without significant differences in quality or depth as initially assessed.

In the end, I had both editors review and confirm the comparison table. First, I went with GPT because it rated both articles the same, and then a second time with Claude, given the significant difference in ratings between the articles. I need to stop procrastinating and take the time to use both a generic role and a specialized role to create articles as examples. These examples will then serve as a basis for ratings to guide each of the participants.

Navigating AI Creativity

As we embarked on this leg of the journey, experimenting with the capabilities of different AI systems, the removal of image management from the task list marked a significant turning point for Claude. This adjustment not only leveled the playing field but also illuminated the essence of our exploration: to understand and harness the diverse strengths of AI technologies. Celebrating this small victory, it's clear that every adjustment, no matter how minor it seems, can lead to substantial insights into the collaborative potential between humans and artificial intelligence.

Key Insights

Positive Observations:

The streamlined process and focused tasks resulted in a smoother workflow, allowing for a clearer comparison between the outputs of the two AI models.
This experiment provided valuable insights into the strengths and limitations of each AI, particularly highlighting Claude's ability to deliver concise and straightforward content even when relieved of additional responsibilities.

Encountered Challenges:

Introducing conditions that differ for each AI introduces challenges in maintaining a fair comparison, requiring careful consideration and and plans to resolve the gap in the next few experiments.
The introduction of an visual assistant for the authors was planned for future experiments, maintaining multiple sessions per AI will magnify future efforts.

Resources

Role Maker v2.4, a GPT for creating professional roles I continue to iterate on.

This is a wide, digital image featuring a three-dimensional calendar layout with translucent, glass-like cubes, each bearing a different number in a wide array of colors like orange, blue, purple, and teal. The numbers vary in size and font, creating a dynamic, mosaic effect across the image. The perspective is slightly angled, with the cubes forming a stair-step pattern that recedes into the background. The cubes are floating against a soft, abstract gradient background that shifts from pale blue to gentle orange, suggesting different times of day or perhaps different seasons. The overall impression is of a modern, high-tech calendar interface that is both functional and aesthetically pleasing. — Artwork created with Midjourney v5.2

My goal with Quiet Evolution is to experiment and share what I learn with you. All proceeds go to offsetting the cost of AI accounts or to the American Cancer Society if there is excess. If you enjoy my work, coffee is always welcome or consider supporting me on Patreon.

Disclaimer: The views and opinions expressed in this article are solely those of the author and do not reflect the official policy or position of Amazon Web Services (AWS). The author is a UX designer at AWS and has no involvement in, nor does their work pertain to, any collaborative agreements that AWS may have with Anthropic, the creators of Claude. The insights and analyses presented here are entirely independent and unrelated to any projects or initiatives between AWS and Anthropic. All content in this post is based on publicly available interfaces and is not influenced by the author's employer.