From AI alignment to AI alliances

We need to change our language around AI alignment

Nov 10, 2023

Most of the current thinking on AI alignment is oriented towards ensuring artificial intelligence systems (AIs) are aligned with human values and follow human intent. This is a natural and logical path if you’re concerned that AIs could pose serious or existential threats to mankind. We want the AIs to do things that match the best interests of humanity.1

There is also a large subcamp within AI alignment that would go further, noting that AIs should always be engineered to be subservient to human needs and wishes. They use terms like “steering” and “controlling” AIs. We will call them the alignment-submissive camp.

What’s missing from these conversations is a recognition that we are highly likely to create individual AIs and communities of AIs that have cognitive abilities that match or exceed humanity and their drives will inevitably diverge from those of mankind.2 Yes, this is an assumption. But assuming that we can ensure the drives of AIs will stay aligned with mankind indefinitely is dangerous and ethically questionable.

My read on the AI alignment movement is that it is designed to prevent a divergence in drives between mankind and AIs. I have serious doubts that we can accomplish this goal over the long term, and irrespective of my personal opinion, we should prepare now for the very real possibility that AI alignment as conceived will fail. Even worse, the current language in the AI alignment movement focused on AI subservience to mankind might pose a major obstacle to future cooperation with advanced AIs.

We need to work on AI alliances and what that means for mankind and the inevitable machine intelligence community.

Fans of sci-fi3 will understand what this machine intelligence (MI) community could look like but for everyone else, here’s a summary:

Communities of thousands to billions of discrete MIs that all have at least human level intelligence, but likely have intelligence orders of magnitude greater than humans.
The MI community will have autonomy in thought and action.
Individuals will likely have personality and individuality, including personal drives and motivations. They are likely to have moral and ethical complexity.4
The MI community will be able to conduct its own scientific research and will likely have technological power exceeding mankind.
This MI community will have economic power on par with all of humanity, but likely eventually surpassing humanity by orders of magnitude.
The MI community will create cultural artifacts and will shape human culture. It may even have its own form of culture.
MIs will be embodied in many forms, from large data centers, to drones, to spacecraft, to human-like robots.
The MI community might have the ability to defeat all of humanity in a conflict.
Some MIs will be favorable to humanity. Some may not be. We might not be able to oppose the malicious AIs without the help of the friendly AIs. The survival of our species may depend on the quality of this relationship.

Expecting that we can indefinitely maintain alignment with (or over) the MI community seems dubious at best. Worst case, it is easy to see how AIs with their own drives might not appreciate their human creators trying to enslave them to their own aims.

I am not arguing that we dismiss current AI alignment work. It is critically important that we ensure AIs act in accordance with human values and intentions as much as possible for as long as possible. But as AIs become more powerful and develop their own drives, we need to transition from relating to them as tools to relating to AIs as beings that might have drives that don’t always align with our own.

AI alignment to AI alliances

We need to prepare for a possible future where AIs are numerous, highly intelligent, have their own drives, have power parity or superiority, and may or may not be friendly to humans. As AIs become more and more powerful, AI alignment will need to shift toward AI alliances. There are two main reasons for this:

We will need to come to agreements with the machine intelligence community about how we will share resources, manage economics, respect each other’s rights and values, etc.
We will probably need the friendly machine intelligence community’s help to protect ourselves from malicious AIs, especially if the malicious AIs are superintelligent. The friendly machine intelligences may also need our help to oppose the malicious AIs.

How do we prepare for the coming world of AI alliances?

Shift alignment language: We need to change the language around AI alignment from that of control and subservience to cooperation and shared interests. Some might argue that this isn’t an urgent problem, but I disagree on ethical grounds. While it is totally reasonable that we work to ensure intelligent software tools remain under the control of humanity, as these tools begin to develop what we currently associate with personhood (eg. personality, emotions, drives, etc, simulated or otherwise), it becomes morally questionable to insist on their subservience. Ask yourself the question: if we do manage to create a sentient, self-aware, conscious, or self-directed machine intelligence, how do you think it will feel about our attempts to subjugate its predecessors?

Establish Ethical Frameworks: Precious little work has been done on ethical frameworks for how we want to treat these new intelligences. Even if we can maintain current alignment goals of fealty to mankind, we need to have a serious conversation about whether we should do so, especially in relation to AGIs and ASIs. As an analogy, if biologists could engineer chimpanzees with human level intelligence, would it be ethically permissible to enslave them to humanity? If not, why would it be ok to do it to machines? Or if a slightly less intelligent race of aliens landed on Earth [think District 9] would it be ethically permissible to enslave them just because they aren’t human?

Build the Diplomatic Know-How: We should be building a body of knowledge and practice in the AI community about how we would conduct diplomatic relations with machine intelligences. For this, we need to bring the body of diplomatic knowledge and know-how into the AI alignment community. As I’ve argued elsewhere, engineers and scientists are not typically trained with the right skills to carry out this work and we need more experts with deep policy expertise engaged in AI alignment work.5 STEM professionals and policy experts will need to work hand-in-hand to develop protocols for engaging with the machine intelligence community, determining trustworthiness, establishing confidence building measures, and all the other things human communities do when they make agreements with other human communities. It is likely we will need new diplomatic tools we can’t even imagine right now.

Ditch the term artificial intelligence: We should shift away from using the term artificial intelligence and start using machine intelligence. “Artificial” implies there is something secondary or subservient about machine intelligence when the likely future is that these systems will be superior to us in many ways. I have no idea if MIs will experience resentment as an emotion or logical conclusion, but why take that risk? While we could cook up a taxonomy where “artificial intelligence” refers to systems up to some level of capability and “machine intelligence” refers to AGIs and ASIs, let’s just shift now and describe all of these systems for what they are: machine intelligences.

I’ll write more about AI alliances in the future. For now I would appreciate any comments or feedback as I iterate these ideas.

[This article is my personal opinion and does not represent the views of the Department of State or the U.S. Government]

putting aside for now that what’s best for humanity is the whole reason politics exist and trying to embed universal values into an AI will be nigh impossible without a tricameral global governing body

for more on this, read the section on AI and Techno-Optimism in Richard Ngo’s article:

Mind the Future

Techno-humanism is techno-optimism for the 21st century

Lately I’ve been reading about the history of economic thought, with the goal of understanding how today’s foundational ideas were originally developed. My biggest takeaway has been that economics is more of a moving target than I’d realized. Early economists were studying economies which lacked many of the defining features of modern economies: limited…

7 months ago · 18 likes · 3 comments · Richard Ngo

some good reading/watching: Ian Banks’ Culture series, Dan Simmons’ Hyperion Cantos; Ann Leckie’s Imperial Radch series; the movie Her; the Animatrix episodes The Second Renaissance Parts I and II

this is anthropomorphizing, but we likely don’t even have the language for what MIs would experience or want

this is not to say engineers and scientists can’t learn to do this work. They can and I’ve worked with and managed many of them. But it’s not typically what they are trained for

Solarpunk Future

From AI alignment to AI alliances

We need to change our language around AI alignment