A Case for Superhuman Governance, using AI

Ozzie Gooen

07 Jun 2024

I believe that:

AI-enhanced organization governance could be a potentially huge win in the next few decades.
AI-enhanced governance could allow organizations to reach superhuman standards, like having an expected "99.99" reliability rate of not being corrupt or not telling lies.
While there are clear risks to AI-enhancement at the top levels of organizations, it’s likely that most of these can be managed, assuming that the implementers are reasonable.
AI-enhanced governance could synchronize well with AI company regulation. These companies would be well-placed to develop innovations and could hypothetically be incentivized to do much of the work. AI-enhanced governance might be necessary to ensure that these organizations are aligned with public interests.
More thorough investigation here could be promising for the effective altruism community.

Within effective altruism now, there’s a lot of work on governance and AI, but not much on using AI for governance. AI Governance typically focuses on using conventional strategies to oversee AI organizations, while AI Alignment research focuses on aligning AI systems. However, leveraging AI to improve human governance is an underexplored area that could complement these cause areas. You can think of it as “Organizational Alignment”, as a counterpoint to “AI Alignment.”

This article was written after some rough ideation I’ve done about this area. This isn’t at all a literature review or a research agenda. That said, for those interested in this topic, here are a few posts you might find interesting.

What is “AI-Assisted” Governance?

AI-Assisted Governance refers to improvements in governance that leverage artificial intelligence (AI), particularly focusing on rapidly advancing areas like Large Language Models (LLMs).

Examples methods include:

Monitoring politicians and executives to identify and flag misaligned or malevolent behavior, ensuring accountability and integrity.
Enhancing epistemics and decision-making processes at the top levels of organizations, leading to more informed and rational strategies.
Facilitating more effective negotiations and trades between organizations, fostering better cooperation and coordination.
Assisting in writing and overseeing highly secure systems, such as implementing differential privacy and formally verified, bug-free decision-automation software, for use at managerial levels.

Arguments for Governance Improvements, Generally

There's already a lot of consensus in the rationalist and effective altruist communities about the importance for governance. See the topics on Global Governance, AI Governance, and Nonprofit Governance for more information.

Here are some main reasons why focusing on improving governance seems particularly promising:

Concentrated Leverage

Real-world influence is disproportionately concentrated in the hands of a relatively small number of leaders in government, business, and other pivotal institutions. This is especially true in the case of rapid AI progress. Improving the reasoning and actions of this select group is therefore perhaps the most targeted, tractable, and neglected way to shape humanity's long-term future. AI tools could offer uniquely potent levers to do so.

A lot of epistemic-enhancing work focuses on helping large populations. But some people will matter many times as much as others, and these people are often in key management positions.

Dramatic Room for Improvement

It's hard to look at recent political and business fiascos and have much confidence of what to expect in the future. It seems harder if you think that the world could get a lot more crazy in the future. I think there's wide acceptance in our communities that critical modern organization are poorly equipped to deal with modern or upcoming challenges.

The main question is if there are any effective approaches to improvement here. I would argue that AI assistance is a serious option.

Arguments for AI in Governance Improvements

Assuming that governance is an important area, why should we prioritize using AI to improve it?

Rapid Technological Progress

In contrast to most other domains relevant to improving governance, AI has seen remarkable and rapidly accelerating progress in recent years. LLMs are improving rapidly and now there are billions of dollars being invested in improving deep learning capabilities. We can expect this trend to continue.

Lack of Promising Alternatives

There aren't any AI-absent improvements that I could see making very large expected changes in governments or corporations in the next 20 years.

There seems to be surprisingly little interest in improving organizational boards. I don't know of any nonprofits focused on this, for example. Recent FTX and OpenAI board fiascos have demonstrated severe problems with boards, and I don't see these going away soon.
Despite decades of research into organizational psychology, human decision-making, statistics, and so on, government leaders continue to be severely lacking.
Better voting systems for governments seems nice, but very limited. I don't see these making substantial global differences in the next 20 years.
Current forecasting infrastructure is still a long way off from helping at executive levels. I think that next-generation AI-heavy systems could change that, but don't expect much from non-AI systems.

Reliable Oversight

One of the key challenges in holding human leaders accountable is the difficulty of comprehensively monitoring their actions, due to privacy concerns and logistical constraints. For instance, while it could be valuable to have 24/7 oversight of key decision-makers, few would consent to such invasive surveillance. In contrast, AI systems could be designed from the ground up to be fully transparent and amenable to constant monitoring.

Weak AI systems, in particular, could be both much easier to oversee and more effective at providing oversight than human-based approaches. For example, consider the challenge of ensuring that a team of 20 humans maintains 99.99% integrity - never accepting bribes or intentionally deceiving. Achieving this level of reliability with purely human oversight seems impractical. However, well-designed AI systems could potentially provide the necessary level of monitoring and control.

In the field of cybersecurity, social hacking is a well-known corporate vulnerability. Software controls and monitoring are often employed as a solution. As software capabilities advance, we can anticipate improvements in software-based mitigation of social hacking risks.

A simple step in this direction could be to require that teams rely on designated AI decision-support systems, at least in situations where deception or misconduct is most likely. More advanced "AI watchers" could eventually operate at scale to keep both human and machine agents consistently honest and aligned with organizational goals.

Complementary Human-AI Approaches

I'm particularly excited by the potential for well-designed AI governance tools to be mutually reinforcing with responsible human leadership, rather than purely substitutive. In the same way that the AI alignment research community aims to use limited AI assistants to help oversee and validate more advanced AI systems, we could use highly reliable AI to augment human oversight of both artificial and human agents. The complementary strengths and weaknesses of human and machine cognition could be a powerful combination. Humans fail in some predictable ways, and AI fail in some predictable ways, but there seems to be a small subset of strategies that could defeat both humans and AIs combined.

Synergies with Key AI Actors

The leading AI organizations and individuals are not only some of the most critical targets for improved oversight and alignment, but also the best positioned to pioneer effective AI governance techniques. They could have both the greatest need and capacity to innovate in this area. Leading AI teams are generally highly-innovative, strong at using AI, and knowledgeable of AI oversight - these skills seem very well suited to them being the earliest candidates to execute and innovate on AI-Assisted Governance strategies.

If AI companies claim to be developing robust and secure AI systems for the public good, it's reasonable to ask them to embrace AI-assisted oversight of their own activities.

The Recursive Enforcement Problem

Traditional human oversight systems struggle to ensure accountability at the highest levels, as the question of "who oversees the overseers" often remains unresolved. This can lead to distrust and corruption.

Integrating AI systems can mitigate this issue by designing highest-level oversight systems that maintain privacy while revealing key problems. These AI systems can be thoroughly tested to ensure competence and reliability, focusing on ensuring the honesty and alignment of human overseers.

To enhance AI-assisted oversight:

Implement secure communication channels between AI and human overseers
Make sure that AIs reveal key alignment failures, while minimizing the reveal of private information
Have human evaluators oversee these AI systems. These evaluators can do so while not seeing critical private information.
Have a dialogue between the organization and other actors to ensure that the human AI evaluators do a good job.

Such a system will still have challenges to find the best tradeoff between oversight and privacy, but it would introduce a lot more options than simply using humans to oversee other humans.

In a more advanced system, it seems possible that all entities could be overseen by certain limited AI overseers, in proportion to the need.

Strategy: Bootstrapping Effective AI Governance

AI-Assisted Governance could enable superhuman performance measures, such as "99.999% resistance to bribes or manipulation," to become feasible. This could provide a roadmap for governments to set increasingly stringent performance requirements that AI companies must meet in order to be permitted to develop more advanced and potentially dangerous technologies.

Prudent governments would start with highly conservative rules that impose rigorous demands and grant limited permissions. If AI developers demonstrate their ability to meet the initial performance benchmarks, the rules for higher levels will be reassessed.

With robust government regulation, this approach could create a powerful positive feedback loop, where at each stage, the system is expected to become safer, even as capabilities grow.

For instance:

AI companies enhance AI capabilities up to the limits they are permitted.
To meet the required standards, AI companies would be incentivized to incorporate AI capabilities into innovative governance tools. Their existing expertise and innovation capabilities in AI put them in a strong position to do so.
Other parties adopt some of these new governance tools. Importantly, some are implemented by government regulators, and many are implemented by AI organization evaluators and auditors.
Regulators review recent changes in AI development and developer governance, and update regulatory standards accordingly. If it appears very safe to do so, they may gradually relax certain measures. These regulators also leverage recent AI advancements to assist in setting standards.
The cycle begins anew.

Just as we might bootstrap safe AI systems, this could effectively bootstrap safe AI organizations. However, this self-reinforcing dynamic still necessitates robust external checks and balances. Independent auditors and evaluators with strong technical expertise and an adversarial mandate will be crucial to stress-test the AI governance solutions and prevent self-deception or manipulation by AI companies.

Objection 1: Would AI-Assisted Governance Increase AI Risk?

Some may object that integrating AI into governance structures could actually make dangerous AI outcomes more likely. After all, if we are already concerned about advanced AI systems seizing control, wouldn't putting them in positions of power be enormously risky?

There are certainly some very unwise ways one could combine AI and governance. Recklessly deploying an untested AGI system to replace human leaders, for instance. However, I believe that with sufficient care and incremental development, we can find balanced approaches that capture the benefits of AI-assisted governance while mitigating the risks.

"AI" is really an extremely broad category, encompassing a vast range of potential systems with radically different capabilities and risk profiles. Just as we use "technology" to counter misuse of "technology", or "politicians" to oversee other "politicians", we can leverage safer and more constrained AI systems to help govern the development and deployment of more advanced AI. A large spectrum exists between "AIs that are useful" and "AIs that pose existential risk". Many systems could offer significant benefits to governance without being capable of dangerous self-improvement or independent power-seeking.

For example, we could use heavily restricted AI systems to monitor communications and flag potential misconduct, without giving them any executive powers. We could have more advanced AI systems oversee the deployment of more mature and stable ones. Various oversight regimes and "AI watchers" could be implemented to maintain accountability. The key is to start with highly limited and transparent systems, and gradually scale up oversight and functionality as we gain confidence.

It's also worth noting that by the time AIs are truly dangerous, they may gain power regardless of where they are initially deployed. The most concerning AGI scenarios might not be worsened by responsible AI governance efforts in the interim. On the contrary, having more robust governance structures in place, bolstered by our most dependable AI tools, may actually help us navigate the emergence of advanced AI more safely.

In particular, well-designed AI oversight could make institutions more resistant to subversion and misuse by malicious agents, both human and artificial. For instance, requiring both human and AI authorization for sensitive actions could make unilateral corruption or deception much harder. Ultimately, while caution is certainly warranted, I suspect the protective benefits of AI-assisted governance are greater than the risks, if implemented thoughtfully.

Objection 2: Will AI Governance Dangerously Increase Complexity?

Another reasonable concern is that introducing AI into already complex governance structures could make the overall system more intricate and unpredictable. Highly capable AI agents engaging in strategic interactions could generate dynamics far beyond human understanding, leading to a loss of control. In the worst case, this opacity and complexity could make the whole system more fragile and vulnerable to catastrophic failure modes.

This is a serious challenge that would need to be carefully managed. Highly advanced AI pursuing incomprehensible strategies could certainly introduce dangerous volatility and uncertainty. We would need to invest heavily in interpretability and oversight capabilities to maintain a sufficient grasp on what's happening.

However, I suspect that thoughtful AI-assisted governance is more likely to ultimately reduce destructive unpredictability than increase it. Currently, many high-stakes decisions are made by individual human leaders with unchecked biases and failure modes. The rationality and consistency of well-designed AI systems could help stabilize decision-making and make it more legible.

There's a long track record of judicious optimization and standardization increasing the reliability and predictability of essential systems and infrastructures, from units of measurement to building codes to economic policy. I expect we could achieve similar benefits for governance by applying our most capable and aligned AI tools, e.g. to help leaders converge on more rational and evidence-based policies.

Some added complexity is likely inevitable when using advanced tools to improve old processes. The key question is whether the benefits justify the complexity. Given the extraordinary importance of good governance, and the ability for it to help reduce complexity where really needed, I believe some additional complexity in certain places is an acceptable price to pay for much more capable and aligned decision-making.

Objection 3: Could AI Governance Tools Backfire by Causing Overconfidence?

A final concern to consider is that AI-assisted governance tools might not work nearly as well as hoped, but could still engender overconfidence and complacency. If people perceive AI involvement as a panacea that makes governance trustworthy and reliable by default, they may grow overly deferential and reduce their vigilance. Flawed AI tools and overreliance on them could then cause more harm than if we had maintained a healthy skepticism.

This risk of misplaced confidence is real and will require active effort to combat. Any development and deployment of AI governance tools must be accompanied by clear communication about their limitations and failure modes. Overselling their capabilities or implying that they make conventional oversight and accountability unnecessary would be deeply irresponsible.

However, if AI governance researchers maintain a sober and realistic outlook, I'm optimistic that we can realize real gains without much undue hype. It's an empirical question how much the benefits of AI assistance will outweigh the drawbacks of overconfidence, but I suspect we can achieve a favorable balance. Note too that AI improvements in epistemic capabilities could help in making these decisions.

What Should Effective Altruists Do?

If governments mandated exceptional levels of governance, AI organizations would be driven to undertake the necessary work to achieve those standards. Therefore, effective altruists would ideally concentrate on ensuring the implementation of appropriate regulations, if that specific (perhaps very unlikely) approach seems viable.

Determining the specifics of such regulations would be challenging, as would be the process of getting them approved. Setting benchmarks for organizational governance is not a straightforward task and warrants further investigation. It's important to note that these standards may be initially unattainable using current technology, provided there is a path to eventually meeting them with future AI advancements.

However, there are numerous other entities for which we desire excellent governance, but they may lack the capabilities or motivation to conduct the required research. In such cases, some level of philanthropic support in this area appears to be warranted.

I'll also note: the main writing I've seen in this space comes from the AI policy sector. I'm thankful for this, but I'd like to see more ideation from the tech sector. I'd be eager to get better futurist ideas and demos of stories where creative technologies make organizational governance amazing in the next 50 years. Similar to my desire to see more great futuristic epistemic ideas, I'd love to see more futuristic governance ideas as well.