導(dǎo)讀
由三位圖靈獎得主約書亞·本吉奧(Yoshua Bengio)、杰佛瑞·辛頓(Geoffrey Hinton)、姚期智(Andrew Yao)(清華大學(xué)人工智能國際治理研究院學(xué)術(shù)委員會主席)領(lǐng)銜,連同多位權(quán)威專家,包括諾貝爾經(jīng)濟學(xué)獎得主丹尼爾·卡內(nèi)曼 (Daniel Kahneman) 以及清華大學(xué)講席教授張亞勤(清華大學(xué)人工智能國際治理研究院學(xué)術(shù)委員)、文科資深教授薛瀾(清華大學(xué)蘇世民書院院長、公共管理學(xué)院學(xué)術(shù)委員會主任、清華大學(xué)人工智能國際治理研究院院長)等共同撰寫的文章 “Managing extreme AI risks amid rapid progress”(《人工智能飛速進步背景下的極端風(fēng)險管理》) 于2024年5月20日發(fā)表于美國《科學(xué)》雜志。
現(xiàn)將全文進行翻譯,以饗讀者。

文章摘要
Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI’s impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI [1], there is a lack of consensus about how to manage them. Society’s response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness and barely address autonomous systems. Drawing on lessons learned from other safety-critical technologies, we outline a comprehensive plan that combines technical research and development (R&D) with proactive, adaptive governance mechanisms for a more commensurate preparation.
在人工智能(AI)快速發(fā)展之時,眾多企業(yè)逐漸將重點轉(zhuǎn)移至開發(fā)可自主行動和追求目標的通用AI系統(tǒng)。隨著能力與自主性的提高,AI的影響將會被大幅擴大,帶來大規(guī)模社會危害、惡意使用以及人類不可逆地失去對自主AI系統(tǒng)的控制等風(fēng)險。盡管研究人員都對AI的極端風(fēng)險作出了警告,但對于如何管理這些風(fēng)險依然缺乏共識。盡管在應(yīng)對措施方面已經(jīng)有了一些初步進展,但與許多專家所預(yù)測且極有可能出現(xiàn)的發(fā)展速度和變化程度相比,當(dāng)前的社會回應(yīng)仍顯不足。AI安全研究已經(jīng)滯后了。目前的治理舉措缺少能預(yù)防濫用和使用不當(dāng)?shù)臋C制和機構(gòu),且?guī)缀鯖]涉及自主系統(tǒng)。鑒于其他關(guān)鍵技術(shù)在安全上的經(jīng)驗教訓(xùn),我們提出了一項將技術(shù)研發(fā)與主動、適應(yīng)的治理機制相結(jié)合的綜合方案,以此來做好更相符的準備。
01 RAPID PROGRESS, HIGH STAKES
快速發(fā)展,高風(fēng)險
Present deep-learning systems still lack important capabilities, and we do not know how long it will take to develop them. However, companies are engaged in a race to create generalist AI systems that match or exceed human abilities in most cognitive work. They are rapidly deploying more resources and developing new techniques to increase AI capabilities, with investment in training state-of-the-art models tripling annually.
我們無法精準預(yù)計,當(dāng)前深度學(xué)習(xí)系統(tǒng)所欠缺的重要能力還需要多長時間才能被開發(fā)出來。但進度顯然已被加快,因為各公司都在競相創(chuàng)造在大多數(shù)認知工作中匹配或超越人類能力的通用AI系統(tǒng)。他們正在快速地部署更多資源和開發(fā)新技術(shù)來提高AI能力,每年訓(xùn)練最先進模型的投資已經(jīng)增加了兩倍。
There is much room for further advances because tech companies have the cash reserves needed to scale the latest training runs by multiples of 100 to 1000. Hardware and algorithms will also improve: AI computing chips have been getting 1.4 times more cost-effective, and AI training algorithms 2.5 times more efficient, each year. Progress in AI also enables faster AI progress — AI assistants are increasingly used to automate programming, data collection, and chip design.
AI得到進一步發(fā)展的空間是非常巨大的,因為科技公司擁有的現(xiàn)金儲備足以將當(dāng)前最先進的訓(xùn)練規(guī)模再擴大100到1000倍。硬件和算法也將得到改進:AI計算芯片的成本效益每年將提高1.4倍,AI訓(xùn)練算法的效率每年將提高2.5倍。AI的進步也會促進自身的發(fā)展速度——AI助手越來越多地用于自動化編程、數(shù)據(jù)收集和芯片設(shè)計。
There is no fundamental reason for AI progress to slow or halt at human-level abilities. Indeed, AI has already surpassed human abilities in narrow domains such as playing strategy games and predicting how proteins fold. Compared with humans, AI systems can act faster, absorb more knowledge, and communicate at a higher bandwidth. Additionally, they can be scaled to use immense computational resources and can be replicated by the millions.
沒有證據(jù)表明,當(dāng)AI具有和人類同等水平時,其發(fā)展速度就會減緩或停止。事實上,AI已經(jīng)在玩策略游戲和預(yù)測蛋白質(zhì)如何折疊等狹窄領(lǐng)域超越了人類的能力。與人類相比,AI系統(tǒng)可以更快地行動、吸收更多的知識并以更高的帶寬進行通信。此外,它們可以被不斷擴展從而使用巨大的計算資源并進行數(shù)百萬次復(fù)制。
We do not know for certain how the future of AI will unfold. However, we must take seriously the possibility that highly powerful generalist AI systems that outperform human abilities across many critical domains will be developed within this decade or the next. What happens then? More capable AI systems have larger impacts. Especially as AI matches and surpasses human workers in capabilities and cost-effectiveness, we expect a massive increase in AI deployment, opportunities, and risks. If managed carefully and distributed fairly, AI could help humanity cure diseases, elevate living standards, and protect ecosystems. The opportunities are immense.
我們不確定AI的未來將如何發(fā)展。然而,我們必須認真對待這樣一種可能性:在未來十年或二十年內(nèi),許多在關(guān)鍵領(lǐng)域超越人類能力的強大通用AI系統(tǒng)將會被開發(fā)出來。那時會發(fā)生什么?能力更強的AI系統(tǒng)會產(chǎn)生更大的影響。特別是當(dāng)AI在能力和成本效益方面趕上甚至超過人類工作者時,我們預(yù)計AI的部署、機遇和風(fēng)險都將大幅增加。如果管理得當(dāng)且分配公平,AI也可以幫助人類治愈疾病、提高生活水平、保護生態(tài)系統(tǒng)。因此,機遇是巨大的。
But alongside advanced AI capabilities come large-scale risks. AI systems threaten to amplify social injustice, erode social stability, enable large-scale criminal activity, and facilitate automated warfare, customized mass manipulation, and pervasive surveillance.
但先進的AI能力也會帶來大規(guī)模風(fēng)險。AI系統(tǒng)會加劇社會不公正,破壞社會穩(wěn)定,助長大規(guī)模犯罪活動,并促進自動化戰(zhàn)爭、定制化的大規(guī)模操縱和無處不在的監(jiān)視。
Many risks could soon be amplified, and new risks created, as companies work to develop autonomous AI: systems that can use tools such as computers to act in the world and pursue goals. Malicious actors could deliberately embed undesirable goals. Without R&D breakthroughs (see next section), even well-meaning developers may inadvertently create AI systems that pursue unintended goals: The reward signal used to train AI systems usually fails to fully capture the intended objectives, leading to AI systems that pursue the literal specification rather than the intended outcome. Additionally, the training data never captures all relevant situations, leading to AI systems that pursue undesirable goals in new situations encountered after training.
各公司致力于開發(fā)自主AI的同時,許多風(fēng)險可能很快會被放大并產(chǎn)生新的風(fēng)險:這些系統(tǒng)可以利用計算機等工具在全球行動并追求目標。惡意行為者可能會故意嵌入不良目標。如果沒有研發(fā)突破(見下一節(jié)),即使是善意的開發(fā)人員也可能會無意中創(chuàng)造追求非預(yù)定目標的AI系統(tǒng):用于訓(xùn)練AI系統(tǒng)的獎勵信號通常無法完全指向預(yù)定目標,導(dǎo)致AI系統(tǒng)追求字面規(guī)范而非預(yù)期結(jié)果。此外,訓(xùn)練數(shù)據(jù)永遠無法涵蓋所有情況,導(dǎo)致AI系統(tǒng)在訓(xùn)練后遇到新情況時導(dǎo)向不良目標。
Once autonomous AI systems pursue undesirable goals, we may be unable to keep them in check. Control of software is an old and unsolved problem: computer worms have long been able to proliferate and avoid detection. However, AI is making progress in critical domains such as hacking, social manipulation, and strategic planning and may soon pose unprecedented control challenges. To advance undesirable goals, AI systems could gain human trust, acquire resources, and influence key decision-makers. To avoid human intervention, they might copy their algorithms across global server networks. In open conflict, AI systems could autonomously deploy a variety of weapons, including biological ones. AI systems having access to such technology would merely continue existing trends to automate military activity. Finally, AI systems will not need to plot for influence if it is freely handed over. Companies, governments, and militaries may let autonomous AI systems assume critical societal roles in the name of efficiency.
一旦自主AI系統(tǒng)導(dǎo)向不良目標,我們可能無法控制它們。軟件控制是一個久遠且尚未解決的問題:計算機蠕蟲長期以來一直能夠擴散并躲避檢測。然而,AI正在黑客攻擊、社交操縱和戰(zhàn)略規(guī)劃等關(guān)鍵領(lǐng)域取得進展,可能很快會帶來前所未有的控制挑戰(zhàn)。為了推進不良目標,AI系統(tǒng)可能會取得人類信任、獲取資源并影響關(guān)鍵決策者。為了避免人為干預(yù),它們可能會在全球服務(wù)器網(wǎng)絡(luò)上復(fù)制其算法。在公開沖突中,AI系統(tǒng)可能會自主部署各種武器,包括生物武器。擁有此類技術(shù)的AI系統(tǒng)只會延續(xù)已有的軍事活動自動化趨勢。最后,如果AI系統(tǒng)被很輕易地賦予關(guān)鍵的社會角色,它們將不再需要陰謀來獲得影響力。公司、政府和軍隊都有可能會以效率的名義讓自主AI系統(tǒng)承擔(dān)關(guān)鍵的社會角色。
Without sufficient caution, we may irreversibly lose control of autonomous AI systems, rendering human intervention ineffective. Large-scale cybercrime, social manipulation, and other harms could escalate rapidly. This unchecked AI advancement could culminate in a large scale loss of life and biosphere, and the marginalization or extinction of humanity.
如果缺乏足夠的謹慎,我們可能會不可逆轉(zhuǎn)地失去對自主AI系統(tǒng)的控制,導(dǎo)致人類的干預(yù)無效。大規(guī)模網(wǎng)絡(luò)犯罪、社交操縱和其他危害可能會迅速升級。這種不受控制的AI進展可能最終導(dǎo)致生命和生物圈的大規(guī)模損失,以及人類被邊緣化或滅絕。
We are not on track to handle these risks well. Humanity is pouring vast resources into making AI systems more powerful but far less into their safety and mitigating their harms. Only an estimated 1 to 3% of AI publications are on safety. For AI to be a boon, we must reorient; pushing AI capabilities alone is not enough.
我們還沒做好應(yīng)對這些風(fēng)險的準備。人類正在投入大量資源,使AI系統(tǒng)變得更強大,但對其安全性和減輕危害方面的投入?yún)s投入甚少。據(jù)估計,只有1%到3%關(guān)于AI的出版物是講安全的。為了讓AI造福人類,我們必須重新調(diào)整,僅僅推動AI能力發(fā)展是不夠的。
We are already behind schedule for this reorientation. The scale of the risks means that we need to be proactive, because the costs of being unprepared far outweigh those of premature preparation. We must anticipate the amplification of ongoing harms, as well as new risks, and prepare for the largest risks well before they materialize.
在重新調(diào)整的進度中,我們依然處于落后狀態(tài)。大規(guī)模風(fēng)險意味著我們要更積極主動,毫無準備的代價遠遠大于提早準備的代價。我們必須預(yù)見到當(dāng)前危害的進一步擴大以及新風(fēng)險出現(xiàn),并在最大風(fēng)險發(fā)生之前做好準備。
02 REORIENT TECHNICAL R&D
重新調(diào)整技術(shù)研發(fā)
There are many open technical challenges in ensuring the safety and ethical use of generalist, autonomous AI systems. Unlike advancing AI capabilities, these challenges cannot be addressed by simply using more computing power to train bigger models. They are unlikely to resolve automatically as AI systems get more capable and require dedicated research and engineering efforts. In some cases, leaps of progress may be needed; we thus do not know whether technical work can fundamentally solve these challenges in time. However, there has been comparatively little work on many of these challenges. More R&D may thus facilitate progress and reduce risks. A first set of R&D areas needs breakthroughs to enable reliably safe AI. Without this progress, developers must either risk creating unsafe systems or falling behind competitors who are willing to take more risks. If ensuring safety remains too difficult, extreme governance measures would be needed to prevent corner-cutting driven by competition and overconfidence. These R&D challenges include the following:
在確保通用、自主AI系統(tǒng)的安全和道德使用方面,存在許多公開的技術(shù)挑戰(zhàn)。與提高AI能力不同的是,這些挑戰(zhàn)不能只通過使用更多算力培訓(xùn)更大模型來解決。這些挑戰(zhàn)也不可能會隨著AI系統(tǒng)能力的提高而自動解決,反而需要更有針對性的研究和工程開發(fā)。在某些情況下,可能需要突破性進展;因此,我們不知道技術(shù)發(fā)展能否及時從根本上解決這些挑戰(zhàn)。針對這些挑戰(zhàn)而開展的工作相對較少。我們需要更多能促進發(fā)展和降低風(fēng)險的研發(fā)。第一組需要實現(xiàn)突破的研發(fā)領(lǐng)域是實現(xiàn)可靠安全的AI。否則,開發(fā)人員如果不冒著創(chuàng)建不安全系統(tǒng)的風(fēng)險,那么就會落后于更冒進的競爭對手。如果確保安全還是太難實現(xiàn),那就需要采取一些極端的治理措施,以防止因競爭和過度自信導(dǎo)致的“偷工減料”。這些研發(fā)挑戰(zhàn)包括以下內(nèi)容:
Oversight and honesty More capable AI systems can better exploit weaknesses in technical oversight and testing, for example, by producing false but compelling output.
監(jiān)督與誠信 能力更強的AI系統(tǒng)將會更好地利用技術(shù)監(jiān)督和測試方面的缺陷,例如,生產(chǎn)虛假但令人信服的輸出。
Robustness AI systems behave unpredictably in new situations. Whereas some aspects of robustness improve with model scale, other aspects do not or even get worse.
魯棒性 AI系統(tǒng)在新情況下的表現(xiàn)難以預(yù)測。魯棒性的某些方面會隨著模型規(guī)模而改善,而其他方面則不會,甚至?xí)兊酶恪?/p>
Interpretability and transparency AI decision-making is opaque, with larger, more capable models being more complex to interpret. So far, we can only test large models through trial and error. We need to learn to understand their inner workings.
可解釋性和透明度 AI決策是不透明的,規(guī)模更大、能力更強的模型就更難以解釋。到目前為止,我們只能通過試錯來測試大模型。我們需要學(xué)會去理解它們的內(nèi)部運作機制。
Inclusive AI development AI advancement will need methods to mitigate biases and integrate the values of the many populations it will affect.
包容的AI發(fā)展 發(fā)展AI需要用各種方法來減輕偏見,并整合將其會影響的眾多群體的價值觀。
Addressing emerging challenges Future AI systems may exhibit failure modes that we have so far seen only in theory or lab experiments, such as AI systems taking control over the training reward-provision channels or exploiting weaknesses in our safety objectives and shutdown mechanisms to advance a particular goal.
應(yīng)對新興挑戰(zhàn) 未來的AI系統(tǒng)可能會表現(xiàn)出迄今為止我們僅在理論或?qū)嶒炇覍嶒炛锌吹竭^的失效模式,例如AI系統(tǒng)掌控“訓(xùn)練獎勵—供應(yīng)”渠道,或利用我們在安全目標和關(guān)閉機制中的缺陷來推進某一特定目標。
A second set of R&D challenges needs progress to enable effective, risk-adjusted governance or to reduce harms when safety and governance fail.
第二組需要取得進展的研發(fā)領(lǐng)域是,實現(xiàn)有效的、風(fēng)險調(diào)整的治理,或在安全措施和治理失效時減少危害。
Evaluation for dangerous capabilities As AI developers scale their systems, unforeseen capabilities appear spontaneously, without explicit programming. They are often only discovered after deployment. We need rigorous methods to elicit and assess AI capabilities and to predict them before training. This includes both generic capabilities to achieve ambitious goals in the world (e.g., long-term planning and execution) as well as specific dangerous capabilities based on threat models (e.g., social manipulation or hacking). Present evaluations of frontier AI models for dangerous capabilities, which are key to various AI policy frameworks, are limited to spot-checks and attempted demonstrations in specific settings. These evaluations can sometimes demonstrate dangerous capabilities but cannot reliably rule them out: AI systems that lacked certain capabilities in the tests may well demonstrate them in slightly different settings or with post training enhancements. Decisions that depend on AI systems not crossing any red lines thus need large safety margins. Improved evaluation tools decrease the chance of missing dangerous capabilities, allowing for smaller margins.
評估危險能力隨著AI開發(fā)人員對系統(tǒng)進行擴展,不可預(yù)見的能力會在沒有明確編程的情況下自發(fā)出現(xiàn),通常只有在部署后才會被發(fā)現(xiàn)。我們需要嚴格的方法來探知和評估AI能力,并在訓(xùn)練前對其進行預(yù)測。這既包括在世界上實現(xiàn)宏偉目標的通用能力(如長期規(guī)劃和執(zhí)行),也包括基于威脅模型的特定危險能力(如社交操縱或黑客攻擊)。目前對前沿AI模型危險能力的評估——僅限于在特定環(huán)境下的抽查和的推演——是各種AI政策框架的關(guān)鍵。這些評估有時可以展示危險能力,但不能完全排除它們:在測試中缺乏某些能力的AI系統(tǒng),很可能在稍有不同的環(huán)境下或經(jīng)過后期訓(xùn)練增強后顯示出這些能力。因此,在AI系統(tǒng)不跨越任何紅線基礎(chǔ)上的決策需要更大的安全邊界。改進的評估工具可以降低遺漏危險能力的幾率,從而允許更小的安全邊界。
Evaluating AI alignment If AI progress continues, AI systems will eventually possess highly dangerous capabilities. Before training and deploying such systems, we need methods to assess their propensity to use these capabilities. Purely behavioral evaluations may fail for advanced AI systems: Similar to humans, they might behave differently under evaluation, faking alignment.
評估人工智能對齊 如果AI繼續(xù)發(fā)展下去,AI系統(tǒng)最終將擁有高度危險的能力。在訓(xùn)練和部署這些系統(tǒng)之前,我們需要一些方法來評估系統(tǒng)使用這些能力的傾向。對于先進的AI系統(tǒng),單純的行為評估可能會失?。号c人類類似,它們可能會在評估中刻意表現(xiàn)不同,從而制造虛假對齊。
Risk assessment We must learn to assess not just dangerous capabilities but also risk in a societal context, with complex interactions and vulnerabilities. Rigorous risk assessment for frontier AI systems remains an open challenge owing to their broad capabilities and pervasive deployment across diverse application areas.
風(fēng)險評估 我們不僅要學(xué)會評估AI產(chǎn)生的直接風(fēng)險,還要學(xué)會評估具有復(fù)雜性和脆弱性的社會背景下AI產(chǎn)生的一系列風(fēng)險。事實上,鑒于前沿AI系統(tǒng)具有通用性能力,被廣泛應(yīng)用于眾多領(lǐng)域,對相關(guān)系統(tǒng)進行嚴格的風(fēng)險評估仍然是一項重要挑戰(zhàn)。
Resilience Inevitably, some will misuse or act recklessly with AI. We need tools to detect and defend against AI-enabled threats such as large-scale influence operations, biological risks, and cyberattacks. However, as AI systems become more capable, they will eventually be able to circumvent humanmade defenses. To enable more powerful AI based defenses, we first need to learn how to make AI systems safe and aligned.
韌性不可避免的是,有些人會濫用、惡用AI。我們需要各類工具以檢測和防御AI賦能產(chǎn)生的威脅,如大規(guī)模影響力行動、生物風(fēng)險、網(wǎng)絡(luò)攻擊等。隨著AI系統(tǒng)的能力越來越強,它們規(guī)避人為防御的能力也在不斷增強。為了實現(xiàn)更強大的基于AI的防御,我們首先需要學(xué)習(xí)如何確保AI系統(tǒng)的安全和對齊。
Given the stakes, we call on major tech companies and public funders to allocate at least one-third of their AI R&D budget, comparable to their funding for AI capabilities, toward addressing the above R&D challenges and ensuring AI safety and ethical use. Beyond traditional research grants, government support could include prizes, advance market commitments, and other incentives. Addressing these challenges, with an eye toward powerful future systems, must become central to our field.
鑒于事關(guān)重大,我們呼吁大型科技公司和公共資助者至少將其AI研發(fā)預(yù)算的三分之一用于解決上述挑戰(zhàn)并確保AI安全和道德使用。除了傳統(tǒng)的研發(fā)投入外,政府還可以提供獎金、預(yù)先市場承諾等各類激勵措施。著眼于強大的未來AI系統(tǒng),應(yīng)對這些挑戰(zhàn)必須成為我們關(guān)注的核心。
03 GOVERNANCE MEASURES
治理措施
We urgently need national institutions and international governance to enforce standards that prevent recklessness and misuse. Many areas of technology, from pharmaceuticals to financial systems and nuclear energy, show that society requires and effectively uses government oversight to reduce risks. However, governance frameworks for AI are far less developed and lag behind rapid technological progress. We can take inspiration from the governance of other safety-critical technologies while keeping the distinctiveness of advanced AI in mind—that it far outstrips other technologies in its potential to act and develop ideas autonomously, progress explosively, behave in an adversarial manner, and cause irreversible damage. Governments worldwide have taken positive steps on frontier AI, with key players, including China, the United States, the European Union, and the United Kingdom, engaging in discussions and introducing initial guidelines or regulations. Despite their limitations—often voluntary adherence, limited geographic scope, and exclusion of high-risk areas like military and R&D-stage systems—these are important initial steps toward, among others, developer accountability, third-party audits, and industry standards.
我們迫切需要國家機構(gòu)和國際治理來執(zhí)行AI誤用和和濫用的預(yù)防標準。生物醫(yī)藥、金融、到核能等眾多技術(shù)領(lǐng)域的經(jīng)驗表明,社會需要通過政府監(jiān)督降低風(fēng)險。然而,AI的治理框架遠遠落后于技術(shù)的快速發(fā)展。我們可以從其他高風(fēng)險技術(shù)的治理中汲取靈感,同時牢記前沿AI技術(shù)的獨特性——AI在自主行動和自主意識、對抗性行為及造成不可逆損害等方面遠遠超過其他技術(shù)。目前,包括中國、美國、歐盟和英國等主要參與者在前沿AI上采取了積極舉措,提出了初步指導(dǎo)方針或法規(guī)。盡管這些方針或法規(guī)存在以自愿遵守為主、地理范圍有限、不包括軍事等高風(fēng)險領(lǐng)域等局限性,但仍朝著開發(fā)者問責(zé)制、第三方監(jiān)管的行業(yè)標準等治理方向邁出了重要的第一步。
Yet these governance plans fall critically short in view of the rapid progress in AI capabilities. We need governance measures that prepare us for sudden AI breakthroughs while being politically feasible despite disagreement and uncertainty about AI timelines. The key is policies that automatically trigger when AI hits certain capability milestones. If AI advances rapidly, strict requirements automatically take effect, but if progress slows, the requirements relax accordingly. Rapid, unpredictable progress also means that risk-reduction efforts must be proactive—identifying risks from next generation systems and requiring developers to address them before taking high-risk actions. We need fast-acting, tech savvy institutions for AI oversight, mandatory and much-more rigorous risk assessments with enforceable consequences (including assessments that put the burden of proof on AI developers), and mitigation standards commensurate to powerful autonomous AI. Without these, companies, militaries, and governments may seek a competitive edge by pushing AI capabilities to new heights while cutting corners on safety or by delegating key societal roles to autonomous AI systems with insufficient human oversight, reaping the rewards of AI development while leaving society to deal with the consequences.
鑒于AI能力的快速進步,上述治理計劃遠遠不夠。盡管目前各屆關(guān)于AI發(fā)展的時間表仍有分歧,但人類仍需采取政治上可行的方式,為AI領(lǐng)域隨時可能產(chǎn)生的技術(shù)突破做好準備。實現(xiàn)這一目標的關(guān)鍵就是提前設(shè)定AI達到某些能力閾值時自動觸發(fā)的監(jiān)管政策,即如果AI發(fā)展迅速,嚴格的監(jiān)管要求就會自動生效,但如果AI發(fā)展緩慢,監(jiān)管政策就會相應(yīng)放寬。同時,AI快速、不可預(yù)測的發(fā)展還意味著要人類需要提前識別下一代AI系統(tǒng)的潛在風(fēng)險,并要求相關(guān)系統(tǒng)的開發(fā)人員提前準備控制風(fēng)險的相關(guān)措施。最后,我們還需要行動迅速、精通技術(shù)的行政機構(gòu)來監(jiān)督AI,需要強制性的、更加嚴格的風(fēng)險評估及執(zhí)行措施(包括要求AI開發(fā)者承擔(dān)舉證責(zé)任)。如果沒有上述措施,公司、軍隊和政府可能會為了尋求競爭優(yōu)勢而將AI能力盲目推向新的高度,但在安全問題上“偷工減料”;或?qū)㈥P(guān)鍵的社會角色委托給自主AI系統(tǒng),卻沒有提供足夠的人類監(jiān)督,讓全社會承擔(dān)AI系統(tǒng)可能帶來的負面影響。
Institutions to govern the rapidly moving frontier of AI To keep up with rapid progress and avoid quickly outdated, inflexible laws national institutions need strong technical expertise and the authority to act swiftly. To facilitate technically demanding risk assessments and mitigations, they will require far greater funding and talent than they are due to receive under almost any present policy plan. To address international race dynamics, they need the affordance to facilitate international agreements and partnerships. Institutions should protect low-risk use and low-risk academic research by avoiding undue bureaucratic hurdles for small, predictable AI models. The most pressing scrutiny should be on AI systems at the frontier: the few most powerful systems, trained on billion-dollar supercomputers, that will have the most hazardous and unpredictable capabilities.
應(yīng)對人工智能前沿快速發(fā)展的治理機構(gòu)為了跟上AI技術(shù)快速發(fā)展的步伐,避免治理體系的相對落后,國家機構(gòu)需要強大的技術(shù)能力和迅速采取行動的權(quán)力。為了實現(xiàn)高要求的技術(shù)風(fēng)險評估和治理,這些機構(gòu)需要遠超現(xiàn)行行政機構(gòu)的資金和人才。為了應(yīng)對AI領(lǐng)域的國際競爭,這些機構(gòu)還需要具備推動國際協(xié)議和伙伴關(guān)系的對外交流能力。同時,這些機構(gòu)需要避免針對小型、可預(yù)測的AI模型設(shè)置不當(dāng)?shù)墓倭耪系K,保護低風(fēng)險領(lǐng)域AI技術(shù)的使用和學(xué)術(shù)研究。目前最迫切的監(jiān)管需求還是集中在少數(shù)能力最強、最前沿的,具有危險的不可預(yù)測能力的AI系統(tǒng)。
Government insight To identify risks, governments urgently need comprehensive insight into AI development. Regulators should mandate whistleblower protections, incident reporting, registration of key information on frontier AI systems and their datasets throughout their life cycle, and monitoring of model development and supercomputer usage. Recent policy developments should not stop at requiring that companies report the results of voluntary or underspecified model evaluations shortly before deployment. Regulators can and should require that frontier AI developers grant external auditors on-site, comprehensive (“white box”), and fine-tuning access from the start of model development. This is needed to identify dangerous model capabilities such as autonomous self-replication, large scale persuasion, breaking into computer systems, developing (autonomous) weapons, or making pandemic pathogens widely accessible.
政府洞察力 為了識別風(fēng)險,政府迫切需要全面了解AI的發(fā)展情況。監(jiān)管機構(gòu)應(yīng)強制記錄前沿AI系統(tǒng)及其整個生命周期數(shù)據(jù)集的關(guān)鍵信息,監(jiān)控相關(guān)模型的開發(fā)和超級計算機的使用。最新的政策發(fā)展不應(yīng)局限于要求公司在部署前才報告模型評估結(jié)果,監(jiān)管機構(gòu)可以要求前沿AI開發(fā)者從模型開發(fā)伊始就授予外部人員審查、“白盒”和微調(diào)的訪問權(quán)限。這些監(jiān)管措施對于識別自主自我復(fù)制、大規(guī)模說服、侵入計算機系統(tǒng)、開發(fā)(自主)武器或使流行病病原體廣泛傳播等風(fēng)險是極為必要的。
Safety cases Despite evaluations, we cannot consider coming powerful frontier AI systems “safe unless proven unsafe.” With present testing methodologies, issues can easily be missed. Additionally, it is unclear whether governments can quickly build the immense expertise needed for reliable technical evaluations of AI capabilities and societal-scale risks. Given this, developers of frontier AI should carry the burden of proof to demonstrate that their plans keep risks within acceptable limits. By doing so, they would follow best practices for risk management from industries, such as aviation, medical devices, and defense software, in which companies make safety cases: structured arguments with falsifiable claims supported by evidence that identify potential hazards, describe mitigations, show that systems will not cross certain red lines, and model possible outcomes to assess risk. Safety cases could leverage developers’ in-depth experience with their own systems. Safety cases are politically viable even when people disagree on how advanced AI will become because it is easier to demonstrate that a system is safe when its capabilities are limited. Governments are not passive recipients of safety cases: they set risk thresholds, codify best practices, employ experts and third-party auditors to assess safety cases and conduct independent model evaluations, and hold developers liable if their safety claims are later falsified.
安全論證 哪怕按照上述步驟進行評估,我們?nèi)匀粺o法將即將到來的強大前沿AI系統(tǒng)視為“在未證明其不安全之前就是安全的”。使用現(xiàn)有的測試方法,很容易出現(xiàn)遺漏問題。此外,我們尚不清楚政府能否迅速建立對AI能力和社會規(guī)模風(fēng)險進行可靠技術(shù)評估所需的大量專業(yè)知識。有鑒于此,前沿AI的開發(fā)者應(yīng)該承擔(dān)相應(yīng)責(zé)任,證明他們的AI模型將風(fēng)險控制在可接受的范圍內(nèi)。通過多方參與,開發(fā)者們將遵循航空、醫(yī)療設(shè)備、國防軟件等行業(yè)風(fēng)險管理的歷史實踐。在上述行業(yè)中,公司被要求提出安全案例,通過結(jié)構(gòu)化的論證、可證偽的分析和情景模識別潛在風(fēng)險、劃清紅線,這一模式可以充分利用開發(fā)人員對相關(guān)系統(tǒng)的深入了解。同時,即使人們對AI的先進程度存在分歧,安全案例在政治上也是可行的,因為在AI系統(tǒng)能力有限的情況下,證明系統(tǒng)是安全反而更加容易。最后,政府并不是安全論證的被動接受者,而是可以通過設(shè)置風(fēng)險閾值、制定最佳實踐規(guī)范、聘請專家和第三方機構(gòu)評估安全論證等形式進行管理,并在開發(fā)者安全聲明被證偽時追究其責(zé)任。
Mitigation To keep AI risks within acceptable limits, we need governance mechanisms that are matched to the magnitude of the risks. Regulators should clarify legal responsibilities that arise from existing liability frameworks and hold frontier AI developers and owners legally accountable for harms from their models that can be reasonably foreseen and prevented, including harms that foreseeably arise from deploying powerful AI systems whose behavior they cannot predict. Liability, together with consequential evaluations and safety cases, can prevent harm and create much-needed incentives to invest in safety.
緩解措施 為了將AI風(fēng)險控制在可接受的范圍內(nèi),我們需要建立與風(fēng)險等級相匹配的治理機制。監(jiān)管機構(gòu)應(yīng)明確現(xiàn)有責(zé)任框架劃定的法律責(zé)任,并要求前沿AI系統(tǒng)的開發(fā)者和所有者對其模型所產(chǎn)生的、可以合理預(yù)見和預(yù)防的危害承擔(dān)法律責(zé)任,包括可以預(yù)見的源于部署強大AI系統(tǒng)(其行為無法預(yù)測)造成的損害。責(zé)任與后果評估應(yīng)該和安全論證一起,為AI風(fēng)險治理提供保障。
Commensurate mitigations are needed for exceptionally capable future AI systems, such as autonomous systems that could circumvent human control. Governments must be prepared to license their development, restrict their autonomy in key societal roles, halt their development and deployment in response to worrying capabilities, mandate access controls, and require information security measures robust to state-level hackers until adequate protections are ready. Governments should build these capacities now.
針對能力超強的未來AI系統(tǒng),例如能夠繞過人類控制的自主系統(tǒng),我們需要采取相應(yīng)的緩解措施。政府必須做好準備,對這類系統(tǒng)的開發(fā)進行管理,限制其在關(guān)鍵社會角色中的自主性,并針對令人擔(dān)憂的能力暫停其研發(fā)和部署。同時,政府應(yīng)強制實施訪問控制,要求其具備抵御國家級黑客的安全措施。政府應(yīng)從現(xiàn)在開始著手建立這些能力。
To bridge the time until regulations are complete, major AI companies should promptly lay out “if-then” commitments: specific safety measures they will take if specific red-line capabilities are found in their AI systems. These commitments should be detailed and independently scrutinized. Regulators should encourage a race-to-the top among companies by using the best-in class commitments, together with other inputs, to inform standards that apply to all players.
為了縮短監(jiān)管完善的空窗期,主要的AI公司應(yīng)該迅速作出“If-Then”的承諾,即如果在他們的 AI 系統(tǒng)中發(fā)現(xiàn)了特定的越界能力,他們將采取針對性的安全措施。這些承諾應(yīng)該足夠詳細且經(jīng)過獨立審查。監(jiān)管機構(gòu)應(yīng)鼓勵公司之間進行“向上看齊”的競爭,利用同類最佳的承諾制定適用于所有參與者的共同標準。
To steer AI toward positive outcomes and away from catastrophe, we need to reorient. There is a responsible path—if we have the wisdom to take it.
為了引導(dǎo)人工智能“智能向善”并避免災(zāi)難性后果,我們需要及時調(diào)整治理方向。只要我們有足夠智慧,一定能夠找到一條實現(xiàn)“負責(zé)任的人工智能”的道路。
供稿丨清華大學(xué)人工智能國際治理研究院
清華大學(xué)應(yīng)急管理研究基地