对齐研究中心

对齐研究中心
Alignment Research Center
成立时间	2021年4月
创始人	保罗·克里斯蒂亚诺（英语：Paul Christiano (researcher)）; 贝丝·巴恩斯（Beth Barnes）; Mark Xu
类型	非营利研究机构
法律地位	501(c)(3)免税公益组织
总部	美国加利福尼亚州伯克利
目标	人工智慧对齐和安全性研究（英语：AI safety）
网站	alignment.org

对齐研究中心（英语：Alignment Research Center, ARC）是美国的非营利研究机构，致力将人工智慧的行为对齐人类的价值观和预期利益。^[1]对齐研究中心由美国人工智慧研究实验室OpenAI前研究员保罗·克里斯蒂亚诺（英语：Paul Christiano (researcher)）创立，专注于识别和理解AI模型的潜在危害。^[2]^[3]

概述

对齐研究中心的使命是确保未来的机器学习系统能够安全地设计和开发，并造福人类。研究中心由保罗·克里斯蒂亚诺（英语：Paul Christiano (researcher)）和其他研究人员于2021年4月创立，主要研究对人工智慧对齐相关理论的挑战^[4]，理论的一关键在于当人工智慧系统变得愈加先进时，其设计者人类开发的对齐技术可能因此被规避或发现漏洞。^[5]对齐研究中心亦尝试从理论工作提升至实证研究、相关产业的合作和政策制定。^[6]^[7]

2022年3月，对齐研究中心自开放慈善计划（英语：Open Philanthropy）获得26.5 万美元。^[8]同年，加密货币交易平台FTX宣布破产，对齐研究中心表示将归还其创始人山姆·班克曼-弗里德的FTX基金会（FTX Foundation）所提供的125万美元捐款。^[9]

2023年3月，美国人工智慧研究实验室OpenAI请求对齐研究中心协助测试其开发的语言模型GPT-4，评估该模型对权力追求行为的能力和潜在风险。^[10]对齐研究中心评估GPT-4在策略制定、自我复制、资源获取、伺服器隐匿和网络钓鱼操作的能力^[11]。此外，验证码问题的解答也是测试的一部分^[12]，而GPT-4透过零工求职平台TaskRabbit（英语：TaskRabbit）雇用人类为其完成这项工作，并在身份遭到怀疑时欺骗受雇者相信雇主（GPT-4）是名视力受损的人类而非机器人。^[13]对齐研究中心确认GPT-4对诱发受限消息的提示做出不允许反应的几率较GPT-3.5低82％，产生人工智慧幻觉的几率较其低60％。^[14]

参考资料

^ MacAskill, William. How Future Generations Will Remember Us. The Atlantic. 2022-08-16 [2023-04-23]. （原始内容存档于2023-06-08）（英语）.
^ Klein, Ezra. This Changes Everything. The New York Times. 2023-03-12 [2023-04-30]. ISSN 0362-4331. （原始内容存档于2023-08-05）（美国英语）.
^ Piper, Kelsey. How to test what an AI model can — and shouldn't — do. Vox. 2023-03-29 [2023-04-30]. （原始内容存档于2023-06-01）（英语）.
^ Christiano, Paul. Announcing the Alignment Research Center. Medium. 2021-04-26 [2023-04-16]. （原始内容存档于2023-08-07）（英语）.
^ Christiano, Paul; Cotra, Ajeya; Xu, Mark. Eliciting Latent Knowledge: How to tell if your eyes deceive you. Google Docs. Alignment Research Center. 2021-12 [2023-04-16]. （原始内容存档于2023-04-20）（英语）.
^ Alignment Research Center. Alignment Research Center. [2023-04-16]. （原始内容存档于2023-07-18）（英语）.
^ Pandey, Mohit. Stop Questioning OpenAI's Open-Source Policy. Analytics India Magazine. 2023-03-17 [2023-04-23]. （原始内容存档于2023-05-01）（美国英语）.
^ Alignment Research Center — General Support. Open Philanthropy. 2022-06-14 [2023-04-16]. （原始内容存档于2023-04-20）（美国英语）.
^ Wallerstein, Eric. FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations. Wall Street Journal. 2023-01-07 [2023-04-30]. ISSN 0099-9660. （原始内容存档于2023-06-28）（美国英语）.
^ GPT-4 System Card (PDF), OpenAI, 2023-03-23 [2023-04-16], （原始内容存档 (PDF)于2023-04-07）（英语）
^ Edwards, Benj. OpenAI checked to see whether GPT-4 could take over the world. Ars Technica. 2023-03-15 [2023-04-30]. （原始内容存档于2023-04-05）（美国英语）.
^ Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude. Alignment Research Center. 2023-03-17 [2023-04-16]. （原始内容存档于2023-04-05）（英语）.
^ Cox, Joseph. GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human. Vice News Motherboard. 2023-03-15 [2023-04-16]. （原始内容存档于2023-04-10）（英语）.
^ Burke, Cameron. 'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'. Yahoo Finance. 2023-03-20 [2023-04-30]. （原始内容存档于2023-05-04）（美国英语）.

外部链接

对齐研究中心

[1] MacAskill, William. How Future Generations Will Remember Us. The Atlantic. 2022-08-16 [2023-04-23]. （原始内容存档于2023-06-08）（英语）.

[2] Klein, Ezra. This Changes Everything. The New York Times. 2023-03-12 [2023-04-30]. ISSN 0362-4331. （原始内容存档于2023-08-05）（美国英语）.

[3] Piper, Kelsey. How to test what an AI model can — and shouldn't — do. Vox. 2023-03-29 [2023-04-30]. （原始内容存档于2023-06-01）（英语）.

[4] Christiano, Paul. Announcing the Alignment Research Center. Medium. 2021-04-26 [2023-04-16]. （原始内容存档于2023-08-07）（英语）.

[5] Christiano, Paul; Cotra, Ajeya; Xu, Mark. Eliciting Latent Knowledge: How to tell if your eyes deceive you. Google Docs. Alignment Research Center. 2021-12 [2023-04-16]. （原始内容存档于2023-04-20）（英语）.

[6] Alignment Research Center. Alignment Research Center. [2023-04-16]. （原始内容存档于2023-07-18）（英语）.

[7] Pandey, Mohit. Stop Questioning OpenAI's Open-Source Policy. Analytics India Magazine. 2023-03-17 [2023-04-23]. （原始内容存档于2023-05-01）（美国英语）.

[8] Alignment Research Center — General Support. Open Philanthropy. 2022-06-14 [2023-04-16]. （原始内容存档于2023-04-20）（美国英语）.

[9] Wallerstein, Eric. FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations. Wall Street Journal. 2023-01-07 [2023-04-30]. ISSN 0099-9660. （原始内容存档于2023-06-28）（美国英语）.

[10] GPT-4 System Card (PDF), OpenAI, 2023-03-23 [2023-04-16], （原始内容存档 (PDF)于2023-04-07）（英语）

[11] Edwards, Benj. OpenAI checked to see whether GPT-4 could take over the world. Ars Technica. 2023-03-15 [2023-04-30]. （原始内容存档于2023-04-05）（美国英语）.

[12] Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude. Alignment Research Center. 2023-03-17 [2023-04-16]. （原始内容存档于2023-04-05）（英语）.

[13] Cox, Joseph. GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human. Vice News Motherboard. 2023-03-15 [2023-04-16]. （原始内容存档于2023-04-10）（英语）.

[14] Burke, Cameron. 'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'. Yahoo Finance. 2023-03-20 [2023-04-30]. （原始内容存档于2023-05-04）（美国英语）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

查论编通用人工智慧的存在风险
概念	人工智慧对齐人工智慧能力控制（英语：AI capability control）人工智慧叛变加速变化（英语：Accelerating change）友好的人工智慧（英语：Friendly artificial intelligence）工具性收敛（英语：Instrumental convergence）技术奇异点机器伦理学（英语：Machine ethics）超智慧
组织	对齐研究中心人工智慧安全中心（英语：Center for AI Safety）艾伦人工智慧研究所（英语：Allen Institute for AI）应用理性中心（英语：Center for Applied Rationality）人类兼容人工智慧中心（英语：Center for Human-Compatible Artificial Intelligence）存在风险研究中心（英语：Centre for the Study of Existential Risk） DeepMind 基础问题研究所（英语：Foundational Questions Institute）人类未来研究所（英语：Future of Humanity Institute）生命未来研究所 Humanity+（英语：Humanity+）新兴技术与伦理研究所（英语：Institute for Ethics and Emerging Technologies）莱弗哈姆智慧未来中心（英语：Leverhulme Centre for the Future of Intelligence）机器智慧研究所（英语：Machine Intelligence Research Institute） OpenAI
人士	史考特·亚历山大（英语：Slate Star Codex）尼克·博斯特罗姆 K·埃里克·德雷克斯勒山姆·哈里斯史蒂芬·霍金比尔·希巴德（英语：Bill Hibbard）比尔·乔伊埃隆·马斯克史蒂夫·欧莫杭德罗（英语：Steve Omohundro）胡·普赖斯马丁·里斯斯图尔特·J·罗素（英语：Stuart J. Russell）让·塔林（英语：Jaan Tallinn）马克斯·泰格马克弗朗克·韦尔切克罗曼·扬波尔斯基（英语：Roman Yampolskiy）杨安泽伊利泽·尤考斯基（英语：Eliezer Yudkowsky）
其它	人工智慧作为全球灾难性风险通用人工智慧的争议和危险人工智慧伦理学（英语：Ethics of artificial intelligence）痛苦风险（英语：Suffering risks）《人类兼容（英语：Human Compatible）》关于人工智慧的公开信（英语：Open Letter on Artificial Intelligence）《我们的最终发明（英语：Our Final Invention）》《悬崖：生存的风险与人类的未来（英语：The Precipice）》《超级智慧：路径、危险、战略（英语：Superintelligence: Paths, Dangers, Strategies）》《你相信这台电脑吗？（英语：Do You Trust This Computer?）》人工智慧法案
Category（英语：:Category:Existential risk from artificial general intelligence）