對齊研究中心

**對齊研究中心**
	Alignment Research Center
成立時間	2021年4月
創始人	保羅·克里斯蒂亞諾（英语：Paul Christiano (researcher)）; 貝絲·巴恩斯（Beth Barnes）; Mark Xu
類型	非營利研究機構
法律地位	501(c)(3)免稅公益組織
總部	美国加利福尼亞州柏克萊
目標	人工智慧對齊和安全性研究（英语：AI safety）
網站	alignment.org

對齊研究中心（英語：Alignment Research Center, ARC）是美國的非營利研究機構，致力將人工智慧的行為對齊人類的價值觀和預期利益。^[1]對齊研究中心由美國人工智慧研究實驗室OpenAI前研究員保羅·克里斯蒂亞諾（英语：Paul Christiano (researcher)）創立，專注於識別和理解AI模型的潛在危害。^[2]^[3]

概述

對齊研究中心的使命是確保未來的機器學習系統能夠安全地設計和開發，並造福人類。研究中心由保羅·克里斯蒂亞諾（英语：Paul Christiano (researcher)）和其他研究人員於2021年4月創立，主要研究對人工智慧對齊相關理論的挑戰^[4]，理論的一關鍵在於當人工智慧系統變得愈加先進時，其設計者人類開發的對齊技術可能因此被規避或發現漏洞。^[5]對齊研究中心亦嘗試從理論工作提升至實證研究、相關產業的合作和政策制定。^[6]^[7]

2022年3月，對齊研究中心自開放慈善計畫（英语：Open Philanthropy）獲得26.5 萬美元。^[8]同年，加密貨幣交易平台FTX宣布破產，對齊研究中心表示將歸還其創始人山姆·班克曼-弗里德的FTX基金會（FTX Foundation）所提供的125萬美元捐款。^[9]

2023年3月，美國人工智慧研究實驗室OpenAI請求對齊研究中心協助測試其開發的語言模型GPT-4，評估該模型對權力追求行為的能力和潛在風險。^[10]對齊研究中心評估GPT-4在策略制定、自我複製、資源獲取、伺服器隱匿和網路釣魚操作的能力^[11]。此外，驗證碼問題的解答也是測試的一部分^[12]，而GPT-4透過零工求職平台TaskRabbit（英语：TaskRabbit）雇用人類為其完成這項工作，並在身分遭到懷疑時欺騙受雇者相信雇主（GPT-4）是名視力受損的人類而非機器人。^[13]對齊研究中心確認GPT-4對誘發受限訊息的提示做出不允許反應的機率較GPT-3.5低82％，產生人工智慧幻覺的機率較其低60％。^[14]

參考資料

^ MacAskill, William. How Future Generations Will Remember Us. The Atlantic. 2022-08-16 [2023-04-23]. （原始内容存档于2023-06-08）（英语）.
^ Klein, Ezra. This Changes Everything. The New York Times. 2023-03-12 [2023-04-30]. ISSN 0362-4331. （原始内容存档于2023-08-05）（美国英语）.
^ Piper, Kelsey. How to test what an AI model can — and shouldn't — do. Vox. 2023-03-29 [2023-04-30]. （原始内容存档于2023-06-01）（英语）.
^ Christiano, Paul. Announcing the Alignment Research Center. Medium. 2021-04-26 [2023-04-16]. （原始内容存档于2023-08-07）（英语）.
^ Christiano, Paul; Cotra, Ajeya; Xu, Mark. Eliciting Latent Knowledge: How to tell if your eyes deceive you. Google Docs. Alignment Research Center. 2021-12 [2023-04-16]. （原始内容存档于2023-04-20）（英语）.
^ Alignment Research Center. Alignment Research Center. [2023-04-16]. （原始内容存档于2023-07-18）（英语）.
^ Pandey, Mohit. Stop Questioning OpenAI's Open-Source Policy. Analytics India Magazine. 2023-03-17 [2023-04-23]. （原始内容存档于2023-05-01）（美国英语）.
^ Alignment Research Center — General Support. Open Philanthropy. 2022-06-14 [2023-04-16]. （原始内容存档于2023-04-20）（美国英语）.
^ Wallerstein, Eric. FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations. Wall Street Journal. 2023-01-07 [2023-04-30]. ISSN 0099-9660. （原始内容存档于2023-06-28）（美国英语）.
^ GPT-4 System Card (PDF), OpenAI, 2023-03-23 [2023-04-16], （原始内容存档 (PDF)于2023-04-07）（英语）
^ Edwards, Benj. OpenAI checked to see whether GPT-4 could take over the world. Ars Technica. 2023-03-15 [2023-04-30]. （原始内容存档于2023-04-05）（美国英语）.
^ Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude. Alignment Research Center. 2023-03-17 [2023-04-16]. （原始内容存档于2023-04-05）（英语）.
^ Cox, Joseph. GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human. Vice News Motherboard. 2023-03-15 [2023-04-16]. （原始内容存档于2023-04-10）（英语）.
^ Burke, Cameron. 'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'. Yahoo Finance. 2023-03-20 [2023-04-30]. （原始内容存档于2023-05-04）（美国英语）.

外部連結

對齊研究中心

[1] MacAskill, William. How Future Generations Will Remember Us. The Atlantic. 2022-08-16 [2023-04-23]. （原始内容存档于2023-06-08）（英语）.

[2] Klein, Ezra. This Changes Everything. The New York Times. 2023-03-12 [2023-04-30]. ISSN 0362-4331. （原始内容存档于2023-08-05）（美国英语）.

[3] Piper, Kelsey. How to test what an AI model can — and shouldn't — do. Vox. 2023-03-29 [2023-04-30]. （原始内容存档于2023-06-01）（英语）.

[4] Christiano, Paul. Announcing the Alignment Research Center. Medium. 2021-04-26 [2023-04-16]. （原始内容存档于2023-08-07）（英语）.

[5] Christiano, Paul; Cotra, Ajeya; Xu, Mark. Eliciting Latent Knowledge: How to tell if your eyes deceive you. Google Docs. Alignment Research Center. 2021-12 [2023-04-16]. （原始内容存档于2023-04-20）（英语）.

[6] Alignment Research Center. Alignment Research Center. [2023-04-16]. （原始内容存档于2023-07-18）（英语）.

[7] Pandey, Mohit. Stop Questioning OpenAI's Open-Source Policy. Analytics India Magazine. 2023-03-17 [2023-04-23]. （原始内容存档于2023-05-01）（美国英语）.

[8] Alignment Research Center — General Support. Open Philanthropy. 2022-06-14 [2023-04-16]. （原始内容存档于2023-04-20）（美国英语）.

[9] Wallerstein, Eric. FTX Seeks to Recoup Sam Bankman-Fried's Charitable Donations. Wall Street Journal. 2023-01-07 [2023-04-30]. ISSN 0099-9660. （原始内容存档于2023-06-28）（美国英语）.

[10] GPT-4 System Card (PDF), OpenAI, 2023-03-23 [2023-04-16], （原始内容存档 (PDF)于2023-04-07）（英语）

[11] Edwards, Benj. OpenAI checked to see whether GPT-4 could take over the world. Ars Technica. 2023-03-15 [2023-04-30]. （原始内容存档于2023-04-05）（美国英语）.

[12] Update on ARC's recent eval efforts: More information about ARC's evaluations of GPT-4 and Claude. Alignment Research Center. 2023-03-17 [2023-04-16]. （原始内容存档于2023-04-05）（英语）.

[13] Cox, Joseph. GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human. Vice News Motherboard. 2023-03-15 [2023-04-16]. （原始内容存档于2023-04-10）（英语）.

[14] Burke, Cameron. 'Robot' Lawyer DoNotPay Sued For Unlicensed Practice Of Law: It's Giving 'Poor Legal Advice'. Yahoo Finance. 2023-03-20 [2023-04-30]. （原始内容存档于2023-05-04）（美国英语）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]