#alignmentresearch — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #alignmentresearch, aggregated by home.social.
-
OpenAI wants to stop ChatGPT from validating users’ political views - "ChatGPT shouldn't have political bias in any direction."
Th... - https://arstechnica.com/ai/2025/10/openai-wants-to-stop-chatgpt-from-validating-users-political-views/ #largelanguagemodels #alignmentresearch #machinelearning #aiobjectivity #politicalbias #culturalbias #generativeai #aialignment #aicriticism #aibehavior #airesearch #anthropic #aiethics #chatgpt #biz #aibias #openai #rlhf #ai -
Is AI really trying to escape human control and blackmail people? - In June, headlines read like science fiction: AI models "bla... - https://arstechnica.com/information-technology/2025/08/is-ai-really-trying-to-escape-human-control-and-blackmail-people/ #goalmisgeneralization #reinforcementlearning #largelanguagemodels #alignmentresearch #palisaderesearch #aisafetytesting #machinelearning #jeffreyladish #generativeai #aialignment #aideception #claudeopus4 #aibehavior #airesearch #o3model
-
Researchers astonished by tool’s apparent success at revealing AI’s hidden motives - In a new paper published Thursday titled "Auditing language models for hid... - https://arstechnica.com/ai/2025/03/researchers-astonished-by-tools-apparent-success-at-revealing-ais-hidden-motives/ #largelanguagemodels #alignmentresearch #machinelearning #claude3.5haiku #aialignment #aideception #airesearch #anthropic #chatgpt #chatgtp #biz #claude #ai