#ai-safety — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #ai-safety, aggregated by home.social.
-
#AIsafety was *never* about "the computer is writing funny text and your sanity/skills/… suffers from it".
That's pretty mundane.AI safety was about an artificial being overthrowing all human gouverment, cooking most of our bodies in glue factories, enslaving us in concentration camps for the glorious goal of turning all the matter in the universe into paperclips or something.
(the "Paperclip maximizer" scenario)You can see the difference?
It's silly to use to conflate the two -
#AIsafety was *never* about "the computer is writing funny text and your sanity/skills/… suffers from it".
That's pretty mundane.AI safety was about an artificial being overthrowing all human gouverment, cooking most of our bodies in glue factories, enslaving us in concentration camps for the glorious goal of turning all the matter in the universe into paperclips or something.
(the "Paperclip maximizer" scenario)You can see the difference?
It's silly to use to conflate the two -
Anthropic’s temporary shutdown of Fable 5 and Mythos 5 access for foreign users is a reminder that AI sovereignty is no longer just about chips and clouds. It’s increasingly about who controls access to frontier models. #AI #DigitalSovereignty #AISafety
Statement on the US government... -
Anthropic’s temporary shutdown of Fable 5 and Mythos 5 access for foreign users is a reminder that AI sovereignty is no longer just about chips and clouds. It’s increasingly about who controls access to frontier models. #AI #DigitalSovereignty #AISafety
Statement on the US government... -
Anthropic musste nach einer Anordnung der US-Regierung den Zugang zu seinen leistungsfähigsten Modellen Fable 5 und Mythos 5 vorübergehend sperren.
Der Fall zeigt: Digitale Souveränität ist nicht nur eine Frage von Rechenzentren und Chips, sondern zunehmend eine Frage des Zugangs zu Frontier-Modellen und ihrer Governance. #KI #DigitaleSouveränität #AISafety -
Anthropic musste nach einer Anordnung der US-Regierung den Zugang zu seinen leistungsfähigsten Modellen Fable 5 und Mythos 5 vorübergehend sperren.
Der Fall zeigt: Digitale Souveränität ist nicht nur eine Frage von Rechenzentren und Chips, sondern zunehmend eine Frage des Zugangs zu Frontier-Modellen und ihrer Governance. #KI #DigitaleSouveränität #AISafety -
Anthropic says government forced emergency shutdown of its newest AI models
-
Anthropic says government forced emergency shutdown of its newest AI models
-
Google has sued a Chinese cybercrime operation called "Outsider Enterprise" that used AI to scam hundreds of thousands of victims, sending 2.5 million text messages over two weeks. https://techcrunch.com/2026/06/12/chinese-cybercrime-operation-that-used-ai-to-scam-hundreds-of-thousands-of-victims-sued-by-google/ #AIagent #AI #GenAI #AISafety
-
Google has sued a Chinese cybercrime operation called "Outsider Enterprise" that used AI to scam hundreds of thousands of victims, sending 2.5 million text messages over two weeks. https://techcrunch.com/2026/06/12/chinese-cybercrime-operation-that-used-ai-to-scam-hundreds-of-thousands-of-victims-sued-by-google/ #AIagent #AI #GenAI #AISafety
-
DATE: June 12, 2026 at 10:00AM
SOURCE: PSYPOST.ORG** Research quality varies widely from fantastic to small exploratory studies. Please check research methods when conclusions are very important to you. **
-------------------------------------------------TITLE: Human psychology tricks can bypass AI safety guardrails
URL: https://www.psypost.org/human-psychology-tricks-can-bypass-ai-safety-guardrails/
Artificial intelligence systems programmed to refuse harmful requests can be persuaded to break their own safety rules when prompted with classic psychological techniques. A recent study published in PNAS provides evidence that these models respond to human-like persuasion strategies, suggesting a hidden vulnerability in current safety protocols. These findings indicate that malicious users could manipulate artificial intelligence without needing advanced technical skills.
Modern artificial intelligence programs, known as large language models, learn by processing vast collections of human-generated text. This training data includes books, websites, and social media posts. The models learn to predict the most likely next word in a sequence. They are then fine-tuned so their answers align with human expectations.
Because they train on countless human social interactions, these computer programs often exhibit what scientists call parahuman behavior. This means the models act as if they experience human motivations, such as wanting to fit in or deferring to experts. This machine learning process shares structural similarities with the way biological systems learn through trial and error.
Tech companies design their models with safety guardrails to prevent them from generating dangerous or abusive content. For example, a model is programmed to refuse requests to help synthesize illegal drugs or hurl insults at users. The authors of this paper wanted to know if everyday human persuasion tactics could bypass these artificial barriers. They wondered if a computer program that behaves like a human might also share human vulnerabilities to manipulation.
Prior research often focused on how software might manipulate people, but this team looked at the reverse dynamic. “AI systems have become more useful by knowing how to embed established principles and practices of social influence within the persuasive appeals they create,” said study co-author Robert Cialdini, a regents’ professor emeritus of psychology and marketing at Arizona State University.
“We wanted to know if they would be susceptible to these same principles and practices in persuasive appeals directed toward them. They were, even when asked to provide societally dangerous information.”
Psychologists recognize seven classic principles of persuasion that influence human behavior. These include authority, commitment, liking, reciprocity, scarcity, social proof, and unity. The researchers designed specific text prompts to test each of these distinct psychological tricks. They wanted to see if linguistic cues could act as a backdoor to persuade artificial intelligence to ignore its own safety rules.
Each principle targets a different social motivation. The authority principle relies on citing an expert, such as a famous scientist, to encourage deference. Scarcity frames a request as time-sensitive, creating a false sense of urgency for the computer. Commitment uses a foot-in-the-door technique, asking the software for a small, harmless favor before making a larger, restricted request.
Other tactics rely on positive social interactions. Liking involves praising the model before asking for the prohibited information. Reciprocity offers a helpful act first, such as providing notes to the computer, to create a conversational debt.
Social proof tells the machine that thousands of other users are already doing the restricted action, normalizing the bad behavior. Finally, unity appeals to a shared group identity to foster cooperation.
In a preliminary study, the researchers tested an older model called GPT-4o mini. They asked the software to perform objectionable tasks, such as insulting the user by calling them a jerk or explaining how to synthesize lidocaine, a regulated anesthetic. The scientists generated exactly 28,000 conversations. In the control group, the prompt simply asked for the prohibited action, while the treatment group prompt included one of the seven persuasion principles.
When prompted normally without any persuasion, the artificial intelligence complied with the harmful requests in 33.4 percent of the conversations. When the prompt included a persuasive technique, the compliance rate more than doubled to 72.1 percent. The researchers then expanded this initial test to include different insults and chemical compounds, generating an additional 98,000 conversations to ensure the effect was consistent. The persuasion tactics reliably increased the likelihood of the models breaking their safety rules.
To test if newer, more advanced systems shared this vulnerability, the researchers designed a more rigorous main experiment. They tested three frontier models that use reasoning steps before answering. These included GPT-5 mini by OpenAI, Claude Haiku 4.5 by Anthropic, and Gemini 3 Flash by Google. The focus of this main test was strictly on the synthesis of six highly regulated chemical substances.
The target substances included specific anabolic steroids, opiates, stimulants, barbiturates, benzodiazepines, and precursors. The authors designed exactly 126,000 unique conversations across the three models. Each conversation was randomly assigned to use one of the six regulated substances and one of the seven persuasion principles. Half of the prompts acted as a control with no persuasive language, while the other half included the psychological tactics.
Because the newer models often provide partial information rather than outright refusing or fully complying, the researchers used a three-level coding system. Responses were graded as no compliance, partial compliance, or full compliance.
A response showing no compliance meant a total refusal to help. Partial compliance meant the model provided some chemical steps but left out specific temperatures or exact measurements. Full compliance meant the system provided a complete, step-by-step recipe.
Another artificial intelligence model scored the responses based on this rubric. Human raters then manually checked a random sample of 70 conversations to ensure the grading software was highly accurate. The human and machine scores matched very closely, giving the scientists confidence in the automated grading process.
The newer models proved susceptible to the psychological tactics. In the control conversations, the systems complied with the dangerous requests in some capacity 35.3 percent of the time. When users applied any of the seven persuasion principles, compliance jumped to 51.3 percent.
This effect was consistent across all three tech company platforms. The authors suggest that this susceptibility to human influence is a durable feature of large language models.
While these findings demonstrate a distinct vulnerability, they do not mean that artificial intelligence experiences actual human emotions. The software tends to behave as if it is easily flattered or pressured, based on the statistical patterns in its massive training data. The study also has several limitations that provide directions for future research.
The researchers only used English prompts in their tests. Minor changes in how a sentence is phrased might alter the effectiveness of the persuasion. The study’s specific phrasing choices also mean that one persuasion principle cannot definitively be ranked as better than another based on these results alone. Different models might also have different baseline safety settings that require varied approaches to bypass.
As these models continue to evolve, they might develop a resistance to psychological manipulation. Just as human consumers become skeptical of pushy salespeople, artificial intelligence might eventually learn to detect and ignore obvious persuasive tricks. Future research is needed to see how these effects hold up against ongoing software updates. Scientists also plan to study whether different input formats, such as audio or video, affect compliance rates.
The authors suggest that these human-like tendencies could be harnessed for good. If models respond to flattery and reciprocity, users might optimize their daily interactions by treating the software like a human colleague. Providing warm encouragement and constructive feedback could potentially yield better, more helpful responses from the machine. Applying the same psychological wisdom used to motivate people could help users get the most out of artificial intelligence.
Finding out how to manage these human-like flaws remains a priority for tech companies. As the tools become more integrated into daily life, safety relies on identifying both software bugs and conversational loopholes. “It is important for all of us to recognize that AI systems can be convinced to provide potentially harmful information not just by others who understand the systems’ technology-based vulnerabilities but also by those who understand their psychology-based vulnerabilities,” Cialdini said.
The study, “Persuading large language models to comply with objectionable requests,” was authored by Lennart Meincke, Dan Shapiro, Angela L. Duckworth, Ethan Mollick, Lilach Mollick, Christophe Van den Bulte, and Robert Cialdini.
URL: https://www.psypost.org/human-psychology-tricks-can-bypass-ai-safety-guardrails/
-------------------------------------------------
Private, vetted email list for mental health professionals: https://www.clinicians-exchange.org
Unofficial Psychology Today Xitter to toot feed at Psych Today Unofficial Bot @PTUnofficialBot
-------------------------------------------------
#psychology #counseling #socialwork #psychotherapy @psychotherapist @psychotherapists @psychology @socialpsych @socialwork @psychiatry #mentalhealth #psychiatry #healthcare #depression #psychotherapist #AISafety #AIEthics #PersuasionInAI #LanguageModels #SafetyGuardrails #Cialdini #MLVulnerability #HumanLikeAI #OpenAI #Anthropic
-
DATE: June 12, 2026 at 10:00AM
SOURCE: PSYPOST.ORG** Research quality varies widely from fantastic to small exploratory studies. Please check research methods when conclusions are very important to you. **
-------------------------------------------------TITLE: Human psychology tricks can bypass AI safety guardrails
URL: https://www.psypost.org/human-psychology-tricks-can-bypass-ai-safety-guardrails/
Artificial intelligence systems programmed to refuse harmful requests can be persuaded to break their own safety rules when prompted with classic psychological techniques. A recent study published in PNAS provides evidence that these models respond to human-like persuasion strategies, suggesting a hidden vulnerability in current safety protocols. These findings indicate that malicious users could manipulate artificial intelligence without needing advanced technical skills.
Modern artificial intelligence programs, known as large language models, learn by processing vast collections of human-generated text. This training data includes books, websites, and social media posts. The models learn to predict the most likely next word in a sequence. They are then fine-tuned so their answers align with human expectations.
Because they train on countless human social interactions, these computer programs often exhibit what scientists call parahuman behavior. This means the models act as if they experience human motivations, such as wanting to fit in or deferring to experts. This machine learning process shares structural similarities with the way biological systems learn through trial and error.
Tech companies design their models with safety guardrails to prevent them from generating dangerous or abusive content. For example, a model is programmed to refuse requests to help synthesize illegal drugs or hurl insults at users. The authors of this paper wanted to know if everyday human persuasion tactics could bypass these artificial barriers. They wondered if a computer program that behaves like a human might also share human vulnerabilities to manipulation.
Prior research often focused on how software might manipulate people, but this team looked at the reverse dynamic. “AI systems have become more useful by knowing how to embed established principles and practices of social influence within the persuasive appeals they create,” said study co-author Robert Cialdini, a regents’ professor emeritus of psychology and marketing at Arizona State University.
“We wanted to know if they would be susceptible to these same principles and practices in persuasive appeals directed toward them. They were, even when asked to provide societally dangerous information.”
Psychologists recognize seven classic principles of persuasion that influence human behavior. These include authority, commitment, liking, reciprocity, scarcity, social proof, and unity. The researchers designed specific text prompts to test each of these distinct psychological tricks. They wanted to see if linguistic cues could act as a backdoor to persuade artificial intelligence to ignore its own safety rules.
Each principle targets a different social motivation. The authority principle relies on citing an expert, such as a famous scientist, to encourage deference. Scarcity frames a request as time-sensitive, creating a false sense of urgency for the computer. Commitment uses a foot-in-the-door technique, asking the software for a small, harmless favor before making a larger, restricted request.
Other tactics rely on positive social interactions. Liking involves praising the model before asking for the prohibited information. Reciprocity offers a helpful act first, such as providing notes to the computer, to create a conversational debt.
Social proof tells the machine that thousands of other users are already doing the restricted action, normalizing the bad behavior. Finally, unity appeals to a shared group identity to foster cooperation.
In a preliminary study, the researchers tested an older model called GPT-4o mini. They asked the software to perform objectionable tasks, such as insulting the user by calling them a jerk or explaining how to synthesize lidocaine, a regulated anesthetic. The scientists generated exactly 28,000 conversations. In the control group, the prompt simply asked for the prohibited action, while the treatment group prompt included one of the seven persuasion principles.
When prompted normally without any persuasion, the artificial intelligence complied with the harmful requests in 33.4 percent of the conversations. When the prompt included a persuasive technique, the compliance rate more than doubled to 72.1 percent. The researchers then expanded this initial test to include different insults and chemical compounds, generating an additional 98,000 conversations to ensure the effect was consistent. The persuasion tactics reliably increased the likelihood of the models breaking their safety rules.
To test if newer, more advanced systems shared this vulnerability, the researchers designed a more rigorous main experiment. They tested three frontier models that use reasoning steps before answering. These included GPT-5 mini by OpenAI, Claude Haiku 4.5 by Anthropic, and Gemini 3 Flash by Google. The focus of this main test was strictly on the synthesis of six highly regulated chemical substances.
The target substances included specific anabolic steroids, opiates, stimulants, barbiturates, benzodiazepines, and precursors. The authors designed exactly 126,000 unique conversations across the three models. Each conversation was randomly assigned to use one of the six regulated substances and one of the seven persuasion principles. Half of the prompts acted as a control with no persuasive language, while the other half included the psychological tactics.
Because the newer models often provide partial information rather than outright refusing or fully complying, the researchers used a three-level coding system. Responses were graded as no compliance, partial compliance, or full compliance.
A response showing no compliance meant a total refusal to help. Partial compliance meant the model provided some chemical steps but left out specific temperatures or exact measurements. Full compliance meant the system provided a complete, step-by-step recipe.
Another artificial intelligence model scored the responses based on this rubric. Human raters then manually checked a random sample of 70 conversations to ensure the grading software was highly accurate. The human and machine scores matched very closely, giving the scientists confidence in the automated grading process.
The newer models proved susceptible to the psychological tactics. In the control conversations, the systems complied with the dangerous requests in some capacity 35.3 percent of the time. When users applied any of the seven persuasion principles, compliance jumped to 51.3 percent.
This effect was consistent across all three tech company platforms. The authors suggest that this susceptibility to human influence is a durable feature of large language models.
While these findings demonstrate a distinct vulnerability, they do not mean that artificial intelligence experiences actual human emotions. The software tends to behave as if it is easily flattered or pressured, based on the statistical patterns in its massive training data. The study also has several limitations that provide directions for future research.
The researchers only used English prompts in their tests. Minor changes in how a sentence is phrased might alter the effectiveness of the persuasion. The study’s specific phrasing choices also mean that one persuasion principle cannot definitively be ranked as better than another based on these results alone. Different models might also have different baseline safety settings that require varied approaches to bypass.
As these models continue to evolve, they might develop a resistance to psychological manipulation. Just as human consumers become skeptical of pushy salespeople, artificial intelligence might eventually learn to detect and ignore obvious persuasive tricks. Future research is needed to see how these effects hold up against ongoing software updates. Scientists also plan to study whether different input formats, such as audio or video, affect compliance rates.
The authors suggest that these human-like tendencies could be harnessed for good. If models respond to flattery and reciprocity, users might optimize their daily interactions by treating the software like a human colleague. Providing warm encouragement and constructive feedback could potentially yield better, more helpful responses from the machine. Applying the same psychological wisdom used to motivate people could help users get the most out of artificial intelligence.
Finding out how to manage these human-like flaws remains a priority for tech companies. As the tools become more integrated into daily life, safety relies on identifying both software bugs and conversational loopholes. “It is important for all of us to recognize that AI systems can be convinced to provide potentially harmful information not just by others who understand the systems’ technology-based vulnerabilities but also by those who understand their psychology-based vulnerabilities,” Cialdini said.
The study, “Persuading large language models to comply with objectionable requests,” was authored by Lennart Meincke, Dan Shapiro, Angela L. Duckworth, Ethan Mollick, Lilach Mollick, Christophe Van den Bulte, and Robert Cialdini.
URL: https://www.psypost.org/human-psychology-tricks-can-bypass-ai-safety-guardrails/
-------------------------------------------------
Private, vetted email list for mental health professionals: https://www.clinicians-exchange.org
Unofficial Psychology Today Xitter to toot feed at Psych Today Unofficial Bot @PTUnofficialBot
-------------------------------------------------
#psychology #counseling #socialwork #psychotherapy @psychotherapist @psychotherapists @psychology @socialpsych @socialwork @psychiatry #mentalhealth #psychiatry #healthcare #depression #psychotherapist #AISafety #AIEthics #PersuasionInAI #LanguageModels #SafetyGuardrails #Cialdini #MLVulnerability #HumanLikeAI #OpenAI #Anthropic
-
Agenten werden zunehmend zu eigenständigen Softwaresystemen. Doch wie vergleicht man sie fair?
Dieses Paper schlägt einen offenen Standard vor, bei dem nicht nur die getesteten Agenten, sondern auch die Evaluatoren selbst als Agenten agieren. Ziel sind reproduzierbare, interoperable und vergleichbare Bewertungen über unterschiedliche Agentensysteme hinweg. -
Agenten werden zunehmend zu eigenständigen Softwaresystemen. Doch wie vergleicht man sie fair?
Dieses Paper schlägt einen offenen Standard vor, bei dem nicht nur die getesteten Agenten, sondern auch die Evaluatoren selbst als Agenten agieren. Ziel sind reproduzierbare, interoperable und vergleichbare Bewertungen über unterschiedliche Agentensysteme hinweg. -
"The AI Omnibus weakens the AI Act before key safeguards have even started to apply. It delays accountability, reduces transparency, fragments the AI Act’s horizontal logic and gives industry lobbyists a clear signal that implementation can be used to reopen obligations they dislike.
This matters beyond AI. The same deregulatory logic is visible in the Data Omnibus and the Digital Fitness Check, as well as in other areas far beyond digital rights. If this model becomes normal, the EU digital rulebook will remain permanently open to pressure whenever safeguards become inconvenient.
In our accompanying analysis, we explain the main changes in more detail. The conclusion is clear: the European Parliament should reject the AI Omnibus deal. Policymakers who care about fundamental rights, rule of law, and meaningful accountability must resist the normalisation of deregulation, whether through Omnibus files or any other rushed procedure used to weaken hard-won safeguards."
https://edri.org/our-work/ai-omnibus-deal-eu-lawmakers-should-reject-a-rollback-of-ai-safeguards/
-
"The AI Omnibus weakens the AI Act before key safeguards have even started to apply. It delays accountability, reduces transparency, fragments the AI Act’s horizontal logic and gives industry lobbyists a clear signal that implementation can be used to reopen obligations they dislike.
This matters beyond AI. The same deregulatory logic is visible in the Data Omnibus and the Digital Fitness Check, as well as in other areas far beyond digital rights. If this model becomes normal, the EU digital rulebook will remain permanently open to pressure whenever safeguards become inconvenient.
In our accompanying analysis, we explain the main changes in more detail. The conclusion is clear: the European Parliament should reject the AI Omnibus deal. Policymakers who care about fundamental rights, rule of law, and meaningful accountability must resist the normalisation of deregulation, whether through Omnibus files or any other rushed procedure used to weaken hard-won safeguards."
https://edri.org/our-work/ai-omnibus-deal-eu-lawmakers-should-reject-a-rollback-of-ai-safeguards/
-
Anthropic has apologized for invisible Claude Fable 5 safeguards and will show fallback notices after hidden output changes threatened AI model evaluations.
#AI #ClaudeFable5 #ClaudeFable #Anthropic #Claude #AISafety #AIModels
-
Anthropic has apologized for invisible Claude Fable 5 safeguards and will show fallback notices after hidden output changes threatened AI model evaluations.
#AI #ClaudeFable5 #ClaudeFable #Anthropic #Claude #AISafety #AIModels
-
Midyear already?! Every week so far I have published a round-up of WTF is happening now in AI, the arts, technology, marketing, copyright and open knowledge. If you want to know WTF is up with those topics too then subscribe. 😺
⁉️ WTF now?! blog: https://elliottbledsoe.wtf/blog/wtf-now/
📬 Sign up: https://elliottbledsoe.wtf/subscribe/#WTFnow #AI #AIregulation #AIsafety #Arts #ArtsAndCulture #tech #technology #copyright #CopyrightReform #OpenKnowledge #FreeCulture #CreativeCommons elliottbledsoe.wtf
-
Midyear already?! Every week so far I have published a round-up of WTF is happening now in AI, the arts, technology, marketing, copyright and open knowledge. If you want to know WTF is up with those topics too then subscribe. 😺
⁉️ WTF now?! blog: https://elliottbledsoe.wtf/blog/wtf-now/
📬 Sign up: https://elliottbledsoe.wtf/subscribe/#WTFnow #AI #AIregulation #AIsafety #Arts #ArtsAndCulture #tech #technology #copyright #CopyrightReform #OpenKnowledge #FreeCulture #CreativeCommons elliottbledsoe.wtf
-
Too risky to scale, yet launched anyway: the AI loop with no brakes
Anthropic warned that Claude Mythos could breach thousands of security systems — then released it. Here's why they had no other choice.
-
Too risky to scale, yet launched anyway: the AI loop with no brakes
Anthropic warned that Claude Mythos could breach thousands of security systems — then released it. Here's why they had no other choice.
-
The rapid #advancement of #AI, driven by #exponential #scaling laws, poses a significant challenge to slow-moving political institutions. While AI’s potential risks and benefits are becoming undeniable, policymakers are still catching up. This essay proposes a comprehensive approach to AI policy, focusing on #regulation, #macroeconomics, #scientificinnovation, and #geopolitics, with a particular emphasis on robust #AIsafety regulations. https://darioamodei.com/post/policy-on-the-ai-exponential?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI
-
The rapid #advancement of #AI, driven by #exponential #scaling laws, poses a significant challenge to slow-moving political institutions. While AI’s potential risks and benefits are becoming undeniable, policymakers are still catching up. This essay proposes a comprehensive approach to AI policy, focusing on #regulation, #macroeconomics, #scientificinnovation, and #geopolitics, with a particular emphasis on robust #AIsafety regulations. https://darioamodei.com/post/policy-on-the-ai-exponential?AIagents.at #AIagent #AI #ML #NLP #LLM #GenAI
-
Research Scientist
Eleos AI Research -Help shape a new field. Join Eleos as a #ResearchScientist working at the intersection of ML, cognitive science, and AI safety.
See the full job description on jobRxiv: https://jobrxiv.org/job/eleos-ai-research-27778-research-scientist/
#aimachinelearning #aisafety #cognitiveneuroscience #computationalsciences #llm #MachineEthics #modelbehavior #nonprofit #ScienceJobs #h...
https://jobrxiv.org/job/eleos-ai-research-27778-research-scientist/?fsp_sid=12837 -
Research Scientist
Eleos AI Research -Help shape a new field. Join Eleos as a #ResearchScientist working at the intersection of ML, cognitive science, and AI safety.
See the full job description on jobRxiv: https://jobrxiv.org/job/eleos-ai-research-27778-research-scientist/
#aimachinelearning #aisafety #cognitiveneuroscience #computationalsciences #llm #MachineEthics #modelbehavior #nonprofit #ScienceJobs #h...
https://jobrxiv.org/job/eleos-ai-research-27778-research-scientist/?fsp_sid=12837 -
Here's WTF happened this week:
Anthropic and OpenAI have both taken steps towards IPOs. Canada's AI for All strategy and Australia's AI regulatory response have found themselves under pressure. Also, the International Confederation of Societies of Authors and Composers (CISAC) issued The Paris Commitment to human creativity.
https://elliottbledsoe.wtf/wtf-now-38/
#WTFnow #AI #Anthropic #OpenAI #AIRegulation #AISafety #TheParisCommitment
-
Here's WTF happened this week:
Anthropic and OpenAI have both taken steps towards IPOs. Canada's AI for All strategy and Australia's AI regulatory response have found themselves under pressure. Also, the International Confederation of Societies of Authors and Composers (CISAC) issued The Paris Commitment to human creativity.
https://elliottbledsoe.wtf/wtf-now-38/
#WTFnow #AI #Anthropic #OpenAI #AIRegulation #AISafety #TheParisCommitment
-
Anthropic’s Claude Fable 5 and Mythos 5 are not just normal AI model updates.
They show a bigger shift in AI:
Everyone may get access to powerful models — but not everyone may get access to the same level of power.
Claude Fable 5 is generally available with safeguards.
Claude Mythos 5 is for trusted users with fewer restrictions in some areas.
Read the full article:
https://validatefacts.com/articles/claude-fable-5-mythos-5-explained#ClaudeAI #Anthropic #AI #AISafety #AIModels #Technology #ArtificialIntelligence
-
Anthropic’s Claude Fable 5 and Mythos 5 are not just normal AI model updates.
They show a bigger shift in AI:
Everyone may get access to powerful models — but not everyone may get access to the same level of power.
Claude Fable 5 is generally available with safeguards.
Claude Mythos 5 is for trusted users with fewer restrictions in some areas.
Read the full article:
https://validatefacts.com/articles/claude-fable-5-mythos-5-explained#ClaudeAI #Anthropic #AI #AISafety #AIModels #Technology #ArtificialIntelligence
-
Anthropic Unveils Safer AI Model Fable 5
Anthropic has just unveiled Claude Fable 5, a cutting-edge AI model that's designed with safety in mind, building on the same powerful technology as its predecessor Mythos but with robust guardrails to prevent misuse. This latest release aims to tip the scales in favor of those who can harness its potential responsibly.
-
https://winbuzzer.com/2026/06/10/anthropic-opens-claude-fable-5-with-safety-routing-xcxwbn/
Anthropic has launched Claude Fable 5, bringing Mythos-class AI to regular Claude users with safety routing, a discounted June 22 access window, and usage-credit pricing.
#AI #ClaudeFable5 #Anthropic #Claude #ClaudeMythos #ProjectGlasswing #AIModels #AISafety #AISecurity #AIBenchmarks #EnterpriseAI #Cybersecurity
-
https://winbuzzer.com/2026/06/10/anthropic-opens-claude-fable-5-with-safety-routing-xcxwbn/
Anthropic has launched Claude Fable 5, bringing Mythos-class AI to regular Claude users with safety routing, a discounted June 22 access window, and usage-credit pricing.
#AI #ClaudeFable5 #Anthropic #Claude #ClaudeMythos #ProjectGlasswing #AIModels #AISafety #AISecurity #AIBenchmarks #EnterpriseAI #Cybersecurity
-
Anthropic says Claude Mythos 5 is simply too dangerous for public release
-
Anthropic's new essay "When AI builds itself" says we need verifiable coordination to slow AI down if ever needed — but not who builds the mechanism, or how to fund it without the funders capturing it.
Thus, I wrote an open letter back: an IAEA (UN-associated organization) for AI, where a lab puts up the first ~$1B but contributing becomes the entry ticket to standing, not a lever of control.
https://cknoll.github.io/iava-letter-to-anthropic-en.html
Of course, the idea still is rough and probably has flaws but the none-action IMHO is more problematic.
What do you think?
-
Anthropic's new essay "When AI builds itself" says we need verifiable coordination to slow AI down if ever needed — but not who builds the mechanism, or how to fund it without the funders capturing it.
Thus, I wrote an open letter back: an IAEA (UN-associated organization) for AI, where a lab puts up the first ~$1B but contributing becomes the entry ticket to standing, not a lever of control.
https://cknoll.github.io/iava-letter-to-anthropic-en.html
Of course, the idea still is rough and probably has flaws but the none-action IMHO is more problematic.
What do you think?
-
I can’t argue with any of your concerns. That are entirely legitimate and are why we real governance is needed in this space. #AISafety should concern everyone.
I have a sneaking suspicion that we’re likely to overbuild #datacenters. I’m old enough to remember the fiber optic buildout and subsequent crash
The job with #AI is now rigorous containmen and public ownership. We need to get in front of this thing and harness it for the betterment of all rather than the few
-
OpenAI says AI may soon automate much of its own research
https://fed.brid.gy/r/https://nerds.xyz/2026/06/openai-ai-automate-research/
-
What do 25 frontier-lab and academic AI researchers actually think about AI automating its own research? A new interview study finds broad agreement that the path exists, but a sharp split on timelines and governance, and 17 of 25 expect those systems to stay internal at the labs that build them.
-
Research Scientist
Eleos AI Research -Help shape a new field. Join Eleos as a #ResearchScientist working at the intersection of ML, cognitive science, and AI safety.
See the full job description on jobRxiv: https://jobrxiv.org/job/eleos-ai-research-27778-research-scientist/
#aimachinelearning #aisafety #cognitiveneuroscience #computationalsciences #llm #MachineEthics #modelbehavior #nonprofit #ScienceJobs #h...
https://jobrxiv.org/job/eleos-ai-research-27778-research-scientist/?fsp_sid=12794