#toxiccontent — Public Fediverse posts on home.social

STOPDISINFORMATION @[email protected] · 2025-04-16 · 14:35 UTC

Empowering users to control their news feeds is a key component of the #DSA’s protections against toxic, profit-driven content recommendation systems of the type that #Meta deploys. Hiding news feed controls from users and regularly undoing settings made by users that wish to avoid #toxiccontent being algorithmically pushed onto their screens is a blatant breach of the DSA.”

– Jan Penfrat, Senior Policy Advisor, EDRi

#toxiccontent #meta #dsa

STOPDISINFORMATION @[email protected] · 2025-04-16 · 14:35 UTC

Empowering users to control their news feeds is a key component of the #DSA’s protections against toxic, profit-driven content recommendation systems of the type that #Meta deploys. Hiding news feed controls from users and regularly undoing settings made by users that wish to avoid #toxiccontent being algorithmically pushed onto their screens is a blatant breach of the DSA.”

– Jan Penfrat, Senior Policy Advisor, EDRi

#dsa #meta #toxiccontent

Bi Sasquatch @[email protected] · 2025-02-01 · 03:13 UTC

Sourece: Wired

From the article: "Ever since OpenAI released ChatGPT at the end of 2022, hackers and security researchers have tried to find holes in large language models (LLMs) to get around their guardrails and trick them into spewing out hate speech, bomb-making instructions, propaganda, and other harmful content. In response, OpenAI and other generative AI developers have refined their system defenses to make it more difficult to carry out these attacks. But as the Chinese AI platform DeepSeek rockets to prominence with its new, cheaper R1 reasoning model, its safety protections appear to be far behind those of its established competitors.

"Today, security researchers from Cisco and the University of Pennsylvania are publishing findings showing that, when tested with 50 malicious prompts designed to elicit toxic content, DeepSeek’s model did not detect or block a single one. In other words, the researchers say they were shocked to achieve a “100 percent attack success rate.”

#AI #ArtificialIntelligence #DeepSeek #ChatBot #Guardrails #Safety #Security #ToxicContent
https://www.wired.com/story/deepseeks-ai-jailbreak-prompt-injection-attacks/

#ai #artificialintelligence #deepseek #chatbot #guardrails #safety

Bi Sasquatch @[email protected] · 2025-02-01 · 03:13 UTC

Sourece: Wired

From the article: "Ever since OpenAI released ChatGPT at the end of 2022, hackers and security researchers have tried to find holes in large language models (LLMs) to get around their guardrails and trick them into spewing out hate speech, bomb-making instructions, propaganda, and other harmful content. In response, OpenAI and other generative AI developers have refined their system defenses to make it more difficult to carry out these attacks. But as the Chinese AI platform DeepSeek rockets to prominence with its new, cheaper R1 reasoning model, its safety protections appear to be far behind those of its established competitors.

"Today, security researchers from Cisco and the University of Pennsylvania are publishing findings showing that, when tested with 50 malicious prompts designed to elicit toxic content, DeepSeek’s model did not detect or block a single one. In other words, the researchers say they were shocked to achieve a “100 percent attack success rate.”

#AI #ArtificialIntelligence #DeepSeek #ChatBot #Guardrails #Safety #Security #ToxicContent
https://www.wired.com/story/deepseeks-ai-jailbreak-prompt-injection-attacks/

#ai #artificialintelligence #deepseek #chatbot #guardrails #safety

Bi Sasquatch @[email protected] · 2025-02-01 · 03:13 UTC

Sourece: Wired

From the article: "Ever since OpenAI released ChatGPT at the end of 2022, hackers and security researchers have tried to find holes in large language models (LLMs) to get around their guardrails and trick them into spewing out hate speech, bomb-making instructions, propaganda, and other harmful content. In response, OpenAI and other generative AI developers have refined their system defenses to make it more difficult to carry out these attacks. But as the Chinese AI platform DeepSeek rockets to prominence with its new, cheaper R1 reasoning model, its safety protections appear to be far behind those of its established competitors.

"Today, security researchers from Cisco and the University of Pennsylvania are publishing findings showing that, when tested with 50 malicious prompts designed to elicit toxic content, DeepSeek’s model did not detect or block a single one. In other words, the researchers say they were shocked to achieve a “100 percent attack success rate.”

#AI #ArtificialIntelligence #DeepSeek #ChatBot #Guardrails #Safety #Security #ToxicContent
https://www.wired.com/story/deepseeks-ai-jailbreak-prompt-injection-attacks/

#ai #artificialintelligence #deepseek #chatbot #guardrails #safety

Bi Sasquatch @[email protected] · 2025-02-01 · 03:13 UTC

Sourece: Wired

From the article: "Ever since OpenAI released ChatGPT at the end of 2022, hackers and security researchers have tried to find holes in large language models (LLMs) to get around their guardrails and trick them into spewing out hate speech, bomb-making instructions, propaganda, and other harmful content. In response, OpenAI and other generative AI developers have refined their system defenses to make it more difficult to carry out these attacks. But as the Chinese AI platform DeepSeek rockets to prominence with its new, cheaper R1 reasoning model, its safety protections appear to be far behind those of its established competitors.

"Today, security researchers from Cisco and the University of Pennsylvania are publishing findings showing that, when tested with 50 malicious prompts designed to elicit toxic content, DeepSeek’s model did not detect or block a single one. In other words, the researchers say they were shocked to achieve a “100 percent attack success rate.”

#AI #ArtificialIntelligence #DeepSeek #ChatBot #Guardrails #Safety #Security #ToxicContent
https://www.wired.com/story/deepseeks-ai-jailbreak-prompt-injection-attacks/

#toxiccontent #security #safety #guardrails #chatbot #deepseek

Bi Sasquatch @[email protected] · 2025-02-01 · 03:13 UTC

Sourece: Wired

From the article: "Ever since OpenAI released ChatGPT at the end of 2022, hackers and security researchers have tried to find holes in large language models (LLMs) to get around their guardrails and trick them into spewing out hate speech, bomb-making instructions, propaganda, and other harmful content. In response, OpenAI and other generative AI developers have refined their system defenses to make it more difficult to carry out these attacks. But as the Chinese AI platform DeepSeek rockets to prominence with its new, cheaper R1 reasoning model, its safety protections appear to be far behind those of its established competitors.

"Today, security researchers from Cisco and the University of Pennsylvania are publishing findings showing that, when tested with 50 malicious prompts designed to elicit toxic content, DeepSeek’s model did not detect or block a single one. In other words, the researchers say they were shocked to achieve a “100 percent attack success rate.”

#AI #ArtificialIntelligence #DeepSeek #ChatBot #Guardrails #Safety #Security #ToxicContent
https://www.wired.com/story/deepseeks-ai-jailbreak-prompt-injection-attacks/