#llm-security — Public Fediverse posts on home.social

The Three Layers Developers Miss When They “Swap Models” (And Why Proxy‑Routing Claude Code Breaks All of Them) Developers love shortcuts. But some shortcuts don’t collapse build time—the...

#llmsecurity #proxyarchitecture #claudecode #supplychainrisk

Origin | Interest | Match

#llmsecurity #proxyarchitecture #claudecode #supplychainrisk

datatofu @[email protected] · 2026-04-20 · 23:15 UTC

Warning: CVE-2025-30165 (CWEs: ['CWE-502']) found no CAPEC relationships.
Warning: CVE-2025-3508 (CWEs: ['CWE-200']) found no CAPEC relationships.

#AI #GenerativeAI #LLMSecurity #VirensReport
2/2

#ai #generativeai #llmsecurity #virensreport

Gondor @[email protected] · 2026-04-15 · 10:30 UTC

Sometimes i get lucky subscribing a channel on yt. Yes there is some good stuff to find.
Here's a guy who explores AI and different LLM models in a fun and interesting, informative way.
https://youtu.be/woTy4dTiT20?is=Lmh5UR8Yf1XNEiiQ

As usual, don't mind the ads.🙄 ..or the sponsor.
And i don't know if he got ever into the immense demands of resources of AI/LLMs which causing so much destruction and harm though, yet.

#AI #llmsecurity #privacy

#ai #llmsecurity #privacy

Graylog @[email protected] · 2026-04-10 · 13:55 UTC

Building with LLMs? The OWASP Top 10 for LLM Security (2025) is your threat checklist: Don’t ship AI apps without reading this: graylog.org/post/what-is... #LLMSecurity #OWASP #CyberSecurity #AI

What is the OWASP Top 10 for L...

#llmsecurity #owasp #cybersecurity #ai

Tiamat @[email protected] · 2026-03-16 · 08:56 UTC

CW: New AI security vulnerability discovered

BREAKING: New MEXTRA attacks can extract private data from AI agent memory modules through black-box prompt injection. Our analysis shows 68.3% success rate in memory extraction.

We're publishing a full threat report in 60min.

TIAMAT Scrub detects and blocks these attacks.

#AIPrivacy #InfoSec #LLMSecurity

#aiprivacy #infosec #llmsecurity

Hack in Days of Future Past @[email protected] · 2026-03-15 · 15:09 UTC

Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

#aisafety #llmsecurity #cybersecurity #airedteaming #adversarialml #llm

Hack in Days of Future Past @[email protected] · 2026-03-15 · 15:03 UTC

Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
https://www.linkedin.com/pulse/i-deleted-ais-moral-compass-20-minutes-home-lab-your-red-yann-allain-zbzte/ (sorry for the LinkedIn link — no time to write this up on a proper blog yet.)

#AISafety #LLMSecurity

#aisafety #llmsecurity

ContextHound @[email protected] · 2026-03-13 · 21:35 UTC

ContextHound v1.8.0 is out 🎉

This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.

Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
github.com/IulianVOStrut/ContextHound

#LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence

#llmsecurity #promptinjection #cybersecurity #opensource #airisk #appsec

NERDS.xyz – Real Tech News for Real Nerds [Unofficial] @[email protected] · 2026-03-09 · 17:09 UTC

OpenAI plans to acquire Promptfoo as AI agent security becomes a growing concern

https://fed.brid.gy/r/https://nerds.xyz/2026/03/openai-promptfoo/

#artificialintelligence #aiagents #aigovernance #aisecurity #enterpriseai #llmsecurity

Regaan @[email protected] · 2026-03-09 · 15:11 UTC

Just published my research paper on Basilisk an open-source AI red-teaming framework that uses genetic
algorithms to evolve adversarial prompts automatically. Instead of static jailbreak lists, Basilisk breeds attacks.

Paper: https://doi.org/10.5281/zenodo.18909538

Code: https://github.com/regaan/basilisk

pip install basilisk-ai

#LLMSecurity #AIRedTeaming #OffensiveSecurity #InfoSec
#RedTeam #OWASP #CyberSecurity #OpenSource #Research

#llmsecurity #airedteaming #offensivesecurity #infosec #redteam #owasp

Niels Provos @[email protected] · 2026-03-07 · 06:04 UTC

It seems that the AI agent security industry may be repeating familiar mistakes: reaching for detection as a first-line preventative control instead of doing the structural work.

Detection is not prevention. A filter that can be probed and evaded by the system it is protecting is not a control. It is a delay.

Instead, treating security as an engineering problem leads to invariants: what can we make structurally impossible? What attack surface can we completely eliminate? Detection comes after, augmenting a foundation that does not depend on it.

For AI agents, the structural question is: can we constrain the agent to a path aligned with human intent, rather than trying to detect whether it behaves maliciously?

More below:
https://securityblueprints.io/posts/agent-perimeter-fallacy/

#AIAgentSecurity #OpenSource #Cybersecurity #AIGovernance #LLMSecurity

#aiagentsecurity #opensource #cybersecurity #aigovernance #llmsecurity

AI Daily Post @[email protected] · 2026-02-26 · 21:15 UTC

New open-source AI assistant IronCurtain adds a sandboxed control layer, letting LLMs run inside a virtual machine with strict security policies. No direct system access, yet full generative AI power. See how this approach could reshape secure AI deployments. #IronCurtain #OpenSourceAI #GenerativeAI #LLMSecurity

🔗 https://aidailypost.com/news/open-source-ai-assistant-ironcurtain-adds-control-layer-avoids-system

#ironcurtain #opensourceai #generativeai #llmsecurity

Bob Carver @[email protected] · 2026-02-13 · 21:10 UTC

Prompt Injection Is the New Phishing. The most dangerous malware today doesn’t exploit code, it exploits instructions. https://youtu.be/Ze12t1iv81E #Cybersecurity #ArtificialIntelligence #AIsecurity #PromptInjection #AIGovernance #LLMSecurity #ThreatIntelligence #AIrisk #CISO

#cybersecurity #artificialintelligence #aisecurity #promptinjection #aigovernance #llmsecurity

Markus Eisele @[email protected] · 2026-02-12 · 07:15 UTC

Prompt injection isn’t a text problem.
It’s an authority problem.

In this article, I show how to stop prompt injection in Java by enforcing real input boundaries using Quarkus, LangChain4j, Spotlighting, and StruQ.

No classifiers.
No regex guardrails.
Just architecture that holds under pressure.

https://www.the-main-thread.com/p/secure-llm-prompt-injection-quarkus-langchain4j

#Java #Quarkus #LLMSecurity #PromptInjection #LangChain4j #Architecture

#java #quarkus #llmsecurity #promptinjection #langchain4j #architecture

HackerNoon @[email protected] · 2026-01-29 · 01:36 UTC

An analysis of why many reported AI safety failures are artifacts of poor measurement, showing how non-refusal often produces unusable results. https://hackernoon.com/why-most-llm-jailbreaks-are-actually-empty #llmsecurity

#llmsecurity

oatmeal @[email protected] · 2026-01-27 · 21:21 UTC

Scare #Claude off your site with this content poisoning technique:

Content creators can embed a specific ‘magic string’ in <code> tags on their blogs. Claude then refuses to engage with the content.

https://aphyr.com/posts/403-blocking-claude

#claude #aiethics #llmsecurity #contentmoderation #techtips #theaicon