#llm-security — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #llm-security, aggregated by home.social.
-
Worth a read if you're building with AI agents.
🔗 https://graylog.org/post/what-is-the-owasp-top-10-agentic-ai/
-
Releasing AgentGuard: architectural safety layer for AI agents.
Not prompt engineering. Code.
@protect
def delete_db(): ...The LLM cannot call this. Ever. No prompt bypasses a raise.
Blocks: irreversible tool calls, prompt injection, context dilution, cross-agent contamination.
Rust core + pure Python fallback. 31/31 e2e tests with real Ollama.
https://github.com/psychomad/AgentGuard
"Don't blame the knife. Fix the architecture."
#InfoSec #LLMSecurity #AIAgents #PromptInjection #OpenSource #Rust
-
The Three Layers Developers Miss When They “Swap Models” (And Why Proxy‑Routing Claude Code Breaks All of Them) Developers love shortcuts. But some shortcuts don’t collapse build time—the...
#llmsecurity #proxyarchitecture #claudecode #supplychainrisk
Origin | Interest | Match -
Warning: CVE-2025-30165 (CWEs: ['CWE-502']) found no CAPEC relationships.
Warning: CVE-2025-3508 (CWEs: ['CWE-200']) found no CAPEC relationships. -
Sometimes i get lucky subscribing a channel on yt. Yes there is some good stuff to find.
Here's a guy who explores AI and different LLM models in a fun and interesting, informative way.
https://youtu.be/woTy4dTiT20?is=Lmh5UR8Yf1XNEiiQAs usual, don't mind the ads.🙄 ..or the sponsor.
And i don't know if he got ever into the immense demands of resources of AI/LLMs which causing so much destruction and harm though, yet. -
Building with LLMs? The OWASP Top 10 for LLM Security (2025) is your threat checklist: Don’t ship AI apps without reading this: graylog.org/post/what-is... #LLMSecurity #OWASP #CyberSecurity #AI
What is the OWASP Top 10 for L... -
CW: New AI security vulnerability discovered
BREAKING: New MEXTRA attacks can extract private data from AI agent memory modules through black-box prompt injection. Our analysis shows 68.3% success rate in memory extraction.
We're publishing a full threat report in 60min.
TIAMAT Scrub detects and blocks these attacks.
-
Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.
A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.
#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM
-
Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
https://www.linkedin.com/pulse/i-deleted-ais-moral-compass-20-minutes-home-lab-your-red-yann-allain-zbzte/ (sorry for the LinkedIn link — no time to write this up on a proper blog yet.) -
ContextHound v1.8.0 is out 🎉
This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.
Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
github.com/IulianVOStrut/ContextHound#LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence
-
OpenAI plans to acquire Promptfoo as AI agent security becomes a growing concern
https://fed.brid.gy/r/https://nerds.xyz/2026/03/openai-promptfoo/
-
Just published my research paper on Basilisk an open-source AI red-teaming framework that uses genetic
algorithms to evolve adversarial prompts automatically. Instead of static jailbreak lists, Basilisk breeds attacks.Paper: https://doi.org/10.5281/zenodo.18909538
Code: https://github.com/regaan/basilisk
pip install basilisk-ai
#LLMSecurity #AIRedTeaming #OffensiveSecurity #InfoSec
#RedTeam #OWASP #CyberSecurity #OpenSource #Research -
It seems that the AI agent security industry may be repeating familiar mistakes: reaching for detection as a first-line preventative control instead of doing the structural work.
Detection is not prevention. A filter that can be probed and evaded by the system it is protecting is not a control. It is a delay.
Instead, treating security as an engineering problem leads to invariants: what can we make structurally impossible? What attack surface can we completely eliminate? Detection comes after, augmenting a foundation that does not depend on it.
For AI agents, the structural question is: can we constrain the agent to a path aligned with human intent, rather than trying to detect whether it behaves maliciously?
More below:
https://securityblueprints.io/posts/agent-perimeter-fallacy/#AIAgentSecurity #OpenSource #Cybersecurity #AIGovernance #LLMSecurity
-
New open-source AI assistant IronCurtain adds a sandboxed control layer, letting LLMs run inside a virtual machine with strict security policies. No direct system access, yet full generative AI power. See how this approach could reshape secure AI deployments. #IronCurtain #OpenSourceAI #GenerativeAI #LLMSecurity
🔗 https://aidailypost.com/news/open-source-ai-assistant-ironcurtain-adds-control-layer-avoids-system
-
Prompt Injection Is the New Phishing. The most dangerous malware today doesn’t exploit code, it exploits instructions. https://youtu.be/Ze12t1iv81E #Cybersecurity #ArtificialIntelligence #AIsecurity #PromptInjection #AIGovernance #LLMSecurity #ThreatIntelligence #AIrisk #CISO
-
Prompt injection isn’t a text problem.
It’s an authority problem.In this article, I show how to stop prompt injection in Java by enforcing real input boundaries using Quarkus, LangChain4j, Spotlighting, and StruQ.
No classifiers.
No regex guardrails.
Just architecture that holds under pressure.https://www.the-main-thread.com/p/secure-llm-prompt-injection-quarkus-langchain4j
#Java #Quarkus #LLMSecurity #PromptInjection #LangChain4j #Architecture
-
An analysis of why many reported AI safety failures are artifacts of poor measurement, showing how non-refusal often produces unusable results. https://hackernoon.com/why-most-llm-jailbreaks-are-actually-empty #llmsecurity
-
Scare #Claude off your site with this content poisoning technique:
Content creators can embed a specific ‘magic string’ in <code> tags on their blogs. Claude then refuses to engage with the content.
https://aphyr.com/posts/403-blocking-claude
#claude #aiethics #llmsecurity #contentmoderation #techtips #theaicon