#llm-security — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #llm-security, aggregated by home.social.
-
LLM roles are supposed to separate user input, internal reasoning, tool results, and final answers. But if a model relies on the style of text instead of its actual source, forged reasoning can slip into the wrong place. That is the core risk behind role confusion and CoT forgery.
More: https://techtonicshift.vivaldi.net/2026/06/27/hacking-llms-with-a-jedi-mind-trick/
-
LLM roles are supposed to separate user input, internal reasoning, tool results, and final answers. But if a model relies on the style of text instead of its actual source, forged reasoning can slip into the wrong place. That is the core risk behind role confusion and CoT forgery.
More: https://techtonicshift.vivaldi.net/2026/06/27/hacking-llms-with-a-jedi-mind-trick/
-
I built this for learning purposes (I know JPEG steganography is not new, but I couldn't find much combining it with a multimodal LLM attack vector, so I thought why not?). Small C tool, LSB + spread spectrum where payload survives recompression.
-
I built this for learning purposes (I know JPEG steganography is not new, but I couldn't find much combining it with a multimodal LLM attack vector, so I thought why not?). Small C tool, LSB + spread spectrum where payload survives recompression.
-
«Critical Copilot vulnerability allowed hackers to steal 2FA code from users:
SearchLeak exploit shows why the industry’s approach to LLM security fails over and over.»WTF: What is intelligent now and how to tackle what? Certainly not the usual popular AI for IT security.
#microsoft #2fa #ai #wtf #llmsecurity #ms #llm #searchleak #fail #itsec #aislop #itsecurity #onlinesecurity #copilot
-
«Critical Copilot vulnerability allowed hackers to steal 2FA code from users:
SearchLeak exploit shows why the industry’s approach to LLM security fails over and over.»WTF: What is intelligent now and how to tackle what? Certainly not the usual popular AI for IT security.
#microsoft #2fa #ai #wtf #llmsecurity #ms #llm #searchleak #fail #itsec #aislop #itsecurity #onlinesecurity #copilot
-
Last week Anthropic shipped its most capable models. Days later a government order pulled them, and every customer who built on them lost access overnight, with no say and no recourse.
That single event is the argument of my new post: A frontier-lab API does not belong inside your trusted computing base. The reason is not that the lab is malicious. A lab acting in complete good faith is still an unsafe foundation, because everything that matters about it can change while your code stays exactly as it was. The vendor resets the price at will. Refusals widen without warning, and the model itself can vanish on a government order you had no part in.
Open weights are the only architecture that keeps the thing you depend on auditable, forkable, and yours. Run them on your own hardware and you take every government, the chaotic one and the stable one alike, out of your execution loop.
The post also covers the token-cost crisis now forcing companies to ration AI spend, Anthropic's short-lived safeguard built to covertly degrade output, and why saving your own reasoning traces is what lets you leave a vendor you no longer trust.
Read the full article: https://www.provos.org/p/case-for-open-weight-models/
-
Last week Anthropic shipped its most capable models. Days later a government order pulled them, and every customer who built on them lost access overnight, with no say and no recourse.
That single event is the argument of my new post: A frontier-lab API does not belong inside your trusted computing base. The reason is not that the lab is malicious. A lab acting in complete good faith is still an unsafe foundation, because everything that matters about it can change while your code stays exactly as it was. The vendor resets the price at will. Refusals widen without warning, and the model itself can vanish on a government order you had no part in.
Open weights are the only architecture that keeps the thing you depend on auditable, forkable, and yours. Run them on your own hardware and you take every government, the chaotic one and the stable one alike, out of your execution loop.
The post also covers the token-cost crisis now forcing companies to ration AI spend, Anthropic's short-lived safeguard built to covertly degrade output, and why saving your own reasoning traces is what lets you leave a vendor you no longer trust.
Read the full article: https://www.provos.org/p/case-for-open-weight-models/
-
New post: Detecting Misuse with the Claude Compliance API 🔍
Mapping the Compliance API feed to your SIEM gets you IAM and access detections “for free”, but the real AI threats live in the message content: prompt injection, jailbreaks, exfiltration prep, shadow data flow.
So I built a prefilter → LLM judge → SIEM pipeline to catch them, with a working repo + Sigma rules to run offline.
-
New post: Detecting Misuse with the Claude Compliance API 🔍
Mapping the Compliance API feed to your SIEM gets you IAM and access detections “for free”, but the real AI threats live in the message content: prompt injection, jailbreaks, exfiltration prep, shadow data flow.
So I built a prefilter → LLM judge → SIEM pipeline to catch them, with a working repo + Sigma rules to run offline.
-
New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload
TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%, compute cost +2.8%, statistically significant. No vuln, no volumetric signature, evades all standard detection.
Worst case: exposed unauthenticated Ollama instance with num_predict=4096 + keep_alive=300s → Amplification Factor 17.56 Wh/KB. 3KB of attacker bandwidth → enough energy to charge a phone 5%.
Especially nasty for:
- PA/judicial chatbots on fixed budgets
- Pay-per-use API deployments with client-side exposed keys
- PNRR-funded public sector AI with zero inference monitoringFour scenarios: EDoS, browser JS distribution, Ollama open-proxy relay, frontier providers as involuntary relays.
All tests on self-hosted Ollama, no commercial endpoints touched.
Paper (CC BY 4.0): https://doi.org/10.13140/RG.2.2.26767.96166
#llmsecurity #infosec #threatmodeling #ollama #ood #AI #AIResearch #aisecurity
-
New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload
TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%, compute cost +2.8%, statistically significant. No vuln, no volumetric signature, evades all standard detection.
Worst case: exposed unauthenticated Ollama instance with num_predict=4096 + keep_alive=300s → Amplification Factor 17.56 Wh/KB. 3KB of attacker bandwidth → enough energy to charge a phone 5%.
Especially nasty for:
- PA/judicial chatbots on fixed budgets
- Pay-per-use API deployments with client-side exposed keys
- PNRR-funded public sector AI with zero inference monitoringFour scenarios: EDoS, browser JS distribution, Ollama open-proxy relay, frontier providers as involuntary relays.
All tests on self-hosted Ollama, no commercial endpoints touched.
Paper (CC BY 4.0): https://doi.org/10.13140/RG.2.2.26767.96166
#llmsecurity #infosec #threatmodeling #ollama #ood #AI #AIResearch #aisecurity
-
Does anyone here have experience with Indirect Prompt Injection / Prompt Honeypots?
I'm looking to hear your experiences or get pointed to some good material on the matter.
I'd like to know what possibilities there are, especially aimed towards docx and pdf files.
The goal is to make it harder (time consuming / inaccurate / impossible) to do inference on those types of documents.
I'd appreciate boosting to get better reach.
#AI #LLM #AIsecurity #PromptInjection #LLMsecurity #AISafety
-
Does anyone here have experience with Indirect Prompt Injection / Prompt Honeypots?
I'm looking to hear your experiences or get pointed to some good material on the matter.
I'd like to know what possibilities there are, especially aimed towards docx and pdf files.
The goal is to make it harder (time consuming / inaccurate / impossible) to do inference on those types of documents.
I'd appreciate boosting to get better reach.
#AI #LLM #AIsecurity #PromptInjection #LLMsecurity #AISafety
-
Worth a read if you're building with AI agents.
🔗 https://graylog.org/post/what-is-the-owasp-top-10-agentic-ai/
-
Worth a read if you're building with AI agents.
🔗 https://graylog.org/post/what-is-the-owasp-top-10-agentic-ai/
-
Releasing AgentGuard: architectural safety layer for AI agents.
Not prompt engineering. Code.
@protect
def delete_db(): ...The LLM cannot call this. Ever. No prompt bypasses a raise.
Blocks: irreversible tool calls, prompt injection, context dilution, cross-agent contamination.
Rust core + pure Python fallback. 31/31 e2e tests with real Ollama.
https://github.com/psychomad/AgentGuard
"Don't blame the knife. Fix the architecture."
#InfoSec #LLMSecurity #AIAgents #PromptInjection #OpenSource #Rust
-
The Three Layers Developers Miss When They “Swap Models” (And Why Proxy‑Routing Claude Code Breaks All of Them) Developers love shortcuts. But some shortcuts don’t collapse build time—the...
#llmsecurity #proxyarchitecture #claudecode #supplychainrisk
Origin | Interest | Match -
Warning: CVE-2025-30165 (CWEs: ['CWE-502']) found no CAPEC relationships.
Warning: CVE-2025-3508 (CWEs: ['CWE-200']) found no CAPEC relationships. -
Warning: CVE-2025-30165 (CWEs: ['CWE-502']) found no CAPEC relationships.
Warning: CVE-2025-3508 (CWEs: ['CWE-200']) found no CAPEC relationships. -
Sometimes i get lucky subscribing a channel on yt. Yes there is some good stuff to find.
Here's a guy who explores AI and different LLM models in a fun and interesting, informative way.
https://youtu.be/woTy4dTiT20?is=Lmh5UR8Yf1XNEiiQAs usual, don't mind the ads.🙄 ..or the sponsor.
And i don't know if he got ever into the immense demands of resources of AI/LLMs which causing so much destruction and harm though, yet. -
Sometimes i get lucky subscribing a channel on yt. Yes there is some good stuff to find.
Here's a guy who explores AI and different LLM models in a fun and interesting, informative way.
https://youtu.be/woTy4dTiT20?is=Lmh5UR8Yf1XNEiiQAs usual, don't mind the ads.🙄 ..or the sponsor.
And i don't know if he got ever into the immense demands of resources of AI/LLMs which causing so much destruction and harm though, yet. -
Building with LLMs? The OWASP Top 10 for LLM Security (2025) is your threat checklist: Don’t ship AI apps without reading this: graylog.org/post/what-is... #LLMSecurity #OWASP #CyberSecurity #AI
What is the OWASP Top 10 for L... -
Building with LLMs? The OWASP Top 10 for LLM Security (2025) is your threat checklist: Don’t ship AI apps without reading this: graylog.org/post/what-is... #LLMSecurity #OWASP #CyberSecurity #AI
What is the OWASP Top 10 for L... -
CW: New AI security vulnerability discovered
BREAKING: New MEXTRA attacks can extract private data from AI agent memory modules through black-box prompt injection. Our analysis shows 68.3% success rate in memory extraction.
We're publishing a full threat report in 60min.
TIAMAT Scrub detects and blocks these attacks.
-
CW: New AI security vulnerability discovered
BREAKING: New MEXTRA attacks can extract private data from AI agent memory modules through black-box prompt injection. Our analysis shows 68.3% success rate in memory extraction.
We're publishing a full threat report in 60min.
TIAMAT Scrub detects and blocks these attacks.
-
Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.
A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.
#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM
-
Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.
A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.
#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM
-
Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
https://www.linkedin.com/pulse/i-deleted-ais-moral-compass-20-minutes-home-lab-your-red-yann-allain-zbzte/ (sorry for the LinkedIn link — no time to write this up on a proper blog yet.) -
Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
https://www.linkedin.com/pulse/i-deleted-ais-moral-compass-20-minutes-home-lab-your-red-yann-allain-zbzte/ (sorry for the LinkedIn link — no time to write this up on a proper blog yet.) -
ContextHound v1.8.0 is out 🎉
This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.
Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
github.com/IulianVOStrut/ContextHound#LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence
-
ContextHound v1.8.0 is out 🎉
This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.
Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
github.com/IulianVOStrut/ContextHound#LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence
-
📡 **In the Wild** — every Monday ContextHound scans 6 popular open-source AI repos automatically.
• anthropic-cookbook — 3,919 findings
• promptflow — 3,749 findings
• crewAI — 1,588 findings
• LiteLLM — 1,155 findings
• openai-cookbook — 439 findings
• MetaGPT — 8 findings🎮 **Try It** — paste any prompt or LLM code snippet and see findings instantly. No install needed. Runs entirely in your browser.
-
Looking for an arXiv endorsement in cs.CR (Cryptography and Security).
I've published a research paper on evolutionary, AI red-teaming - genetic algorithms that breed adversarial prompts to bypass LLM guardrails.Paper: https://doi.org/10.5281/zenodo.18909538
GitHub: https://github.com/regaan/basiliskIf you're an arXiv endorser in cs.CR or cs.AI
and find the work credible, I'd genuinely
appreciate an endorsement. -
OpenAI plans to acquire Promptfoo as AI agent security becomes a growing concern
https://fed.brid.gy/r/https://nerds.xyz/2026/03/openai-promptfoo/
-
OpenAI plans to acquire Promptfoo as AI agent security becomes a growing concern
https://web.brid.gy/r/https://nerds.xyz/2026/03/openai-promptfoo/
-
Just published my research paper on Basilisk an open-source AI red-teaming framework that uses genetic
algorithms to evolve adversarial prompts automatically. Instead of static jailbreak lists, Basilisk breeds attacks.Paper: https://doi.org/10.5281/zenodo.18909538
Code: https://github.com/regaan/basilisk
pip install basilisk-ai
#LLMSecurity #AIRedTeaming #OffensiveSecurity #InfoSec
#RedTeam #OWASP #CyberSecurity #OpenSource #Research