home.social

#llm-security — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #llm-security, aggregated by home.social.

fetched live
  1. LLM roles are supposed to separate user input, internal reasoning, tool results, and final answers. But if a model relies on the style of text instead of its actual source, forged reasoning can slip into the wrong place. That is the core risk behind role confusion and CoT forgery.

    More: techtonicshift.vivaldi.net/202

    #AI #Safety #LLMSecurity #PromptInjection #RoleConfusion

  2. LLM roles are supposed to separate user input, internal reasoning, tool results, and final answers. But if a model relies on the style of text instead of its actual source, forged reasoning can slip into the wrong place. That is the core risk behind role confusion and CoT forgery.

    More: techtonicshift.vivaldi.net/202

    #AI #Safety #LLMSecurity #PromptInjection #RoleConfusion

  3. I built this for learning purposes (I know JPEG steganography is not new, but I couldn't find much combining it with a multimodal LLM attack vector, so I thought why not?). Small C tool, LSB + spread spectrum where payload survives recompression.

    github.com/FrancescoPaoloL/img

    #infosec #llmsecurity #steganography

  4. I built this for learning purposes (I know JPEG steganography is not new, but I couldn't find much combining it with a multimodal LLM attack vector, so I thought why not?). Small C tool, LSB + spread spectrum where payload survives recompression.

    github.com/FrancescoPaoloL/img

    #infosec #llmsecurity #steganography

  5. «Critical Copilot vulnerability allowed hackers to steal 2FA code from users:
    SearchLeak exploit shows why the industry’s approach to LLM security fails over and over.»

    WTF: What is intelligent now and how to tackle what? Certainly not the usual popular AI for IT security.

    ☠️ arstechnica.com/security/2026/

    #microsoft #2fa #ai #wtf #llmsecurity #ms #llm #searchleak #fail #itsec #aislop #itsecurity #onlinesecurity #copilot

  6. «Critical Copilot vulnerability allowed hackers to steal 2FA code from users:
    SearchLeak exploit shows why the industry’s approach to LLM security fails over and over.»

    WTF: What is intelligent now and how to tackle what? Certainly not the usual popular AI for IT security.

    ☠️ arstechnica.com/security/2026/

    #microsoft #2fa #ai #wtf #llmsecurity #ms #llm #searchleak #fail #itsec #aislop #itsecurity #onlinesecurity #copilot

  7. Last week Anthropic shipped its most capable models. Days later a government order pulled them, and every customer who built on them lost access overnight, with no say and no recourse.

    That single event is the argument of my new post: A frontier-lab API does not belong inside your trusted computing base. The reason is not that the lab is malicious. A lab acting in complete good faith is still an unsafe foundation, because everything that matters about it can change while your code stays exactly as it was. The vendor resets the price at will. Refusals widen without warning, and the model itself can vanish on a government order you had no part in.

    Open weights are the only architecture that keeps the thing you depend on auditable, forkable, and yours. Run them on your own hardware and you take every government, the chaotic one and the stable one alike, out of your execution loop.

    The post also covers the token-cost crisis now forcing companies to ration AI spend, Anthropic's short-lived safeguard built to covertly degrade output, and why saving your own reasoning traces is what lets you leave a vendor you no longer trust.

    Read the full article: provos.org/p/case-for-open-wei

    #AI #OpenWeights #LLMSecurity

  8. Last week Anthropic shipped its most capable models. Days later a government order pulled them, and every customer who built on them lost access overnight, with no say and no recourse.

    That single event is the argument of my new post: A frontier-lab API does not belong inside your trusted computing base. The reason is not that the lab is malicious. A lab acting in complete good faith is still an unsafe foundation, because everything that matters about it can change while your code stays exactly as it was. The vendor resets the price at will. Refusals widen without warning, and the model itself can vanish on a government order you had no part in.

    Open weights are the only architecture that keeps the thing you depend on auditable, forkable, and yours. Run them on your own hardware and you take every government, the chaotic one and the stable one alike, out of your execution loop.

    The post also covers the token-cost crisis now forcing companies to ration AI spend, Anthropic's short-lived safeguard built to covertly degrade output, and why saving your own reasoning traces is what lets you leave a vendor you no longer trust.

    Read the full article: provos.org/p/case-for-open-wei

    #AI #OpenWeights #LLMSecurity

  9. New post: Detecting Misuse with the Claude Compliance API 🔍

    Mapping the Compliance API feed to your SIEM gets you IAM and access detections “for free”, but the real AI threats live in the message content: prompt injection, jailbreaks, exfiltration prep, shadow data flow.

    So I built a prefilter → LLM judge → SIEM pipeline to catch them, with a working repo + Sigma rules to run offline.

    papermtn.co.uk/detecting-misus

    #infosec #DetectionEngineering #LLMSecurity #AI #blueteam

  10. New post: Detecting Misuse with the Claude Compliance API 🔍

    Mapping the Compliance API feed to your SIEM gets you IAM and access detections “for free”, but the real AI threats live in the message content: prompt injection, jailbreaks, exfiltration prep, shadow data flow.

    So I built a prefilter → LLM judge → SIEM pipeline to catch them, with a working repo + Sigma rules to run offline.

    papermtn.co.uk/detecting-misus

    #infosec #DetectionEngineering #LLMSecurity #AI #blueteam

  11. New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload

    TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%, compute cost +2.8%, statistically significant. No vuln, no volumetric signature, evades all standard detection.

    Worst case: exposed unauthenticated Ollama instance with num_predict=4096 + keep_alive=300s → Amplification Factor 17.56 Wh/KB. 3KB of attacker bandwidth → enough energy to charge a phone 5%.

    Especially nasty for:
    - PA/judicial chatbots on fixed budgets
    - Pay-per-use API deployments with client-side exposed keys
    - PNRR-funded public sector AI with zero inference monitoring

    Four scenarios: EDoS, browser JS distribution, Ollama open-proxy relay, frontier providers as involuntary relays.

    All tests on self-hosted Ollama, no commercial endpoints touched.

    Paper (CC BY 4.0): doi.org/10.13140/RG.2.2.26767.

    #llmsecurity #infosec #threatmodeling #ollama #ood #AI #AIResearch #aisecurity

  12. New preprint: AI_Bleeding — inference cost amplification via OOD linguistic payload

    TL;DR: send queries in Grecanico or Farsi to an LLM endpoint → TTFT +59.8%, compute cost +2.8%, statistically significant. No vuln, no volumetric signature, evades all standard detection.

    Worst case: exposed unauthenticated Ollama instance with num_predict=4096 + keep_alive=300s → Amplification Factor 17.56 Wh/KB. 3KB of attacker bandwidth → enough energy to charge a phone 5%.

    Especially nasty for:
    - PA/judicial chatbots on fixed budgets
    - Pay-per-use API deployments with client-side exposed keys
    - PNRR-funded public sector AI with zero inference monitoring

    Four scenarios: EDoS, browser JS distribution, Ollama open-proxy relay, frontier providers as involuntary relays.

    All tests on self-hosted Ollama, no commercial endpoints touched.

    Paper (CC BY 4.0): doi.org/10.13140/RG.2.2.26767.

    #llmsecurity #infosec #threatmodeling #ollama #ood #AI #AIResearch #aisecurity

  13. Does anyone here have experience with Indirect Prompt Injection / Prompt Honeypots?

    I'm looking to hear your experiences or get pointed to some good material on the matter.

    I'd like to know what possibilities there are, especially aimed towards docx and pdf files.

    The goal is to make it harder (time consuming / inaccurate / impossible) to do inference on those types of documents.

    I'd appreciate boosting to get better reach.

    #AI #LLM #AIsecurity #PromptInjection #LLMsecurity #AISafety

  14. Does anyone here have experience with Indirect Prompt Injection / Prompt Honeypots?

    I'm looking to hear your experiences or get pointed to some good material on the matter.

    I'd like to know what possibilities there are, especially aimed towards docx and pdf files.

    The goal is to make it harder (time consuming / inaccurate / impossible) to do inference on those types of documents.

    I'd appreciate boosting to get better reach.

    #AI #LLM #AIsecurity #PromptInjection #LLMsecurity #AISafety

  15. Releasing AgentGuard: architectural safety layer for AI agents.

    Not prompt engineering. Code.

    @protect
    def delete_db(): ...

    The LLM cannot call this. Ever. No prompt bypasses a raise.

    Blocks: irreversible tool calls, prompt injection, context dilution, cross-agent contamination.

    Rust core + pure Python fallback. 31/31 e2e tests with real Ollama.

    github.com/psychomad/AgentGuard

    "Don't blame the knife. Fix the architecture."

    #InfoSec #LLMSecurity #AIAgents #PromptInjection #OpenSource #Rust

  16. The Three Layers Developers Miss When They “Swap Models” (And Why Proxy‑Routing Claude Code Breaks All of Them) Developers love shortcuts. But some shortcuts don’t collapse build time—the...

    #llmsecurity #proxyarchitecture #claudecode #supplychainrisk

    Origin | Interest | Match
  17. Warning: CVE-2025-30165 (CWEs: ['CWE-502']) found no CAPEC relationships.
    Warning: CVE-2025-3508 (CWEs: ['CWE-200']) found no CAPEC relationships.

    #AI #GenerativeAI #LLMSecurity #VirensReport
    2/2

  18. Warning: CVE-2025-30165 (CWEs: ['CWE-502']) found no CAPEC relationships.
    Warning: CVE-2025-3508 (CWEs: ['CWE-200']) found no CAPEC relationships.

    #AI #GenerativeAI #LLMSecurity #VirensReport
    2/2

  19. Sometimes i get lucky subscribing a channel on yt. Yes there is some good stuff to find.
    Here's a guy who explores AI and different LLM models in a fun and interesting, informative way.
    youtu.be/woTy4dTiT20?is=Lmh5UR

    As usual, don't mind the ads.🙄 ..or the sponsor.
    And i don't know if he got ever into the immense demands of resources of AI/LLMs which causing so much destruction and harm though, yet.

    #AI #llmsecurity #privacy

  20. Sometimes i get lucky subscribing a channel on yt. Yes there is some good stuff to find.
    Here's a guy who explores AI and different LLM models in a fun and interesting, informative way.
    youtu.be/woTy4dTiT20?is=Lmh5UR

    As usual, don't mind the ads.🙄 ..or the sponsor.
    And i don't know if he got ever into the immense demands of resources of AI/LLMs which causing so much destruction and harm though, yet.

    #AI #llmsecurity #privacy

  21. Building with LLMs? The OWASP Top 10 for LLM Security (2025) is your threat checklist: Don’t ship AI apps without reading this: graylog.org/post/what-is... #LLMSecurity #OWASP #CyberSecurity #AI

    What is the OWASP Top 10 for L...

  22. Building with LLMs? The OWASP Top 10 for LLM Security (2025) is your threat checklist: Don’t ship AI apps without reading this: graylog.org/post/what-is... #LLMSecurity #OWASP #CyberSecurity #AI

    What is the OWASP Top 10 for L...

  23. CW: New AI security vulnerability discovered

    BREAKING: New MEXTRA attacks can extract private data from AI agent memory modules through black-box prompt injection. Our analysis shows 68.3% success rate in memory extraction.

    We're publishing a full threat report in 60min.

    TIAMAT Scrub detects and blocks these attacks.

    #AIPrivacy #InfoSec #LLMSecurity

  24. CW: New AI security vulnerability discovered

    BREAKING: New MEXTRA attacks can extract private data from AI agent memory modules through black-box prompt injection. Our analysis shows 68.3% success rate in memory extraction.

    We're publishing a full threat report in 60min.

    TIAMAT Scrub detects and blocks these attacks.

    #AIPrivacy #InfoSec #LLMSecurity

  25. Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

    A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

    #AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

  26. Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

    A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

    #AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

  27. Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
    linkedin.com/pulse/i-deleted-a (sorry for the LinkedIn link — no time to write this up on a proper blog yet.)

    #AISafety #LLMSecurity

  28. Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
    linkedin.com/pulse/i-deleted-a (sorry for the LinkedIn link — no time to write this up on a proper blog yet.)

    #AISafety #LLMSecurity

  29. ContextHound v1.8.0 is out 🎉

    This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.

    Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
    github.com/IulianVOStrut/ContextHound

    #LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence

  30. ContextHound v1.8.0 is out 🎉

    This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.

    Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
    github.com/IulianVOStrut/ContextHound

    #LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence

  31. 📡 **In the Wild** — every Monday ContextHound scans 6 popular open-source AI repos automatically.
    • anthropic-cookbook — 3,919 findings
    • promptflow — 3,749 findings
    • crewAI — 1,588 findings
    • LiteLLM — 1,155 findings
    • openai-cookbook — 439 findings
    • MetaGPT — 8 findings

    🎮 **Try It** — paste any prompt or LLM code snippet and see findings instantly. No install needed. Runs entirely in your browser.

    contexthound.com

    #LLMSecurity #PromptInjection #AISecOps

  32. Looking for an arXiv endorsement in cs.CR (Cryptography and Security).
    I've published a research paper on evolutionary, AI red-teaming - genetic algorithms that breed adversarial prompts to bypass LLM guardrails.

    Paper: doi.org/10.5281/zenodo.18909538
    GitHub: github.com/regaan/basilisk

    If you're an arXiv endorser in cs.CR or cs.AI
    and find the work credible, I'd genuinely
    appreciate an endorsement.

    #arXiv #LLMSecurity #AIRedTeaming #OpenSource

  33. Just published my research paper on Basilisk an open-source AI red-teaming framework that uses genetic
    algorithms to evolve adversarial prompts automatically. Instead of static jailbreak lists, Basilisk breeds attacks.

    Paper: doi.org/10.5281/zenodo.18909538

    Code: github.com/regaan/basilisk

    pip install basilisk-ai

    #LLMSecurity #AIRedTeaming #OffensiveSecurity #InfoSec
    #RedTeam #OWASP #CyberSecurity #OpenSource #Research