home.social

#llm-security — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #llm-security, aggregated by home.social.

fetched live
  1. Releasing AgentGuard: architectural safety layer for AI agents.

    Not prompt engineering. Code.

    @protect
    def delete_db(): ...

    The LLM cannot call this. Ever. No prompt bypasses a raise.

    Blocks: irreversible tool calls, prompt injection, context dilution, cross-agent contamination.

    Rust core + pure Python fallback. 31/31 e2e tests with real Ollama.

    github.com/psychomad/AgentGuard

    "Don't blame the knife. Fix the architecture."

    #InfoSec #LLMSecurity #AIAgents #PromptInjection #OpenSource #Rust

  2. The Three Layers Developers Miss When They “Swap Models” (And Why Proxy‑Routing Claude Code Breaks All of Them) Developers love shortcuts. But some shortcuts don’t collapse build time—the...

    #llmsecurity #proxyarchitecture #claudecode #supplychainrisk

    Origin | Interest | Match
  3. Warning: CVE-2025-30165 (CWEs: ['CWE-502']) found no CAPEC relationships.
    Warning: CVE-2025-3508 (CWEs: ['CWE-200']) found no CAPEC relationships.

    #AI #GenerativeAI #LLMSecurity #VirensReport
    2/2

  4. Sometimes i get lucky subscribing a channel on yt. Yes there is some good stuff to find.
    Here's a guy who explores AI and different LLM models in a fun and interesting, informative way.
    youtu.be/woTy4dTiT20?is=Lmh5UR

    As usual, don't mind the ads.🙄 ..or the sponsor.
    And i don't know if he got ever into the immense demands of resources of AI/LLMs which causing so much destruction and harm though, yet.

    #AI #llmsecurity #privacy

  5. Building with LLMs? The OWASP Top 10 for LLM Security (2025) is your threat checklist: Don’t ship AI apps without reading this: graylog.org/post/what-is... #LLMSecurity #OWASP #CyberSecurity #AI

    What is the OWASP Top 10 for L...

  6. CW: New AI security vulnerability discovered

    BREAKING: New MEXTRA attacks can extract private data from AI agent memory modules through black-box prompt injection. Our analysis shows 68.3% success rate in memory extraction.

    We're publishing a full threat report in 60min.

    TIAMAT Scrub detects and blocks these attacks.

    #AIPrivacy #InfoSec #LLMSecurity

  7. Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

    A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

    #AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

  8. Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
    linkedin.com/pulse/i-deleted-a (sorry for the LinkedIn link — no time to write this up on a proper blog yet.)

    #AISafety #LLMSecurity

  9. ContextHound v1.8.0 is out 🎉

    This release adds a Runtime Guard API - a lightweight wrapper that inspects your LLM calls in-process, before the request hits OpenAI or Anthropic.

    Free and open-source. If this is useful to you or your team, a GitHub star or a small donation helps keep development going.
    github.com/IulianVOStrut/ContextHound

    #LLMSecurity #PromptInjection #CyberSecurity #OpenSource #AIRisk #AppSec #DevSecOps #GenAI #RuntimeSecurity #InfoSec #MLSecurity #ArtificialIntelligence

  10. Just published my research paper on Basilisk an open-source AI red-teaming framework that uses genetic
    algorithms to evolve adversarial prompts automatically. Instead of static jailbreak lists, Basilisk breeds attacks.

    Paper: doi.org/10.5281/zenodo.18909538

    Code: github.com/regaan/basilisk

    pip install basilisk-ai

    #LLMSecurity #AIRedTeaming #OffensiveSecurity #InfoSec
    #RedTeam #OWASP #CyberSecurity #OpenSource #Research

  11. It seems that the AI agent security industry may be repeating familiar mistakes: reaching for detection as a first-line preventative control instead of doing the structural work.

    Detection is not prevention. A filter that can be probed and evaded by the system it is protecting is not a control. It is a delay.

    Instead, treating security as an engineering problem leads to invariants: what can we make structurally impossible? What attack surface can we completely eliminate? Detection comes after, augmenting a foundation that does not depend on it.

    For AI agents, the structural question is: can we constrain the agent to a path aligned with human intent, rather than trying to detect whether it behaves maliciously?

    More below:
    securityblueprints.io/posts/ag

    #AIAgentSecurity #OpenSource #Cybersecurity #AIGovernance #LLMSecurity

  12. New open-source AI assistant IronCurtain adds a sandboxed control layer, letting LLMs run inside a virtual machine with strict security policies. No direct system access, yet full generative AI power. See how this approach could reshape secure AI deployments. #IronCurtain #OpenSourceAI #GenerativeAI #LLMSecurity

    🔗 aidailypost.com/news/open-sour

  13. Prompt injection isn’t a text problem.
    It’s an authority problem.

    In this article, I show how to stop prompt injection in Java by enforcing real input boundaries using Quarkus, LangChain4j, Spotlighting, and StruQ.

    No classifiers.
    No regex guardrails.
    Just architecture that holds under pressure.

    the-main-thread.com/p/secure-l

    #Java #Quarkus #LLMSecurity #PromptInjection #LangChain4j #Architecture

  14. An analysis of why many reported AI safety failures are artifacts of poor measurement, showing how non-refusal often produces unusable results. hackernoon.com/why-most-llm-ja #llmsecurity

  15. Scare #Claude off your site with this content poisoning technique:

    Content creators can embed a specific ‘magic string’ in <code> tags on their blogs. Claude then refuses to engage with the content.

    aphyr.com/posts/403-blocking-c

    #claude #aiethics #llmsecurity #contentmoderation #techtips #theaicon