home.social

#airedteaming — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #airedteaming, aggregated by home.social.

  1. Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

    A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

    #AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

  2. Just published my research paper on Basilisk an open-source AI red-teaming framework that uses genetic
    algorithms to evolve adversarial prompts automatically. Instead of static jailbreak lists, Basilisk breeds attacks.

    Paper: doi.org/10.5281/zenodo.18909538

    Code: github.com/regaan/basilisk

    pip install basilisk-ai

    #LLMSecurity #AIRedTeaming #OffensiveSecurity #InfoSec
    #RedTeam #OWASP #CyberSecurity #OpenSource #Research