#airedteaming — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #airedteaming, aggregated by home.social.
-
Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.
A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.
#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM
-
Just published my research paper on Basilisk an open-source AI red-teaming framework that uses genetic
algorithms to evolve adversarial prompts automatically. Instead of static jailbreak lists, Basilisk breeds attacks.Paper: https://doi.org/10.5281/zenodo.18909538
Code: https://github.com/regaan/basilisk
pip install basilisk-ai
#LLMSecurity #AIRedTeaming #OffensiveSecurity #InfoSec
#RedTeam #OWASP #CyberSecurity #OpenSource #Research -
AI Red Teaming in Focus: Why CISA Advocates a Secure by Design Approach https://thecyberexpress.com/cisa-ai-red-teaming/ #TheCyberExpressNews #CyberEssentials #AITEVVframework #AIbasedsoftware #TheCyberExpress #SecurebyDesign #FirewallDaily #AIevaluations #TEVVpractices #AIRedTeaming #AIsecurity #CyberNews #AIsystems #CISA