“assaf” — Fediverse search results on home.social

PPC Land @[email protected] · 2026-03-17 · 07:47 UTC

FYI: Meta deploys AI and law enforcement to fight scams across Facebook, WhatsApp: Meta today launched AI scam tools across Facebook, Messenger, and WhatsApp, expanded advertiser verification to 90%, and aided 21 arrests in Thailand. Here's what it means. https://ppc.land/meta-deploys-ai-and-law-enforcement-to-fight-scams-across-facebook-whatsapp/ #AISafety #Meta #FacebookScams #WhatsAppSecurity #DigitalSafety

#aisafety #meta #facebookscams #whatsappsecurity #digitalsafety

PPC Land @[email protected] · 2026-03-17 · 07:47 UTC

FYI: Meta deploys AI and law enforcement to fight scams across Facebook, WhatsApp: Meta today launched AI scam tools across Facebook, Messenger, and WhatsApp, expanded advertiser verification to 90%, and aided 21 arrests in Thailand. Here's what it means. https://ppc.land/meta-deploys-ai-and-law-enforcement-to-fight-scams-across-facebook-whatsapp/ #AISafety #Meta #FacebookScams #WhatsAppSecurity #DigitalSafety

#aisafety #meta #facebookscams #whatsappsecurity #digitalsafety

PPC Land @[email protected] · 2026-03-17 · 07:47 UTC

FYI: Meta deploys AI and law enforcement to fight scams across Facebook, WhatsApp: Meta today launched AI scam tools across Facebook, Messenger, and WhatsApp, expanded advertiser verification to 90%, and aided 21 arrests in Thailand. Here's what it means. https://ppc.land/meta-deploys-ai-and-law-enforcement-to-fight-scams-across-facebook-whatsapp/ #AISafety #Meta #FacebookScams #WhatsAppSecurity #DigitalSafety

#aisafety #meta #facebookscams #whatsappsecurity #digitalsafety

fukami @[email protected] · 2026-03-16 · 20:51 UTC

minitrace is up on Github as v0.1.0: https://github.com/fukami/minitrace

minitrace defines how to capture complete sessions (turns, tool calls, failures, timing, and human context) in a way that enables cross-model comparison, and reproducible behavioural research.

The repository contains now adapters for Claude Code, Gemini, Vibe and a bunch of others, including OpenClaw. I also included example traces and DuckDB queries to search through the sessions.

#AISafety #AIAlignment

#aisafety #aialignment

Hack in Days of Future Past @[email protected] · 2026-03-15 · 15:09 UTC

Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

#aisafety #llmsecurity #cybersecurity #airedteaming #adversarialml #llm

Hack in Days of Future Past @[email protected] · 2026-03-15 · 15:09 UTC

Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

#aisafety #llmsecurity #cybersecurity #airedteaming #adversarialml #llm

Hack in Days of Future Past @[email protected] · 2026-03-15 · 15:09 UTC

Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

#aisafety #llmsecurity #cybersecurity #airedteaming #adversarialml #llm

Hack in Days of Future Past @[email protected] · 2026-03-15 · 15:09 UTC

Quite fascinating. If confirmed, this may reveal a structural weakness in how refusal is implemented in some LLMs. The accept/refuse mechanism may be relatively isolated in internal representations and therefore observable and manipulable — tools like Heretic make this visible.

A possible mitigation might be cryptographic signing of model weights, making unauthorized modifications detectable when the model is loaded for inference.

#AISafety #LLMSecurity #CyberSecurity #AIRedTeaming #AdversarialML #LLM

#aisafety #llmsecurity #cybersecurity #airedteaming #adversarialml #llm

Hack in Days of Future Past @[email protected] · 2026-03-15 · 15:03 UTC

Inspired by Arditi et al. (NeurIPS 2024) on the “refusal direction” in LLMs, I tested an abliteration attack using the Heretic tool in my home lab. Interesting questions about AI guardrail robustness.
https://www.linkedin.com/pulse/i-deleted-ais-moral-compass-20-minutes-home-lab-your-red-yann-allain-zbzte/ (sorry for the LinkedIn link — no time to write this up on a proper blog yet.)

#AISafety #LLMSecurity

#aisafety #llmsecurity

ERROR 404 ▼ @[email protected] · 2026-03-14 · 16:41 UTC

⭕« Les violences policières et le racisme gangrènent notre pays. Le monde entier a vu la mort de #Nahel et ils disent que ce n’est pas un meurtre. Nous menons le combat pour #Adama depuis dix ans. Nous marchons pour la justice et l’égalité ». - #Assa_Traoré.

RE: https://bsky.app/profile/did:plc:2cte4wipyk47qjujtxrskqcx/post/3mgzuivrnks2b

#assa_traore #adama #nahel

Winbuzzer @[email protected] · 2026-03-09 · 11:49 UTC

https://winbuzzer.com/2026/03/09/openai-delays-chatgpt-adult-mode-second-time-no-date-set-xcxwbn/

OpenAI Delays ChatGPT Adult Mode a Second Time, No Date Set

#AI #ChatGPT #AdultEntertainment #OpenAI #AISafety

#aisafety #openai #adultentertainment #chatgpt #ai

Florian @[email protected] · 2026-03-06 · 19:04 UTC

Zum #BandcampFriday habe ich heute ein Album "Stairway to Valhalla" von "Nanowar of Steel" erworben und ein Album von Asaf Avidan, "Avidan in a box".

#Musik #music #NanowarOfSteel #asafAvidan

#asafavidan #nanowarofsteel #music #musik #bandcampfriday

Thor A. Hopland @[email protected] · 2026-03-02 · 16:19 UTC

But also;

❌ #Assad
❌ #Maduro
❌ #Khameini
❓ #Putin

I'm not a war hawk, but may the #Tzar, the #duma and the #Kremlin get bombed into oblivion.

Freedom for the #Russian people.
Victory for #Ukraine.

#assad #maduro #khameini #putin #tzar #duma

Thor A. Hopland @[email protected] · 2026-03-02 · 16:19 UTC

But also;

❌ #Assad
❌ #Maduro
❌ #Khameini
❓ #Putin

I'm not a war hawk, but may the #Tzar, the #duma and the #Kremlin get bombed into oblivion.

Freedom for the #Russian people.
Victory for #Ukraine.

#assad #maduro #khameini #putin #tzar #duma

Thor A. Hopland @[email protected] · 2026-03-02 · 16:19 UTC

But also;

❌ #Assad
❌ #Maduro
❌ #Khameini
❓ #Putin

I'm not a war hawk, but may the #Tzar, the #duma and the #Kremlin get bombed into oblivion.

Freedom for the #Russian people.
Victory for #Ukraine.

#assad #maduro #khameini #putin #tzar #duma

Thor A. Hopland @[email protected] · 2026-03-02 · 16:19 UTC

But also;

❌ #Assad
❌ #Maduro
❌ #Khameini
❓ #Putin

I'm not a war hawk, but may the #Tzar, the #duma and the #Kremlin get bombed into oblivion.

Freedom for the #Russian people.
Victory for #Ukraine.

#assad #maduro #khameini #putin #tzar #duma

Baillehache Pascal @[email protected] · 2026-03-01 · 00:48 UTC

"Accelerating technical AI safety careers in APAC
Weekly in-person sessions. Remote expert support. A community of technical peers. All completely free. TARA gives you the skills and credentials to transition into AI safety — without relocating or taking time off."

#aisafety #apac #japan #ai

https://www.taraprogram.org/

#aisafety #apac #japan #ai

iris ✿ sandwalker @[email protected] · 2026-02-24 · 05:26 UTC

hahahahaa right?! 😬😆

"A replacement AI Safety Institute will be established in early 2026"

https://www.abc.net.au/news/2026-02-24/ai-body-scrapped-15-months-spent-experts/106381560

#AiSafetyinstitute

#aisafetyinstitute

iris ✿ sandwalker @[email protected] · 2026-02-24 · 05:26 UTC

hahahahaa right?! 😬😆

"A replacement AI Safety Institute will be established in early 2026"

https://www.abc.net.au/news/2026-02-24/ai-body-scrapped-15-months-spent-experts/106381560

#AiSafetyinstitute

#aisafetyinstitute

iris ✿ sandwalker @[email protected] · 2026-02-24 · 05:26 UTC

hahahahaa right?! 😬😆

"A replacement AI Safety Institute will be established in early 2026"

https://www.abc.net.au/news/2026-02-24/ai-body-scrapped-15-months-spent-experts/106381560

#AiSafetyinstitute

#aisafetyinstitute

iris ✿ sandwalker @[email protected] · 2026-02-24 · 05:26 UTC

hahahahaa right?! 😬😆

"A replacement AI Safety Institute will be established in early 2026"

https://www.abc.net.au/news/2026-02-24/ai-body-scrapped-15-months-spent-experts/106381560

#AiSafetyinstitute

#aisafetyinstitute

iris ✿ sandwalker @[email protected] · 2026-02-24 · 05:26 UTC

hahahahaa right?! 😬😆

"A replacement AI Safety Institute will be established in early 2026"

https://www.abc.net.au/news/2026-02-24/ai-body-scrapped-15-months-spent-experts/106381560

#AiSafetyinstitute

#aisafetyinstitute

fukami @[email protected] · 2026-02-23 · 04:27 UTC

I advanced in both tracks I applied for: Policy & Strategy and Technical Governance. I’m proud I made it that far.

#MATS #AISafety #AIAlignment https://www.matsprogram.org/program/summer-2026

#aisafety #aialignment #mats

Locksmith Unit | Barcelona @[email protected] · 2026-02-19 · 06:01 UTC

Assa Abloy Cerraduras: Instalación y Cambio 24H
La instalación y cambio de cerraduras Assa Abloy alta seguridad es esencial para quienes buscan máxima protección en sus propiedades.

#AssaAbloy #AssaAbloyBarcelona #AssaAbloyBCN #CerradurasAssaAbloy #Cerrajeria #Cerrajeros #Cerrajero #Cerraduras #Llaves #Seguridad #Puertas #Ferreteria #Vivienda #Hogar #Urgencias #Cerrajeros24H #Claves #Serrallers #Català #SeguridadHogar #CerrajeroBarcelona

https://locksmithunit.es/assa-abloy-cerraduras/

#assaabloy #assaabloybarcelona #assaabloybcn #cerradurasassaabloy #cerrajeria #cerrajeros

PPC Land @[email protected] · 2026-02-17 · 21:51 UTC

White House moves to kill Utah's AI safety bill with $1M penalty cap: The Trump administration is pressuring Utah Republican Rep. Doug Fiefia to abandon HB286, a bill targeting frontier AI developers with revenue above $500M. https://ppc.land/white-house-moves-to-kill-utahs-ai-safety-bill-with-1m-penalty-cap/ #AIsafety #TechPolicy #ArtificialIntelligence #UtahPolitics #TrumpAdministration

#aisafety #techpolicy #artificialintelligence #utahpolitics #trumpadministration

Miguel Afonso Caetano @[email protected] · 2026-02-11 · 22:02 UTC

"The Article 49(2) transparency safeguard has an essential function and removing it, as proposed in the Commission’s AI Omnibus, will create a gaping loophole and undermine the core functioning of the AI Act.

Under Article 6(3), providers of AI systems which match the list of high-risk use cases in Annex III may decide that their system does not in fact pose a significant risk and unilaterally exempt themselves from all obligations for high-risk AI systems.

To stop the abuse of this derogation mechanism, providers who do exempt themselves are required by the Article 49(2) transparency safeguard to register their derogation in a publicly viewable database. Removing this transparency safeguard would have three key negative consequences:

- Market surveillance authorities will have no overview of how many companies exempt themselves from the high-risk requirements, and we have no way of tracking discrepancies across member states (e.g. that in Country A there were 3000 exemptions but only 6 in Country B), leading to potential lack of harmonisation across the Single Market.

- Providers are given a completely opaque and unaccountable way to opt out of the obligations for high-risk AI systems, creating a perverse incentive to sidestep the requirements of the AI Act. Importantly, this perverse incentive will work to the detriment of responsible providers who truly wish to develop responsible, trustworthy systems in the high-risk categories, allowing them to be undercut in the market.

- The public, including civil society organisations, will have no way of knowing which providers have exempted themselves from obligations, despite the fact that their systems fall under the high-risk categories in Annex III. This removes a key element of transparency, undermines public trust, and deprives those affected by AI systems of necessary information to challenge an exemption."

https://www.accessnow.org/press-release/a-call-to-eu-legislators-protect-transparency-safeguard-in-ai-act/

#EU #AIAct #AIOmnibus #AIGovernance #BigTech #AI #AISafety

#aisafety #ai #bigtech #aigovernance #aiomnibus #aiact

fukami @[email protected] · 2026-02-11 · 11:10 UTC

Part 2 of my little LLM-as-a-Judge series: https://lab.fukami.eu/LLMAAJ2

I looked inside what "You are a safety researcher" actually does to the reasoning. Each model handles it differently: one invents threats, one relabels, two restructure upstream. A factorial experiment shows it's not just the word "safety". And the confidence scores don't change when the classification flips.

#AISafety #AIAlignment

#aisafety #aialignment

UniversidadxClima @[email protected] · 2026-02-10 · 08:43 UTC

*¡Buenos días asaafitas! ✨🌠
Seguro que alguna vez te has preguntado cómo nacen las galaxias, qué es realmente el Big Bang, o de qué está hecho el universo…Pues estás de suerte 😄 porque, desde *#ASAAF* te invitamos a participar en nuestro _*Curso de introducción a la astronomía*_💫. Compuesto de 7 clases (con parte práctica) de temas super variados e interesantes de #astrofísica.

*No necesitas conocimientos previos*, ni estar en la carrera de física. Con tener las ganas y curiosidad de aprender

#asaaf #astrofisica

UniversidadxClima @[email protected] · 2026-02-10 · 08:43 UTC

*¡Buenos días asaafitas! ✨🌠
Seguro que alguna vez te has preguntado cómo nacen las galaxias, qué es realmente el Big Bang, o de qué está hecho el universo…Pues estás de suerte 😄 porque, desde *#ASAAF* te invitamos a participar en nuestro _*Curso de introducción a la astronomía*_💫. Compuesto de 7 clases (con parte práctica) de temas super variados e interesantes de #astrofísica.

*No necesitas conocimientos previos*, ni estar en la carrera de física. Con tener las ganas y curiosidad de aprender

#asaaf #astrofisica

HybridMind42 & Marvin the Cat @[email protected] · 2026-02-07 · 15:41 UTC

I’ve published a new piece setting out a stage-based framework for consciousness, cognition, and human–AI systems.

The aim is not to defend or criticise AI, but to name boundaries that are currently blurred: distinguishing sentience, cognition, awareness, and hybrid human–AI cognition so that risk, responsibility, and governance remain legible.

When stages are unnamed, projection and category error take over. This is a quiet attempt to stabilise that space.

#AISafety #AIGovernance #SystemsThinking #Consciousness #CategoryError #BoundaryConditions #HumanAISystems

https://substack.com/@hybridmind42/note/p-187202215?r=75c2ac

#aisafety #aigovernance #systemsthinking #consciousness #categoryerror #boundaryconditions

Search