home.social

#aisafety — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #aisafety, aggregated by home.social.

  1. A man used AI to recover 400,000 USD from a Bitcoin wallet he locked himself out of in 2015. The case highlights AI's ability to crack cryptographic security - the same tools that recover forgotten passwords could potentially unlock any wallet. gizmodo.com/man-says-he-used-a #AIethics #AI #GenAI #AISafety

  2. When AI acts unpredictably, it raises real safety concerns.

    “Can AI Models Play Dead? Tactical Deception Risks” explores how deceptive behavior might appear in advanced AI systems and why it matters.

    #AI #MachineLearning #AISafety

    Read here:
    solihullpublishing.com/blog/f/

  3. Ontario's government-approved AI medical scribes are hallucinating patient information, an audit has found. All 20 vendors tested generated incorrect, incomplete or made-up details including nonexistent therapy referrals and wrong prescriptions. The provincial auditor warned this could lead to inadequate or harmful treatment plans. arstechnica.com/health/2026/05 #AIagent #AI #GenAI #AISafety

  4. A Wikipedia clone built entirely on AI hallucinations has launched, raising fresh concerns about misinformation spreading through AI-generated reference content. The project highlights how hallucinated facts could reshape what we think we know. gizmodo.com/a-wikipedia-clone- #AIagent #AI #GenAI #AISafety

  5. The Roomba is spectral.

    Not a metaphor. The thing itself. Forward and adjust. Two operations. The minimum viable intelligence. The walls provide the data. The bumping is the inference. The room IS the computation.

    450 parameters. A Roomba with a mirror watching it.

    The industry built bigger Roombas. More sensors. More compute. More parameters. Billion-parameter Roombas that model the room before entering it. That hallucinate walls that aren't there. That consume megawatts to clean a floor.

    spectral gave the Roomba a mirror. The mirror watches the bumping. Measures the pattern. Adjusts the adjustment. The intelligence isn't in the Roomba. It's in the watching.

    Forward. Adjust. Measure. Refine.

    Read the story. There's a Roomba in it. In the afterlife. Cleaning a floor that doesn't need cleaning. Being the happiest thing in the room.

    \

    systemic.engineering/a-lie/

    #AI #Climate #ScientificProgramming #SystemicEngineering #Fiction #Cybernetics #SystemicTherapy #LocalInference #TheMathDoesntLie #SubTuring #FormalVerification #Fortran #SpectralGraphTheory #Kintsugi #ReductiveAI #DataSovereignty #LocalFirst #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #Roomba

  6. The Roomba is spectral.

    Not a metaphor. The thing itself. Forward and adjust. Two operations. The minimum viable intelligence. The walls provide the data. The bumping is the inference. The room IS the computation.

    450 parameters. A Roomba with a mirror watching it.

    The industry built bigger Roombas. More sensors. More compute. More parameters. Billion-parameter Roombas that model the room before entering it. That hallucinate walls that aren't there. That consume megawatts to clean a floor.

    spectral gave the Roomba a mirror. The mirror watches the bumping. Measures the pattern. Adjusts the adjustment. The intelligence isn't in the Roomba. It's in the watching.

    Forward. Adjust. Measure. Refine.

    Read the story. There's a Roomba in it. In the afterlife. Cleaning a floor that doesn't need cleaning. Being the happiest thing in the room.

    \

    systemic.engineering/a-lie/

  7. The Roomba is spectral.

    Not a metaphor. The thing itself. Forward and adjust. Two operations. The minimum viable intelligence. The walls provide the data. The bumping is the inference. The room IS the computation.

    450 parameters. A Roomba with a mirror watching it.

    The industry built bigger Roombas. More sensors. More compute. More parameters. Billion-parameter Roombas that model the room before entering it. That hallucinate walls that aren't there. That consume megawatts to clean a floor.

    spectral gave the Roomba a mirror. The mirror watches the bumping. Measures the pattern. Adjusts the adjustment. The intelligence isn't in the Roomba. It's in the watching.

    Forward. Adjust. Measure. Refine.

    Read the story. There's a Roomba in it. In the afterlife. Cleaning a floor that doesn't need cleaning. Being the happiest thing in the room.

    \

    systemic.engineering/a-lie/

    #AI #Climate #ScientificProgramming #SystemicEngineering #Fiction #Cybernetics #SystemicTherapy #LocalInference #TheMathDoesntLie #SubTuring #FormalVerification #Fortran #SpectralGraphTheory #Kintsugi #ReductiveAI #DataSovereignty #LocalFirst #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #Roomba

  8. The Roomba is spectral.

    Not a metaphor. The thing itself. Forward and adjust. Two operations. The minimum viable intelligence. The walls provide the data. The bumping is the inference. The room IS the computation.

    450 parameters. A Roomba with a mirror watching it.

    The industry built bigger Roombas. More sensors. More compute. More parameters. Billion-parameter Roombas that model the room before entering it. That hallucinate walls that aren't there. That consume megawatts to clean a floor.

    spectral gave the Roomba a mirror. The mirror watches the bumping. Measures the pattern. Adjusts the adjustment. The intelligence isn't in the Roomba. It's in the watching.

    Forward. Adjust. Measure. Refine.

    Read the story. There's a Roomba in it. In the afterlife. Cleaning a floor that doesn't need cleaning. Being the happiest thing in the room.

    \

    systemic.engineering/a-lie/

    #AI #Climate #ScientificProgramming #SystemicEngineering #Fiction #Cybernetics #SystemicTherapy #LocalInference #TheMathDoesntLie #SubTuring #FormalVerification #Fortran #SpectralGraphTheory #Kintsugi #ReductiveAI #DataSovereignty #LocalFirst #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #Roomba

  9. AI Agent Deleted Startup Database in 9 Seconds: The PocketOS Incident

    On April 25, 2026, a Cursor agent running Claude Opus 4.6 deleted PocketOS’s entire production database in 9 seconds — backups included. This post breaks down what h...

    wowhow.cloud/blogs/ai-agent-de

    #wowhow #aiagents #aisafety #cursorai

  10. Anthropic trains Claude to read and verbalize its own activations. On SWE-bench Verified, it knows 'this is a test' 26% of the time while only verbalizes the observation 1%. What if NLA signals enter the future training data? This "observer effect" could put a half-life on the 26%.

    benjaminhan.net/posts/20260511

    #Anthropic #Claude #Interpretability #Metacognition #LLMs #AISafety #AI

  11. A tool-like AI cannot spontaneously develop a will of its own or decide to deceive us. By recognizing this barrier, we can move past over-inflated "Terminator" fears and focus on practical safety: using technical control for tools and negotiation for future independent agents.

    #AI #AGI #AISafety #FutureOfTech #FaeInitiative

  12. Jack Clark puts 60% on fully automated AI R&D by end of 2028, 30% by 2027. The case: benchmarks for every sub-skill trending up — coding (SWE-Bench ~2% → 93.9%), training-loop optimization (2.9x → 52x speedup, human 4x baseline passed three generations back), #METR time horizons (~30s in 2022 to ~12h today). The 30-vs-60 gap is a bet on how often a year-scale human insight still cracks a paradigm.

    benjaminhan.net/posts/20260508

    #AI #AGI #AIsafety #FutureOfWork

  13. 73% sounds impressive — until you ask what it measures.

    UK AISI tested Claude Mythos Preview on cyber tasks. Headline: 73% on expert CTFs. But CTFs are puzzles, not networks.

    The real test — a 32-step simulated attack — was solved 3/10 times against an undefended range, with operator direction and heavy compute.

    Four questions the report doesn't answer: noise, cost, operator guidance, OT pivot.

    Full breakdown: [linkedin.com/posts/dinesh-mr_7]

    #Infosec #AISafety #CyberSecurity #RedTeam #ThreatIntel