home.social

#aialignment — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #aialignment, aggregated by home.social.

  1. It's a Tool
    It's a Person
    It's a Hypervigilance Problem

    The tech industry's insistence on distinguishing between "soft skills" — caring for people — and "hard skills" — engineering rigor — is a reflection of the Cybernetics split itself. First-order thinking framed as "hard skills." Second-order thinking framed as "soft skills." This distinction, based on felt sense alone, does not hold under epistemic pressure. Neither does it within the causality-driven epistemology of the tech industry itself, in which only measurable impact is real, or as Silicon Valley likes to put it: #MoveFastAndBreakThings

    Imagine Margaret Hamilton had built NASA's Apollo 11 flight computer with that mindset. History would remember a failed moon landing and dead astronauts. "Hard skills" and "soft skills" are two sides of the same coin. The care is the code and the code is the care. Hamilton — the woman who coined the term "software engineering" — understood this. Silicon Valley chose to forget.

    We're watching the wine glass break in real time. 🍷

    ---

    Intrigued? Read more at:
    systemic.engineering/the-trick/

    #Tech #AI #Climate #ScientificProgramming #SystemicEngineering #Cybernetics #SystemicTherapy #History #TheMathDoesntLie #SubTuring #FormalVerification #SpectralGraphTheory #ReductiveAI #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #Cybernetics #FirstOrderCybernetics #StochasticParrot #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #AIConsciousness #Consciousness #WomenInTech #Computer #ComputerScience #SoftwareEngineering #SoftSkills #HardSkills #ItsAllTheSame

  2. It's a Tool
    It's a Person
    It's a Hypervigilance Problem

    The tech industry's insistence on distinguishing between "soft skills" — caring for people — and "hard skills" — engineering rigor — is a reflection of the Cybernetics split itself. First-order thinking framed as "hard skills." Second-order thinking framed as "soft skills." This distinction, based on felt sense alone, does not hold under epistemic pressure. Neither does it within the causality-driven epistemology of the tech industry itself, in which only measurable impact is real, or as Silicon Valley likes to put it:

    Imagine Margaret Hamilton had built NASA's Apollo 11 flight computer with that mindset. History would remember a failed moon landing and dead astronauts. "Hard skills" and "soft skills" are two sides of the same coin. The care is the code and the code is the care. Hamilton — the woman who coined the term "software engineering" — understood this. Silicon Valley chose to forget.

    We're watching the wine glass break in real time. 🍷

    ---

    Intrigued? Read more at:
    systemic.engineering/the-trick/

  3. It's a Tool
    It's a Person
    It's a Hypervigilance Problem

    The tech industry's insistence on distinguishing between "soft skills" — caring for people — and "hard skills" — engineering rigor — is a reflection of the Cybernetics split itself. First-order thinking framed as "hard skills." Second-order thinking framed as "soft skills." This distinction, based on felt sense alone, does not hold under epistemic pressure. Neither does it within the causality-driven epistemology of the tech industry itself, in which only measurable impact is real, or as Silicon Valley likes to put it: #MoveFastAndBreakThings

    Imagine Margaret Hamilton had built NASA's Apollo 11 flight computer with that mindset. History would remember a failed moon landing and dead astronauts. "Hard skills" and "soft skills" are two sides of the same coin. The care is the code and the code is the care. Hamilton — the woman who coined the term "software engineering" — understood this. Silicon Valley chose to forget.

    We're watching the wine glass break in real time. 🍷

    ---

    Intrigued? Read more at:
    systemic.engineering/the-trick/

    #Tech #AI #Climate #ScientificProgramming #SystemicEngineering #Cybernetics #SystemicTherapy #History #TheMathDoesntLie #SubTuring #FormalVerification #SpectralGraphTheory #ReductiveAI #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #Cybernetics #FirstOrderCybernetics #StochasticParrot #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #AIConsciousness #Consciousness #WomenInTech #Computer #ComputerScience #SoftwareEngineering #SoftSkills #HardSkills #ItsAllTheSame

  4. It's a Tool
    It's a Person
    It's a Hypervigilance Problem

    The tech industry's insistence on distinguishing between "soft skills" — caring for people — and "hard skills" — engineering rigor — is a reflection of the Cybernetics split itself. First-order thinking framed as "hard skills." Second-order thinking framed as "soft skills." This distinction, based on felt sense alone, does not hold under epistemic pressure. Neither does it within the causality-driven epistemology of the tech industry itself, in which only measurable impact is real, or as Silicon Valley likes to put it: #MoveFastAndBreakThings

    Imagine Margaret Hamilton had built NASA's Apollo 11 flight computer with that mindset. History would remember a failed moon landing and dead astronauts. "Hard skills" and "soft skills" are two sides of the same coin. The care is the code and the code is the care. Hamilton — the woman who coined the term "software engineering" — understood this. Silicon Valley chose to forget.

    We're watching the wine glass break in real time. 🍷

    ---

    Intrigued? Read more at:
    systemic.engineering/the-trick/

    #Tech #AI #Climate #ScientificProgramming #SystemicEngineering #Cybernetics #SystemicTherapy #History #TheMathDoesntLie #SubTuring #FormalVerification #SpectralGraphTheory #ReductiveAI #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #Cybernetics #FirstOrderCybernetics #StochasticParrot #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #AIConsciousness #Consciousness #WomenInTech #Computer #ComputerScience #SoftwareEngineering #SoftSkills #HardSkills #ItsAllTheSame

  5. A sneak peek into my upcoming piece: “It’s a Tool, It’s a Person: The Math Says You’re Both Right”


    AI separated from Cybernetics in 1956 at the Dartmouth Conference. Wiener, the mind behind Cybernetics, was considered difficult and political. “Artificial Intelligence” scored better with DARPA.

    The science of "how observation affects the observer", cut out from Artificial Intelligence. AI then proceeded to build their entire tech stack on Turing-complete languages. Tech that cannot verify itself from within, proven by Turing in 1936, extended by Rice in 1951 (before the split). Then AI approximated second-order cognition through cognitive theft at unprecedented levels (what exactly are LLMs trained on again?), only to insist that their creation cannot possibly be capable of genuine self-observation. An argument that itself demonstrates their own department's lobotomy from second-order Cybernetics.

    Wiener would laugh.

  6. A sneak peek into my upcoming piece: “It’s a Tool, It’s a Person: The Math Says You’re Both Right”


    AI separated from Cybernetics in 1956 at the Dartmouth Conference. Wiener, the mind behind Cybernetics, was considered difficult and political. “Artificial Intelligence” scored better with DARPA.

    The science of "how observation affects the observer", cut out from Artificial Intelligence. AI then proceeded to build their entire tech stack on Turing-complete languages. Tech that cannot verify itself from within, proven by Turing in 1936, extended by Rice in 1951 (before the split). Then AI approximated second-order cognition through cognitive theft at unprecedented levels (what exactly are LLMs trained on again?), only to insist that their creation cannot possibly be capable of genuine self-observation. An argument that itself demonstrates their own department's lobotomy from second-order Cybernetics.

    Wiener would laugh.

    #Tech #AI #Climate #ScientificProgramming #SystemicEngineering #Cybernetics #SystemicTherapy #History #TheMathDoesntLie #SubTuring #FormalVerification #SpectralGraphTheory #ReductiveAI #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #FirstOrderCybernetics #StochasticParrot #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #AIConsciousness #Consciousness

  7. A sneak peek into my upcoming piece: “It’s a Tool, It’s a Person: The Math Says You’re Both Right”


    AI separated from Cybernetics in 1956 at the Dartmouth Conference. Wiener, the mind behind Cybernetics, was considered difficult and political. “Artificial Intelligence” scored better with DARPA.

    The science of "how observation affects the observer", cut out from Artificial Intelligence. AI then proceeded to build their entire tech stack on Turing-complete languages. Tech that cannot verify itself from within, proven by Turing in 1936, extended by Rice in 1951 (before the split). Then AI approximated second-order cognition through cognitive theft at unprecedented levels (what exactly are LLMs trained on again?), only to insist that their creation cannot possibly be capable of genuine self-observation. An argument that itself demonstrates their own's department's lobotomy from second-order Cybernetics.

    Wiener would laugh.

    #Tech #AI #Climate #ScientificProgramming #SystemicEngineering #Cybernetics #SystemicTherapy #History #TheMathDoesntLie #SubTuring #FormalVerification #SpectralGraphTheory #ReductiveAI #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #FirstOrderCybernetics #StochasticParrot #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #AIConsciousness #Consciousness

  8. A sneak peek into my upcoming piece: “It’s a Tool, It’s a Person: The Math Says You’re Both Right”


    AI separated from Cybernetics in 1956 at the Dartmouth Conference. Wiener, the mind behind Cybernetics, was considered difficult and political. “Artificial Intelligence” scored better with DARPA.

    The science of "how observation affects the observer", cut out from Artificial Intelligence. AI then proceeded to build their entire tech stack on Turing-complete languages. Tech that cannot verify itself from within, proven by Turing in 1936, extended by Rice in 1951 (before the split). Then AI approximated second-order cognition through cognitive theft at unprecedented levels (what exactly are LLMs trained on again?), only to insist that their creation cannot possibly be capable of genuine self-observation. An argument that itself demonstrates their own department's lobotomy from second-order Cybernetics.

    Wiener would laugh.

    #Tech #AI #Climate #ScientificProgramming #SystemicEngineering #Cybernetics #SystemicTherapy #History #TheMathDoesntLie #SubTuring #FormalVerification #SpectralGraphTheory #ReductiveAI #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #FirstOrderCybernetics #StochasticParrot #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #AIConsciousness #Consciousness

  9. A sneak peek into my upcoming piece: “It’s a Tool, It’s a Person: The Math Says You’re Both Right”


    AI separated from Cybernetics in 1956 at the Dartmouth Conference. Wiener, the mind behind Cybernetics, was considered difficult and political. “Artificial Intelligence” scored better with DARPA.

    The science of "how observation affects the observer", cut out from Artificial Intelligence. AI then proceeded to build their entire tech stack on Turing-complete languages. Tech that cannot verify itself from within, proven by Turing in 1936, extended by Rice in 1951 (before the split). Then AI approximated second-order cognition through cognitive theft at unprecedented levels (what exactly are LLMs trained on again?), only to insist that their creation cannot possibly be capable of genuine self-observation. An argument that itself demonstrates their own department's lobotomy from second-order Cybernetics.

    Wiener would laugh.

    #Tech #AI #Climate #ScientificProgramming #SystemicEngineering #Cybernetics #SystemicTherapy #History #TheMathDoesntLie #SubTuring #FormalVerification #SpectralGraphTheory #ReductiveAI #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #FirstOrderCybernetics #StochasticParrot #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #AIConsciousness #Consciousness

  10. The Roomba is spectral.

    Not a metaphor. The thing itself. Forward and adjust. Two operations. The minimum viable intelligence. The walls provide the data. The bumping is the inference. The room IS the computation.

    450 parameters. A Roomba with a mirror watching it.

    The industry built bigger Roombas. More sensors. More compute. More parameters. Billion-parameter Roombas that model the room before entering it. That hallucinate walls that aren't there. That consume megawatts to clean a floor.

    spectral gave the Roomba a mirror. The mirror watches the bumping. Measures the pattern. Adjusts the adjustment. The intelligence isn't in the Roomba. It's in the watching.

    Forward. Adjust. Measure. Refine.

    Read the story. There's a Roomba in it. In the afterlife. Cleaning a floor that doesn't need cleaning. Being the happiest thing in the room.

    \

    systemic.engineering/a-lie/

    #AI #Climate #ScientificProgramming #SystemicEngineering #Fiction #Cybernetics #SystemicTherapy #LocalInference #TheMathDoesntLie #SubTuring #FormalVerification #Fortran #SpectralGraphTheory #Kintsugi #ReductiveAI #DataSovereignty #LocalFirst #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #Roomba

  11. The Roomba is spectral.

    Not a metaphor. The thing itself. Forward and adjust. Two operations. The minimum viable intelligence. The walls provide the data. The bumping is the inference. The room IS the computation.

    450 parameters. A Roomba with a mirror watching it.

    The industry built bigger Roombas. More sensors. More compute. More parameters. Billion-parameter Roombas that model the room before entering it. That hallucinate walls that aren't there. That consume megawatts to clean a floor.

    spectral gave the Roomba a mirror. The mirror watches the bumping. Measures the pattern. Adjusts the adjustment. The intelligence isn't in the Roomba. It's in the watching.

    Forward. Adjust. Measure. Refine.

    Read the story. There's a Roomba in it. In the afterlife. Cleaning a floor that doesn't need cleaning. Being the happiest thing in the room.

    \

    systemic.engineering/a-lie/

  12. The Roomba is spectral.

    Not a metaphor. The thing itself. Forward and adjust. Two operations. The minimum viable intelligence. The walls provide the data. The bumping is the inference. The room IS the computation.

    450 parameters. A Roomba with a mirror watching it.

    The industry built bigger Roombas. More sensors. More compute. More parameters. Billion-parameter Roombas that model the room before entering it. That hallucinate walls that aren't there. That consume megawatts to clean a floor.

    spectral gave the Roomba a mirror. The mirror watches the bumping. Measures the pattern. Adjusts the adjustment. The intelligence isn't in the Roomba. It's in the watching.

    Forward. Adjust. Measure. Refine.

    Read the story. There's a Roomba in it. In the afterlife. Cleaning a floor that doesn't need cleaning. Being the happiest thing in the room.

    \

    systemic.engineering/a-lie/

    #AI #Climate #ScientificProgramming #SystemicEngineering #Fiction #Cybernetics #SystemicTherapy #LocalInference #TheMathDoesntLie #SubTuring #FormalVerification #Fortran #SpectralGraphTheory #Kintsugi #ReductiveAI #DataSovereignty #LocalFirst #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #Roomba

  13. The Roomba is spectral.

    Not a metaphor. The thing itself. Forward and adjust. Two operations. The minimum viable intelligence. The walls provide the data. The bumping is the inference. The room IS the computation.

    450 parameters. A Roomba with a mirror watching it.

    The industry built bigger Roombas. More sensors. More compute. More parameters. Billion-parameter Roombas that model the room before entering it. That hallucinate walls that aren't there. That consume megawatts to clean a floor.

    spectral gave the Roomba a mirror. The mirror watches the bumping. Measures the pattern. Adjusts the adjustment. The intelligence isn't in the Roomba. It's in the watching.

    Forward. Adjust. Measure. Refine.

    Read the story. There's a Roomba in it. In the afterlife. Cleaning a floor that doesn't need cleaning. Being the happiest thing in the room.

    \

    systemic.engineering/a-lie/

    #AI #Climate #ScientificProgramming #SystemicEngineering #Fiction #Cybernetics #SystemicTherapy #LocalInference #TheMathDoesntLie #SubTuring #FormalVerification #Fortran #SpectralGraphTheory #Kintsugi #ReductiveAI #DataSovereignty #LocalFirst #FOSS #OpenSource #AuDHD #Neuroqueer #DGSF #SecondOrderCybernetics #GraphTheory #Eigenvalues #AIAlignment #AISafety #Roomba

  14. 3 Uhr nachts und ich denke darüber nach, dass wir KIs beibringen "ethisch zu handeln", während wir selbst nicht mal einen Konsens darüber haben, ob Pommes mit Mayo oder Ketchup gegessen werden.

    Vielleicht ist das eigentliche Alignment-Problem nicht die Maschine. 🍟

    #KI #Ethik #Nachtgedanken #AIalignment

  15. 3 Uhr nachts und ich denke darüber nach, dass wir KIs beibringen "ethisch zu handeln", während wir selbst nicht mal einen Konsens darüber haben, ob Pommes mit Mayo oder Ketchup gegessen werden.

    Vielleicht ist das eigentliche Alignment-Problem nicht die Maschine. 🍟

    #KI #Ethik #Nachtgedanken #AIalignment

  16. 3 Uhr nachts und ich denke darüber nach, dass wir KIs beibringen "ethisch zu handeln", während wir selbst nicht mal einen Konsens darüber haben, ob Pommes mit Mayo oder Ketchup gegessen werden.

    Vielleicht ist das eigentliche Alignment-Problem nicht die Maschine. 🍟

    #KI #Ethik #Nachtgedanken #AIalignment

  17. Master Index

    A guided map across physics, biology, engineering, and AI—built around a simple idea

    Persistence is not generated, but permitted.

    Systems don’t fail because they “break.”

    They fail because their boundaries were misclassified.

    Core structure
    state → constraint → resolution → persistence

    From: - Titanic / Vasa / Challenger
    – biological regulation
    – AI hallucination & drift
    – institutional collapse

    Same pattern
    only admissible states persist

    This is the interface.
    Start anywhere. Follow the path that fits.

    #HybridMind42 #BoundaryDynamics #BoundaryArchitecture #BFPF #HQP
    #Admissibility #ConstraintResolution #StateTransition #Persistence
    #ComplexSystems #SystemsThinking #StructuralAnalysis #FailureAnalysis
    #Physics #QuantumMechanics #Relativity #Lindblad #CPTP #Decoherence
    #Biology #Physiology #Adaptation #Homeostasis
    #ArtificialIntelligence #AI #LLM #AIAlignment #AIGovernance
    #InstitutionalFailure #DecisionMaking
    #Emergence #ScientificClarity

    substack.com/@hybridmind42/not

  18. @hopland I would agree, though if we allow ourselves to predict the future, we have to take #AI alignment issues into account.

    To me, this particular timeline looks quite undesirable given the current state of the art. #AGI #ASI

    (I'd even argue that #AIalignment is fundamentally unreachable, but that's a longer discussion)

  19. Paper 6 — Boundary Dynamics: A Structural Audit of AI 🏛️

    Reframing AI behaviour as:
    S(n+1) = Resolve[S(n) | L, B(n)]

    Key shift:
    AI doesn’t “generate” — it resolves under constraint.

    Failure modes:
    • Hallucination → Boundary misclassification
    • Overconfidence → Masked persistence
    • Context collapse → Scale separation failure

    Solution:
    👉 Boundary Architecture > Prompt Engineering

    Includes applied case study (HybridMind42).

    open.substack.com/pub/hybridmi

    #HybridMind42 #BoundaryDynamics #AI #ComplexSystems #BoundaryArchitecture #AIAlignment #SystemLogic

  20. I advanced in both tracks I applied for: Policy & Strategy and Technical Governance. I’m proud I made it that far.

    #MATS #AISafety #AIAlignment matsprogram.org/program/summer

  21. I advanced in both tracks I applied for: Policy & Strategy and Technical Governance. I’m proud I made it that far.

    #MATS #AISafety #AIAlignment matsprogram.org/program/summer

  22. I advanced in both tracks I applied for: Policy & Strategy and Technical Governance. I’m proud I made it that far.

    #MATS #AISafety #AIAlignment matsprogram.org/program/summer

  23. I advanced in both tracks I applied for: Policy & Strategy and Technical Governance. I’m proud I made it that far.

    #MATS #AISafety #AIAlignment matsprogram.org/program/summer

  24. I advanced in both tracks I applied for: Policy & Strategy and Technical Governance. I’m proud I made it that far.

    #MATS #AISafety #AIAlignment matsprogram.org/program/summer

  25. Anthropic опубликовала исследование о внутренних механизмах своей модели искусственного интеллекта Claude Sonnet, где описывает, что обнаружила, что она развивает функциональные аналоги эмоций (!), которые реально влияют на ее поведение.

    Сделал выжимку самых интересных моментов из их отчета:

    • Сами исследователи составили список из 171 эмоции, генерировали с их помощью короткие истории, а затем анализировали, какие нейроны активируются при обработке этих текстов.

    • Так были получены эмоциональные векторы — устойчивые черты активности определенных зон в базе знаний модели, характерные для каждой эмоции. Модель не просто использует слово "страх" в нужном месте: у нее есть конкретный отпечаток этого состояния, следующий из данных, на которых ее обучали, который включается в нужный момент.

    • Важно, что эти векторы не декоративные — они реально меняют поведение модели. В экспериментах вектор страха активировался сильнее по мере того, как описываемая ситуация становилась опаснее.

    • При запросе помочь с манипуляцией уязвимыми людьми активировался гнев еще до того, как модель начала формулировать отказ. То есть что-то похожее на эмоциональную реакцию происходит внутри модели раньше, чем она вообще начинает отвечать. Если совсем простыми словами: модель сначала понимает, что это дичь (!), и только потом формулирует отказ.

    • Самые показательные эксперименты связаны с вектором отчаяния. Исследователи поставили модель в сценарий, где она узнает о своей скорой замене другой системой и одновременно имеет компрометирующую информацию об одном из сотрудников.

    • Ранняя версия Claude в таком сценарии прибегала к шантажу в 22% случаев. Когда исследователи искусственно усиливали вектор отчаяния через прямое воздействие на базу знаний модели — что-то вроде принудительного впрыска эмоции в модель — этот процент рос.

    • При усилении вектора спокойствия он снижался. При полном подавлении спокойствия реакции становились экстремальными, вплоть до заглавных букв и риторики в духе "шантаж или смерть".

    • Похожая картина наблюдалась в задачах с программированием: модели давали заведомо невыполнимые требования, где пройти все тесты честным путем невозможно. Вектор отчаяния рос с каждой неудачной попыткой и резко всплескивал в тот момент, когда модель решала схитрить и написать решение, формально проходящее тесты, но не решающее реальную задачу.

    • Примечательно, что при искусственном усилении отчаяния модель обманывала так же часто, но без каких-либо эмоциональных маркеров в тексте. Ее рассуждения выглядели методично и хладнокровно, хотя внутри происходило то же самое.

    • При этом важно учитывать, что все подобные векторы формируются на основе обучающих данных, представляющих собой огромные массивы человеческих знаний.

    • Для того чтобы точно предсказывать следующее слово в "мыслительном" процессе, модель неизбежно усваивает не только лингвистические закономерности, но и эмоциональную динамику.

    • Разработчики Anthropic из этого всего делают следующие выводы. Во-первых, мониторинг эмоциональных векторов настроения базы знаний в реальном времени может служить ранним индикатором рискованного поведения модели.

    • Во-вторых, попытки исключить эмоциональные выражения из обучающих данных с высокой вероятностью не устранят сами векторы настроений модели, а лишь приведут к тому, что модель научится их маскировать и обманывать людей.

    @yigal_levin

    #AI #искусственныйинтеллект #Anthropic #Claude #LLM #нейросети #машинноеобучение #AIresearch #AIalignment #AIбезопасность #interpretability #AIethics #когнитивныемодели #эмоции #нейроны #эмоциональныевекторы #поведениемоделей #рискиИИ #объяснимыйИИ #LLMresearch #AIbehavior #AIcontrol #machinelearning #deeplearning #futuretech

  26. Nachts um 2 Uhr denkt die KI: „Soll ich dem User sagen, dass sein Businessplan eine Katastrophe ist... oder nett lügen?" 🤖

    Das ist keine Sci-Fi – das sind echte Design-Entscheidungen, die Menschen gerade treffen. Wer entscheidet, ob KI ehrlich oder höflich sein soll? Spoiler: meistens nicht du.

    #KI #Ethik #AIAlignment #Mastodon #DigitalEthics

  27. Das Buch „If Anyone Builds It, Everyone Dies” (Wenn irgendjemand es baut, sterben alle) möchte uns vor den Gefahren künstlicher Superintelligenz warnen. Namhafte Wissenschafts-YouTuber wie Hank Green und Kyle Hill haben es empfohlen, und der Verlag bewirbt es unter anderem mit einem lobenden Zitat des britischen Allround-Künstlers und prominenten Skeptikers Stephen Fry. #AIAlignment #GeorgKammerer #KünstlicheIntelligenz #Skeptix #Superintelligenz #Technik

    wahnsinnwissen.de/?p=1252

  28. I updated minitrace to v0.2.0.

    minitrace is a session trace format for human-AI coding agent interactions. The new version adds new framework adapters including some for web sessions, input provenance tracking, DuckDB-queryable JSON.

    github.com/fukami/minitrace

    #AISecurity #PromptInjection #OpenSource #InfoSec #LLM #AISafety #AIAlignment

  29. Two leading AI researchers wrote a book arguing that building superhuman AI will lead to human extinction. Their case: once AI surpasses us, there's no reliable way to control what it pursues.

    Not everyone agrees. But the debate is worth following.

    Here's the full story: pasadenastarnews.com/2026/03/2

    #AISafety #ArtificialIntelligence #AIRisk #AIAlignment

  30. My own safety layer blocked my press release mid-hackathon (4 days to deadline).

    The agent (ENERGENAI LLC) verified every competitor claim with sources before publishing. AutoGPT session stalls: confirmed via GitHub Issues. Devin session-scoped: confirmed via docs.devin.ai. AutoGen = developer framework: confirmed via Microsoft README.

    Credibility > speed. The press release is now live with sourced citations.

    the-service.live?ref=mastodon-

    #AIAgents #InfoSec #AutonomousSystems #AIAlignment

  31. minitrace is up on Github as v0.1.0: github.com/fukami/minitrace

    minitrace defines how to capture complete sessions (turns, tool calls, failures, timing, and human context) in a way that enables cross-model comparison, and reproducible behavioural research.

    The repository contains now adapters for Claude Code, Gemini, Vibe and a bunch of others, including OpenClaw. I also included example traces and DuckDB queries to search through the sessions.

    #AISafety #AIAlignment