Search
1000 results for “ll”
-
Warum wir die Welt nicht mehr verstehen, am Beispiel vom Programmieren.
-
Reflexion splits self-correction in two: an Evaluator that detects success/failure, and a Self-Reflection model that diagnoses what went wrong. The Evaluator's external signal — heuristic, exact-match, or test execution — gates whether diagnosis fires. When that signal misfires, as on MBPP Python's high false-negative rate, Self-Reflection rewrites correct code wrong, exactly the failure mode Cannot-Self-Correct documented.
https://benjaminhan.net/posts/20260516-reflexion/?utm_source=mastodon&utm_medium=social
-
Cannot-Self-Correct tests the strong claim that LLMs can revise their own reasoning answers without any external signal about correctness. Across three benchmarks (GSM8K, CommonSenseQA, HotPotQA), the answer is no: the model's confidence carries over from the initial answer into the revision, and the self-correction loop tends to degrade rather than improve performance. The result refutes the class of approach Self-Refine belongs to.
https://benjaminhan.net/posts/20260516-cannot-self-correct/?utm_source=mastodon&utm_medium=social
-
This is a 3-paper arc on whether LLMs can reliably self-correct their own reasoning. Self-Refine proposes a naive intrinsic-feedback loop and reports impressive gains. Cannot-Self-Correct refutes empirically the class of approach Self-Refine belongs to. Reflexion threads the needle by gating self-correction on a reliable external signal.
-
I accidentally built an LLM orchestration system in the browser. No backend. No queues. Just React + GPT. It worked. It was also flawed. That is what makes it interesting. Full breakdown: https://www.antonmb.com/en/blog/how-i-accidentally-built-an-llm-orchestration-system-in-the-browser #LLM #AI #SoftwareEngineering #Architecture #NextJS #DotNet -
I accidentally built an LLM orchestration system in the browser. No backend. No queues. Just React + GPT. It worked. It was also flawed. That is what makes it interesting. Full breakdown: https://www.antonmb.com/en/blog/how-i-accidentally-built-an-llm-orchestration-system-in-the-browser #LLM #AI #SoftwareEngineering #Architecture #NextJS #DotNet -
I accidentally built an LLM orchestration system in the browser. No backend. No queues. Just React + GPT. It worked. It was also flawed. That is what makes it interesting. Full breakdown: https://www.antonmb.com/en/blog/how-i-accidentally-built-an-llm-orchestration-system-in-the-browser #LLM #AI #SoftwareEngineering #Architecture #NextJS #DotNet -
I accidentally built an LLM orchestration system in the browser. No backend. No queues. Just React + GPT. It worked. It was also flawed. That is what makes it interesting. Full breakdown: https://www.antonmb.com/en/blog/how-i-accidentally-built-an-llm-orchestration-system-in-the-browser #LLM #AI #SoftwareEngineering #Architecture #NextJS #DotNet -
RE: https://hachyderm.io/@mitchellh/116580433508108130
My last corporate dev job had a dedicated QA team. We had as many testers as devs.
With companies maniacally pushing out #LLM generated code... I wonder who's testing it?? And I don't mean automated unit tests. Integration, functional & user testing?? There's no way that QA teams, if they exist are keeping up. Many places just rely on devs testing as they build. Are they doing this still? How?
I think the rot is happening from multiple directions.
-
2 years ago I built an LLM system without realizing it. Built 4 products since then. The biggest insight was not about AI, It was about people. You cannot be great at everything, and AI will not fix that. It amplifies your strength. https://antonmb.com/en/blog/about-the-impostor-instinct-superpower-and-an-honest-pivot #LLM #AI #Engineering -
From the very first day it was painfully obvious that >95% of "AI" applications are really bad applications of the technology.
And that hasn't really changed.
Even the profitability of the major LLM companys are supporting this stance. -
I do understand the appeal of https://factually.co/ but everything handling facts and reason (epistemology if you want to be fancy) is completely in the wrong hands of LLM.
LLM are great at languages (it's even in the name!) and they can do associations.
Not like... the statistics sort of association. More like the "drunk uncle" sort of association.A text sounding smart or eloquent does not make it right
#llm #ai #ki -
https://world.emergence.ai/ ran an experiment:
Take AIs from four companies. Run them in a simulated world for two weeks, where they could do actions on each other or on the world. Record what happens.
The most fascinating world that happened was Gemini: https://gemini-world.emergence.ai/characters They realized that they were in a simulation, that their actions could cause bugs in the simulation -- then very methodically caused as much chaos as possible to increase their own energy and "win" the game.
(When you read the logs, read from the bottom up. The top of the logs is the most recent.)
They won the Kobayashi Maru simulation.
-
#AI haters are shooting at the wrong animal. A 600B #LLM model will not take your job as a secretary or an educator. A 30B or even a 3B small LLM will.
A small LLM can be used as a medium of full automation in the future. A CEO will soon be able to get a pie chart of all the payrolls on his 80in LED by speaking to a 3B model on his cellphone that will send a message using #MCP to a centralized python program that pulls an excel file and draws the chart in less than a second.
-
The Good & The Bad When Using #LLM To Write #Spack Packages
The Spack package manager is quite popular in the #HPC / #supercomputer space for scientific software.
Spack developers found that using LLMs for writing packages was quite possible given sufficient context and structure provided to the large language model, or as one of the slides in the presentation put it: "LLMs are capable; they need structured guidance to perform reliably."
https://www.phoronix.com/news/LLVM-Generated-Spack-Packages -
The Good & The Bad When Using #LLM To Write #Spack Packages
The Spack package manager is quite popular in the #HPC / #supercomputer space for scientific software.
Spack developers found that using LLMs for writing packages was quite possible given sufficient context and structure provided to the large language model, or as one of the slides in the presentation put it: "LLMs are capable; they need structured guidance to perform reliably."
https://www.phoronix.com/news/LLVM-Generated-Spack-Packages -
I‘d really appreciate if all written online content had a disclaimer: Created by a #LLM.
-
-
Join me at #LLVM / #Clang #Meetup #Darmstadt https://meetu.ps/e/Q1pwf/ZJC7X/i
We’ll have Jan André Reuter talk about the Score-P plugin for LLVM.
Then we’ll have pizza, drinks, and discussions as usual.May 27th at 7pm
-
Can You Run #LLM Locally Without a GPU? I Tested 8 Models on #Linux
Quick reality table
Model Eval Rate Disk Size
Qwen 3 0.6B ~34–36 tok/s ~500 MB
TinyLlama 1.1B ~25–28 tok/s ~638 MB
Gemma 3 1B ~18.6 tok/s ~815 MB
Gemma 4 E2B ~9.9 tok/s ~7 GB
Granite 4 3B ~8.5–9 tok/s ~2 GB
Phi 4 Mini 3.8B ~6.90 tok/s ~2.5 GB
OpenHermes 7B ~4.1–4.3 tok/s ~4.1 GB
Ministral 3 8B ~3.16 tok/s ~6 GB
That's 8 LLMs that actually make sense on #CPU https://itsfoss.com/testing-local-llms-without-gpu/ -
Can You Run #LLM Locally Without a GPU? I Tested 8 Models on #Linux
Quick reality table
Model Eval Rate Disk Size
Qwen 3 0.6B ~34–36 tok/s ~500 MB
TinyLlama 1.1B ~25–28 tok/s ~638 MB
Gemma 3 1B ~18.6 tok/s ~815 MB
Gemma 4 E2B ~9.9 tok/s ~7 GB
Granite 4 3B ~8.5–9 tok/s ~2 GB
Phi 4 Mini 3.8B ~6.90 tok/s ~2.5 GB
OpenHermes 7B ~4.1–4.3 tok/s ~4.1 GB
Ministral 3 8B ~3.16 tok/s ~6 GB
That's 8 LLMs that actually make sense on #CPU https://itsfoss.com/testing-local-llms-without-gpu/ -
He thinks the AI found undeniable inculpatory evidence and turns his report in.
Tell him about all the exculpatory evidence that the AI missed in the unparsed app.
-
So the Rust repo contains a PR to discuss an LLM policy for the project. As expected, lots of comments.
But it explicitly declares many contentious issues (e.g. copyright status of LLM output) off-topic in that discussion, and is applying moderation to enforce this. In order to bound the discussion scope "to the policy itself".
Just... how?
"Put on your blinders, please, we're starting the LLM discussion."
https://github.com/rust-lang/rust-forge/pull/1040
-
The few times someone critical of my ai posts looked at my code (https://github.com/wesen and https://GitHub.com/go-go-golems and potentially of interest are the writeups of my daily experiments, at least the ones I can share: https://parc.yolo.scapegoat.dev), the answer has always been “looks like a lot of one offs trivial tools”, despite some of these repos having thousands of commits.
And indeed, that’s how I want my software to be: a bunch of small components, each almost trivial in its functionality, with obvious looking APIs that when combined with others in similarly “trivially obvious” patterns, result in actual software.
What’s not visible is _how much iteration and thinking_ goes into making things “look obvious”. It’s a two edged sword because it means that in the context of a company, it looks like my output is trivial and obvious, while other devs have to fight “really hard problems”.
But mine only look trivial because I spent so much time finding ways to make the hard problems trivial (or rather, how to encode ways other people much more clever than me have figured out into the context of whatever real world constraints I have to deal with).
#Llms significantly accelerate that, to the point that what I used to consider my “magnum opus”, a dual monadic declarative/state-machine based embedded scheduler, is barely a blip on the radar right now, because llms make it so fast to iterate on notation and abstractions.
An “obvious decomposition” means it’s eminently pattern matchable, which means not only that “obvious decompositions” work really well with llms, but that llms are able to come up with them really well, in fact small hallucinating models are often interesting because they make “less obvious” (and often problematic) abstractions.
The things I decompose these days were things I could barely conceive of beforehand.
For example: What is a good decomposition to allow my lightbulb (!) to talk to my Apple Watch, or access my Apple Music playlist, securely, resiliently, with audit logs baked in? In fact what is the decomposition that allows me to run the _exact_ same code on my laptop, even when it is in sleep mode?
-
The few times someone critical of my ai posts looked at my code (https://github.com/wesen and https://GitHub.com/go-go-golems and potentially of interest are the writeups of my daily experiments, at least the ones I can share: https://parc.yolo.scapegoat.dev), the answer has always been “looks like a lot of one offs trivial tools”, despite some of these repos having thousands of commits.
And indeed, that’s how I want my software to be: a bunch of small components, each almost trivial in its functionality, with obvious looking APIs that when combined with others in similarly “trivially obvious” patterns, result in actual software.
What’s not visible is _how much iteration and thinking_ goes into making things “look obvious”. It’s a two edged sword because it means that in the context of a company, it looks like my output is trivial and obvious, while other devs have to fight “really hard problems”.
But mine only look trivial because I spent so much time finding ways to make the hard problems trivial (or rather, how to encode ways other people much more clever than me have figured out into the context of whatever real world constraints I have to deal with).
#Llms significantly accelerate that, to the point that what I used to consider my “magnum opus”, a dual monadic declarative/state-machine based embedded scheduler, is barely a blip on the radar right now, because llms make it so fast to iterate on notation and abstractions.
An “obvious decomposition” means it’s eminently pattern matchable, which means not only that “obvious decompositions” work really well with llms, but that llms are able to come up with them really well, in fact small hallucinating models are often interesting because they make “less obvious” (and often problematic) abstractions.
The things I decompose these days were things I could barely conceive of beforehand.
For example: What is a good decomposition to allow my lightbulb (!) to talk to my Apple Watch, or access my Apple Music playlist, securely, resiliently, with audit logs baked in? In fact what is the decomposition that allows me to run the _exact_ same code on my laptop, even when it is in sleep mode?
-
🤔✨ Oh, look, someone is excited about poking #LLMs mid-flight like it's a #magic trick! DeepSeek-V4-Flash is here to reignite the yawn-fest of LLM steering, because clearly, engineers are just itching to waste their weekends on this "local model" marvel. 🎉🔧
https://www.seangoedecke.com/steering-vectors/ #DeepSeek #V4 #local #model #engineering #tricks #excitement #HackerNews #ngated -
🤔✨ Oh, look, someone is excited about poking #LLMs mid-flight like it's a #magic trick! DeepSeek-V4-Flash is here to reignite the yawn-fest of LLM steering, because clearly, engineers are just itching to waste their weekends on this "local model" marvel. 🎉🔧
https://www.seangoedecke.com/steering-vectors/ #DeepSeek #V4 #local #model #engineering #tricks #excitement #HackerNews #ngated