#largelanguagemodelsllm_ — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #largelanguagemodelsllm_, aggregated by home.social.
-
Tom’s Hardware: PewDiePie goes all-in on self-hosting AI using modded GPUs, with plans to build his own model soon — YouTuber pits multiple chatbots against each other to find the best answers. “Running open-source models from Baidu and OpenAI, PewDiePie made a ‘council’ of bots that voted on the best responses, and then built “The Swarm” for data collection that will become the foundation of […]
-
Ars Technica: Are you the asshole? Of course not!—quantifying LLMs’ sycophancy problem. “Researchers and users of LLMs have long been aware that AI models have a troubling tendency to tell people what they want to hear, even if that means being less accurate. But many reports of this phenomenon amount to mere anecdotes that don’t provide much visibility into how common this sycophantic […]
-
Ars Technica: AI models can acquire backdoors from surprisingly few malicious documents. “The research involved training AI language models ranging from 600 million to 13 billion parameters on datasets scaled appropriately for their size. Despite larger models processing over 20 times more total training data, all models learned the same backdoor behavior after encountering roughly the same […]
-
TechCrunch: OpenAI ramps up developer push with more powerful models in its API . “OpenAI unveiled new API updates at its Dev Day on Monday, introducing GPT-5 Pro, its latest language model, its new video generation model Sora 2, and a smaller, cheaper voice model.”
-
Tom’s Hardware: Famed gamer creates working 5 million parameter ChatGPT AI model in Minecraft, made with 439 million blocks — AI trained to hold conversations, working model runs inference in the game. “In an amazing feat of Minecraft Redstone engineering, Sammyuri — famed for building a 1Hz CPU inside the game — has built a small language model that runs on a computer inside Minecraft, […]
-
Reuters: China’s Huawei Co-Develops DeepSeek Model, Improves Censoring. “Chinese tech giant Huawei has co-developed a safety-focused version of artificial intelligence model DeepSeek that it said is ‘nearly 100% successful’ in preventing discussion of politically sensitive topics.”
-
ZDNet: AI models know when they’re being tested – and change their behavior, research shows. “Scheming refers to several types of dishonest behavior, including when a model lies, sandbags (strategically underperforms on an evaluation to hide its true abilities), or fakes alignment (when an AI model pretends to follow orders that don’t align with its training in order to avoid being further […]
-
TechCrunch: Thinking Machines Lab wants to make AI models more consistent. “There’s been great interest in what Mira Murati’s Thinking Machines Lab is building with its $2 billion in seed funding and the all-star team of former OpenAI researchers who have joined the lab. In a blog post published on Wednesday, Murati’s research lab gave the world its first look into one of its projects: […]
-
The Register: ChatGPT hates LA Chargers fans . “The reason, according to researchers affiliated with Harvard University, is that the model’s guardrails incorporate biases that shape its responses based on contextual information about the user.”
https://rbfirehose.com/2025/08/29/the-register-chatgpt-hates-la-chargers-fans/
-
Engadget: You can now download and tweak Grok 2.5 for yourself as it goes open source. “Unhinged as Grok may be, it’s now open source. xAI’s CEO, Elon Musk, posted on X that the company made the older Grok 2.5 model available to the public and will do the same with the upcoming Grok 3.”
-
Localghost: This website is for humans . “I write the content on this website for people, not robots. I’m sharing my opinions and experiences so that you might identify with them and learn from them. I’m writing about things I care about because I like sharing and I like teaching. I spend hours writing these posts and AI spends seconds summarising them. I’d much rather people read the whole […]
https://rbfirehose.com/2025/08/25/localghost-this-website-is-for-humans/
-
Ars Technica: Google releases pint-size Gemma open AI model. “Google has announced a tiny version of its Gemma open model designed to run on local devices. Google says the new Gemma 3 270M can be tuned in a snap and maintains robust performance despite its small footprint.”
https://rbfirehose.com/2025/08/19/ars-technica-google-releases-pint-size-gemma-open-ai-model/
-
Ars Technica: Is GPT-5 really worse than GPT-4o? Ars puts them to the test.. “To see just how much the new model changed things, we decided to put both GPT-5 and GPT-4o through our own gauntlet of test prompts. While we reused some of the standard prompts to compare ChatGPT to Google Gemini and Deepseek, for instance, we’ve also replaced some of the more outdated test prompts with new, more […]
-
The Register: AI model ‘personalities’ shape the quality of generated code. “Code quality biz Sonar argues that it’s necessary for software developers who use large language models (LLMs) for assistance to understand how these
‘personalities’ shape AI-generated code and affect code security, reliability, and maintainability.”
-
Gary Marcus: GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it.. “Typical was a comment from Andres Franco, on X ‘GPT 5 has been a huge letdown, way more than I expected’. Another reader, previously an OpenAI fan, told me ‘o3 was a shit good model, [whereas GPT-5] was an utter disappointment, especially given the kind of hype towards its release.’ An NBA President […]
-
Gary Marcus: GPT-5: Overdue, overhyped and underwhelming. And that’s not the worst of it.. “Typical was a comment from Andres Franco, on X ‘GPT 5 has been a huge letdown, way more than I expected’. Another reader, previously an OpenAI fan, told me ‘o3 was a shit good model, [whereas GPT-5] was an utter disappointment, especially given the kind of hype towards its release.’ An NBA President […]
-
The Conversation: ‘Are you joking, mate?’ AI doesn’t get sarcasm in non-American varieties of English. “Large language models are often reported to achieve superlative performance on several standardised sets of tasks known as benchmarks. The majority of benchmark tests are written in Standard American English. This implies that, while large language models are being aggressively sold by […]
-
Mashable: Fans held a funeral for Anthropic’s Claude 3 Sonnet AI. “Roughly 200 people attended a funeral for an AI model. That sentence is not nearly as surreal and dystopian as the event itself, according to a first-person account from Wired’s Kylie Robison.”
https://rbfirehose.com/2025/08/08/mashable-fans-held-a-funeral-for-anthropics-claude-3-sonnet-ai/
-
Anthropic: Claude Opus 4.1. “Today we’re releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning. We plan to release substantially larger improvements to our models in the coming weeks.”
https://rbfirehose.com/2025/08/07/anthropic-claude-opus-4-1/
-
Baltimore Banner: Johns Hopkins University Press will license its authors’ books to train AI models. “The Johns Hopkins University will license its books to train proprietary large language models, an advanced form of artificial intelligence that generates human-like language, the school’s publishing division announced this week. Authors have until the end of August to opt out of the […]
-
The Register: Top AI models – even American ones – parrot Chinese propaganda, report finds. “Five popular AI models all show signs of bias toward viewpoints promoted by the Chinese Communist Party, and censor material it finds distasteful, according to a new report. Just one of the models originated in China. The American Security Project, a non-profit think tank with bipartisan roots and a […]
-
Google Blog: Weather Lab is an interactive website for sharing Google’s AI weather models.. “Today Google DeepMind and Google Research are launching a public preview of Weather Lab, an interactive website for sharing our AIweather models, and debuting our newest experimental AI-based cyclone predictions. While still experimental, our goal is to continue to work on this technology to help […]
-
Engadget: OpenAI adds the o3-pro model to ChatGPT today. “OpenAI is keeping up its rapid-fire pace of new AI releases. The company introduced the o3 and o4-mini models to its ChatGPT platform in April. At the time, the business promised that a pro model of the o3 was on the way, and that version became available today.”
https://rbfirehose.com/2025/06/14/engadget-openai-adds-the-o3-pro-model-to-chatgpt-today/
-
Gary Marcus: A knockout blow for LLMs?. “There’s actually an interesting weakness in the new argument—which I will get to below—but the overall force of the argument is undeniably powerful. So much so that LLM advocates are already partly conceding the blow while hinting at, or at least hoping for, happier futures ahead.”
https://rbfirehose.com/2025/06/10/gary-marcus-a-knockout-blow-for-llms/
-
Gary Marcus: A knockout blow for LLMs?. “There’s actually an interesting weakness in the new argument—which I will get to below—but the overall force of the argument is undeniably powerful. So much so that LLM advocates are already partly conceding the blow while hinting at, or at least hoping for, happier futures ahead.”
https://rbfirehose.com/2025/06/10/gary-marcus-a-knockout-blow-for-llms/
-
TechCrunch: EleutherAI releases massive AI training dataset of licensed and open domain text. “The dataset, called the Common Pile v0.1, took around two years to complete in collaboration with AI startups Poolside, Hugging Face, and others, along with several academic institutions. Weighing in at 8 terabytes in size, the Common Pile v0.1 was used to train two new AI models from EleutherAI, […]
-
TechCrunch: Anthropic unveils custom AI models for US national security customers. “The new models, a custom set of ‘Claude Gov’ models, were ‘built based on direct feedback from our government customers to address real-world operational needs,’ writes Anthropic in the blog post. Compared to Anthropic’s consumer- and enterprise-focused models, the new custom Claude Gov models were designed […]
-
The Verge: Anthropic’s Claude 4 AI models are better at coding and reasoning. “Anthropic has introduced Claude Opus 4 and Claude Sonnet 4, its latest generation of hybrid-reasoning AI models optimized for coding tasks and solving complex problems.”
-
PsyPost: AI chatbots often misrepresent scientific studies — and newer models may be worse. “Published in Royal Society Open Science, the study found that the most widely used language models frequently overgeneralize the results of scientific studies—sometimes making broader or more confident claims than the original research supports. This tendency was more common in newer models and, […]
-
PsyPost: AI chatbots often misrepresent scientific studies — and newer models may be worse. “Published in Royal Society Open Science, the study found that the most widely used language models frequently overgeneralize the results of scientific studies—sometimes making broader or more confident claims than the original research supports. This tendency was more common in newer models and, […]
-
TechXplore: Third-party data annotators often fail to accurately read the emotions of others, study finds. “Machine learning algorithms and large language models (LLMs), such as the model underpinning the functioning of the platform ChatGPT, have proved to be effective in tackling a wide range of tasks. These models are trained on various types of data (e.g., texts, images, videos, and/or […]
-
Cornell Chronicle: Developers, educators view AI harms differently, research finds. “Teachers are increasingly using educational tools that leverage large language models (LLMs) like ChatGPT for lesson planning, personalized tutoring and more in K-12 classrooms around the world. Cornell researchers have found the developers of such tools and the educators who use them have different ideas […]
-
TechCrunch: Google’s Gemma AI models surpass 150M downloads. “Google’s openly available Gemma collection of AI models has reached a milestone: over 150 million downloads. Omar Sanseviero, a developer relations engineer at Google DeepMind, announced the figure on X over the weekend, also revealing that developers have created more than 70,000 variants of Gemma on the AI dev platform […]
https://rbfirehose.com/2025/05/16/techcrunch-googles-gemma-ai-models-surpass-150m-downloads/
-
Hackaday: LLM Ported To The C64, Kinda. “[ytm] did the hard work of porting the Llama 2 model to the most popular computer ever made. Of course, as you might expect, the ancient 8-bit machine doesn’t really have the stones to run an LLM on its own. You will need one rather significant upgrade, in the form of 2 MB additional RAM via a C64 REU.”
https://rbfirehose.com/2025/05/04/hackaday-llm-ported-to-the-c64-kinda/
-
Carnegie Mellon University: Copilot Arena Helps Rank Real-World LLM Coding Abilities. “With so many AI coding assistants out there, it can be hard to keep track of ones that perform well on real-world tasks. To help analyze which leading or emerging code-writing large language models (LLMs) the developer community prefers, researchers at Carnegie Mellon University developed Copilot Arena, a […]
-
Florida International University: “Poisoned” AI models can unleash real-world chaos. Can these attacks be prevented?. “The majority of AI systems we encounter today — from ChatGPT to Netflix’s personalized recommendations — are only ‘intelligent’ enough to pull off such impressive feats because of the extensive amounts of text, imagery, speech and other data they are trained on. If […]
-
MIT News: Training LLMs to self-detoxify their language. “Over time, most of us develop an internal ‘guide’ that enables us to learn context behind conversation; it also frequently directs us away from sharing information and sentiments that are, or could be, harmful or inappropriate. As it turns out, large language models (LLMs) — which are trained on extensive, public datasets and […]
https://rbfirehose.com/2025/04/22/mit-news-training-llms-to-self-detoxify-their-language/
-
TechCrunch: Microsoft researchers say they’ve developed a hyper-efficient AI model that can run on CPUs. “Microsoft researchers claim they’ve developed the largest-scale 1-bit AI model, also known as a ‘bitnet,’ to date. Called BitNet b1.58 2B4T, it’s openly available under an MIT license and can run on CPUs, including Apple’s M2.”
-
TechCrunch: OpenAI launches Flex processing for cheaper, slower AI tasks. “In a bid to more aggressively compete with rival AI companies like Google, OpenAI is launching Flex processing, an API option that provides lower AI model usage prices in exchange for slower response times and ‘occasional resource unavailability.'”
-
Carnegie Mellon University: CMU Study Shows Large Language Models Have Distinctive Styles. “In a recent study, Carnegie Mellon University researchers found they could use characteristic word choices to determine which large language model (LLM) generated a particular bit of text with 97% accuracy.”
-
Carnegie Mellon University: CMU Study Shows Large Language Models Have Distinctive Styles. “In a recent study, Carnegie Mellon University researchers found they could use characteristic word choices to determine which large language model (LLM) generated a particular bit of text with 97% accuracy.”
-
Tom’s Hardware: Meta defends using pirated material, claims it’s legal if you don’t seed content. “Meta claimed in a court filing this week that despite torrenting an 82 TB dataset of pirated, copyrighted material from shadow libraries to train its LLaMA AI models, that employees ‘took precautions not to “seed” any downloaded files’. The act of Seeding in torrenting terminology refers to […]
-
The Register: It’s only a matter of time before LLMs jump start supply-chain attacks. ” Now that criminals have realized there’s no need to train their own LLMs for any nefarious purposes – it’s much cheaper and easier to steal credentials and then jailbreak existing ones – the threat of a large-scale supply chain attack using generative AI becomes more real.”
-
arXiv: Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad. “Using expert human annotators, we evaluated several state-of-the-art reasoning models on the six problems from the 2025 USAMO within hours of their release. Our results reveal that all tested models struggled significantly, achieving less than 5% on average. Through detailed analysis of reasoning traces, we identify the most […]
https://rbfirehose.com/2025/04/02/arxiv-proof-or-bluff-evaluating-llms-on-2025-usa-math-olympiad/
-
MIT Press: A note on LibGen and the unauthorized use of our authors’ work. “We want to be clear: The MIT Press has not licensed any of our books or journal articles for LLM training purposes, nor have we granted permission for any such use. However, we are well aware that many MIT Press publications have ended up in pirated training data sets. We share the deep distress of our authors whose […]
-
Search Engine Journal: OpenAI Secretly Funded Benchmarking Dataset Linked To o3 Model. “In addition to accessing the benchmarking dataset, OpenAI funded its creation, a fact that was withheld from the mathematicians who contributed to developing FrontierMath. Epoch AI belatedly disclosed OpenAI’s funding only in the final paper published on Arxiv.org, which announced the benchmark. Earlier […]
-
Hackaday: Running AI Locally Without Spending All Day On Setup. “Msty is a desktop application that lets you do several things. First, it can let you chat with an AI engine either locally or remotely. It knows about many popular options and can take your keys for paid services. For local options, it can download, install, and run the engines of your choice.”
https://rbfirehose.com/2025/01/26/hackaday-running-ai-locally-without-spending-all-day-on-setup/
-
ZDNet: ‘Humanity’s Last Exam’ benchmark is stumping top AI models – can you do any better?. “On Thursday, Scale AI and the Center for AI Safety (CAIS) released Humanity’s Last Exam (HLE), a new academic benchmark aiming to ‘test the limits of AI knowledge at the frontiers of human expertise,’ Scale AI said in a release. The test consists of 3,000 text and multi-modal questions on more than […]
-
MIT News: Study: Some language reward models exhibit political bias. “A new study conducted by researchers at MIT’s Center for Constructive Communication (CCC) provides support for the notion that reward models — models trained on human preference data that evaluate how well an LLM’s response aligns with human preferences — may also be biased, even when trained on statements known to be […]
https://rbfirehose.com/2024/12/11/study-some-language-reward-models-exhibit-political-bias-mit-news/
-
MIT News: AI tool generates high-quality images faster than state-of-the-art approaches. “Researchers fuse the best of two popular methods to create an image generator that uses less energy and can run locally on a laptop or smartphone.”