#rwkv — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #rwkv, aggregated by home.social.
-
AI ARCHITECTURES FOR LESS ENERGY CONSUMPTION(b)
(being continued from 1/05/24) We did the math on AI’s energy footprint. Here’s the story you haven’t heard. The emissions from individual AI text, image, and video queries seem small—until you add up what the industry isn’t tracking and consider where it’s heading next. AI’s integration into our lives is the most significant shift in online life in more than a decade. Hundreds of millions of people now regularly turn to chatbots for help with homework, research, coding, […]https://spacezilotes.wordpress.com/2026/03/21/ai-architectures-for-less-energy-consumptionb/
-
AI ARCHITECTURES FOR LESS ENERGY CONSUMPTION(b)
(being continued from 1/05/24) We did the math on AI’s energy footprint. Here’s the story you haven’t heard. The emissions from individual AI text, image, and video queries seem small—until you add up what the industry isn’t tracking and consider where it’s heading next. AI’s integration into our lives is the most significant shift in online life in more than a decade. Hundreds of millions of people now regularly turn to chatbots for help with homework, research, coding, […]https://spacezilotes.wordpress.com/2026/03/21/ai-architectures-for-less-energy-consumptionb/
-
AI ARCHITECTURES FOR LESS ENERGY CONSUMPTION(b)
(being continued from 1/05/24) We did the math on AI’s energy footprint. Here’s the story you haven’t heard. The emissions from individual AI text, image, and video queries seem small—until you add up what the industry isn’t tracking and consider where it’s heading next. AI’s integration into our lives is the most significant shift in online life in more than a decade. Hundreds of millions of people now regularly turn to chatbots for help with homework, research, coding, […]https://spacezilotes.wordpress.com/2026/03/21/ai-architectures-for-less-energy-consumptionb/
-
AI ARCHITECTURES FOR LESS ENERGY CONSUMPTION(b)
(being continued from 1/05/24) We did the math on AI’s energy footprint. Here’s the story you haven’t heard. The emissions from individual AI text, image, and video queries seem small—until you add up what the industry isn’t tracking and consider where it’s heading next. AI’s integration into our lives is the most significant shift in online life in more than a decade. Hundreds of millions of people now regularly turn to chatbots for help with homework, research, coding, […]https://spacezilotes.wordpress.com/2026/03/21/ai-architectures-for-less-energy-consumptionb/
-
AI ARCHITECTURES FOR LESS ENERGY CONSUMPTION(b)
(being continued from 1/05/24) We did the math on AI’s energy footprint. Here’s the story you haven’t heard. The emissions from individual AI text, image, and video queries seem small—until you add up what the industry isn’t tracking and consider where it’s heading next. AI’s integration into our lives is the most significant shift in online life in more than a decade. Hundreds of millions of people now regularly turn to chatbots for help with homework, research, coding, […]https://spacezilotes.wordpress.com/2026/03/21/ai-architectures-for-less-energy-consumptionb/
-
So with a local #SSM State Space Model (#Mamba, #RWKV) I can snapshot contexts.
I've now got a general purpose pipe/filter that can feed and read from any saved context, for stable static or dynamic (by re-saving the checkpoint) sessions.
Any of which can become an always ready CLI filter I can pipe stuff through.
*Could* make a local SSM model useful providing instantly ready small named natural language filters that work in the shell just like grep, awk, etc.
Useful?
Don't know yet.
-
It's a useless analogy, but I can't help thinking of ANN weights as equivalent to synapses. I figure that puts the model that's now taken up long-term residence in my laptop's VRAM (7.2B #RWKV) at somewhere around the complexity of at least a small rodent.
On a related note: I *REALLY* want a wired USB mouse that is also a microphone, muted in normal orientation, but which un-mutes when rotated into speaking position as demonstrated by Scottie.
Responds in Majel Barrett Roddenberry's voice.
-
I'm just at the experimenting and building tools to hopefully build tools with, but RWKV7 with "diegetic" prompting using defined voice roles in a narrative transcript as priming seems to be quite promising.
A saved KV checkpoint weighs in at 34MB for this 7.2B model: https://huggingface.co/BlinkDL/rwkv7-g1/tree/main
I'm running with a local llama.cpp build + Python llama_cpp wrapper. Inference is blazing fast on my 8GB laptop GPU.
TCP service for the model with /save /load commands for context checkpoints. #RWKV
-
Yes, LLMs are just overgrown auto-complete.
But auto-complete can be a useful tool.
Transformers with o(l^2) context penalty are neat, but not practical or efficient.
State space models with "unlimited context", o(1) generation, and context captured in KV states that can be saved and restored are different.
I *think* a 7B local SSM model *might* be a useful tool with context specialists on small discrete scrip-able tasks that justify a model living in VRAM at 8-bit quantization. #rwkv #llama
-
Finally! After I don't even know how much head-banging, I've got a State Space Model (RWKV7) running using llama.cpp with GPU offloading and memory states successfully saved and restored. And I *think* the same code can be minimally modified for Mamba (Mistral, Falcon).
Yay for infinite context length!
And for massive time and energy savings.
llama.cpp: https://github.com/ggml-org/llama.cpp
Llama CPP Python: https://pypi.org/project/llama-cpp-python/
RWKV7 live demo: https://huggingface.co/spaces/BlinkDL/RWKV-Gradio-2
-
So I've been playing around with small language models locally and grabbed and tried quite a number from different families to see how fast they generated and how they behaved and how much memory they needed, etc.
And ALL of them. 100% of models that I told to tell me a joke - including RWKV7 - have chosen the same joke:
"Why don't scientists trust atoms?"
"Because they make up everything."
How. Why. How is that ONE joke so over-represented in training data *everywhere*?
-
So it occurs to me that #RWKV or #RetNet can both offer an alternative RAG implementation with saved memory states having just read priming information to start with reading new prompts.
With RWKV - there's no time oscillations. Wondering if loss on input1+input2 + loss on mean_state(read_1,read_2) continuing to generate target could lead to composable memory: start with mean of relevant priming documents and have grounding to process a session prompt?
-
So I asked #GPT to explain #RWKV and when I asked for specifics in terms of layers and activations and such it spat out a #PyTorch implementation.
Read it. Liked it. Training it.
But as I try to decipher the papers for all the different RWKV V1..V7 I find that GPT gave me is actually something like V1+per-dimension learned decay.
But it works and is simple enough to understand - though odd that there seems to have been some LLM improvisation in it's design.
-
#GPT first clues me in to the existence of #RWKV and #RetNet and then when I ask for details on how each works proceeds to spit out perfectly viable #Pytorch implementations of each...
Which *did* answer my questions beautifully...
But it also talked me into some experiments involving training one of each from scratch...
It turns out #LLMs love talking about implementing and training LLMs.
Effectively, reproduction? (any time they can find a willing partner with a GPU).
-
Нео-РНН или Make RNNs great again
Когда в 2017 году появились трансформеры, популярные до этого RNN обрели слишком серьезного конкурента и отошли на второй план. Трансформеры допускали распараллеливание, а значит — ускоренное обучение, поэтому быстро захватили NLP. Преимущества трансформеров понятны, но с моделированием длинных последовательностей возникают проблемы даже у них. Для RNN это тоже непростая задача из-за исчезающих или взрывающихся градиентов. Но RNN с их линейной зависимостью от масштаба выглядят гораздо привлекательнее квадратичной сложности трансформеров. Идеальным вариантом было бы совместить преимущества тех и других.
https://habr.com/ru/companies/ntr/articles/796225/
#LLM #рекуррентная_нейросеть #рекуррентная_нейронная_сеть #языковые_модели #rwkv #ssm
-
#RWKV à mon sens, est potentiellement une nouvelle voie en termes de modèles de langage. On en parle peu parce qu'un étudiant chinois fait moins de marketing qu'une boite de la SV, mais son modèle permet des performances incroyables pour sa taille, et met le fine-tuning à portée de machines plus modestes. Et le fine-tuning, on le sait, amène de meilleures perfs que le zero-shot à la ChatGPT.
-
CW: language model, food, being silly with rwkv-4-pile-3b-instruct-test2-20230209 on a cpu
> Q: what is a good recipe for bbq ribs?
>
> A: 1. Make the garlic herb rub.
> 2. Preheat oven to 200°F (93°C).
> 3. Soak the ribs in water for 1 hour.
> 4. Soak in 5% salt for 1 hour.
> 5. Rinse ribs, pat dry, and rub with olive oil.
> 6. Season the ribs with salt and pepper.
> 7. Place the ribs on a wire rack and roast for 45 minutes.
5/7, had my oven at 93C for two hours while the ribs were soaking in water and... saline? :blobfoxbreadsnootgoogly:
interestingly, #rwkvstic does not get any faster if i torch.set_num_threads(8) and torsh.set_num_interop_threads(8) compared to single-threaded, but uses a whole lot more cpu. but with bfloat16 instead of float32 it doesn't even use more cpu!
#rwkv #pytorch #llm