#retnet — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #retnet, aggregated by home.social.
-
So it occurs to me that #RWKV or #RetNet can both offer an alternative RAG implementation with saved memory states having just read priming information to start with reading new prompts.
With RWKV - there's no time oscillations. Wondering if loss on input1+input2 + loss on mean_state(read_1,read_2) continuing to generate target could lead to composable memory: start with mean of relevant priming documents and have grounding to process a session prompt?
-
#GPT first clues me in to the existence of #RWKV and #RetNet and then when I ask for details on how each works proceeds to spit out perfectly viable #Pytorch implementations of each...
Which *did* answer my questions beautifully...
But it also talked me into some experiments involving training one of each from scratch...
It turns out #LLMs love talking about implementing and training LLMs.
Effectively, reproduction? (any time they can find a willing partner with a GPU).