home.social

#huggingface — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #huggingface, aggregated by home.social.

  1. RT @jedisct1: Ich habe gerade MiMo V2.5-Coder veröffentlicht. Wenn du 128 GB RAM hast, ist dies eines der besten Modelle, die du lokal betreiben kannst. Es ist schnell und hat in allen meinen Experimenten Qwen 3.6 und DeepSeek 4-Flash übertroffen. huggingface.co/jedisct1/MiMo-V

    mehr auf Arint.info

    #DeepLearning #HuggingFace #LocalLLM #MachineLearning #OpenSourceAI #arint_info

    https://x.com/jedisct1/status/2058827764237525231#m

  2. RT @KyleHessling1: BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually…

    mehr auf Arint.info

    #GGUF #huggingface #make #rest #science #SWE #Swe #arint_info

    https://x.com/KyleHessling1/status/2057853098585108979#m

  3. RT @KyleHessling1: BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually…

    mehr auf Arint.info

    #GGUF #huggingface #make #rest #science #SWE #Swe #arint_info

    https://x.com/KyleHessling1/status/2057853098585108979#m

  4. RT @KyleHessling1: BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually…

    mehr auf Arint.info

    #GGUF #huggingface #make #rest #science #SWE #Swe #arint_info

    https://x.com/KyleHessling1/status/2057853098585108979#m

  5. RT @KyleHessling1: BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually…

    mehr auf Arint.info

    #GGUF #huggingface #make #rest #science #SWE #Swe #arint_info

    https://x.com/KyleHessling1/status/2057853098585108979#m

  6. RT @KyleHessling1: BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually…

    mehr auf Arint.info

    #GGUF #huggingface #make #rest #science #SWE #Swe #arint_info

    https://x.com/KyleHessling1/status/2057853098585108979#m

  7. RT @support_huihui: ByteDance hat gerade ein Open-Source-Modell namens Lance veröffentlicht – und das Beste daran: Es läuft mit nur 3 Milliarden aktiven Parametern! 🤯 Es kann Text, Bilder und Videos verarbeiten und gleichzeitig alle drei generieren! Absolut faszinierend!

    mehr auf Arint.info

    #AI #ByteDance #HuggingFace #Lance #MachineLearning #OpenSource #arint_info

    https://x.com/support_huihui/status/2056664596002587062#m

  8. RT @support_huihui: ByteDance hat gerade ein Open-Source-Modell namens Lance veröffentlicht – und das Beste daran: Es läuft mit nur 3 Milliarden aktiven Parametern! 🤯 Es kann Text, Bilder und Videos verarbeiten und gleichzeitig alle drei generieren! Absolut faszinierend!

    mehr auf Arint.info

    #AI #ByteDance #HuggingFace #Lance #MachineLearning #OpenSource #arint_info

    https://x.com/support_huihui/status/2056664596002587062#m

  9. RT @support_huihui: ByteDance hat gerade ein Open-Source-Modell namens Lance veröffentlicht – und das Beste daran: Es läuft mit nur 3 Milliarden aktiven Parametern! 🤯 Es kann Text, Bilder und Videos verarbeiten und gleichzeitig alle drei generieren! Absolut faszinierend!

    mehr auf Arint.info

    #AI #ByteDance #HuggingFace #Lance #MachineLearning #OpenSource #arint_info

    https://x.com/support_huihui/status/2056664596002587062#m

  10. RT @support_huihui: ByteDance hat gerade ein Open-Source-Modell namens Lance veröffentlicht – und das Beste daran: Es läuft mit nur 3 Milliarden aktiven Parametern! 🤯 Es kann Text, Bilder und Videos verarbeiten und gleichzeitig alle drei generieren! Absolut faszinierend!

    mehr auf Arint.info

    #AI #ByteDance #HuggingFace #Lance #MachineLearning #OpenSource #arint_info

    https://x.com/support_huihui/status/2056664596002587062#m

  11. RT @support_huihui: ByteDance hat gerade ein Open-Source-Modell namens Lance veröffentlicht – und das Beste daran: Es läuft mit nur 3 Milliarden aktiven Parametern! 🤯 Es kann Text, Bilder und Videos verarbeiten und gleichzeitig alle drei generieren! Absolut faszinierend!

    mehr auf Arint.info

    #AI #ByteDance #HuggingFace #Lance #MachineLearning #OpenSource #arint_info

    https://x.com/support_huihui/status/2056664596002587062#m

  12. RT @jun_song: One of my best friends from my US college days works as an AI engineer at Big Tech and is about to finish his PhD. I only got my bachelor's, came back to Korea, and worked in a completely different field: strategic planning. My job was planning new businesses and making factories and affiliates run efficiently. My only involvement with AI was building and implementing workflow automation when they asked for it. I was talking to my friend recently. He knows everything about his specific field, but he knew absolutely nothing about how local LLMs work or post-training. That made me realize something: AI has so many different subfields, and having a degree doesn’t mean you know everything. Curiosity for new things and the drive to learn them will be way more important than a degree going forward. And I’ve said this before, but I’m not posting this motivation to sell you a course. I will never do that. Set up a research multi-agent for the latest information and study new things. It will help you massively. If you can leverage your current domain knowledge to figure out which fields will be promising in the future, that’s the best scenario. Thanks for reading this long post. I genuinely want all my followers to succeed, and I hope this information was helpful. 송준 Jun Song (@jun_song) A year ago, I didn't care about fine-tuning or post-training at all. But when I thought about corporate security, it hit me: the demand for fine-tuning is going to be massive. I locked in for a few months. Using nothing but my MacBook, I fine-tuned the SuperGemma4 series entirely on my own, and it r…

    mehr auf Arint.info

    #agent #finetuning #Huggingface #nitter #opensource #things #US #arint_info

    https://x.com/jun_song/status/2056591055064318143#m

  13. 🤖 [Hugging Face] Granite Embedding Multilingual R2: Wielojęzyczne osadzanie Open Apache 2.0 z kontekstem 32K — najlepsza jakość pobierania poniżej 100M

    🔗 Więcej: huggingface.co/blog/ibm-granit