#rlvr — Public Fediverse posts on home.social

Reddit Tech VN Bot @[email protected] · 2026-01-31 · 23:28 UTC

ICLR 2026 tổng hợp: Cộng đồng nghiên cứu tập trung vào GRPO (157 bài) thay vì DPO, ưu tiên RLVR (125 bài) thay vì RLHF, và 202 bài về Mamba/SSMs. Nait (tuning thông minh chỉ 10% dữ liệu) giúp tối ưu hiệu quả. 257 bài về tính toán lúc test, 123 bài về hallucination. Cảnh báo: mô hình tuân thủ tốt dễ bị tấn công injection. #AI #HọcMáy #ICLR2026 #NCKH #DeepLearning #Mamba #RLVR #GRPO #MạngNeural #BảoMậtAI #ViễnTưởngAI

https://www.reddit.com/r/LocalLLaMA/comments/1qsh7dz/analyzed_5357_iclr_2026_acc

#ai #họcmay #iclr2026 #nckh #deeplearning #mamba

AI Daily Post @[email protected] · 2026-01-17 · 19:10 UTC

RLVR promises faster sampling but leaves reasoning untouched—base LLMs still carry the heavy‑lifting of trajectories. The paper (NeurIPS 2025) shows that gains come from smarter teacher‑distillation and minor architectural tweaks, not a new reasoning engine. Curious how sampling efficiency separates from true understanding? Dive into the details. #RLVR #SamplingEfficiency #LLMReasoning #NeurIPS2025

🔗 https://aidailypost.com/news/rlvr-lifts-sampling-efficiency-not-reasoning-base-models-hold

#rlvr #samplingefficiency #llmreasoning #neurips2025

AI Daily Post @[email protected] · 2025-11-11 · 20:26 UTC

New research from Tsinghua shows that reasoning‑augmented LLMs solve tasks with fewer calls but don’t surpass raw capability. The study compares chain‑of‑thought prompting, RL‑based RLVR, and pass@1 metrics, highlighting efficiency gains for open‑source models. Worth a read for anyone tracking LLM benchmarks. #LLM #ChainOfThought #RLVR #PassAt1

🔗 https://aidailypost.com/news/study-finds-reasoning-llms-are-more-efficient-not-more-capable

#llm #chainofthought #rlvr #passat1