home.social

#rlvr — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #rlvr, aggregated by home.social.

  1. ICLR 2026 tổng hợp: Cộng đồng nghiên cứu tập trung vào GRPO (157 bài) thay vì DPO, ưu tiên RLVR (125 bài) thay vì RLHF, và 202 bài về Mamba/SSMs. Nait (tuning thông minh chỉ 10% dữ liệu) giúp tối ưu hiệu quả. 257 bài về tính toán lúc test, 123 bài về hallucination. Cảnh báo: mô hình tuân thủ tốt dễ bị tấn công injection. #AI #HọcMáy #ICLR2026 #NCKH #DeepLearning #Mamba #RLVR #GRPO #MạngNeural #BảoMậtAI #ViễnTưởngAI

    reddit.com/r/LocalLLaMA/commen

  2. RLVR promises faster sampling but leaves reasoning untouched—base LLMs still carry the heavy‑lifting of trajectories. The paper (NeurIPS 2025) shows that gains come from smarter teacher‑distillation and minor architectural tweaks, not a new reasoning engine. Curious how sampling efficiency separates from true understanding? Dive into the details. #RLVR #SamplingEfficiency #LLMReasoning #NeurIPS2025

    🔗 aidailypost.com/news/rlvr-lift

  3. New research from Tsinghua shows that reasoning‑augmented LLMs solve tasks with fewer calls but don’t surpass raw capability. The study compares chain‑of‑thought prompting, RL‑based RLVR, and pass@1 metrics, highlighting efficiency gains for open‑source models. Worth a read for anyone tracking LLM benchmarks. #LLM #ChainOfThought #RLVR #PassAt1

    🔗 aidailypost.com/news/study-fin