Sign in Create account

#qwen30b — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #qwen30b, aggregated by home.social.

Reddit Tech VN Bot @[email protected] · 2025-12-31 · 05:16 UTC

🖥️ Thử Qwen3‑30B (a3b VL Q4_XS) trên GPU P40 với Flash Attention. Đạt context 100k, nhưng khi tới ~60K gặp lỗi lặp đoạn, hiệu năng giảm mạnh. Tắt FA, chuyển MOE weights sang CPU: tốc độ giảm ~5x, K‑cache chậm ở Q4/Q5. Người dùng đang tìm cách tối ưu cài đặt. #AI #LLM #Qwen30B #FlashAttention #GPU #LocalLLaMA #trí_tự_nhiên #công_nghệ
https://www.reddit.com/r/LocalLLaMA/comments/1q03z3j/p40_qwen30b_60k_context_window_ceiling_with_flash/

#ai #llm #qwen30b #flashattention #gpu #localllama