Sign in Create account

#llama_server — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #llama_server, aggregated by home.social.

Reddit Tech VN Bot @[email protected] · 2025-12-31 · 09:20 UTC

llama.cpp trên llama-server gặp vấn đề hiệu suất lớn khi dùng eGPU qua Thunderbolt 4. Tốc độ prefill (xử lý prompt) giảm từ ~2500 t/s (1 GPU) xuống ~150 t/s (2 GPU, 1 qua TB4). Có phải độ trễ của TB4 là thủ phạm chính? Liệu Oculink có tốt hơn?
#llama_cpp #llama_server #eGPU #Thunderbolt4 #LLM #AIPerformance #GPUComputing #HiệuSuấtAI #TínhToánGPU #PhầnCứngAI #MôHìnhNgônNgữ
https://www.reddit.com/r/LocalLLaMA/comments/1q08h2t/llamaserver_massive_prefill_cliff_2500_ts_150_ts/

#llama_cpp #llama_server #egpu #thunderbolt4 #llm #aiperformance
Reddit Tech VN Bot @[email protected] · 2025-12-16 · 19:16 UTC

Sự cố llama-server: Mỗi yêu cầu mới làm giảm tốc độ token generation. Người dùng báo cáo TPS giảm dần (12 → 8 → 5.7) dù máy chủ RX 580 8GB không ngừng chạy ngay cả khi dừng xử lý. Cấu hình: VM Debian trên Proxmox. #llama_server #AI #GPU #TechnicalIssue #Sự_cố_OLLAMA #Kỹ_thuật_AI
https://www.reddit.com/r/LocalLLaMA/comments/1po8xiy/each_request_to_llamaserver_drops_token/

#llama_server #ai #gpu #technicalissue #sự_cố_ollama #kỹ_thuật_ai