#quantization — Public Fediverse posts
Live and recent posts from across the Fediverse tagged #quantization, aggregated by home.social.
-
optimization-kernels: C++ kernels and utilities for quantization and inference optimization.
👉 https://github.com/brandonhimpfen/optimization-kernels
#ai #artificialintelligence #machinelearning #llm #inference #quantization
-
An excellent introduction to #quantization used for #LLMs 👌🏽:
“Quantization From The Ground Up”, Sam Rose, Ngrok (https://ngrok.com/blog/quantization).
On HN: https://news.ycombinator.com/item?id=47519295
#AI #Math #FloatingPoint #NumericalAnalysis #Numbers #NeuralNetworks #Precision #Accuracy
-
An excellent introduction to #quantization used for #LLMs 👌🏽:
“Quantization From The Ground Up”, Sam Rose, Ngrok (https://ngrok.com/blog/quantization).
On HN: https://news.ycombinator.com/item?id=47519295
#AI #Math #FloatingPoint #NumericalAnalysis #Numbers #NeuralNetworks #Precision #Accuracy
-
An excellent introduction to #quantization used for #LLMs 👌🏽:
“Quantization From The Ground Up”, Sam Rose, Ngrok (https://ngrok.com/blog/quantization).
On HN: https://news.ycombinator.com/item?id=47519295
#AI #Math #FloatingPoint #NumericalAnalysis #Numbers #NeuralNetworks #Precision #Accuracy
-
An excellent introduction to #quantization used for #LLMs 👌🏽:
“Quantization From The Ground Up”, Sam Rose, Ngrok (https://ngrok.com/blog/quantization).
On HN: https://news.ycombinator.com/item?id=47519295
#AI #Math #FloatingPoint #NumericalAnalysis #Numbers #NeuralNetworks #Precision #Accuracy
-
Impressive:
“TurboQuant: Redefining AI Efficiency With Extreme Compression”, Amir Zandieh, et al, Google Research (https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/).
The paper: https://arxiv.org/abs/2504.19874
On HN: https://news.ycombinator.com/item?id=47513475
#TurboQuant #Quantization #LLMs #Vectors #Compression #Paper
-
Impressive:
“TurboQuant: Redefining AI Efficiency With Extreme Compression”, Amir Zandieh, et al, Google Research (https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/).
The paper: https://arxiv.org/abs/2504.19874
On HN: https://news.ycombinator.com/item?id=47513475
#TurboQuant #Quantization #LLMs #Vectors #Compression #Paper
-
Impressive:
“TurboQuant: Redefining AI Efficiency With Extreme Compression”, Amir Zandieh, et al, Google Research (https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/).
The paper: https://arxiv.org/abs/2504.19874
On HN: https://news.ycombinator.com/item?id=47513475
#TurboQuant #Quantization #LLMs #Vectors #Compression #Paper
-
Impressive:
“TurboQuant: Redefining AI Efficiency With Extreme Compression”, Amir Zandieh, et al, Google Research (https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/).
The paper: https://arxiv.org/abs/2504.19874
On HN: https://news.ycombinator.com/item?id=47513475
#TurboQuant #Quantization #LLMs #Vectors #Compression #Paper
-
🚀🌐 Oh great, now #Google wants us to #turbocharge our #browsers with "vector quantization" mumbo-jumbo that requires versions of Chrome, Firefox, and Safari that don't even exist yet. 🤖 Because who doesn't want to compress their vectors in 3 bits/dim while their browsers and brains crash simultaneously. 🙄
https://github.com/teamchong/turboquant-wasm #Vector #Quantization #Browser #Update #TechNews #HackerNews #ngated -
New AI Breakthrough May Bring Full FSD V14 to Tesla’s HW3 Vehicles
March 30, 2026 By Karan Singh For owners of Tesla vehicles equipped with HW3, the wait for the…
#NewsBeep #News #Artificialintelligence #AI #AI4 #ArtificialIntelligence #AU #Australia #FSD #HW3 #memory #neuralnetworks #Nvidia #Quantization #Technology #TESLA
https://www.newsbeep.com/au/575661/ -
New AI Breakthrough May Bring Full FSD V14 to Tesla’s HW3 Vehicles
March 30, 2026 By Karan Singh For owners of Tesla vehicles equipped with HW3, the wait for the…
#NewsBeep #News #Artificialintelligence #AI #Ai4 #ArtificialIntelligence #FSD #HW3 #Memory #neuralnetworks #Nvidia #Quantization #Technology #Tesla #UK #UnitedKingdom
https://www.newsbeep.com/uk/503987/ -
New AI Breakthrough May Bring Full FSD V14 to Tesla’s HW3 Vehicles
March 30, 2026 By Karan Singh For owners of Tesla vehicles equipped with HW3, the wait for the…
#NewsBeep #News #Artificialintelligence #AI #AI4 #ArtificialIntelligence #CA #Canada #FSD #hw3 #memory #neuralnetworks #Nvidia #Quantization #Technology #TESLA
https://www.newsbeep.com/ca/571227/ -
New AI Breakthrough May Bring Full FSD V14 to Tesla’s HW3 Vehicles
March 30, 2026 By Karan Singh For owners of Tesla vehicles equipped with HW3, the wait for the…
#NewsBeep #News #US #USA #UnitedStates #UnitedStatesOfAmerica #Artificialintelligence #AI #Ai4 #ArtificialIntelligence #FSD #HW3 #Memory #neuralnetworks #NVIDIA #Quantization #Technology #Tesla
https://www.newsbeep.com/us/554537/ -
Authors: Federico Marcuzzi (INSAIT - Institute for Computer Science, Artificial Intelligence and Technology), Xuefei Ning (Tsinghua University), Roy Schwartz (The Hebrew University of Jerusalem), and Iryna Gurevych (UKP Lab, Technische Universität Darmstadt and ATHENE Center).
See you at #EACL2026 in Rabat 🕌!
#UKPLab #NLProc #ResponsibleAI #Quantization #MLSafety #Fairness #TrustworthyAI #ModelCompression #LLMSafety #EthicalAI #NLP #AIResearch
-
Authors: Federico Marcuzzi (INSAIT - Institute for Computer Science, Artificial Intelligence and Technology), Xuefei Ning (Tsinghua University), Roy Schwartz (The Hebrew University of Jerusalem), and Iryna Gurevych (UKP Lab, Technische Universität Darmstadt and ATHENE Center).
See you at #EACL2026 in Rabat 🕌!
#UKPLab #NLProc #ResponsibleAI #Quantization #MLSafety #Fairness #TrustworthyAI #ModelCompression #LLMSafety #EthicalAI #NLP #AIResearch
-
Authors: Federico Marcuzzi (INSAIT - Institute for Computer Science, Artificial Intelligence and Technology), Xuefei Ning (Tsinghua University), Roy Schwartz (The Hebrew University of Jerusalem), and Iryna Gurevych (UKP Lab, Technische Universität Darmstadt and ATHENE Center).
See you at #EACL2026 in Rabat 🕌!
#UKPLab #NLProc #ResponsibleAI #Quantization #MLSafety #Fairness #TrustworthyAI #ModelCompression #LLMSafety #EthicalAI #NLP #AIResearch
-
Authors: Federico Marcuzzi (INSAIT - Institute for Computer Science, Artificial Intelligence and Technology), Xuefei Ning (Tsinghua University), Roy Schwartz (The Hebrew University of Jerusalem), and Iryna Gurevych (UKP Lab, Technische Universität Darmstadt and ATHENE Center).
See you at #EACL2026 in Rabat 🕌!
#UKPLab #NLProc #ResponsibleAI #Quantization #MLSafety #Fairness #TrustworthyAI #ModelCompression #LLMSafety #EthicalAI #NLP #AIResearch
-
Authors: Federico Marcuzzi (INSAIT - Institute for Computer Science, Artificial Intelligence and Technology), Xuefei Ning (Tsinghua University), Roy Schwartz (The Hebrew University of Jerusalem), and Iryna Gurevych (UKP Lab, Technische Universität Darmstadt and ATHENE Center).
See you at #EACL2026 in Rabat 🕌!
#UKPLab #NLProc #ResponsibleAI #Quantization #MLSafety #Fairness #TrustworthyAI #ModelCompression #LLMSafety #EthicalAI #NLP #AIResearch
-
🎉 Wow, an article longer than the collective thoughts of its intended audience! Sam Rose seems to think we're all aspiring data scientists with infinite free time and an endless love for #quantization. 😂 6,658 words later, we're left with an 80 billion-parameter headache and absolutely zero desire to quantize anything ever again. 🚀🔢
https://ngrok.com/blog/quantization #HackerNews #DataScience #LongRead #Humor #HackerNews #ngated -
Quantization from the Ground Up
https://ngrok.com/blog/quantization
#HackerNews #Quantization #Ground #Up #Machine #Learning #AI #Technology #Blog
-
Compare GGUF, GPTQ, and AWQ quantization formats for LLMs on consumer GPUs. Learn how to balance model quality, speed, and memory usage with Q4_K_M, IQ4_XS, and Q3_K_S variants for optimal inference performance.
#GGUF #quantization #LLM inference #GPU optimization #model deployment
https://dasroot.net/posts/2026/02/gguf-quantization-quality-speed-consumer-gpus/
-
🧠 Tại sao định dạng NVFP8/MXFP8 không được quan tâm trong llama.cpp hay VLLM dù có độ chính xác cao hơn FP8 và được tối ưu trên kiến trúc Blackwell? Câu hỏi mở cho cộng đồng AI!
#AI #MachineLearning #Quantization #ĐịnhDạng #TríTuệNhânTạo #HọcMáyhttps://www.reddit.com/r/LocalLLaMA/comments/1qsi8n2/why_no_nvfp8_or_mxfp8/
-
Một người dùng Reddit đã so sánh 3 phương pháp lượng tử hóa 4-bit (Q4_K_M, Q4_K_XL và MXFP4) trên mô hình GLM-4.7-Flash và Nemotron-3-nano. MXFP4 cho perplexity thấp hơn (10.72 PPL) và tải nhanh hơn so với Q4_K_M (16.17 PPL). Nó cũng tiết kiệm 17% VRAM và tăng tốc xử lý lên 5% so với Q4_K_XL. Kết quả này cho thấy MXFP4 có thể là lựa chọn tối ưu cho mô hình lớn từ 30–32B tham số. #AI #Quantization #MôHìnhĐịnhLượng #TríTuệNhânTạo #HọcMáy
-
So sánh quantization MXFP4 vs Q4_K_M/XL trên mô hình GLM-4.7-Flash:
📉 Kết quả bất ngờ: MXFP4 có chỉ số Perplexity (PPL) thấp hơn (~10.72) so với Q4_K_XL (~15.73), dù kích thước file nhỏ hơn (15.79 GiB so với 16.31 GiB).
🚀 Tốc độ: MXFP4 cho tốc độ xử lý nhanh hơn và tiết kiệm VRAM hơn.
🤔 Câu hỏi đặt ra: Liệu PPL thấp hơn có đồng nghĩa với việc cải thiện khả năng gọi công cụ (tool-calling) và lập trình?#LLM #AI #Quantization #MXFP4 #MachineLearning #CongNghe #LocalLLM
-
Benchmark trên RTX 4070 Super (12 GB) cho thấy Qwen 2.5 Coder 7B (AWQ Int4) nhanh hơn 24 % (44.6 TPS) và dùng ít VRAM hơn (9.49 GB) so với Qwen 2.5 3B FP16 (35.9 TPS, 10 GB). Kết luận: mô hình lớn đã được định lượng đáp ứng tốt hơn trên GPU tiêu dùng. #AI #Quantization #Benchmark #RTX4070 #LLM #TríTuệNhânTạo #địnhlượng #đánhgiá
-
Tôi đang chạy mô hình QwQ 32B trên LM Studio với lượng hóa 4 bit, tối ưu K/V cache giúp tăng tốc độ xử lý lên 3 lần (40k context thay vì 10k), đồng thời giảm VRAM xuống 19GB/24GB. Tuy nhiên, việc giảm K/V cache xuống 4 bit có ảnh hưởng nhiều đến độ chính xác? Đây là cách tối ưu hiệu quả cho vai trò trò chuyện/role-play với LLM cục bộ. #AI #MáyHọc #LLM #TốiƯuHóa #Quantization #KVTuning
https://www.reddit.com/r/ollama/comments/1qqan74/effects_of_quantized_kv_cache_on_an_already/
-
Scientific Reports precision medicine AI launches a clinic-first Collection as ML peers rethink LLM review norms and media races to keep up.
-
Сколько VRAM нужно для нейросетей?
Этот пост будет полезен людям, кто хочет разобраться в локальных моделях, особенно использующим их, как инструмент в создании контента, арта и дизайна (контекст нейросетей - image и video). Так же поговорим о выборе видеокарты и параметрах влияющих на генеративные workflow. Telegram
https://habr.com/ru/articles/979092/
#нейросеть_локально #нейросеть_для_генерации_изображений #видеокарты #quantization #comfyui #memory_bandwidth #vram #neural_networks #генеративные_модели
-
Сколько VRAM нужно для нейросетей?
Этот пост будет полезен людям, кто хочет разобраться в локальных моделях, особенно использующим их, как инструмент в создании контента, арта и дизайна (контекст нейросетей - image и video). Так же поговорим о выборе видеокарты и параметрах влияющих на генеративные workflow. Telegram
https://habr.com/ru/articles/979092/
#нейросеть_локально #нейросеть_для_генерации_изображений #видеокарты #quantization #comfyui #memory_bandwidth #vram #neural_networks #генеративные_модели
-
Сколько VRAM нужно для нейросетей?
Этот пост будет полезен людям, кто хочет разобраться в локальных моделях, особенно использующим их, как инструмент в создании контента, арта и дизайна (контекст нейросетей - image и video). Так же поговорим о выборе видеокарты и параметрах влияющих на генеративные workflow. Telegram
https://habr.com/ru/articles/979092/
#нейросеть_локально #нейросеть_для_генерации_изображений #видеокарты #quantization #comfyui #memory_bandwidth #vram #neural_networks #генеративные_модели
-
Сколько VRAM нужно для нейросетей?
Этот пост будет полезен людям, кто хочет разобраться в локальных моделях, особенно использующим их, как инструмент в создании контента, арта и дизайна (контекст нейросетей - image и video). Так же поговорим о выборе видеокарты и параметрах влияющих на генеративные workflow. Telegram
https://habr.com/ru/articles/979092/
#нейросеть_локально #нейросеть_для_генерации_изображений #видеокарты #quantization #comfyui #memory_bandwidth #vram #neural_networks #генеративные_модели
-
NVIDIA deep learning courses add Earth-2, MONAI, and adversarial ML training, with free options and certificates for practitioners.
https://www.aistory.news/machine-learning/nvidia-deep-learning-courses-spotlight-practical-skills/
-
NVIDIA unveils Broadened Reinforcement Learning, using massive rollout scaling to boost LLM reasoning with less compute and stable rewards.
https://www.aistory.news/machine-learning/broadened-reinforcement-learning-adds-rollout-scaling/
-
NVIDIA expands its training catalog with a new Graph Neural Networks course, plus fresh modules on adversarial ML, Earth-2, and Jetson.
https://www.aistory.news/machine-learning/nvidia-adds-graph-neural-networks-course-to-lineup/
-
NVIDIA unveils an interactive AI agent that accelerates ML workflows with CUDA-X and Nemotron Nano-9B-v2, plus fresh training options.
https://www.aistory.news/machine-learning/nvidia-debuts-interactive-ai-agent-to-speed-ml-tasks/
-
NVIDIA expands its AI catalog with federated learning courses and modules on adversarial ML, Earth-2 weather models, and Jetson edge AI.
https://www.aistory.news/machine-learning/nvidia-adds-federated-learning-courses-to-ai-catalog/
-
Accelerated ML workflows get a boost as NVIDIA details a GPU-powered agent that speeds data prep, training, and HPO by up to 43x. Today.
https://www.aistory.news/machine-learning/accelerated-ml-workflows-arrive-with-nvidias-new-agent/
-
Limitless Pendant discontinued after Meta deal. Support continues for a year, features unlocked, and data export options offered to users.
https://www.aistory.news/machine-learning/limitless-pendant-discontinued-as-team-joins-meta/
-
Meta Limitless acquisition signals new AI wearables, while Isaac Lab 2.3 boosts robot learning with whole‑body control and teleoperation.
-
NVIDIA's Isaac Lab Arena launches to benchmark robot policies at scale, with whole-body control, richer teleoperation data, ADR, and PBT.
https://www.aistory.news/machine-learning/isaac-lab-arena-debuts-for-scalable-robot-evaluation/
-
Call Reason beta leads Android’s AI updates, while Superhuman expands Ask AI and new ML courses arrive for developers.
https://www.aistory.news/machine-learning/call-reason-beta-headlines-androids-latest-ai-upgrades/
-
Android 16 AI features add notification summaries, spam checks and Expressive Captions, rolling out to Pixel devices with privacy controls.
https://www.aistory.news/machine-learning/android-16-ai-features-roll-out-to-pixel-phones-first/
-
Ecommerce anomaly detection gains urgency as the Shopify outage underscores needs for ML monitoring, adversarial defenses, and resilient
https://www.aistory.news/machine-learning/ecommerce-anomaly-detection-rises-after-shopify-outage/
-
Các phiên bản Qwen3-Next-80B-A3B GGUF mới đã có sẵn! Bao gồm lượng tử hóa imatrix và IQ, cùng với tối ưu hóa MoE, mang lại hiệu suất tốt hơn cho các mô hình LLM cục bộ.
#Qwen3Next #GGUF #LLM #AI #Quantization
#MôHìnhAI #LượngTửHóa #TríTuệNhânTạohttps://www.reddit.com/r/LocalLLaMA/comments/1p9qe7o/qwen3_next_imatrix_ggufs_up/
-
SGLang vừa giải quyết ổn định FP8 cho huấn luyện RL, phát hiện vấn đề nằm ở bước lượng tử hóa (quantization step). Đây là bước tiến lớn cho RLHF và tinh chỉnh RL cục bộ, giúp đơn giản hóa việc sử dụng độ chính xác hỗn hợp.
#SGLang #FP8 #RLTraining #Quantization #AI #MachineLearning #HuấnLuyệnRL #TríTuệNhânTạo #HọcMáy -
Nemotron Nano-9B-v2 powers a GPU-accelerated AI agent that automates ML workflows and boosts key tasks by up to 43x, per NVIDIA.
https://www.aistory.news/machine-learning/nemotron-nano-9b-v2-speeds-ml-agent-tasks-up-to-43x/
-
Claude Code updates add longer-running agents, Excel and Chrome tools, while NVIDIA debuts new RL rollout scaling and training paths.
https://www.aistory.news/machine-learning/claude-code-updates-bring-faster-smarter-ml-workflows/
-
Valve signals PC-aligned pricing, and Steam Machine AI like DLSS, FSR, and XeSS could shape performance expectations in the living room.
https://www.aistory.news/machine-learning/steam-machine-ai-stakes-rise-as-valve-signals-pricing/
-
NVIDIA’s CUDA-X Data Science shows 3x–43x ML speedups and expands training, pointing to faster, simpler workflows for teams and researchers.
https://www.aistory.news/machine-learning/cuda-x-data-science-brings-big-ml-speedups-in-new-demos/