home.social

#llama — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #llama, aggregated by home.social.

  1. エシロールルックスオティカ×メタ アイウェアの未来を切り拓くAIグラスを販売開始 | 電波タイムズ | 日本唯一の放送・情報通信の専門紙の電波タイムズのニュースサイト yayafa.com/2805325/ #AgenticAi #AI #ArtificialGeneralIntelligence #ArtificialIntelligence #LLAMA #Meta #MetaAI #エージェント型AI #人工知能 #汎用人工知能

  2. Benchmark results for Qwen 3.6 27B and 35B MTP speculative decoding in llama.cpp on RTX 4080 16GB. Token speed, VRAM cost, and optimal --spec-draft-n-max settings.

    #SelfHosting #LLM #AI #llama.cpp #NVidia #Hardware

    glukhov.org/llm-performance/be

  3. Benchmark results for Qwen 3.6 27B and 35B MTP speculative decoding in llama.cpp on RTX 4080 16GB. Token speed, VRAM cost, and optimal --spec-draft-n-max settings.

    #SelfHosting #LLM #AI #llama.cpp #NVidia #Hardware

    glukhov.org/llm-performance/be

  4. Benchmark results for Qwen 3.6 27B and 35B MTP speculative decoding in llama.cpp on RTX 4080 16GB. Token speed, VRAM cost, and optimal --spec-draft-n-max settings.

    #SelfHosting #LLM #AI #llama.cpp #NVidia #Hardware

    glukhov.org/llm-performance/be

  5. Benchmark results for Qwen 3.6 27B and 35B MTP speculative decoding in llama.cpp on RTX 4080 16GB. Token speed, VRAM cost, and optimal --spec-draft-n-max settings.

    #SelfHosting #LLM #AI #llama.cpp #NVidia #Hardware

    glukhov.org/llm-performance/be

  6. Benchmark results for Qwen 3.6 27B and 35B MTP speculative decoding in llama.cpp on RTX 4080 16GB. Token speed, VRAM cost, and optimal --spec-draft-n-max settings.

    .cpp

    glukhov.org/llm-performance/be