home.social

#trtllm — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #trtllm, aggregated by home.social.

  1. Fitting an LLM on a GPU is a bit like photography. Model weights = film sensitivity, activation size = shutter speed, I/O tensors = aperture. These 3 dials control your model's memory footprint, just as they shape a photo's exposure.

    Just realised this while trying to fit Llama 3.1 on my 24GB GPU with TRT-LLM: nvidia.github.io/TensorRT-LLM/.