Sign in Create account

#trtllm — Public Fediverse posts

Live and recent posts from across the Fediverse tagged #trtllm, aggregated by home.social.

Judith van Stegeren @jd7h · 2024-10-19 · 06:02 UTC

Fitting an LLM on a GPU is a bit like photography. Model weights = film sensitivity, activation size = shutter speed, I/O tensors = aperture. These 3 dials control your model's memory footprint, just as they shape a photo's exposure.
Just realised this while trying to fit Llama 3.1 on my 24GB GPU with TRT-LLM: https://nvidia.github.io/TensorRT-LLM/reference/memory.html.
#llms #genai #llama #gpu #nvidia #trtllm #tensorrt

#llms #genai #llama #gpu #nvidia #trtllm