Serving Gemma on GKE using Nvidia TRT LLM and Triton Server

  Рет қаралды 642

Container Bytes

Container Bytes

5 ай бұрын

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models.
Nvidia TensorRT is an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
Triton Inference Server is an open source inference serving software that streamlines AI inferencing.
In this video Mofi Rahman and Kent Hua walks through the process of deploying Gemma on GKE using TensorRT LLM and Triton Server.
Find Gemma on Huggingface: huggingface.co/google
Find Gemma on Kaggle: www.kaggle.com/models/google/...
Follow along the guide: cloud.google.com/kubernetes-e...
Find other guides for serving Gemma and other AIML resources for GKE: g.co/cloud/gke-aiml
Find other resources for learning about Gemma: ai.google.dev/gemma

Пікірлер: 1
@shezanbaig895
@shezanbaig895 4 ай бұрын
Hey there, i have finetuned Mistral Modela and I have also created TensorRT engine. I wanted to ask do I need preprocessing and postprocessing script or do I just need pbtxt file to serve it on Triton Inference server? Shall I need to follow what you did for gemma?
Serve LLM on Google Kubernetes Engine on L4 GPUs
16:51
Container Bytes
Рет қаралды 386
Nastya and SeanDoesMagic
00:16
Nastya
Рет қаралды 18 МЛН
Happy 4th of July 😂
00:12
Alyssa's Ways
Рет қаралды 67 МЛН
Задержи дыхание дольше всех!
00:42
Аришнев
Рет қаралды 2,3 МЛН
Coding a Web Server in 25 Lines - Computerphile
17:49
Computerphile
Рет қаралды 329 М.
Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM
12:21
Google for Developers
Рет қаралды 2,4 М.
Do NOT Learn Kubernetes Without Knowing These Concepts...
13:01
Travis Media
Рет қаралды 254 М.
Serving Gemma on GKE using vLLM
4:56
Container Bytes
Рет қаралды 478
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Apple beats EVERYONE to share LLM Secrets!!!
11:54
1littlecoder
Рет қаралды 15 М.
Getting Started with TensorRT-LLM
14:21
Long's Short-Term Memory
Рет қаралды 2,7 М.
Лазер против камеры смартфона
1:01
NEWTONLABS
Рет қаралды 639 М.
ВАЖНО! Не проверяйте на своем iPhone после установки на экран!
0:19
ГЛАЗУРЬ СТЕКЛО для iPhone и аксессуары OTU
Рет қаралды 6 МЛН
S24 Ultra and IPhone 14 Pro Max telephoto shooting comparison #shorts
0:15
Photographer Army
Рет қаралды 9 МЛН
Как правильно выключать звук на телефоне?
0:17
Люди.Идеи, общественная организация
Рет қаралды 1,9 МЛН
Самые крутые школьные гаджеты
0:49