Serving Gemma on GKE using Nvidia TRT LLM and Triton Server

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

Serve LLM on Google Kubernetes Engine on L4 GPUs

Nastya and SeanDoesMagic

Happy 4th of July 😂

УГНАЛИ МАШИНУ 😱 НЕЖДАННЫЙ ПОЦЕЛУЙ АМИНА ПОПАЛА В БОЛЬНИЦУ 🤯 ВЫПУСКНОЙ В КАМПУСЕ 🤩

Задержи дыхание дольше всех!

Serving Gemma on GKE using Nvidia TRT LLM and Triton Server

Рет қаралды 642

Container Bytes

Container Bytes

5 ай бұрын

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models.
Nvidia TensorRT is an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
Triton Inference Server is an open source inference serving software that streamlines AI inferencing.
In this video Mofi Rahman and Kent Hua walks through the process of deploying Gemma on GKE using TensorRT LLM and Triton Server.
Find Gemma on Huggingface: huggingface.co/google
Find Gemma on Kaggle: www.kaggle.com/models/google/...
Follow along the guide: cloud.google.com/kubernetes-e...
Find other guides for serving Gemma and other AIML resources for GKE: g.co/cloud/gke-aiml
Find other resources for learning about Gemma: ai.google.dev/gemma

Пікірлер: 1

@shezanbaig895 4 ай бұрын

Hey there, i have finetuned Mistral Modela and I have also created TensorRT engine. I wanted to ask do I need preprocessing and postprocessing script or do I just need pbtxt file to serve it on Triton Inference server? Shall I need to follow what you did for gemma?

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

32:27

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

Outerbounds

Рет қаралды 1,7 М.

Serve LLM on Google Kubernetes Engine on L4 GPUs

16:51

Serve LLM on Google Kubernetes Engine on L4 GPUs

Container Bytes

Рет қаралды 386

Nastya and SeanDoesMagic

00:16

Nastya and SeanDoesMagic

Nastya

Рет қаралды 18 МЛН

Happy 4th of July 😂

00:12

Happy 4th of July 😂

Alyssa's Ways

Рет қаралды 67 МЛН

УГНАЛИ МАШИНУ 😱 НЕЖДАННЫЙ ПОЦЕЛУЙ АМИНА ПОПАЛА В БОЛЬНИЦУ 🤯 ВЫПУСКНОЙ В КАМПУСЕ 🤩

19:48

УГНАЛИ МАШИНУ 😱 НЕЖДАННЫЙ ПОЦЕЛУЙ АМИНА ПОПАЛА В БОЛЬНИЦУ 🤯 ВЫПУСКНОЙ В КАМПУСЕ 🤩

KiKiDo

Рет қаралды 3,9 МЛН

Задержи дыхание дольше всех!

00:42

Задержи дыхание дольше всех!

Аришнев

Рет қаралды 2,3 МЛН

Coding a Web Server in 25 Lines - Computerphile

17:49

Coding a Web Server in 25 Lines - Computerphile

Computerphile

Рет қаралды 329 М.

Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM

12:21

Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM

Google for Developers

Рет қаралды 2,4 М.

Perfect AI development setup for any programming language with Sonnet 3.5 and Claude Projects

28:14

Perfect AI development setup for any programming language with Sonnet 3.5 and Claude Projects

Stanislav Khromov

Рет қаралды 13 М.

Do NOT Learn Kubernetes Without Knowing These Concepts...

13:01

Do NOT Learn Kubernetes Without Knowing These Concepts...

Travis Media

Рет қаралды 254 М.

Шестопалов Егор - Как мы сервинг на Triton переводили

10:19

Шестопалов Егор - Как мы сервинг на Triton переводили

ODS AI Ru

Рет қаралды 1,1 М.

Serving Gemma on GKE using vLLM

4:56

Serving Gemma on GKE using vLLM

Container Bytes

Рет қаралды 478

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

30:25

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

MLOps.community

Рет қаралды 12 М.

Generative AI in a Nutshell - how to survive and thrive in the age of AI

17:57

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Henrik Kniberg

Рет қаралды 1,7 МЛН

Apple beats EVERYONE to share LLM Secrets!!!

11:54

Apple beats EVERYONE to share LLM Secrets!!!

1littlecoder

Рет қаралды 15 М.

Getting Started with TensorRT-LLM

14:21

Getting Started with TensorRT-LLM

Long's Short-Term Memory

Рет қаралды 2,7 М.

Лазер против камеры смартфона

1:01

Лазер против камеры смартфона

NEWTONLABS

Рет қаралды 639 М.

The first two iPads are imitations, just for demonstration purposes, don't worry#ipadkeyboard #ipad

0:12

The first two iPads are imitations, just for demonstration purposes, don't worry#ipadkeyboard #ipad

Typecase

Рет қаралды 2 МЛН

ВАЖНО! Не проверяйте на своем iPhone после установки на экран!

0:19

ВАЖНО! Не проверяйте на своем iPhone после установки на экран!

ГЛАЗУРЬ СТЕКЛО для iPhone и аксессуары OTU

Рет қаралды 6 МЛН

S24 Ultra and IPhone 14 Pro Max telephoto shooting comparison #shorts

0:15

S24 Ultra and IPhone 14 Pro Max telephoto shooting comparison #shorts

Photographer Army

Рет қаралды 9 МЛН

Как правильно выключать звук на телефоне?

0:17

Как правильно выключать звук на телефоне?

Люди.Идеи, общественная организация

Рет қаралды 1,9 МЛН

Самые крутые школьные гаджеты

0:49

Самые крутые школьные гаджеты

veloloh

Рет қаралды 553 М.

Кронштейн для монитора | Больше свободного места на столе #shorts

0:27

Кронштейн для монитора | Больше свободного места на столе #shorts

Клуб DNS

Рет қаралды 521 М.