Рет қаралды 642
Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models.
Nvidia TensorRT is an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
Triton Inference Server is an open source inference serving software that streamlines AI inferencing.
In this video Mofi Rahman and Kent Hua walks through the process of deploying Gemma on GKE using TensorRT LLM and Triton Server.
Find Gemma on Huggingface: huggingface.co/google
Find Gemma on Kaggle: www.kaggle.com/models/google/...
Follow along the guide: cloud.google.com/kubernetes-e...
Find other guides for serving Gemma and other AIML resources for GKE: g.co/cloud/gke-aiml
Find other resources for learning about Gemma: ai.google.dev/gemma