Рет қаралды 135
Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models.
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs
In this video Mofi Rahman and Ali Zaidi walks through the process of deploying Gemma on GKE on TPU using JetStream.
Find Gemma on Huggingface - huggingface.co/google
Find Gemma on Kaggle - www.kaggle.com/models/google/...
Follow along the guide: cloud.google.com/kubernetes-e...
Find other guides for serving Gemma and other AIML resources for GKE: g.co/cloud/gke-aiml
Find other resources for learning about Gemma: ai.google.dev/gemma