Run your LLM on Text Generation Inference without the Internet and make your Security team happy!

Рет қаралды 2,395

Жыл бұрын

Here is the "Launch" script many teams have asked for: A short and concise way to launch your models on the Text Generation Interface from Hugging Face in a docker container without Internet Dependencies. By using this "Start up" script you no longer need an Internet connection to start your models. Here is the Script below and you can cut and paste this as a bash.sh file. Wishing continued good fortune to all!
#Two stage startup - Runs with or without Internet requires previously downloaded model
#requires previously downloaded model
#Step 1 - copy this script to your text-generation-inference dir
#Step 2 - from "text-generation-inference dir/data" sudo git clone huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ
#step 3 - Retrun to text-generation-inference dir
#step 4 - Run startup script
#!/bin/bash
# Variables
model="Llama-2-7b-Chat-GPTQ"
volume="$PWD/data"
# Echo starting message
echo "Starting the Docker container with local files (No Internet): $model and volume: $volume ..."
# Start the Docker container
docker run --rm --entrypoint /bin/bash -itd \
--name $model \
-v $volume:/data \
--gpus all -p 8080:80 ghcr.io/huggingface/text-generation-inference:latest
# Check if the container started successfully
if [ $? -eq 0 ]; then
echo "Container started successfully!"
else
echo "Failed to start the container."
exit 1
fi
# Run text-generation-launcher lauch command from inside the container for finer control over the container's execution environment
echo "Running the text-generation-launcher command from /data directory inside the container...Local Files only will be used"
docker exec $model bash -c "text-generation-launcher --model-id /data/$model --quantize gptq --num-shard 1"
# Check if the text-generation-launcher command was successful
if [ $? -eq 0 ]; then
echo "text-generation-launcher ran successfully!"
else
echo "Failed to run text-generation-launcher."
exit 1
fi
Links:
Hugging Face TGI Docs:
huggingface.co/docs/text-generation-inference/index
Huggin Face Github:
github.com/huggingface/text-generation-inference
WSL2 on Windows11 with Cuda 11.8
kzbin.info/www/bejne/Z3ndioholNlmr8U&pp=ygUYd3NsMiB3aW5kb3dzIDExIGFuZCBjdWRh
Ai_by_AI Github link to scripts and files:
github.com/jjmlovesgit/TGIfiles/blob/main/README.md
Installation Steps:
github.com/jjmlovesgit/TGIfiles/blob/main/Install%20Steps

Пікірлер: 12

@SkyCandy 10 ай бұрын

Just wanted to say thanks for putting together such a great video. For someone new, fallowing the instructions provided by the hugging face text generation inference page was less then helpful. Your thorough walk-though beginning to end on the process got me just what I needed. Having the instructions text document to refer to is so helpful. I keep pulling it up when I try different models, etc. Please keep doing that!

@AI_by_AI_007 10 ай бұрын

I appreciate the feedback and encouragement as it is about the community for me -- pleased to know it was useful and helpful.

@bobsalita3417 Жыл бұрын

Jim, your last two videos really helped me to understand how to put models into production especially local hosting. I haven't seen other content which dives into line-by-line explanation of AI scripts. I appreciate the carefulness of your explanations. Most other AI creators, while knowledgeable about usage of AI models, don't have your hands on production experience. You really hit the sweet spot of what I need to know. I expect that you'll have good subscriber growth as your channel becomes more recommended. In particular, your examples using WSL, docker, bash, hugging face, are exactly my intended environments. One area which isn't smooth for me is how to update production environments. What are the options for updating source code from a public github? Pushing new releases to an active server? I use Streamlit as my front-end. I find tailscale to me very useful on my dev systems but I'm unsure about extending to production/public servers.

@AI_by_AI_007 Жыл бұрын

I appreciate the encouragement and feedback and I will look to do videos on the subjects recommended...thank you for taking the time to provide the feedback.

@kyledinh8369 Жыл бұрын

Cool! Very helpful!

@harshsinha9709 10 ай бұрын

Is this video from the start because copying the command and running the same didn't work for me

@xinyuli6603 5 ай бұрын

Can you explain a bit why the initial command cannot run without the Internet?

@leyocode8868 3 ай бұрын

It does load the model directly from HuggingFace , so without internet you are not going to be able to make it work !

@theoeiferman2950 10 ай бұрын

I get " Waiting for shard 0 to be ready..." endlessly, and the inference server doesn't get launche, any help would be greatly appreciated !

@AI_by_AI_007 10 ай бұрын

Sorry for late response... If it hangs it often the inability of code to find the path to the first... That is what it does if you don't do the two part container build.,. Did you get it to load,?

@theoeiferman2950 9 ай бұрын

@@AI_by_AI_007 no I didn't understand the step for proper configuration

@m.kaschi2741 8 ай бұрын

might be that you need more RAM (not VRAM), cuz Shards are first loaded into RAM. Try adding a large "swapfile"