Deep dive: model merging
47:26
3 ай бұрын
Hugging Face, the story so far
31:10
Пікірлер
@fantasyapart787
@fantasyapart787 8 сағат бұрын
i cud c that this video got uploaded 3 years back, is that still valid , I mean the features navigatons
@tuurblaffe
@tuurblaffe Күн бұрын
in what way do you pay that $1? as i am starting to see groups of talent or upcomming talent that understand that infrastructure costs money. but yet at the same time they usually are still learning or either they are trying to learn new technoglogies for a new job or afterrr having lost a job. but it is one thing to be able to have enoug recources to learn at your own pace but i can tell you iff you have to pay for cloud compute that require a credit card for which some of these people cannot even get acces to. and that i notice increasing amounts of people not being able to learn or study new technologies because it is 1$ there and a couple api requests to there. but in the end it is either that or being able to feed yourself for every day of the week.
@SO-vq7qd
@SO-vq7qd 2 күн бұрын
Is there a way to connect this to a custom domain? I want to create a simple web app of chat interface
@SO-vq7qd
@SO-vq7qd 2 күн бұрын
Thank you!
@jiegong529
@jiegong529 2 күн бұрын
Thanks so much for the crystal clear explanations! You understand them so well and it's even more amazing how you show them in bullet points and graphs to make your audience understand as well!
@josephazhikannickel4188
@josephazhikannickel4188 6 күн бұрын
Hi Julien, thank you for the valuable contribution. your vids are impressive, love it. May i have one question, that do deploying Llama3 on sagemaker have met HIPAA compliance, Because the data is highly confidential. Hope you will help with an answer. Thank you!
@user-ff2tf3cw9j
@user-ff2tf3cw9j 6 күн бұрын
Can i use these transformers models for persian language?
@billykotsos4642
@billykotsos4642 8 күн бұрын
very informative as always !
@user-vo5ce6kn5t
@user-vo5ce6kn5t 11 күн бұрын
Hi I have question , I have deployed finetuned llama3 on AWS , but it generate repeated answers if I adjust payload it cut off the end of response , please provide me solution on this I have multiple times changed max lengths, max tokens size also still facing that issue
@nb9t7
@nb9t7 13 күн бұрын
Hey Julien, Where can we find the training model video for food dataset? Also, I am trying to use a model and deploy it on Hugging Face Inference, but it errors out saying I need a config.json file. I'm not sure how to create it. Any leads would be really helpful. Thanks!
@Willtry-l6g
@Willtry-l6g 14 күн бұрын
Thank you for your videos😊 Can I know if, when we have 5 pictures for each of two different subject(assume a1 person and a2 person), using stable diffusion can we generate these tree situation in one model? 1.a1 person in different background 2. a2 person in different backgrounds 3. a1 and a2 in the same image with different background . Do I need to create 3 model for that task? How to do that type of task ?
@billykotsos4642
@billykotsos4642 18 күн бұрын
Theres very serious academic individuals that swear that LLMs reason
@billykotsos4642
@billykotsos4642 18 күн бұрын
im still on the fence
@leonardoschenkel9168
@leonardoschenkel9168 18 күн бұрын
hi Julien! Do you have any tips on how can I convert a ComfyUI workflow SD1.5 based model to 🤗 or run directly on INF2 ?
@Martyniqo
@Martyniqo 18 күн бұрын
Thanks a lot!
@juliensimonfr
@juliensimonfr 18 күн бұрын
You're welcome!
@ayambavictor1449
@ayambavictor1449 19 күн бұрын
Thank you for this. Please I will like to know how can I query this endpoint from a web service? or if there is any guide you can point me to.
@juliensimonfr
@juliensimonfr 18 күн бұрын
Hi, the endpoint is a web service. You can invoke it either with the SageMaker SDK predict() API, or with any HTTP-based library. Each model in Jumpstart has a sample notebook, start from there.
@rbrowne4255
@rbrowne4255 23 күн бұрын
Thanks for the overview!! Excellent!!! In terms of sizing for inference, is there a way to calculate the KV Cache or maybe the overall HBM memory usage base on these optimizations.
@juliensimonfr
@juliensimonfr 23 күн бұрын
You're welcome. The model papers have some numbers, but I think the best resource is huggingface.co/spaces/optimum/llm-perf-leaderboard. You can see how much RAM is required to load a particular model, and of course compare same-size models based on different attention layers :)
@jonschlinkert
@jonschlinkert 25 күн бұрын
Thank you. Agreed, there is zero reasoning.
@juliensimonfr
@juliensimonfr 25 күн бұрын
Thanks for watching!
@Azazello1482
@Azazello1482 27 күн бұрын
Seems like a great video, but I can't move from the starting line. You seem to be skipping over very important details about how to deal with the HuggingFace tokens, AWS security keys, regional compatibility settings with Sagemaker, etc. For example, when running the copied SageMaker code, I get "ValueError: Must setup local AWS configuration with a region supported by SageMaker", but no region seems I try seems to work. Did you cut all the authentication code from your demo? Obviously you don't want to disclose security keys, but at least show/explain that part of the setup code and simply redact the sensitive information.
@juliensimonfr
@juliensimonfr 27 күн бұрын
How about going through Hugging Face 101 and SageMaker 101 first?
@Azazello1482
@Azazello1482 26 күн бұрын
@@juliensimonfr Yes, clearly I'll need to do this! Nonetheless, as an educator myself, I think my point is still useful. It's helpful to learners to mention parts that you skip over. You don't have to teach it in this video, but it would be helpful to mention that there are steps one must perform that are not shown in this video.
@SpiritualItachi
@SpiritualItachi 28 күн бұрын
Thanks a lot for this!
@juliensimonfr
@juliensimonfr 28 күн бұрын
You're very welcome!
@AmusedAtom-hh4pt
@AmusedAtom-hh4pt Ай бұрын
Hi julien, is it possible to finetune this model using my own data consisting of videos?
@juliensimonfr
@juliensimonfr 28 күн бұрын
Yes, here's a blog post: huggingface.co/blog/bridgetower
@HieuNguyen-qs6ig
@HieuNguyen-qs6ig Ай бұрын
Thanks Julien, your videos are really helpful.
@juliensimonfr
@juliensimonfr Ай бұрын
Glad you like them!
@Mechnipulation
@Mechnipulation Ай бұрын
OK but how do I use any model on hugging face I want? Who wants to deploy a model that doesn't have any value prop over GPT4 or Claude (e.g. uncensored)?
@juliensimonfr
@juliensimonfr Ай бұрын
Not sure what the second statement means, but you can pretty much deploy any Hugging Face model on Sagemaker. Go to the model page, click on "Deploy", select "SageMaker", copy paste the deployment code snippet and run it in your AWS account.
@OlabodeAdedoyin
@OlabodeAdedoyin Ай бұрын
This is AWESOME! I've been pulling my hair out with AWS custom chips over the last few days. Thank you!!!
@juliensimonfr
@juliensimonfr Ай бұрын
Glad I could help!
@fadauto
@fadauto Ай бұрын
Thanks for the great video! The costs per M tokens seem really high in the neuron inf2 benchmark table. For Llama-3-70B they are around $115 per M tokens. Even with 3-Year reserved instances the prices would still be way above GPT-4 token price and with a similar throughput . Are VLLM or TensorRT-LLM on GPU’s better options than Inferentia2? or can VLLM be combined with this?
@juliensimonfr
@juliensimonfr Ай бұрын
Thanks! Yes, your numbers are correct (ref: awsdocs-neuron.readthedocs-hosted.com/en/latest/general/benchmarks/inf2/inf2-performance.html), and no, they're not great. Having said that, I meet a lot of enterprise customers and none so far have mentioned 70B models. For conversational apps, the huge majority relies on 7-8B models with RAG, and very often fine-tuning. 1M tokens for Llama3 8B costs $5 or $6 on inf2.48xlarge (on demand price), which is hard to beat. With new small models like Phi-3, cost should be even lower :)
@juliensimonfr
@juliensimonfr Ай бұрын
A good alternative for larger models (70-100B) is our TGI inference server, combined with your cloud of choice. See huggingface.co/blog/tgi-benchmarking
@fadauto
@fadauto 17 күн бұрын
@@juliensimonfr Thanks Julien! That cost looks great but apparently is only for the Throughput optimized configurations, I see that the cost for latency optimized configurations is above $22 for 1M tokens for Llama-3-8B. So using INF2 instances wouldn't make much sense real-time applications using this setting, right? Unless combined with other techniques like continuous batching which would make the costs look more like the throughput optimized ones?
@suhasshirbavikar9496
@suhasshirbavikar9496 Ай бұрын
great video
@juliensimonfr
@juliensimonfr Ай бұрын
Thanks!
@spencerfunk6697
@spencerfunk6697 Ай бұрын
Thank u Julien appreciate this so much right now
@juliensimonfr
@juliensimonfr Ай бұрын
thanks!
@bibiworm
@bibiworm Ай бұрын
Would you mind sharing the slides please Sir? Thank you!
@bibiworm
@bibiworm Ай бұрын
I have been wanting to understand quantization for a very long time. Thank you! Would you mind sharing the slides please? Thank you.
@user-kd2st5vc5t
@user-kd2st5vc5t Ай бұрын
很快就讲清楚了,好厉害!爱来自瓷器。
@guyguy12385
@guyguy12385 Ай бұрын
do you need to have a description to fine tune it? is it possible to just have an image dataset without a description? and it can learn to produce those images at random?
@juliensimonfr
@juliensimonfr Ай бұрын
An SD model needs an input to start generating :) Which is why labels are required.
@toufiqmusah6480
@toufiqmusah6480 Ай бұрын
Thanks for the tutorial. Greatly appreciated..
@juliensimonfr
@juliensimonfr Ай бұрын
You are welcome!
@itayatelis2898
@itayatelis2898 Ай бұрын
Amazing! Julien do you have any plan on doing a distributed parallel algorithms? (Data par, inter layer, intra layer etc)?
@juliensimonfr
@juliensimonfr Ай бұрын
Thank you. I did cover some techniques in kzbin.info/www/bejne/sIu4Zn2gd6xknKs. Anything else you'd be interested in?
@itayatelis2898
@itayatelis2898 25 күн бұрын
@@juliensimonfr Yes maybe touch on the concept of superposition?
@itayatelis2898
@itayatelis2898 Ай бұрын
Love your content! thank you!
@juliensimonfr
@juliensimonfr Ай бұрын
Glad you enjoy it!
@maxhenry8886
@maxhenry8886 Ай бұрын
Is it possible to fine tune a model such that I feed it images of OpenMaps locations and then it randomly generates its own locations? In this case, would I even need to label the data? Is it possible to train the model on unlabelled data? I just want it to take a set of X maps locations, and then start inventing its own. Is this possible? Thanks!
@juliensimonfr
@juliensimonfr Ай бұрын
Probably, and yes you would need to label the images. Another idea would be to use an off the shelf image to image model, e.g. huggingface.co/spaces/tonyassi/image-to-image-SDXL. Maybe it can generate variants of existing images.
@maxhenry8886
@maxhenry8886 Ай бұрын
@@juliensimonfr thanks! that might work. My plan was to show it half the image (the data) and get it to generate something close to the other half (the target).
@juliensimonfr
@juliensimonfr Ай бұрын
I see. Another option would be variations on an input image: huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations
@maxhenry8886
@maxhenry8886 Ай бұрын
@@juliensimonfr thanks! I tried that one and it didn’t work. It might be harder than it sounds! Maybe I need to find a way to fine tune a model first and then get it to create its own variations somehow? But most of the algorithms work by using the label as the target, right? Is there such a model that I can fine tune that purely generates based on using one half of the image as the training data and the other half as the target?
@soniatabti6706
@soniatabti6706 Ай бұрын
Thanks for this. Loved the video ! Love what they do at Arcee ! You really got me at the 20th minut: maybe autoML works after all 😂 100%
@juliensimonfr
@juliensimonfr Ай бұрын
Glad you enjoyed it!
@itayatelis2898
@itayatelis2898 2 күн бұрын
I enjoyed it very much! Thanks Julian
@abse-mj8pw
@abse-mj8pw Ай бұрын
I can't help wondering if there is an experiment which really fully discovers those technique like applying to all kinds of models or combining different methods together?
@juliensimonfr
@juliensimonfr Ай бұрын
Check out arcee.ai, their platform is definitely going that way.
@abse-mj8pw
@abse-mj8pw Ай бұрын
@@juliensimonfr Thanks for your answer!! I've found some interesting blogs about it!
@fatihbicer7353
@fatihbicer7353 Ай бұрын
Thank you Julien.
@juliensimonfr
@juliensimonfr Ай бұрын
You're welcome !
@barber5937
@barber5937 Ай бұрын
Can I deploy my own huggingface model to vertex? My model says it is endpoints compatible but I don't see anything indicating vertex ai compatibility or how to achieve that
@juliensimonfr
@juliensimonfr Ай бұрын
Yes, assuming your model is based on a supported architecture (Llama, etc.). If you create a model card with the appropriate tags (architecture, task type, etc), the "Deploy" button will let you deploy to AWS, Google, etc. See huggingface.co/docs/hub/model-cards for more, as well as model card for well-known models (google, meta, etc.)
@barber5937
@barber5937 Ай бұрын
@@juliensimonfr Awesome, thanks julien
@Gerald-iz7mv
@Gerald-iz7mv Ай бұрын
How do you export to onnx using cuda? It seems optimum doesnt support it - is there an alternative?
@juliensimonfr
@juliensimonfr Ай бұрын
huggingface.co/docs/optimum/onnxruntime/usage_guides/gpu
@satsanthony4452
@satsanthony4452 Ай бұрын
Thanks Julien for the two demos. Good options for developers.
@juliensimonfr
@juliensimonfr Ай бұрын
Agreed :)
@Gerald-xg3rq
@Gerald-xg3rq Ай бұрын
what the difference between setfit.exporters.onnx and optimum.onnxruntime (optimizer = ORTModelFromFeatureExtraction.from_pretrained(...) optimizer.optimize()) etc.?
@juliensimonfr
@juliensimonfr Ай бұрын
Probably the same :)
@spicule123
@spicule123 Ай бұрын
This is fantastic!
@juliensimonfr
@juliensimonfr Ай бұрын
Yes, I like it too :)
@ted2101977854
@ted2101977854 Ай бұрын
Amazing video!
@juliensimonfr
@juliensimonfr Ай бұрын
Thanks!
@briangman3
@briangman3 Ай бұрын
I am going to use inf2 to run finetuned llama 3 70B should be great, I am curious about token gen speed on inf2 different sizes, if you can as a side note mention that in your next video, like this created it at x token/s
@juliensimonfr
@juliensimonfr Ай бұрын
You'll find benchmarks in the the Neuron SDK documentation awsdocs-neuron.readthedocs-hosted.com/en/latest/general/benchmarks/index.html
@briangman3
@briangman3 Ай бұрын
Great video!
@juliensimonfr
@juliensimonfr Ай бұрын
Glad you enjoyed it
@alvinvaughn6531
@alvinvaughn6531 Ай бұрын
The interface and everything has changed since this video. Can you provide an updated video that walks through the process of loading a module from huggingface into stagemaker jumpstart?
@juliensimonfr
@juliensimonfr Ай бұрын
Hi, you don't load models from Hugging Face. The models are already in AWS. The UI has evolved but the workflow is still the same : open Jumpstart, select a model, click on deploy, open the sample notebook.
@pika9985
@pika9985 2 ай бұрын
is that work with 3rd gen or 4th gen cpu's ?
@juliensimonfr
@juliensimonfr Ай бұрын
Previous generations don't have AMX, so don't expect the same acceleration.
@SantiagoBima
@SantiagoBima 2 ай бұрын
Hi Julien, thanks a lot for your videos. I am pretty much new in this word and I have the task to understand how to deploy serverless using SageMaker. Will you do a new video about Serverless using Sagemaker? There is something new that we can use ? In the video you mention we needed to use boto3, is this still like this? Thanks a lot!
@juliensimonfr
@juliensimonfr Ай бұрын
Nothing new AFAIK. Still no GPU support.
@SantiagoBima
@SantiagoBima Ай бұрын
@@juliensimonfr thanks!!
@user-qy8wf8rx4q
@user-qy8wf8rx4q 2 ай бұрын
thanks julien i dont like this service im struggling myself
@rileyheiman1161
@rileyheiman1161 2 ай бұрын
Great video Julien, thank you! Does the model have to be pre-compiled to run on AWS (EC2 or SageMager)?
@juliensimonfr
@juliensimonfr 2 ай бұрын
Thank you. If you're going to deploy on SageMaker, yes. At the moment, our container won't compile the moment. On EC2, the model will be compiled on the fly if needed.