Julien Simon

6:39

LLMs from the trenches - "Data is how you create a competitive advantage, not models"

28 күн бұрын

4:54

LLMs from the trenches - "First and foremost, it is a business discussion"

28 күн бұрын

8:27

LLMs from the trenches - "Closed model builders have decided for you"

28 күн бұрын

8:23

LLMs from the trenches - Bias, risk management, cultural differences, and all that good stuff

28 күн бұрын

1:16:34

Open Source AI with Hugging Face - Dallas AI meetup (05/2024)

28 күн бұрын

10:07

Deploying Llama3 with Inference Endpoints and AWS Inferentia2

Ай бұрын

28:43

Discussion with Mark McQuade, CEO and co-founder, Arcee.ai

Ай бұрын

14:09

Migrating from OpenAI models to Hugging Face models

Ай бұрын

5:41

Deploying Llama3 on Amazon SageMaker

2 ай бұрын

5:21

Deploy Hugging Face models on Google Cloud: directly from Vertex AI

3 ай бұрын

4:57

Deploy Hugging Face models on Google Cloud: from the hub to Vertex AI

3 ай бұрын

7:14

Deploy Hugging Face models on Google Cloud: from the hub to Inference Endpoints

3 ай бұрын

30:51

Enterprise AI with the Hugging Face Enterprise Hub

3 ай бұрын

19:38

Deploying Hugging Face models with Amazon SageMaker and AWS Inferentia2

3 ай бұрын

0:17

Phi-2 on Intel Meteor Lake - Physics question

3 ай бұрын

0:26

Phi-2 on Intel Meteor Lake - Coding question

3 ай бұрын

47:26

Deep dive: model merging

3 ай бұрын

36:12

Deep Dive: Optimizing LLM inference

4 ай бұрын

27:13

Deep Dive: Quantizing Large Language Models, part 2

4 ай бұрын

40:28

Deep Dive: Quantizing Large Language Models, part 1

4 ай бұрын

38:47

Deep Dive: Compiling deep learning models, from XLA to PyTorch 2

4 ай бұрын

31:10

Hugging Face, the story so far

4 ай бұрын

40:54

Deep dive - Better Attention layers for Transformer models

5 ай бұрын

8:54

BFM Business (Tech & Co 01/02/2024) : Hugging Face signe avec Google

5 ай бұрын

47:19

Deep Dive: Advanced distributed training with Hugging Face LLMs and AWS Trainium

5 ай бұрын

52:42

Deep Dive: Hugging Face models on AWS AI Accelerators

5 ай бұрын

21:23

Retrieval-Augmented Generation chatbot, part 2 - LangChain, Hugging Face, OpenSearch, AWS

8 ай бұрын

24:07

Retrieval-Augmented Generation chatbot, part 1: LangChain, Hugging Face, FAISS, AWS

8 ай бұрын

22:51

Parameter-efficient fine-tuning with QLoRA and Hugging Face

8 ай бұрын

Пікірлер

@fantasyapart787 8 сағат бұрын

i cud c that this video got uploaded 3 years back, is that still valid , I mean the features navigatons

@tuurblaffe Күн бұрын

in what way do you pay that $1? as i am starting to see groups of talent or upcomming talent that understand that infrastructure costs money. but yet at the same time they usually are still learning or either they are trying to learn new technoglogies for a new job or afterrr having lost a job. but it is one thing to be able to have enoug recources to learn at your own pace but i can tell you iff you have to pay for cloud compute that require a credit card for which some of these people cannot even get acces to. and that i notice increasing amounts of people not being able to learn or study new technologies because it is 1$ there and a couple api requests to there. but in the end it is either that or being able to feed yourself for every day of the week.

@SO-vq7qd 2 күн бұрын

Is there a way to connect this to a custom domain? I want to create a simple web app of chat interface

@SO-vq7qd 2 күн бұрын

Thank you!

@jiegong529 2 күн бұрын

Thanks so much for the crystal clear explanations! You understand them so well and it's even more amazing how you show them in bullet points and graphs to make your audience understand as well!

@josephazhikannickel4188 6 күн бұрын

Hi Julien, thank you for the valuable contribution. your vids are impressive, love it. May i have one question, that do deploying Llama3 on sagemaker have met HIPAA compliance, Because the data is highly confidential. Hope you will help with an answer. Thank you!

@user-ff2tf3cw9j 6 күн бұрын

Can i use these transformers models for persian language?

@billykotsos4642 8 күн бұрын

very informative as always !

@user-vo5ce6kn5t 11 күн бұрын

Hi I have question , I have deployed finetuned llama3 on AWS , but it generate repeated answers if I adjust payload it cut off the end of response , please provide me solution on this I have multiple times changed max lengths, max tokens size also still facing that issue

@nb9t7 13 күн бұрын

Hey Julien, Where can we find the training model video for food dataset? Also, I am trying to use a model and deploy it on Hugging Face Inference, but it errors out saying I need a config.json file. I'm not sure how to create it. Any leads would be really helpful. Thanks!

@Willtry-l6g 14 күн бұрын

Thank you for your videos😊 Can I know if, when we have 5 pictures for each of two different subject(assume a1 person and a2 person), using stable diffusion can we generate these tree situation in one model? 1.a1 person in different background 2. a2 person in different backgrounds 3. a1 and a2 in the same image with different background . Do I need to create 3 model for that task? How to do that type of task ?

@billykotsos4642 18 күн бұрын

Theres very serious academic individuals that swear that LLMs reason

@billykotsos4642 18 күн бұрын

im still on the fence

@leonardoschenkel9168 18 күн бұрын

hi Julien! Do you have any tips on how can I convert a ComfyUI workflow SD1.5 based model to 🤗 or run directly on INF2 ?

@Martyniqo 18 күн бұрын

Thanks a lot!

@juliensimonfr 18 күн бұрын

You're welcome!

@ayambavictor1449 19 күн бұрын

Thank you for this. Please I will like to know how can I query this endpoint from a web service? or if there is any guide you can point me to.

@juliensimonfr 18 күн бұрын

Hi, the endpoint is a web service. You can invoke it either with the SageMaker SDK predict() API, or with any HTTP-based library. Each model in Jumpstart has a sample notebook, start from there.

@rbrowne4255 23 күн бұрын

Thanks for the overview!! Excellent!!! In terms of sizing for inference, is there a way to calculate the KV Cache or maybe the overall HBM memory usage base on these optimizations.

@juliensimonfr 23 күн бұрын

You're welcome. The model papers have some numbers, but I think the best resource is huggingface.co/spaces/optimum/llm-perf-leaderboard. You can see how much RAM is required to load a particular model, and of course compare same-size models based on different attention layers :)

@jonschlinkert 25 күн бұрын

Thank you. Agreed, there is zero reasoning.

@juliensimonfr 25 күн бұрын

Thanks for watching!

@Azazello1482 27 күн бұрын

Seems like a great video, but I can't move from the starting line. You seem to be skipping over very important details about how to deal with the HuggingFace tokens, AWS security keys, regional compatibility settings with Sagemaker, etc. For example, when running the copied SageMaker code, I get "ValueError: Must setup local AWS configuration with a region supported by SageMaker", but no region seems I try seems to work. Did you cut all the authentication code from your demo? Obviously you don't want to disclose security keys, but at least show/explain that part of the setup code and simply redact the sensitive information.

@juliensimonfr 27 күн бұрын

How about going through Hugging Face 101 and SageMaker 101 first?

@Azazello1482 26 күн бұрын

@@juliensimonfr Yes, clearly I'll need to do this! Nonetheless, as an educator myself, I think my point is still useful. It's helpful to learners to mention parts that you skip over. You don't have to teach it in this video, but it would be helpful to mention that there are steps one must perform that are not shown in this video.

@SpiritualItachi 28 күн бұрын

Thanks a lot for this!

@juliensimonfr 28 күн бұрын

You're very welcome!

@AmusedAtom-hh4pt Ай бұрын

Hi julien, is it possible to finetune this model using my own data consisting of videos?

@juliensimonfr 28 күн бұрын

Yes, here's a blog post: huggingface.co/blog/bridgetower

@HieuNguyen-qs6ig Ай бұрын

Thanks Julien, your videos are really helpful.

@juliensimonfr Ай бұрын

Glad you like them!

@Mechnipulation Ай бұрын

OK but how do I use any model on hugging face I want? Who wants to deploy a model that doesn't have any value prop over GPT4 or Claude (e.g. uncensored)?

@juliensimonfr Ай бұрын

Not sure what the second statement means, but you can pretty much deploy any Hugging Face model on Sagemaker. Go to the model page, click on "Deploy", select "SageMaker", copy paste the deployment code snippet and run it in your AWS account.

@OlabodeAdedoyin Ай бұрын

This is AWESOME! I've been pulling my hair out with AWS custom chips over the last few days. Thank you!!!

@juliensimonfr Ай бұрын

Glad I could help!

@fadauto Ай бұрын

Thanks for the great video! The costs per M tokens seem really high in the neuron inf2 benchmark table. For Llama-3-70B they are around $115 per M tokens. Even with 3-Year reserved instances the prices would still be way above GPT-4 token price and with a similar throughput . Are VLLM or TensorRT-LLM on GPU’s better options than Inferentia2? or can VLLM be combined with this?

@juliensimonfr Ай бұрын

Thanks! Yes, your numbers are correct (ref: awsdocs-neuron.readthedocs-hosted.com/en/latest/general/benchmarks/inf2/inf2-performance.html), and no, they're not great. Having said that, I meet a lot of enterprise customers and none so far have mentioned 70B models. For conversational apps, the huge majority relies on 7-8B models with RAG, and very often fine-tuning. 1M tokens for Llama3 8B costs $5 or $6 on inf2.48xlarge (on demand price), which is hard to beat. With new small models like Phi-3, cost should be even lower :)

@juliensimonfr Ай бұрын

A good alternative for larger models (70-100B) is our TGI inference server, combined with your cloud of choice. See huggingface.co/blog/tgi-benchmarking

@fadauto 17 күн бұрын

@@juliensimonfr Thanks Julien! That cost looks great but apparently is only for the Throughput optimized configurations, I see that the cost for latency optimized configurations is above $22 for 1M tokens for Llama-3-8B. So using INF2 instances wouldn't make much sense real-time applications using this setting, right? Unless combined with other techniques like continuous batching which would make the costs look more like the throughput optimized ones?

@suhasshirbavikar9496 Ай бұрын

great video

@juliensimonfr Ай бұрын

Thanks!

@spencerfunk6697 Ай бұрын

Thank u Julien appreciate this so much right now

@juliensimonfr Ай бұрын

thanks!

@bibiworm Ай бұрын

Would you mind sharing the slides please Sir? Thank you!

@bibiworm Ай бұрын

I have been wanting to understand quantization for a very long time. Thank you! Would you mind sharing the slides please? Thank you.

@user-kd2st5vc5t Ай бұрын

很快就讲清楚了，好厉害！爱来自瓷器。

@guyguy12385 Ай бұрын

do you need to have a description to fine tune it? is it possible to just have an image dataset without a description? and it can learn to produce those images at random?

@juliensimonfr Ай бұрын

An SD model needs an input to start generating :) Which is why labels are required.

@toufiqmusah6480 Ай бұрын

Thanks for the tutorial. Greatly appreciated..

@juliensimonfr Ай бұрын

You are welcome!

@itayatelis2898 Ай бұрын

Amazing! Julien do you have any plan on doing a distributed parallel algorithms? (Data par, inter layer, intra layer etc)?

@juliensimonfr Ай бұрын

Thank you. I did cover some techniques in kzbin.info/www/bejne/sIu4Zn2gd6xknKs. Anything else you'd be interested in?

@itayatelis2898 25 күн бұрын

@@juliensimonfr Yes maybe touch on the concept of superposition?

@itayatelis2898 Ай бұрын

Love your content! thank you!

@juliensimonfr Ай бұрын

Glad you enjoy it!

@maxhenry8886 Ай бұрын

Is it possible to fine tune a model such that I feed it images of OpenMaps locations and then it randomly generates its own locations? In this case, would I even need to label the data? Is it possible to train the model on unlabelled data? I just want it to take a set of X maps locations, and then start inventing its own. Is this possible? Thanks!

@juliensimonfr Ай бұрын

Probably, and yes you would need to label the images. Another idea would be to use an off the shelf image to image model, e.g. huggingface.co/spaces/tonyassi/image-to-image-SDXL. Maybe it can generate variants of existing images.

@maxhenry8886 Ай бұрын

@@juliensimonfr thanks! that might work. My plan was to show it half the image (the data) and get it to generate something close to the other half (the target).

@juliensimonfr Ай бұрын

I see. Another option would be variations on an input image: huggingface.co/spaces/lambdalabs/stable-diffusion-image-variations

@maxhenry8886 Ай бұрын

@@juliensimonfr thanks! I tried that one and it didn’t work. It might be harder than it sounds! Maybe I need to find a way to fine tune a model first and then get it to create its own variations somehow? But most of the algorithms work by using the label as the target, right? Is there such a model that I can fine tune that purely generates based on using one half of the image as the training data and the other half as the target?

@soniatabti6706 Ай бұрын

Thanks for this. Loved the video ! Love what they do at Arcee ! You really got me at the 20th minut: maybe autoML works after all 😂 100%

@juliensimonfr Ай бұрын

Glad you enjoyed it!

@itayatelis2898 2 күн бұрын

I enjoyed it very much! Thanks Julian

@abse-mj8pw Ай бұрын

I can't help wondering if there is an experiment which really fully discovers those technique like applying to all kinds of models or combining different methods together?

@juliensimonfr Ай бұрын

Check out arcee.ai, their platform is definitely going that way.

@abse-mj8pw Ай бұрын

@@juliensimonfr Thanks for your answer!! I've found some interesting blogs about it!

@fatihbicer7353 Ай бұрын

Thank you Julien.

@juliensimonfr Ай бұрын

You're welcome !

@barber5937 Ай бұрын

Can I deploy my own huggingface model to vertex? My model says it is endpoints compatible but I don't see anything indicating vertex ai compatibility or how to achieve that

@juliensimonfr Ай бұрын

Yes, assuming your model is based on a supported architecture (Llama, etc.). If you create a model card with the appropriate tags (architecture, task type, etc), the "Deploy" button will let you deploy to AWS, Google, etc. See huggingface.co/docs/hub/model-cards for more, as well as model card for well-known models (google, meta, etc.)

@barber5937 Ай бұрын

@@juliensimonfr Awesome, thanks julien

@Gerald-iz7mv Ай бұрын

How do you export to onnx using cuda? It seems optimum doesnt support it - is there an alternative?

@juliensimonfr Ай бұрын

huggingface.co/docs/optimum/onnxruntime/usage_guides/gpu

@satsanthony4452 Ай бұрын

Thanks Julien for the two demos. Good options for developers.

@juliensimonfr Ай бұрын

Agreed :)

@Gerald-xg3rq Ай бұрын

what the difference between setfit.exporters.onnx and optimum.onnxruntime (optimizer = ORTModelFromFeatureExtraction.from_pretrained(...) optimizer.optimize()) etc.?

@juliensimonfr Ай бұрын

Probably the same :)

@spicule123 Ай бұрын

This is fantastic!

@juliensimonfr Ай бұрын

Yes, I like it too :)

@ted2101977854 Ай бұрын

Amazing video!

@juliensimonfr Ай бұрын

Thanks!

@briangman3 Ай бұрын

I am going to use inf2 to run finetuned llama 3 70B should be great, I am curious about token gen speed on inf2 different sizes, if you can as a side note mention that in your next video, like this created it at x token/s

@juliensimonfr Ай бұрын

You'll find benchmarks in the the Neuron SDK documentation awsdocs-neuron.readthedocs-hosted.com/en/latest/general/benchmarks/index.html

@briangman3 Ай бұрын

Great video!

@juliensimonfr Ай бұрын

Glad you enjoyed it

@alvinvaughn6531 Ай бұрын

The interface and everything has changed since this video. Can you provide an updated video that walks through the process of loading a module from huggingface into stagemaker jumpstart?

@juliensimonfr Ай бұрын

Hi, you don't load models from Hugging Face. The models are already in AWS. The UI has evolved but the workflow is still the same : open Jumpstart, select a model, click on deploy, open the sample notebook.

@pika9985 2 ай бұрын

is that work with 3rd gen or 4th gen cpu's ?

@juliensimonfr Ай бұрын

Previous generations don't have AMX, so don't expect the same acceleration.

@SantiagoBima 2 ай бұрын

Hi Julien, thanks a lot for your videos. I am pretty much new in this word and I have the task to understand how to deploy serverless using SageMaker. Will you do a new video about Serverless using Sagemaker? There is something new that we can use ? In the video you mention we needed to use boto3, is this still like this? Thanks a lot!

@juliensimonfr Ай бұрын

Nothing new AFAIK. Still no GPU support.

@SantiagoBima Ай бұрын

@@juliensimonfr thanks!!

@user-qy8wf8rx4q 2 ай бұрын

thanks julien i dont like this service im struggling myself

@rileyheiman1161 2 ай бұрын

Great video Julien, thank you! Does the model have to be pre-compiled to run on AWS (EC2 or SageMager)?

@juliensimonfr 2 ай бұрын

Thank you. If you're going to deploy on SageMaker, yes. At the moment, our container won't compile the moment. On EC2, the model will be compiled on the fly if needed.

Ең жақсы KZbin

Пікірлер