Introducing Amazon SageMaker Pipelines

Introducing Amazon SageMaker Pipelines - AWS re:Invent 2020

Рет қаралды 50,501

Julien Simon

Күн бұрын

Пікірлер: 78

@PaulPrae 3 жыл бұрын

So excited about all of these features!

@juliensimonfr 3 жыл бұрын

Me too :)

@caiyu538 Жыл бұрын

Great lectures. Great teacher.

@sgaseretto 4 жыл бұрын

What if I have a SaaS with multi-tenants and similar models per tenant (for example a model that perform client segmentation for each tenant) and I want to do Continuous Training and Deployment for this models. How can this be achieved, since at minute 16:47 you state that this can't be done with Sagemaker Studio

@SambitTripathy 3 жыл бұрын

Great content Julien!

@juliensimonfr 3 жыл бұрын

Thank you, glad you like it!

@fatihbicer7353 8 ай бұрын

Thank you Julien.

@juliensimonfr 8 ай бұрын

You're welcome !

@joaosalero9797 2 жыл бұрын

Thanks to share it!

@juliensimonfr 2 жыл бұрын

My pleasure!

@gaboceron100 Жыл бұрын

Thanl you! Very instructive

@juliensimonfr Жыл бұрын

Glad you enjoyed it!

@herleyshaori Жыл бұрын

Thank you for the video.

@juliensimonfr Жыл бұрын

You're welcome

@poojankothari2440 3 жыл бұрын

Julien Thank you for providing good content, will be very helpful if you provide some insights on model registration and linking the project with custom git repos. Kudos !!

@juliensimonfr 3 жыл бұрын

Thanks Poojan. You can build your own custom templates with your own repos. See docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-templates.html

@abhijeetkabra8525 Жыл бұрын

Hello Julian. Great Video. Followed your steps to create a project and create pipelines and endpoint. can you please answer some questions 1. We have a training model already developed which we want to use in sagemaker pipelines and then deploy it to create endpoints. how to do that. 2. Also are there IAM roles and policies involved in working with Sagemaker pipelines. 3. We have a notebook which has the training code built which is used to train the model, but the problem is that when a new user or a team member comes in he isnt able to see the the whole code and he has to download the whole code offline and upload it back to notebook. is there a way we can collaborate like we have in GIT or azure devops Repo

@Pr06lemChiId 3 жыл бұрын

Interesting to see how TFX will integrate with this.

@Ramyavenkat-r3y Жыл бұрын

How to retrieve the inbuilt sagemaker image uri ? Kindly help me with the command

@AnkitSingh-rv2dq 2 жыл бұрын

Hi Julien, I got an error in preprocessing script. Can you please confirm that the script is correct?

@denzilstudios7072 2 жыл бұрын

Thank you Julien also love your book at Packt a lot. Question: For our startup we want to set this sagemaker pipeline setup for dev acc & prod in seperate accounts. Where can i find guidelines on how to set this up?

@juliensimonfr 2 жыл бұрын

Thanks Denzil! Here's a nice multi-account example: aws.amazon.com/blogs/machine-learning/multi-account-model-deployment-with-amazon-sagemaker-pipelines/

@carbita1 4 жыл бұрын

Hi Julien, as a data engineer it's difficult to test workflows of pyspark without a Jupyter Notebook. Is there any way to "replace" the common Aws glue workflows by calling jupyter notebooks?. Thanks in advance.

@priteshjain0310 4 жыл бұрын

Hi Julien, it's not clear how I should do the inference on this. I have a custom processing container, and then I train a TF model. Is it possible to have these two clubbed for inference? I want to be giving S3 location of raw data during inference, have it go through processing and then predict on it. Can you please let me know if this is possible and how to go about it.

@vinayakdhruv6457 2 жыл бұрын

how to trigger this complete pipeline using lambdas or cron jobs?? is there any such option?

@juliensimonfr 2 жыл бұрын

docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html

@samnman1 3 жыл бұрын

Hi Julian, really appreciate the explication. Could you do a video or point to some demo showing how to use sagemaker pipelines for scheduled batch jobs? Say I have a 10gb data set loaded into s3 every day, how can I schedule a pipeline to transform and run inference on this?

@juliensimonfr 3 жыл бұрын

You can easily run batch transform in your pipelines, see docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform. You can also schedule execution with a Lambda function firing up your pipeline, see docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html

@samuelmathias794 2 жыл бұрын

Hello Julien, i would like to ask a question. i'm a bit new to sagemaker and it's functionalities. how would one go about creating thier own project template assuming i want to start a new project, or do i modify the existing abalone template to suit my taste ?

@juliensimonfr 2 жыл бұрын

Here's an example aws.amazon.com/fr/blogs/machine-learning/build-mlops-workflows-with-amazon-sagemaker-projects-gitlab-and-gitlab-pipelines/

@samuelmathias794 2 жыл бұрын

@@juliensimonfr thank you

@juliocardenas4485 3 жыл бұрын

my layout for studio is quite different; the launches only show notebook, also, I do not get the add on with the triangle on the left.

@juliensimonfr 3 жыл бұрын

The Studio UI frequently changes, sometimes for the better ;)

@juliocardenas4485 3 жыл бұрын

@@juliensimonfr thank you. I figured out what I was doing wrong 😑. I need to launch the app rather than just the notebooks

@dampeel2000 3 жыл бұрын

Bonjour Julien, thanks for the video ! I would be interested about the final step, the one where you actually processe an inference into the endpoint. I don't see this in the demo. In particular I'm curious to know how you can propagate the preprocessing fit "model" (for instance the one-hot) to the model hosted in the endpoint. Thank you very much for any information on this step ! Hav a great day

@juliensimonfr 3 жыл бұрын

Hi Damien, regarding preprocessing, you would have to apply it to the data sent to the endpoint. A clever way to do this is to use an Inference Pipeline, i.e. a sequence of models invoked as a single unit. Here's an example: github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb

@dampeel2000 3 жыл бұрын

@@juliensimonfr Thanks Julien !

@narijami 3 жыл бұрын

I would like to ask another demand. may I ask you to please make a video for how to join several tables including some aggregation functions in sql. I want to join 3 different tables which are in 2 different schema in Redshift. The output of two joined table will have some aggregation functions in its sql query. Since the schema of two tables are different I can not write the sql query directly in Data Wrangler. Will be great If you help.

@Flopyboy 2 жыл бұрын

Hi Julien, I'm trying to create a pipeline and I'm experiencing significant overhead for each individual step (~10 min). Is there any way to test individual steps without running the entire pipeline and having to wait for earlier steps to finish?

@juliensimonfr 2 жыл бұрын

Not that I know. I guess you could test each step in its own mini-pipeline if you have all the intermediate artifacts, and then put them together ?

@Flopyboy 2 жыл бұрын

@@juliensimonfr Thanks for responding! Sagemaker recently made it possible to execute pipelines in local mode which almost eliminates the overhead I was experiencing :)

@anubhabjoardar9321 2 жыл бұрын

Hello Julien, thank you for the video and the channel! Makes understanding AWS SageMaker easier for newbies like me :) I wanted to ask if there is a way to list all resources/components (Models, endpoints, TrainingJobs, ProcessingJobs etc) associated with an (or created in an) AWS SageMaker Notebook/ Studio Project? Thanks a lot for any information on this task!

@juliensimonfr 2 жыл бұрын

Thank you! This is a really good question, and the answer is "kind of". You can track model lineage and see all artifacts that led to a particular model, see docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking.html, but th

@11eagleye 4 жыл бұрын

Hi Julien, where are models getting deployed? I am not seeing any container or docker details in the demo.

@juliensimonfr 4 жыл бұрын

They're deployed on SageMaker endpoints, as usual. It all takes place in the CloudFormation template stored in the 'model-deploy' repository.

@bhujithmadav1481 Жыл бұрын

Hello Julien. Thanks for your videos. It was helpful. I have a requirement. I want to create training jobs within sagemaker pipeline. How to achieve this?

@juliensimonfr Жыл бұрын

sagemaker.workflow.steps.TrainingStep ?

@bhujithmadav1481 Жыл бұрын

@@juliensimonfr Thanks for the reply. In the video you have used scikit learn estimator to train. I will have to create a training job. My doubt is how to integrate training jobs within the pipeline. Please guide.

@sivaprasanth5961 3 жыл бұрын

Its a really awesome session Julien. I have one doubt. If I want to keep Version 1 for 70% requests and Version 2 for 30% requests, How can I do that?

@juliensimonfr 2 жыл бұрын

You can deploy multiple variants on the same endpoint: docs.aws.amazon.com/sagemaker/latest/dg/model-ab-testing.html

@sumeshmr9130 4 жыл бұрын

Hello Julien, Can we use the AWS Step Functions Data Science SDK along with the Pipeline? Or are these two different things?

@juliensimonfr 4 жыл бұрын

Hi Sumesh, SageMaker Pipelines is two-sided 1) A Python SDK to build ML workflows (similar to the Data Science SDK) 2) An MLOps capability based on CodePipeline. I think the integration with SageMaker Studio is really interesting, and a more productive option than the Data Science SDK.

@sumeshmr9130 4 жыл бұрын

@@juliensimonfr Is there a document/link with the details to create custom project template(organization template)? In case if I wanted to call a lambda function or glue job as a workflow step in the pipeline, do you think I will be able to customize it using this?

@Koningbob 4 жыл бұрын

Hi Julien, thanks for this clear demo. My team uses gitlab for ci/cd, would this be a possibility instead of Codepipeline? Thx

@narijami 4 жыл бұрын

Hi Julien. Thanks for explanation. I am working in a company in Germany which we use AWS tools. My question is that I have to run daily millions of rows through sql queries from Redshift. But My in sagemaker I have memory limitation. Is it possible to make it easier in Sagemaker pipelines?

@juliensimonfr 4 жыл бұрын

SageMaker Processing is probably what you're looking for. It's easy to automate and you can pick very large instances. Of course you just pay for the duration of the job.

@narijami 4 жыл бұрын

@@juliensimonfr Thanks for your reply. I am currently working on sagemaker normal instance. I am running a sql query with some joins, aggregation functions reading some very large tables from Redshift. The query takes very long If I fetch data for a period of time more than 6 days. I heard that in Data Wrangler it is possible to speed up the importing tables. Will be the case for joined tables as my case? Thanks in advance

@vickyshrestha 3 жыл бұрын

Hi Julien, Can we terraform the sagemaker pipelines?

@juliensimonfr 3 жыл бұрын

Hi Vicky, according to registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sagemaker_model, it's not supported yet.

@fantasyapart787 6 ай бұрын

i cud c that this video got uploaded 3 years back, is that still valid , I mean the features navigatons

@juliensimonfr 6 ай бұрын

The UI has changed, the SDK is probably still very very similar.

@MrChristian331 4 жыл бұрын

Where does the model deploy to when you approve it?

@juliensimonfr 4 жыл бұрын

A SageMaker endpoint, configured in the template.

@MrChristian331 4 жыл бұрын

@@juliensimonfr Sorry, it's been awhile, what is the endpoint? Just part of the container?

@kanishkmair2920 3 жыл бұрын

@@MrChristian331 the model endpoint creates url for inference once its trained

@lucieackley7432 3 жыл бұрын

thanks for the video Julien gave a very good overview of it. I am wondering if there is a good way to learn more about the deploy step. Additionally, I have a model where I want to retrain it daily as we get new data daily. What is the best pattern for this?

@juliensimonfr 3 жыл бұрын

Thanks Lucie. You can deploy the "usual" way by grabbing the model in S3 and creating an endpoint. For full automation, you can use MLOps as described in docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html, but this requires diving a bit into CloudFormation. We're covering this topic in SageMaker Fridays S03E04, so make sure to catch that episode at amazonsagemakerfridays.splashthat.com/ :)

@elmirach4706 4 жыл бұрын

hello, how to customize from abalone pipeline to custom model pipeline?

@juliensimonfr 3 жыл бұрын

docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects.html :)

@JulianaPassos 4 жыл бұрын

Salut Julien! I see the Pipeline templates are not available for region-us-east-1 in SageMakerStudio (only us-west-1). Is there a reason for that? Any chance they could be available for N.Virginia? Tks for the tutorial :-) Came in handy with a project delivery.

@juliensimonfr 4 жыл бұрын

They should be available there. Please make sure that your Studio user has the appropriate permissions. There's a slider setting in the user details ("Enable SageMaker Projects").

@dasgupta0885 4 жыл бұрын

can you please provide the GitHub link to the Python Notebook. thanks!

@juliensimonfr 4 жыл бұрын

If you're only interested in the Python SDK, this one is very close: github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-pipelines. If you're interested in the full example with MLOps support, it's part of the repos I clone in the video.

@dasgupta0885 4 жыл бұрын

@@juliensimonfr yes. I am interested in the sdk so this is perfect. Thanks a bunch!!