Part1: DataBricks - APIs (Clusters, Jobs, Job Runs)

Рет қаралды 12,040

Next Level

Күн бұрын

Пікірлер: 21

@future-outlier Жыл бұрын

valuable content, thanks for sharing

@NextLevel-LearnWithSubhasis Жыл бұрын

Glad you liked it!

@NextLevel-LearnWithSubhasis Жыл бұрын

Glad it was helpful.

@ashutoshrai5342 2 жыл бұрын

Thanks for sharing

@NextLevel-LearnWithSubhasis Жыл бұрын

Thanks Ashutosh ❤

@skmn07 2 жыл бұрын

Super

@NextLevel-LearnWithSubhasis Жыл бұрын

Thanks Sharath ! ❤

@gregorythompson3534 2 жыл бұрын

At the 24:00 minute mark you mentioned the has_true setting. Can you explain more how you would use offset to grab the next batch of records using the API? In the query history API there is a next page token that can be used, but with the Jobs API, what is the equivalent?

@NextLevel-LearnWithSubhasis 2 жыл бұрын

---- 1st question: Can you explain more how you would use offset to grab the next batch of records using the API? ---- page_size = 500 if 'has_more' in job_runs.keys() and job_runs['has_more'] is True: next_page_exists = True offset = offset + page_size #This is how you increase the offset job_runs = db.jobs.list_runs( job_id=None, active_only=None, completed_only=None, offset=offset, limit=limit, headers=None, version=None, ) --- 2nd question: In the query history API there is a next page token that can be used, but with the Jobs API, what is the equivalent? --- Signature of list_jobs() API: ----- def list_jobs(self, job_type=None, expand_tasks=None, offset=None, limit=None, headers=None, version=None): I made a sample call like this (please note version='2.1' is needed, as pagination is available only from version 2.1): jobs_list = jobs_api.list_jobs(job_type=None, offset=0, limit=1, version='2.1') {'has_more': True, 'jobs': [{'created_time': 1649077058434, 'creator_user_name': 'user1@email.com', 'job_id': 1061722925895936, 'settings': {'email_notifications': {'no_alert_for_skipped_runs': False}, 'format': 'MULTI_TASK', 'max_concurrent_runs': 1, 'name': 'SampleJobName'}}]} --- So, you get same option 'has_more' to check if it has more records available.

@gregorythompson3534 2 жыл бұрын

Thanks! Follow up: Where is this page_size value coming from? I am not seeing this in my response body anywhere, so still unclear on how much to add to the offset.

@NextLevel-LearnWithSubhasis 2 жыл бұрын

limit = how many max records you want to fetch in every batch / call. I kept the value set to constant 500. So in every call I make to the API, it will fetch me 500 records max, if available. You manipulate (increase) offset value in every call. In this case, I have used another variable (page_size = 500) to update the value of offset. Imagine, you are reading 1000s of records from an 1D array. You start from 0th offset, and read 500 records. In next call, the offset has to be changed to (0+500), in next (0+500+500) and so on. github.com/SubhasisAndSharath/Cloud-Infra-Cost/blob/main/databricks/ch5/databricks-jobs-run.py

@NextLevel-LearnWithSubhasis 2 жыл бұрын

page_size is a variable that is defined by me. By the way, thanks for trying this out and sharing your queries.

@Jaden-lz6pb Жыл бұрын

Thanks

@NextLevel-LearnWithSubhasis Жыл бұрын

Welcome

@sohilsundaram5609 Жыл бұрын

Hi Sir, I want to save job name, workflow status (success and fail), error message in a table. How can I do this. I am using Azure Databricks. I tried many things but not able to get these.

@anotheremail9257 2 жыл бұрын

I have to migrate multiple jobs between cross region workspaces. I have got the list of all jobs in a json. Can you share something that I can use the json to import/create the same jobs in new workspace

@NextLevel-LearnWithSubhasis 2 жыл бұрын

I will give that a try and come back..

@NextLevel-LearnWithSubhasis 2 жыл бұрын

Hey, will you be able to share some sample json file ?

@NextLevel-LearnWithSubhasis 2 жыл бұрын

Please use the job create API: docs.databricks.com/dev-tools/api/2.0/jobs.html#create. Let me know how it goes.

@anotheremail9257 2 жыл бұрын

Thanks! But there is one problem here.. the json file that I have created has 40+ job confg json, so I have to call create api that many times! Do you have any sample code to help with this through Python?

@NextLevel-LearnWithSubhasis 2 жыл бұрын

Hey, no - I think calling for individual job is the only approach..