90. Databricks | Pyspark | Interview Question: Read Excel File with Multiple Sheets

  Рет қаралды 8,904

Raja's Data Engineering

Raja's Data Engineering

Күн бұрын

Azure Databricks Learning: Interview Question: Read Excel File with Multiple Sheets
================================================================================
How to create dataframe reading multiple excel sheets ?
Though creating dataframe by reading excel sheets is not very common, still there are certain scenarios where we need to read excel data. Reading data from all excel sheets is bit challenging as there is no direct solution. I have created an automated solution in this video for that requirement
To get through understanding of this concept, please watch this video
#DatabricksExcel, #SparkExcelReading, #PysparkReadingMultipleExcelSheets,#PysparkTips, #DatabricksRealtime, #SparkRealTime, #DatabricksInterviewQuestion, #DatabricksInterview, #SparkInterviewQuestion, #SparkInterview, #PysparkInterviewQuestion, #PysparkInterview, #BigdataInterviewQuestion, #BigdataInterviewQuestion, #BigDataInterview, #PysparkPerformanceTuning, #PysparkPerformanceOptimization, #PysparkPerformance, #PysparkOptimization, #PysparkTuning, #DatabricksTutorial, #AzureDatabricks, #Databricks, #Pyspark, #Spark, #AzureDatabricks, #AzureADF, #Databricks, #LearnPyspark, #LearnDataBRicks, #DataBricksTutorial, #azuredatabricks, #notebook, #Databricksforbeginners

Пікірлер: 33
@oiwelder
@oiwelder Жыл бұрын
I had a similar case, but it was several positional cells. With your function I managed to reduce 12 dataframes to just 1. Thank you for sharing this precious knowledge.
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
That's amazing. Thank you for sharing your experience
@oiwelder
@oiwelder Жыл бұрын
@@rajasdataengineering7585 however I found another challenge, the loop will repeat 12 times, and each framework represents 1 month. I needed to create an incremental to count from 1 to 12. I used a function monotonically_increasing_id() but it doesn't generate the correct sequence, it starts at 1 and the variation ends at 2000. Would there be a solution for this case?
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Please share the requirement in detail to audaciousazure@gmail.com
@KG-rw9sr
@KG-rw9sr Жыл бұрын
please post the code for UDF to create DF from Multiple sheets and Perform Union in the description like in other previous videos! thanks again for amazing tutorials. You are amazing!
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
def createExcelDataFrame(path, sheets): firstSheet =sheets[0] df = spark.read.format("com.crealytics.spark.excel").option("inferschema",True).option("header",True).option("dataAddress", f"{firstSheet}!").load(path) schema =df.schema for sheet in sheets[1:]: sheetDF =spark.read.format("com.crealytics.spark.excel").schema(schema).option("header",True).option("dataAddress", f"{sheet}!").load(path) df = df.union(sheetDF) return df
@venkatasai4293
@venkatasai4293 Жыл бұрын
Good scenario based question …please bring more like this Raja ...
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Thanks Venkat, sure will post more real-time based videos
@ashokkumar-wm6yr
@ashokkumar-wm6yr Жыл бұрын
This question was asked in Winwire azure dataengineer L1 discussion.
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Thank you for sharing your interview experience. It certainly helps everyone in this community
@stoyyeti3671
@stoyyeti3671 Жыл бұрын
Waiting for your videos on delta live tables ,workflows, jobs , pipelines, spark structured streaming in databricks
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Have already created videos on all topics that you mentioned except delta live table Will create play list for DLT in near future
@datoalavista581
@datoalavista581 Жыл бұрын
Thank you for sharing !
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
My pleasure!
@PradyutJoshi
@PradyutJoshi 3 ай бұрын
Great example but your microphone is having interference. Please resolve that. Thanks!
@sravankumar1767
@sravankumar1767 Жыл бұрын
Nice explanation Raja 👌 👍 👏
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Thanks Sravan!
@rushikeshsuryawanshi9679
@rushikeshsuryawanshi9679 Жыл бұрын
Please create a video for xml read and streaming also
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Sure will do
@user-ng3oc9pv5v
@user-ng3oc9pv5v 10 ай бұрын
How to do when sheets names are space separated. I am getting error while reading Excel file having space separated sheet name
@Learn2Share786
@Learn2Share786 Жыл бұрын
How to install the excel Library at notebook level instead of cluster level. Could you pls share the steps?
@suryateja5323
@suryateja5323 Жыл бұрын
How to do when Sheet names have spaces like Sheet Name , by giving dataAddress="Sheet Name!" Throwing an error
@sravankumar1767
@sravankumar1767 Жыл бұрын
Hi Raja I have one doubt in my current project. They have given complex logic mapping document. They given multiple joins in that document. Could you please help me . They said need to finish in Monday. Please help me. How shd I contact you
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Sure Sravan, I will contact t
@sravankumar1767
@sravankumar1767 Жыл бұрын
@@rajasdataengineering7585 Thank you very much
@sravankumar1767
@sravankumar1767 Жыл бұрын
@@rajasdataengineering7585 can you please share your email, I will forward the mapping document..
@ab0515dat
@ab0515dat Жыл бұрын
But how to list the Excel sheets using spark ...not using python Because of u are reading from adls.. u will need spark only not python
@rajasdataengineering7585
@rajasdataengineering7585 Жыл бұрын
Why shouldn't we use python? Any specific reason? Pyspark means combination of python+ spark
@chakhil8000
@chakhil8000 Жыл бұрын
Sir,I am getting this error please help i installed library as you mentioned, after executing the code to read excel file ,getting this error :java.lang.NoClassDefFoundError: Could not initialize class com.crealytics.spark.excel.WorkbookReader$
@dvsrikanth22
@dvsrikanth22 Жыл бұрын
sir i am getting error as "java.lang.NoSuchMethodError: scala.collection.immutable.Seq.map(Lscala/Function1;)Ljava/lang/Object;" Please tell me how to resolve this issue
@shubhikatiwari9259
@shubhikatiwari9259 Жыл бұрын
i am getting same error "java.lang.NoClassDefFoundError: org/apache/poi/ss/usermodel/WorkbookProvider"
У ГОРДЕЯ ПОЖАР в ОФИСЕ!
01:01
Дима Гордей
Рет қаралды 6 МЛН
拉了好大一坨#斗罗大陆#唐三小舞#小丑
00:11
超凡蜘蛛
Рет қаралды 16 МЛН
Autoloader in databricks
25:48
CloudFitness
Рет қаралды 17 М.
Python Program to extract data from multiple Excel Files
12:01
Ajay Anand
Рет қаралды 21 М.
Power Query - Avoid "Helper Queries" (+10 Cool Tricks)
18:40
Process Excel files in Azure with Data Factory and Databricks | Tutorial
34:14
Adam Marczak - Azure for Everyone
Рет қаралды 116 М.
У ГОРДЕЯ ПОЖАР в ОФИСЕ!
01:01
Дима Гордей
Рет қаралды 6 МЛН