flatten nested json in spark | Lec-20 | most requested video

  Рет қаралды 14,687

MANISH KUMAR

MANISH KUMAR

Күн бұрын

In this video I have talked about how you can flatten your nested json in spark.
Directly connect with me on:- topmate.io/man...
Download data from here:- github.com/man..., www.kaggle.com...
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj

Пікірлер: 49
@Ajay_C_JadhavIII
@Ajay_C_JadhavIII 4 ай бұрын
Are bhai ye admi abhi tak viral kaise nahi hua he deserve it
@sanilkumarbarik9151
@sanilkumarbarik9151 2 ай бұрын
Share his contents in various social media platforms and if possible recommend to your friends
@nikharjain5876
@nikharjain5876 8 ай бұрын
Useful content, Much thanks Manish bhai :)
@nayanjyotibhagawati939
@nayanjyotibhagawati939 Жыл бұрын
Very helpful video.. ek interview question tha .. how to validate schema and null values.. please ek real time scenario as eg le kar bata do
@da_nalyst
@da_nalyst Жыл бұрын
Thank you Manish Bhai, very helpful
@Akshay50826
@Akshay50826 3 ай бұрын
Thank you Manish for awesome explanation
@NikhilKulshrestha-j3s
@NikhilKulshrestha-j3s 2 ай бұрын
ye question interview mai bahut pucha jata hai
@shayankabasi160
@shayankabasi160 Жыл бұрын
Good work upload something on streaming
@fashionate6527
@fashionate6527 6 ай бұрын
thanks for great quality content
@da_nalyst
@da_nalyst Жыл бұрын
Thank you Manish Bhai for this gem
@niladridey9666
@niladridey9666 Жыл бұрын
thanks for quality content.very helpful for fresher...
@mantukumar-qn9pv
@mantukumar-qn9pv Жыл бұрын
Thank you Guru!
@nayanjyotibhagawati939
@nayanjyotibhagawati939 Жыл бұрын
Please add a video on how to handle null value and how to validate a scheme
@DeepakSingh-nc2wf
@DeepakSingh-nc2wf Жыл бұрын
Bhai, there are much simpler functions like json_tuple to extract columns from nested json inspite of exploding columns.
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
Oh I didn't know. Let me check that.
@roshan_off1955
@roshan_off1955 Жыл бұрын
Yar manish kuch ispe v video banao Cv p kya project dale Aur Different technology se Data me switch karna hai to kya resume pe hona chaiye
@lakshya1375
@lakshya1375 4 ай бұрын
bhai kiya kya switch?
@shreyaspatil4861
@shreyaspatil4861 10 ай бұрын
Thanks very much for the tutorial :) , I have a query regarding reading in json files. so i have an array of structs where each struct has a different structure/schema. And based on a certain property value of struct I apply filter to get that nested struct , however when I display using printschema it contains fields that do not belong to that object but are somehow being associated with the object from the schema of other structs , how can i possibly fix this issue ?
@Hrmwnoryza
@Hrmwnoryza 3 ай бұрын
hello manis kumar, i wanna ask, if in data json not having object in array should i write function .drop ?
@Matrix_Mayhem
@Matrix_Mayhem 9 ай бұрын
Thanks Manish!
@gauravsingh-gn4zz
@gauravsingh-gn4zz Жыл бұрын
Hello Manish , Just one doubt , what if we have 100 columns of struct type and 100 columns of 100 type. Should we write explode and .column 200 times. Or is there any other way please help to this find out. Thanks
@Ronak-Data-Engineer
@Ronak-Data-Engineer Жыл бұрын
Very helpful
@____prajwal____
@____prajwal____ 11 ай бұрын
Thanks. How can be extend this in case we have a stringified json and we need json fields inside that .
@Tanveer_Shaikh_330
@Tanveer_Shaikh_330 Жыл бұрын
RDD k practical ya theory quetions aate hai kya interview me?
@HanuamnthReddy
@HanuamnthReddy 10 ай бұрын
ThANK U GURUGHEE...
@poojajoshi871
@poojajoshi871 Жыл бұрын
.select use karke we are selecting to expload can we use withColumn also ?
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
Yes it will work using withColumn too
@rudrakasha-t1v
@rudrakasha-t1v 9 ай бұрын
is nested data and complex data both are same?
@RahulRathore-wj9uy
@RahulRathore-wj9uy 8 ай бұрын
can we define our own schema using this json
@VIVEKSINGH-us6he
@VIVEKSINGH-us6he Жыл бұрын
how to make a generic json parser(flatnner) function, do u have that code , could you please share , here u have hard coded, but any generic funciton
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
I have not written yet. I will try to write a generic function that will flatten the entire json into dataframe
@ETLMasters
@ETLMasters Жыл бұрын
I think you are talking about this: from pyspark.sql.types import * from pyspark.sql.functions import col, posexplode_outer def flattenDataFrame(explodeDF): DFSchema = explodeDF.schema fields = DFSchema.fields fieldNames = DFSchema.fieldNames() fieldLength = len(fieldNames) for i in range(fieldLength): field = fields[i] fieldName = field.name fieldDataType = field.dataType if isinstance(fieldDataType, ArrayType): fieldNameExcludingArray = list(filter(lambda colName: colName != fieldName, fieldNames)) fieldNamesAndExplode = fieldNameExcludingArray + ["posexplode_outer({0}) as ({1}, {2})".format(fieldName, fieldName+"_pos", fieldName)] arrayDF = explodeDF.selectExpr(*fieldNamesAndExplode) return flattenDataFrame(arrayDF) elif isinstance(fieldDataType, StructType): childFieldNames = fieldDataType.names structFieldNames = list(map(lambda childname: fieldName +"."+childname, childFieldNames)) newFieldNames = list(filter(lambda colName: colName != fieldName, fieldNames)) + structFieldNames renamedCols = map(lambda x: x.replace(".", "_"), newFieldNames) zipAliasColNames = zip(newFieldNames, renamedCols) aliasColNames = map(lambda y: col(y[0]).alias(y[1]), zipAliasColNames) structDF = explodeDF.select(*aliasColNames) return flattenDataFrame(structDF) return explodeDF
@DedloxGMR
@DedloxGMR Жыл бұрын
Manish bhai I see all your videos of this play list but mujhe meri problem kas answer nhi mila me nested json dataset pr work kr rha hu Wo load to ho rha hai but show me corrupt record arha hai mene uska schema type change kia to jo dataset hai Wase he show horha hai json me me Ise kase load kru mene multiline bhi use kia
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
Aap mujhe may be a smaller data set mail karo in a file or linked par bhejo
@DedloxGMR
@DedloxGMR Жыл бұрын
@@manish_kumar_1 bhai linked in pr nhi horha aap mail bata dijiye
@wayzonic
@wayzonic Жыл бұрын
How many lectures remain for complete this series? Please start SQL playlist also.
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
After 8-10 videos new playlist will come
@OmMishra-y9h
@OmMishra-y9h Жыл бұрын
i am learning python but when i go to geeks for geeks to solve easy question i cant be able to solve them like runner up questions,or etc , can you guide me regarding this
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
Starting me tough lagta hai. But slowly aapko samjhne lagaega, pattern to solve questions.
@dineshughade6741
@dineshughade6741 7 ай бұрын
Hello manish, Ciuld ypu provide the json file which you have used here?
@manish_kumar_1
@manish_kumar_1 7 ай бұрын
Data download karne ka link description me hai
@dineshughade6741
@dineshughade6741 7 ай бұрын
Oh Great, thanks Manish
@poojajoshi871
@poojajoshi871 Жыл бұрын
Hi Sir, How many videos still left to complete.
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
A lot of things to learn but after 8-10 videos we will move forward with others topics.
@satishlovewanshi2540
@satishlovewanshi2540 Жыл бұрын
data = [("openai",'[{"name":"ram","work":"salesman"}]'), ("tech support",'[{"name":"lakhan","work":"service man","lname":"mishra"}]'), ("data operator ",'[{"name":"lakhan","work":"service man","salary":"5000","System":"del"}]')] Bhaiya ji Jo data upar diya h ise kese flatten karenge ye mere client project ka sample data h please help me
@coolashishful
@coolashishful 2 ай бұрын
# Parse the JSON data into structured format df_parsed = df.withColumn("json_data", from_json(col("json_data"), schema)) # Explode the array of JSON objects into rows df_exploded = df_parsed.withColumn("json_data", explode(col("json_data"))) # Flatten the DataFrame by selecting individual fields df_flattened = df_exploded.select( col("job_title"), col("json_data.name").alias("name"), col("json_data.work").alias("work"), col("json_data.lname").alias("lname"), col("json_data.salary").alias("salary"), col("json_data.System").alias("System") ) # Show the result df_flattened.show(truncate=False)
@saisri6404
@saisri6404 Жыл бұрын
you havn't started this vedio with `Possible Interview Questions`😅
@manish_kumar_1
@manish_kumar_1 Жыл бұрын
Code hi likhwayenge isme
scd2 in spark | Lec-24
15:19
MANISH KUMAR
Рет қаралды 14 М.
Flatten Nested Json in PySpark
9:22
GeekCoders
Рет қаралды 5 М.
One day.. 🙌
00:33
Celine Dept
Рет қаралды 77 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
If people acted like cats 🙀😹 LeoNata family #shorts
00:22
LeoNata Family
Рет қаралды 43 МЛН
Flattening a JSON Object Using Recursion in Python
23:30
Coderbyte
Рет қаралды 12 М.
Learn JSON in 10 Minutes
12:00
Web Dev Simplified
Рет қаралды 3,2 МЛН
how to read json file in pyspark
24:02
MANISH KUMAR
Рет қаралды 22 М.
Spark Submit | Lec-17
25:12
MANISH KUMAR
Рет қаралды 19 М.
15. Databricks| Spark | Pyspark | Read Json| Flatten Json
9:35
Raja's Data Engineering
Рет қаралды 45 М.
Handling corrupted records in spark | PySpark | Databricks
19:36
MANISH KUMAR
Рет қаралды 30 М.
One day.. 🙌
00:33
Celine Dept
Рет қаралды 77 МЛН