Spark Scenario Based Question | Deal with Ambiguous Column in Spark | Using PySpark | LearntoSpark

  Рет қаралды 10,749

Azarudeen Shahul

Azarudeen Shahul

Күн бұрын

Пікірлер: 33
@Akshaykumar-pu4vi
@Akshaykumar-pu4vi 2 жыл бұрын
In pyspark we can simply apply df_final=df.withColumn("Name",df["name0"]).drop("name0","name4") In upgrade version of pyspark it will display value with indexing by default "create a new column with taking reference from any one duplicate column and drop that duplicate columns, it will work hope so" Thank you so much for this playlist Sir!
@srinivasa1187
@srinivasa1187 2 жыл бұрын
It Did not work for me, using Pyspark 3. could you write the exact syntax or refer me any page from where you have got it. Thanks
@localmartian9047
@localmartian9047 2 жыл бұрын
@@srinivasa1187 it will not work. Here he is assuming duplicate cols already have index concatenated and then just keeping one of them and dropping rest.
@ashutoshrai5342
@ashutoshrai5342 4 жыл бұрын
Great work.Keep posting new use cases.You will definetly make it big.Thank you
@sivavalluru3864
@sivavalluru3864 4 жыл бұрын
Nice explaantion keep it up
@bhaskarreddy-wt5rc
@bhaskarreddy-wt5rc 3 жыл бұрын
Before unwrapping the inner JSON , we can rename the name coloum and we can unwrapped the inner JSON right
@Shiva-kz6tn
@Shiva-kz6tn 4 жыл бұрын
Good one.. please post in scala as well!
@akshayanand6803
@akshayanand6803 4 жыл бұрын
Wanted to deal with duplicate columns as well... This is nice
@ayushmittal3948
@ayushmittal3948 4 жыл бұрын
cant we rename column by this code:- for i in df_cols: if i in lst: i = i+"new" lst.append(i) it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above
@localmartian9047
@localmartian9047 2 жыл бұрын
Almost works. To make it full proof in case say there are multiple duplicate columns, have a counter inside loop and append that instead of "new"
@ayushmittal3948
@ayushmittal3948 2 жыл бұрын
@@localmartian9047 yes can do that as well
@nareshreddy-l3f
@nareshreddy-l3f Жыл бұрын
lst=[] x=1 for i in df2.columns: if i in lst: i = i+str(x) x=x+1 lst.append(i) print(lst)
@nareshreddy-l3f
@nareshreddy-l3f Жыл бұрын
lst=[] x=1 for i in df2.columns: if i in lst: i = i+str(x) x=x+1 lst.append(i) print(lst)
@vuppalanaveenkrishna6070
@vuppalanaveenkrishna6070 3 жыл бұрын
Thanks azhar...I did this exercize
@AzarudeenShahul
@AzarudeenShahul 3 жыл бұрын
Great job!
@sushantshekhar9409
@sushantshekhar9409 3 жыл бұрын
Hi Azarudeen, when I am converting JSON to a data frame then one of the ambiguous columns is getting null value...what to do in that case..
@ramyagudivaka4944
@ramyagudivaka4944 2 жыл бұрын
Thanks for your efforts. Amazing work could you please this put the logic in spark scala also
@bhavitavyashrivastava8600
@bhavitavyashrivastava8600 4 жыл бұрын
I have made Python machine learning web app can I do the same with Pyspark MLlib . IF yes then how ? I have used Heroku for my Python machine l apps ?
@ravikirantuduru1061
@ravikirantuduru1061 4 жыл бұрын
Can project template for pyspark project to submit job in cluster
@aspait
@aspait 4 жыл бұрын
we can also use rename column option
@AzarudeenShahul
@AzarudeenShahul 4 жыл бұрын
With column rename option will not remove ambiguous issue.. please try and let me know
@SujeetKumarMehta-kk7kw
@SujeetKumarMehta-kk7kw Жыл бұрын
Very very thanks .....
@manojkalyan94
@manojkalyan94 4 жыл бұрын
bro can you make video on unit testing
@AzarudeenShahul
@AzarudeenShahul 4 жыл бұрын
Sure bro.. u can expect one soon
@ayushmittal3948
@ayushmittal3948 4 жыл бұрын
cant we rename column by this code:- for i in df_cols: if i in lst: i = i+"new" lst.append(i) it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above. o/p ['name', 'product', 'address', 'mob', 'namenew']
@ppriya8150
@ppriya8150 4 жыл бұрын
Hi sir could you please explain same in spark scala 🙏
@AzarudeenShahul
@AzarudeenShahul 4 жыл бұрын
Sure will try to make in spark scala aswell
@ppriya8150
@ppriya8150 4 жыл бұрын
@@AzarudeenShahul TanQ Sir
@dippusingh3204
@dippusingh3204 4 жыл бұрын
val df = spark.read.option("multiline", "true").json("input/input1.json") //df.show(false) //df.printSchema() val df0=df.select("*", "Delivery.*").drop("Delivery") df0.show(false) var list = df0.schema.map(_.name).toList for(i
@ppriya8150
@ppriya8150 3 жыл бұрын
@@dippusingh3204 thank you so much
@subramanyams3742
@subramanyams3742 4 жыл бұрын
creating the our own schema does not help is it?
@ppriya8150
@ppriya8150 4 жыл бұрын
Sir could you please explain the same thing in spark scala in next video
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 35 МЛН
Spark reduceByKey Or groupByKey
12:06
Data Engineering
Рет қаралды 16 М.
Spark Accumulator | Shared Variables in Spark | TrendyTech
10:40
Sumit Mittal
Рет қаралды 13 М.
67. Databricks | Pypark | Delta: Schema Evolution - MergeSchema
7:53
Raja's Data Engineering
Рет қаралды 15 М.
How to Read Spark DAGs | Rock the JVM
21:12
Rock the JVM
Рет қаралды 24 М.