Spark Scenario Based Question | Deal with Ambiguous Column in Spark | Using PySpark

Spark Scenario Based Question | Deal with Ambiguous Column in Spark | Using PySpark | LearntoSpark

Рет қаралды 10,749

Күн бұрын

Пікірлер: 33

@Akshaykumar-pu4vi 2 жыл бұрын

In pyspark we can simply apply df_final=df.withColumn("Name",df["name0"]).drop("name0","name4") In upgrade version of pyspark it will display value with indexing by default "create a new column with taking reference from any one duplicate column and drop that duplicate columns, it will work hope so" Thank you so much for this playlist Sir!

@srinivasa1187 2 жыл бұрын

It Did not work for me, using Pyspark 3. could you write the exact syntax or refer me any page from where you have got it. Thanks

@localmartian9047 2 жыл бұрын

@@srinivasa1187 it will not work. Here he is assuming duplicate cols already have index concatenated and then just keeping one of them and dropping rest.

@ashutoshrai5342 4 жыл бұрын

Great work.Keep posting new use cases.You will definetly make it big.Thank you

@sivavalluru3864 4 жыл бұрын

Nice explaantion keep it up

@bhaskarreddy-wt5rc 3 жыл бұрын

Before unwrapping the inner JSON , we can rename the name coloum and we can unwrapped the inner JSON right

@Shiva-kz6tn 4 жыл бұрын

Good one.. please post in scala as well!

@akshayanand6803 4 жыл бұрын

Wanted to deal with duplicate columns as well... This is nice

@ayushmittal3948 4 жыл бұрын

cant we rename column by this code:- for i in df_cols: if i in lst: i = i+"new" lst.append(i) it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above

@localmartian9047 2 жыл бұрын

Almost works. To make it full proof in case say there are multiple duplicate columns, have a counter inside loop and append that instead of "new"

@ayushmittal3948 2 жыл бұрын

@@localmartian9047 yes can do that as well

@nareshreddy-l3f Жыл бұрын

lst=[] x=1 for i in df2.columns: if i in lst: i = i+str(x) x=x+1 lst.append(i) print(lst)

@nareshreddy-l3f Жыл бұрын

lst=[] x=1 for i in df2.columns: if i in lst: i = i+str(x) x=x+1 lst.append(i) print(lst)

@vuppalanaveenkrishna6070 3 жыл бұрын

Thanks azhar...I did this exercize

@AzarudeenShahul 3 жыл бұрын

Great job!

@sushantshekhar9409 3 жыл бұрын

Hi Azarudeen, when I am converting JSON to a data frame then one of the ambiguous columns is getting null value...what to do in that case..

@ramyagudivaka4944 2 жыл бұрын

Thanks for your efforts. Amazing work could you please this put the logic in spark scala also

@bhavitavyashrivastava8600 4 жыл бұрын

I have made Python machine learning web app can I do the same with Pyspark MLlib . IF yes then how ? I have used Heroku for my Python machine l apps ?

@ravikirantuduru1061 4 жыл бұрын

Can project template for pyspark project to submit job in cluster

@aspait 4 жыл бұрын

we can also use rename column option

@AzarudeenShahul 4 жыл бұрын

With column rename option will not remove ambiguous issue.. please try and let me know

@SujeetKumarMehta-kk7kw Жыл бұрын

Very very thanks .....

@manojkalyan94 4 жыл бұрын

bro can you make video on unit testing

@AzarudeenShahul 4 жыл бұрын

Sure bro.. u can expect one soon

@ayushmittal3948 4 жыл бұрын

@ppriya8150 4 жыл бұрын

Hi sir could you please explain same in spark scala 🙏

@AzarudeenShahul 4 жыл бұрын

Sure will try to make in spark scala aswell

@ppriya8150 4 жыл бұрын

@@AzarudeenShahul TanQ Sir

@dippusingh3204 4 жыл бұрын

val df = spark.read.option("multiline", "true").json("input/input1.json") //df.show(false) //df.printSchema() val df0=df.select("*", "Delivery.*").drop("Delivery") df0.show(false) var list = df0.schema.map(_.name).toList for(i

@ppriya8150 3 жыл бұрын

@@dippusingh3204 thank you so much