In pyspark we can simply apply df_final=df.withColumn("Name",df["name0"]).drop("name0","name4") In upgrade version of pyspark it will display value with indexing by default "create a new column with taking reference from any one duplicate column and drop that duplicate columns, it will work hope so" Thank you so much for this playlist Sir!
@srinivasa11872 жыл бұрын
It Did not work for me, using Pyspark 3. could you write the exact syntax or refer me any page from where you have got it. Thanks
@localmartian90472 жыл бұрын
@@srinivasa1187 it will not work. Here he is assuming duplicate cols already have index concatenated and then just keeping one of them and dropping rest.
@ashutoshrai53424 жыл бұрын
Great work.Keep posting new use cases.You will definetly make it big.Thank you
@sivavalluru38644 жыл бұрын
Nice explaantion keep it up
@bhaskarreddy-wt5rc3 жыл бұрын
Before unwrapping the inner JSON , we can rename the name coloum and we can unwrapped the inner JSON right
@Shiva-kz6tn4 жыл бұрын
Good one.. please post in scala as well!
@akshayanand68034 жыл бұрын
Wanted to deal with duplicate columns as well... This is nice
@ayushmittal39484 жыл бұрын
cant we rename column by this code:- for i in df_cols: if i in lst: i = i+"new" lst.append(i) it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above
@localmartian90472 жыл бұрын
Almost works. To make it full proof in case say there are multiple duplicate columns, have a counter inside loop and append that instead of "new"
@ayushmittal39482 жыл бұрын
@@localmartian9047 yes can do that as well
@nareshreddy-l3f Жыл бұрын
lst=[] x=1 for i in df2.columns: if i in lst: i = i+str(x) x=x+1 lst.append(i) print(lst)
@nareshreddy-l3f Жыл бұрын
lst=[] x=1 for i in df2.columns: if i in lst: i = i+str(x) x=x+1 lst.append(i) print(lst)
@vuppalanaveenkrishna60703 жыл бұрын
Thanks azhar...I did this exercize
@AzarudeenShahul3 жыл бұрын
Great job!
@sushantshekhar94093 жыл бұрын
Hi Azarudeen, when I am converting JSON to a data frame then one of the ambiguous columns is getting null value...what to do in that case..
@ramyagudivaka49442 жыл бұрын
Thanks for your efforts. Amazing work could you please this put the logic in spark scala also
@bhavitavyashrivastava86004 жыл бұрын
I have made Python machine learning web app can I do the same with Pyspark MLlib . IF yes then how ? I have used Heroku for my Python machine l apps ?
@ravikirantuduru10614 жыл бұрын
Can project template for pyspark project to submit job in cluster
@aspait4 жыл бұрын
we can also use rename column option
@AzarudeenShahul4 жыл бұрын
With column rename option will not remove ambiguous issue.. please try and let me know
@SujeetKumarMehta-kk7kw Жыл бұрын
Very very thanks .....
@manojkalyan944 жыл бұрын
bro can you make video on unit testing
@AzarudeenShahul4 жыл бұрын
Sure bro.. u can expect one soon
@ayushmittal39484 жыл бұрын
cant we rename column by this code:- for i in df_cols: if i in lst: i = i+"new" lst.append(i) it will check if col exist and if exist, it will append "new" with it. as simple as that , indirectly you are just counting the occurence and then appending, instead of that, we can do above. o/p ['name', 'product', 'address', 'mob', 'namenew']
@ppriya81504 жыл бұрын
Hi sir could you please explain same in spark scala 🙏
@AzarudeenShahul4 жыл бұрын
Sure will try to make in spark scala aswell
@ppriya81504 жыл бұрын
@@AzarudeenShahul TanQ Sir
@dippusingh32044 жыл бұрын
val df = spark.read.option("multiline", "true").json("input/input1.json") //df.show(false) //df.printSchema() val df0=df.select("*", "Delivery.*").drop("Delivery") df0.show(false) var list = df0.schema.map(_.name).toList for(i
@ppriya81503 жыл бұрын
@@dippusingh3204 thank you so much
@subramanyams37424 жыл бұрын
creating the our own schema does not help is it?
@ppriya81504 жыл бұрын
Sir could you please explain the same thing in spark scala in next video