This is the only video on youtube which explains with hands on exercise on RDD, DataFrame and Dataset. Quite good material to get started. Thanks for sharing.
@surajsheshadri6 жыл бұрын
Can we use dataset always compared to dataframe or in which cases should we prefer dataframe over dataset?
@wenyian4845 Жыл бұрын
I like this class
@ThePrashant13feb7 жыл бұрын
Thanks for the video, it was a good learning. I got stuck in one of the parts: val dsPopulation = populationRDD.toDS import org.apache.spark.sql.SQLContext val sqlContext = new SQLContext(sc) import sqlContext.implicits._ val dfPopulation2 = sqlContext.createDataFrame(populationRDD) val dsPopulation = dfPopulation2.as[Person] dsPopulation.filter((p: Person) => p.age > 40).groupBy(round($"age"/5)* 5) ----> Works fine till here. dsPopulation.filter((p: Person) => p.age > 40).groupBy(round($"age"/5)* 5).agg(avg($"income")) ----> Gives error , says Type Mismatch found : org.apache.spark.sql.Column required: org.apache.spark.sql.TypedColumn[Person,?] Please help me out.