53. approx_count_distinct(), avg(), collect_list(), collect_set(), countDistinct(), count()

  Рет қаралды 9,735

WafaStudies

WafaStudies

Күн бұрын

Пікірлер: 11
@TimelessTrailsbySravani-r6s
@TimelessTrailsbySravani-r6s Ай бұрын
your videos are helpful to learn pyspark
@JjCSJ
@JjCSJ Жыл бұрын
nice explanation of all videos of pyspark, you helped lot of peoples and should be proud of yourself for helping lives, pl continue this good work
@Archanaishan
@Archanaishan Жыл бұрын
hi! your vedios are helping me somuch..now i able learn pyspark very easily..thank you somuch
@WafaStudies
@WafaStudies Жыл бұрын
Thank you 😊
@vadderamu5422
@vadderamu5422 Жыл бұрын
Awesome explantion sir ❤
@Dilshad-z4k
@Dilshad-z4k 5 ай бұрын
what is the difference between approx_count_distinct() & countDistinct()..?
@markzohan7835
@markzohan7835 Жыл бұрын
what is the difference between approx_count_distinct() and countDistinct() as its giving the same output. Also please tell us which is better in performance
@ANILKUMARNAGAR
@ANILKUMARNAGAR Жыл бұрын
In PySpark, approx_count_distinct() and countDistinct() are two functions used for counting the number of distinct values in a column of a DataFrame. However, there are some differences between these two functions. countDistinct() is a deterministic function that returns the exact number of distinct values in a column. It scans the entire data set and computes the exact count. This function is more accurate but can be slower on large datasets. approx_count_distinct() is an approximate function that uses HyperLogLog algorithm to estimate the number of distinct values in a column. It does not scan the entire data set, but rather samples the data and computes an estimated count. This function is faster on large datasets, but its accuracy is less than countDistinct(). If both functions are returning the exact same result, it means that the number of distinct values in the column,is not very large, and approx_count_distinct() is able to provide an accurate estimate. if the number of distinct values in the column is very large, approx_count_distinct() may return a lower estimate than the actual count. In that case, you might need to use the countDistinct() function to get an accurate count.
@polakigowtam183
@polakigowtam183 Жыл бұрын
Thank you.
@manu77564
@manu77564 Жыл бұрын
Thank you
@user-nt7lg6wz1o
@user-nt7lg6wz1o Жыл бұрын
Thank you...N
How many people are in the changing room? #devil #lilith #funny #shorts
00:39
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 26 МЛН
When Rosé has a fake Fun Bot music box 😁
00:23
BigSchool
Рет қаралды 6 МЛН
This INCREDIBLE trick will speed up your data processes.
12:54
Rob Mulla
Рет қаралды 270 М.
PySpark Explode function and all its variances with examples
11:44
Data Engineering Studies
Рет қаралды 185
ALL 11 LIST METHODS IN PYTHON EXPLAINED
9:23
Indently
Рет қаралды 103 М.
the TRUTH about C++ (is it worth your time?)
3:17
Low Level
Рет қаралды 786 М.
1. Remove double quotes from value of json string using PySpark
26:02
7. Create and Run Spark Job in Databricks
12:48
WafaStudies
Рет қаралды 110 М.
SQLite Backend for Beginners - Create Quick Databases with Python and SQL
13:32
How many people are in the changing room? #devil #lilith #funny #shorts
00:39