Data validation between source and target table | PySpark Interview Question |

  Рет қаралды 3,626

GeekCoders

GeekCoders

Күн бұрын

Пікірлер: 11
@beingnagur
@beingnagur 4 ай бұрын
At 6.04 instead of copying the same statement you can use .otherwise("not matching")
@rishabhkesarwani-br2rx
@rishabhkesarwani-br2rx 7 ай бұрын
I do below steps to compare source vs target table 1) Count should be matching in source and target table 2) Schema should be matching in source and target table 3) Use the except and to check if any records are there which are present in source and not in target or vice versa. 4) Use the left anti join to find out the records which are not matching. 5) Trying to debug why there is record mismatch
@GeekCoders
@GeekCoders 7 ай бұрын
Nice
@gudiatoka
@gudiatoka 7 ай бұрын
exceptAll can be usefull too or anti join
@GeekCoders
@GeekCoders 7 ай бұрын
Except all may miss the null value sometime
@CeejayPTcoach
@CeejayPTcoach 3 ай бұрын
wont the join be a costly operation
@jhonsen9842
@jhonsen9842 7 ай бұрын
Main Problem i found in learning Pyspark is brackets every time it gives me some error.
@GeekCoders
@GeekCoders 7 ай бұрын
Yes
@nishirajnikku969
@nishirajnikku969 7 ай бұрын
I request you to please create a playlist for Pyspark Unit testing .
@shivamchandan50
@shivamchandan50 7 ай бұрын
plz make video on pyspark unit testing
@VinodKumar-gz8bk
@VinodKumar-gz8bk 3 ай бұрын
What are the most challenging thing that you faced in your project & how you overcome?
How Many Balloons To Make A Store Fly?
00:22
MrBeast
Рет қаралды 196 МЛН
路飞做的坏事被拆穿了 #路飞#海贼王
00:41
路飞与唐舞桐
Рет қаралды 26 МЛН
Walmart PySpark Interview Question | Data Engineering |
15:04
GeekCoders
Рет қаралды 3,2 М.
75. Databricks | Pyspark | Performance Optimization - Bucketing
22:03
Raja's Data Engineering
Рет қаралды 20 М.
22. Databricks| Spark | Performance Optimization | Repartition vs Coalesce
21:11
Raja's Data Engineering
Рет қаралды 56 М.