03 Combine Two DF

  Рет қаралды 54

Dataengineering Learning Hub

Dataengineering Learning Hub

Күн бұрын

🚀 PYSPARK Challenge - Day 3️⃣
---------------------------------------------
🎯 PROBLEM STATEMENT
---------------------------------------------
Combine Two DF
Write a Pyspark program to report the first name, last name, city, and state of each person in the Person dataframe. If the address of a personId is not present in the Address dataframe, report null instead.
---------------------------------------------
📝 Schema And Data :
---------------------------------------------
Difficult Level : EASY
Input Data :
Define schema for the 'persons' table
persons_schema = StructType([
StructField("personId", IntegerType(), True),
StructField("lastName", StringType(), True),
StructField("firstName", StringType(), True)
])
Define schema for the 'addresses' table
addresses_schema = StructType([
StructField("addressId", IntegerType(), True),
StructField("personId", IntegerType(), True),
StructField("city", StringType(), True),
StructField("state", StringType(), True)
])
Define data for the 'persons' table
persons_data = [
(1, 'Wang', 'Allen'),
(2, 'Alice', 'Bob')
]
Define data for the 'addresses' table
addresses_data = [
(1, 2, 'New York City', 'New York'),
(2, 3, 'Leetcode', 'California')
]
Key Concepts:
📚 What You'll Learn:
• How to define schemas in PySpark
• Loading sample data into PySpark DataFrames
• Performing a left join to combine two DataFrames
• Handling missing data with null
🔔 Make sure to subscribe and hit the notification bell so you don’t miss any upcoming challenges! 🚀

Пікірлер
04 Employees Earning
13:58
Dataengineering Learning Hub
Рет қаралды 31
I just tried o3-mini
6:31
ThePrimeTime
Рет қаралды 233 М.
It works #beatbox #tiktok
00:34
BeatboxJCOP
Рет қаралды 41 МЛН
Cat mode and a glass of water #family #humor #fun
00:22
Kotiki_Z
Рет қаралды 42 МЛН
My scorpion was taken away from me 😢
00:55
TyphoonFast 5
Рет қаралды 2,7 МЛН
How I'd Learn AI (If I Had to Start Over)
15:04
Thu Vu
Рет қаралды 908 М.
19 Basic Filter Part 3
16:51
Dataengineering Learning Hub
Рет қаралды 20
12 Cities With Completed Trades  need edit
18:54
Dataengineering Learning Hub
Рет қаралды 19
10 Signs Your Software Project Is Heading For FAILURE
17:59
Continuous Delivery
Рет қаралды 44 М.
Fastest Way to Learn ANY Programming Language: 80-20 rule
8:24
Sahil & Sarra
Рет қаралды 971 М.
Difference between cookies, session and tokens
11:53
Valentin Despa
Рет қаралды 682 М.
Dependency Injection, The Best Pattern
13:16
CodeAesthetic
Рет қаралды 917 М.
8 patterns to solve 80% Leetcode problems
7:30
Sahil & Sarra
Рет қаралды 516 М.
17 Basic Filter
19:41
Dataengineering Learning Hub
Рет қаралды 23