Data Analyst Portfolio Project #2: Python Customer Segmentation & Clustering

  Рет қаралды 115,317

Absent Data

Absent Data

Күн бұрын

This is a data analysis portfolio project that will allow you to perform customer segmentation on a specific group of mall customers. You will identify the best possible cluster using the KMeans unsupervised machine learning algorithm to find the univariate, bivariate, and multivariate clusters. Once these clusters are identified, summary statistics can be performed on these to identify the best marketing group. Book a one-on-one mentorship call: topmate.io/gae...
You can find the dataset here:
absentdata.com...
Get the code here:
github.com/Gae...
#datascience
#python
#dataanalysis
#porfolioproject
#kmeans
Learn Python and practice online with case studies. Dive into the examples, answer the questions, and create your own solutions. Use the affiliate link below to start practicing!
______________________________________________________________________
Check out Data Camp: datacamp.pxf.i...
______________________________________________________________________

Пікірлер: 100
@aditideshpande5566
@aditideshpande5566 2 ай бұрын
Thank you so much for an exceptional well explained and clear video better than what I learnt in my masters degree!
@nezzylearns
@nezzylearns Жыл бұрын
This is an exceptional walkthrough, especially how you vividly explain the process of visualizing the data.
@absentdata
@absentdata Жыл бұрын
Lovely Feedback! Thanks. I am glad you enjoyed it.
@janakiyeluripati6368
@janakiyeluripati6368 2 жыл бұрын
I followed till 32 min as I am not into ML. I just loved it. Understood univariate, bivariate. Want more videos like this. Love from India. Stay blessed.
@absentdata
@absentdata 2 жыл бұрын
Thank you so much! I am glad you finished the video and understood the exploratory data analysis steps. You also stay blessed!
@slacex
@slacex 8 ай бұрын
for those who have an error in the following formula is ---> df.corr() ------> df.corr(numeric_only=True)
@rishiraj1192
@rishiraj1192 6 ай бұрын
Facing similar issue how to resolve it
@vladthelad7298
@vladthelad7298 6 ай бұрын
@@rishiraj1192 He literally said it in his comment
@ai.simplified..
@ai.simplified.. 3 жыл бұрын
More usefull than hours of clas, good job 😍
@thanomnoimoh9299
@thanomnoimoh9299 3 жыл бұрын
Python great way to analysis awesome!!! thank you for great clip.
@mospher9253
@mospher9253 6 ай бұрын
Better than most of the big channels around there Really good explanation and project step by step Can you do other video like this using other types of Clustering like GMM and others and do a more detailed analysis and conclusions as well thank you for the time you put on this video it was super helpful
@absentdata
@absentdata 6 ай бұрын
Ibreally appreciate this. Sure I'll do more detail analysis on clustering
@Maliiik804
@Maliiik804 3 жыл бұрын
waiting for it from a long! Thanks for uploading this great content
@absentdata
@absentdata 3 жыл бұрын
Glad you're enjoying the content!
@Yett1hhh
@Yett1hhh Жыл бұрын
for i in columns: plt.figure() sns.kdeplot(data=df, x=i, shade=True, hue='Gender')
@muskanmodi724
@muskanmodi724 Жыл бұрын
Hello.. for the kdeplot at 16:35, when I'm adding hue=df['Gender'], it is giving error The following variable cannot be assigned with wide-form data: `hue`
@juancamilosanchez4693
@juancamilosanchez4693 Жыл бұрын
You can solve this by adding this to the code: x=df['Annual Income (k$)'], and then you put the hue and it works
@mercyolaleye7502
@mercyolaleye7502 Жыл бұрын
@@juancamilosanchez4693 Thanks, this helped.
@iwojoseph
@iwojoseph Жыл бұрын
@@juancamilosanchez4693 this worked! Thanks!
@StrangeMemes52
@StrangeMemes52 Жыл бұрын
@@juancamilosanchez4693 yeah , thanks, this worked
@kostantinaorselli1093
@kostantinaorselli1093 Жыл бұрын
@@juancamilosanchez4693 why does this solve that problem?
@batoolalshareef9456
@batoolalshareef9456 11 ай бұрын
Thanks alot, It's a great efforts,, keep on going, and share more videos like this 👍🌺🌺🌺
@aramisfarias5316
@aramisfarias5316 3 жыл бұрын
The end felt a little rushed and underwhelming, but overall very instructive. Good job. =)
@javeda
@javeda Жыл бұрын
Please also tell how did you implemented code autocompletion in Jupyter notebook
@RonaldPostelmans
@RonaldPostelmans Жыл бұрын
thanks for your great explanation
@awaisanjum9023
@awaisanjum9023 Жыл бұрын
Amazing video. Kindly make more protfolio projects videos.
@absentdata
@absentdata Жыл бұрын
Thank you, I will
@rachrach9871
@rachrach9871 Жыл бұрын
Awesome tutorial! I tried to download the dataset but I don’t where to begin. There’s an option for “raw” and “blame”. I’m new to data analytics so I would appreciate some help. Thank you very much
@absentdata
@absentdata Жыл бұрын
You can find the data here: absentdata.com/data-analysis/where-to-find-data/
@rachrach9871
@rachrach9871 Жыл бұрын
@@absentdata thank you so much for your quick response! I’m already doing tutorial #1 and I’m hoping to learn as much from your tutorials
@isaacetungu5215
@isaacetungu5215 2 жыл бұрын
Worth watching and follwoing along. I completed the video and did my work alongside code. I needed more help on multivariate analysis of clustering. The last part of the video on it was not well explained. Any recoomendations or video on that @Absent Data??
@adrianapanjiwijaya1520
@adrianapanjiwijaya1520 Жыл бұрын
Hi, This is very helpful. I do have a question though, after df=df.drop('Customer ID'), I forgot to add the hashtag and continued on. From that point on, the Customer ID disappeared. But in your case, Customer ID value re-appear during clustering. How did that happen and how do I get it Customer ID back?
@abrilgonzalez7892
@abrilgonzalez7892 5 күн бұрын
same question here!
@emilioprill3373
@emilioprill3373 Жыл бұрын
Learned a lot! Thank you
@absentdata
@absentdata Жыл бұрын
I'm glad to hear that. Please share it with anyone you think it helps
@nazmussumon5105
@nazmussumon5105 2 жыл бұрын
Thank you for this awesome tutorial. Learnt a lot.
@brandonwarfield5611
@brandonwarfield5611 9 ай бұрын
This is gold!!! I'm upset I'm just finding your channel!!!
@absentdata
@absentdata 9 ай бұрын
I am glad that you found the channel. Share it with anyone you think it will help!
@travelofftradition
@travelofftradition Жыл бұрын
Hi! thank you for this video. I have a question. I want to segment bank customers. But the data is in multiple files like accounts.csv, customer_details.csv, transactions.csv How to approach this problem when we have data in multiple files to segment the customers? Thanks Mohit
@absentdata
@absentdata Жыл бұрын
You will need to merge them into a single dataset.
@travelofftradition
@travelofftradition Жыл бұрын
@@absentdata Ok. so basically i have to join them using any of joins like inner joins etc.? But how is it done when there are like 10-20 files? Is there any other way?
@absentdata
@absentdata Жыл бұрын
@@travelofftradition append the files that are similar like all transactional files to create a single dataset and merge these with single customer details file which should also be result if an append.
@nadil3230
@nadil3230 Жыл бұрын
list object has no attribute mean , how to fix this error
@faa_z
@faa_z Жыл бұрын
Amazing video, thank you a lot, i only have question in 21:52 you said that from the graph seems like there is more femal than male, how did you know, is it because the median?
@absentdata
@absentdata Жыл бұрын
The value count function will count the number of males and females to give the actual number
@forecaststatistics8496
@forecaststatistics8496 3 жыл бұрын
Good job!
@anooppainuly5271
@anooppainuly5271 2 жыл бұрын
Loved it
@ericametta6964
@ericametta6964 9 ай бұрын
insightful
@absentdata
@absentdata 9 ай бұрын
Glad you found it insightful
@nadil3230
@nadil3230 Жыл бұрын
why on the y axis it was density and can we change it with some other parameters.
@absentdata
@absentdata Жыл бұрын
Yes you can change the variables on the x and y axis. You can also use PCA techniques also to display the data
@grhagandanap9912
@grhagandanap9912 2 жыл бұрын
Thanks for the practice. But I got some problem when execute the n_clusters sensitivity analysis in 41:13. Do you know what the problem is?
@44.kieutrang61
@44.kieutrang61 4 ай бұрын
Me too 😢
@nayeem9358
@nayeem9358 3 ай бұрын
What is spending score ?
@absentdata
@absentdata 3 ай бұрын
It is the score(out of 100) given to a customer by the mall authorities, based on the money spent and the behavior of the customer
@TvsCar30
@TvsCar30 Жыл бұрын
The following variable cannot be assigned with wide-form data: `hue` someone can help me
@sencxx6368
@sencxx6368 11 ай бұрын
sns.kdeplot(data=df, x='Annual Income (k$)', shade=True, hue='Gender')
@parhatbazakov1091
@parhatbazakov1091 Жыл бұрын
Hi, I am new to data, Can anyone answer my question please? If the correlation showed the most correlation with Age (-0.33) and no correlation with Annual income (0.0099), would it be better to cluster by age?
@absentdata
@absentdata Жыл бұрын
Low correlation doesn't necessarily mean low similarity. Clustering can still be useful to identify patterns even with low correlation. It depends on the goals of the analysis.
@parhatbazakov1091
@parhatbazakov1091 Жыл бұрын
@@absentdata Thanks!
@alejandrosalgadolima3745
@alejandrosalgadolima3745 Жыл бұрын
Hi, great video.I can not understand why hue is not working in my computer. Could you please help me/
@absentdata
@absentdata Жыл бұрын
Whats your issue?
@promise-abasi
@promise-abasi Жыл бұрын
@@absentdata Hi, thank you so much for the video, I also have a challenge with the hue I can't seem to get pass' ValueError: The following variable cannot be assigned with wide-form data: `hue', from 17m, how do I solve this please, thank you
@yusufbas035
@yusufbas035 2 жыл бұрын
thank you
@absentdata
@absentdata 2 жыл бұрын
You're welcome
@harryfeng4199
@harryfeng4199 3 жыл бұрын
Thnk uuuu
@mahmoudemad8507
@mahmoudemad8507 Жыл бұрын
i get an issue that fit_transform must get 2 arguments
@absentdata
@absentdata Жыл бұрын
try posting your code so we can see what's happening.
@im4485
@im4485 Жыл бұрын
Hi, is K means reliable at high dimensions?
@absentdata
@absentdata Жыл бұрын
I would say no. I would do some PCA to reduce some of your dimensions.
@ai.simplified..
@ai.simplified.. 3 жыл бұрын
15:00 practical &usefull
@absentdata
@absentdata 3 жыл бұрын
Yes loops are your friends. Saves tons of time :)
@tejkumar9018
@tejkumar9018 7 ай бұрын
----> 3 plt.figure() TypeError: 'module' object is not callable please help it cant execute because of error
@KarthiKeyan-ci2yj
@KarthiKeyan-ci2yj 3 жыл бұрын
I would like to learn Data Analytics , can I get your contact to get more information from you?
@absentdata
@absentdata 3 жыл бұрын
www.linkedin.com/in/gaelimholland
@adishreepatra7330
@adishreepatra7330 Жыл бұрын
Hi, Loved your content! If possible please share the source code of this project
@absentdata
@absentdata Жыл бұрын
I added it in the description
@EpicSharjeel
@EpicSharjeel 3 ай бұрын
everything changes in 4 year every syntax
@sikkandarbasha-p8o
@sikkandarbasha-p8o 7 ай бұрын
can i put this project on my resume?
@absentdata
@absentdata 7 ай бұрын
Of course you can!
@zubairsultanate5660
@zubairsultanate5660 Жыл бұрын
zub salute
@hrsh3329
@hrsh3329 2 жыл бұрын
👍🏽👍🏽👍🏽
@vishnua5028
@vishnua5028 Жыл бұрын
How to download dataset
@absentdata
@absentdata Жыл бұрын
check the description 😊
@vishnua5028
@vishnua5028 Жыл бұрын
@@absentdata I can't see any download option in GitHub
@slacex
@slacex 8 ай бұрын
df.groupby('Gender')['Age', 'Annual Income (k$)', 'Spending Score (1-100)'] ---> cannot subset columns with a tuple with more than one element. Use a list instead.
@absentdata
@absentdata 8 ай бұрын
Is that your whole code? Because there is no aggregation function in your group by. Also you are adding two columns. So it should be df groupby('category')[['A','B']].mean()
@slacex
@slacex 8 ай бұрын
@@absentdata i have just resolve it ----> df.groupby(['Gender'])[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']].mean() min 30:41
@mn4769
@mn4769 6 ай бұрын
sns.kdeplot(df['Annual Income (k$)'],shade =True,hue = df['Gender']); here i ValueError: The following variable cannot be assigned with wide-form data: `hue`. Can someone explain?
@arindambhunia9862
@arindambhunia9862 Ай бұрын
sns.kdeplot(x=df['Annual Income (k$)'],shade=True,hue=df['Gender']); write the code in this way, it will get resolved. I also had the same issue. Good Luck
@forzahorizon4eliminator206
@forzahorizon4eliminator206 7 ай бұрын
and you got yourself a subscriber
@absentdata
@absentdata 7 ай бұрын
Welcome to the family! I am happy to earn your subscription.
@yashjajoria
@yashjajoria Жыл бұрын
sns.kdeplot(df['Annual Income (k$)'],shade = True,hue= df['Gender']); - ValueError: The following variable cannot be assigned with wide-form data: `hue`
@absentdata
@absentdata 11 ай бұрын
The updated version of sns.kdeplot may require you to make sure you have your Gender column in longform. so you need to melt the column like this. melted_df = df.melt(id_vars='Gender', value_vars=['Annual Income (k$)']) sns.kdeplot(data=melted_df, x='value', hue='Gender', shade=True)
@yashjajoria
@yashjajoria 11 ай бұрын
thanks for response sir i'm your student @@absentdata
What does a Data Analyst actually do? (in 2024) Q&A
14:27
Tim Joo
Рет қаралды 57 М.
Modus males sekolah
00:14
fitrop
Рет қаралды 20 МЛН
大家都拉出了什么#小丑 #shorts
00:35
好人小丑
Рет қаралды 90 МЛН
Exploratory Data Analysis with Pandas Python
40:22
Rob Mulla
Рет қаралды 469 М.
How I Would Learn to be a Data Analyst
12:30
Luke Barousse
Рет қаралды 1,5 МЛН
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 296 М.
10 PORTFOLIO PROJECTS TO ADD TO YOUR DATA PORTFOLIO
14:39
Mo Chen
Рет қаралды 343 М.
The ONLY Data Analytics Portfolio You Need (GUARANTEED Job)
9:18
Learn with Lukas
Рет қаралды 153 М.
🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...
17:11
Adam Finer - Learn BI Online
Рет қаралды 147 М.
Day in the Life of a Data Analyst - SurveyMonkey Data Transformation
1:17:14
Shashank Kalanithi
Рет қаралды 3,5 МЛН
The Harsh Reality of Being a Data Analyst
7:39
Sundas Khalid
Рет қаралды 586 М.
Modus males sekolah
00:14
fitrop
Рет қаралды 20 МЛН