Step By Step Process In EDA And Feature Engineering In Data Science Projects

  Рет қаралды 147,702

Krish Naik

Krish Naik

Күн бұрын

Пікірлер: 79
@HumairaMaqbool-t2l
@HumairaMaqbool-t2l Жыл бұрын
Exploratory Data Analysis (EDA) and Feature Engineering are two essential steps in data science projects that help in understanding the data, extracting valuable insights, and preparing the data for model building and analysis. Exploratory Data Analysis (EDA): EDA is the initial and crucial phase of any data science project. It involves exploring and summarizing the main characteristics of the dataset to gain insights into its structure, patterns, and relationships between variables. The main objectives of EDA are as follows: Data Cleaning: Identifying and handling missing or erroneous data points, dealing with outliers, and removing duplicates. Descriptive Statistics: Calculating basic statistical measures such as mean, median, standard deviation, and percentiles to understand the central tendencies and dispersion of the data. Data Visualization: Creating visual representations like histograms, scatter plots, box plots, and heatmaps to visualize the distribution and relationships between variables. Correlation Analysis: Assessing the correlation between different features to understand their interdependencies and potential multicollinearity. Hypothesis Testing: Conducting statistical tests to validate assumptions and make data-driven decisions. EDA helps data scientists to identify patterns, trends, and potential issues within the dataset. It provides a foundation for further analysis and model building. Feature Engineering: Feature engineering involves transforming the raw data into meaningful features that can be used as inputs for machine learning algorithms. The quality and relevance of features play a significant role in the performance of a predictive model. The key steps in feature engineering are as follows: Feature Selection: Choosing the most relevant features that have a significant impact on the target variable while disregarding irrelevant or redundant ones. This step helps in reducing dimensionality and enhancing model efficiency. Feature Transformation: Applying mathematical or statistical transformations to the features to make the data suitable for modeling. Common transformations include scaling, normalization, and log transformations. Handling Categorical Variables: Converting categorical variables into numerical representations using techniques like one-hot encoding or label encoding to make them usable by machine learning algorithms. Creating Interaction Features: Introducing new features based on interactions between existing features can help capture non-linear relationships. Handling Missing Data: Dealing with missing data by imputing or removing missing values, depending on the nature of the dataset. Feature Extraction: Generating new features from the existing data using domain knowledge or advanced techniques like principal component analysis (PCA) or t-distributed stochastic neighbor embedding (t-SNE). Effective feature engineering can significantly improve the performance of machine learning models by providing them with more relevant and informative inputs, leading to more accurate predictions and better insights. In summary, Exploratory Data Analysis (EDA) helps in understanding the data, identifying patterns, and making data-driven decisions. Feature engineering transforms the data into useful features, enabling machine learning models to learn from the data and make predictions effectively. Together, these two steps are fundamental for successful data science projects.
@Mothernature-x
@Mothernature-x 7 ай бұрын
Thank you so much
@salehabdullahi9356
@salehabdullahi9356 7 ай бұрын
Thank you for proding this meaningful description.
@percy8177
@percy8177 3 жыл бұрын
💪🤣Facial expression is serious when he said he goes with Box Plots to find the outliers. Gotta love the passion bro.
@Yeyppe
@Yeyppe 3 жыл бұрын
Krish Sir You Know Your Channel Is Not Only A KZbin Channel ... It Is Everything For Us ! Having A Mentor And Teacher Like You Is A Blessing
@write2ruby
@write2ruby 2 жыл бұрын
1. Feature Engineering (Takes 30% of Project Time) a) EDA i) Analyze how many numerical features are present using histogram, pdf with seaborn, matplotlib. ii) Analyze how many categorical features are present. Is multiple categories present for each feature? iii) Missing Values (Visualize all these graphs) iv) Outliers - Boxplot v) Cleaning b) Handling the Missing Values i) Mean/Median/Mode c) Handling Imbalanced dataset d) Treating the Outliers e) Scaling down the data - Standardization, Normalization f) Converting the categorical features into numerical features 2. Feature Selection a) Correlation b) KNeighbors c) ChiSquare d) Genetic Algorithm e) Feature Importance - Extra Tree Classifiers 3. Model Creation 4. Hyperparameter Tuning 5. Model Deployment 6. Incremental Learning
@harithavalmiki9390
@harithavalmiki9390 2 жыл бұрын
Thank you so much!
@Saaii1234
@Saaii1234 2 жыл бұрын
Thank you
@chalmerilexus2072
@chalmerilexus2072 Жыл бұрын
Thanks. You saved my 5 minutes.
@SidIndian082
@SidIndian082 Жыл бұрын
thnx a lot Ma'am🙏🙏
@himanshujharwal2512
@himanshujharwal2512 6 ай бұрын
thanks really appreciating
@kasturibalaji9177
@kasturibalaji9177 3 жыл бұрын
Hi Krishna sir, I got new job on data science domain at Chennai product based company. Your videos lots help me before I was working different domain. Best Regards, Balaji
@krishnaik06
@krishnaik06 3 жыл бұрын
Congratulations
@abdulqudusbalogun8057
@abdulqudusbalogun8057 3 жыл бұрын
I have been watching your videos non stop for weeks now, by God, you are my favorite tutor...God bless
@rajeshseemakurthi1595
@rajeshseemakurthi1595 3 ай бұрын
Top priority for Aspiring Data Scientists like me
@vaishnavi4354
@vaishnavi4354 3 жыл бұрын
Induction session is awesome from MLDL course. .that's 🔥🔥🔥
@awais2451985
@awais2451985 2 жыл бұрын
a lot of love and appreciation from Pakistan for your great effort.
@kanikabagree1084
@kanikabagree1084 3 жыл бұрын
This guy deserves a million subs 🌸❤️
@chaitanyasinghal1098
@chaitanyasinghal1098 3 ай бұрын
I am from future and he has million subs
@nanda9395
@nanda9395 2 жыл бұрын
This is clear info about F.E and E.D.A. . 🙏🙏
@ashmitasharma5879
@ashmitasharma5879 6 ай бұрын
Thank you so much for helping us this way ....🎉🎉🎉🎉 Thank you so much sir You are a very knowledgeable and helping natured person 🎉🎉🎉🎉🎉
@1234560pratik
@1234560pratik 3 жыл бұрын
What I actually need you know very well sir but how ??man ki baat jan lete ho ap antaryami ho mahagyani ho balki me to kahta hu ap purush he nahi MahaPurus ho🤩😍😍❤❤❤
@Samtoosoon
@Samtoosoon 25 күн бұрын
Numerical features may be there, categorial features, missing values, visualise, outliers box plot, cleaning Step 2 handling missing values by mean, box plot iqr remove, handling imbalance dataset, treating outliers, scaling data standarisation and normalisation, categorical to numerical features
@rajpatil2442
@rajpatil2442 3 жыл бұрын
sir one more video on eda all steps and implementation with dataset
@akashmanojchoudhary3290
@akashmanojchoudhary3290 3 жыл бұрын
Can we have a video on a real time project with all the necessary steps krish??
@bhargavikoti4208
@bhargavikoti4208 3 жыл бұрын
Thank you..much needed 🙂
@techandtalks6224
@techandtalks6224 2 жыл бұрын
sir please teach us ml and dl also...ur teaching way is very good
@harishgehlot__
@harishgehlot__ 3 жыл бұрын
Sir one video for Steps for model training
@ukamakaazode
@ukamakaazode Жыл бұрын
Thank you Krish!!!!!!!
@arjunsonar6907
@arjunsonar6907 3 жыл бұрын
Thanks Krish for the video I am about to start my first ever project as an intern and this helped me in an very deep way . Thank you 🙂 . If you give me any suggestions that would be very helpful for me .
@equbalmustafa
@equbalmustafa 2 жыл бұрын
Plz let us know your experience after 3 months of internship
@kawishdaniyal3640
@kawishdaniyal3640 3 жыл бұрын
Great Work sir jii ! 👌👌👌👌
@dalecioustalk9964
@dalecioustalk9964 2 жыл бұрын
Very helpful channel😁
@ShahnawazKhan-xl6ij
@ShahnawazKhan-xl6ij 3 жыл бұрын
Very important step
@apnapython
@apnapython 3 жыл бұрын
Thank you…great video
@ankitachaudhari99
@ankitachaudhari99 3 жыл бұрын
Thank you for this video sir
@pritishpattnaik4674
@pritishpattnaik4674 Жыл бұрын
great video sir
@sadiasultana667
@sadiasultana667 3 жыл бұрын
please make a project on sign language recognition
@SMHasan9
@SMHasan9 2 жыл бұрын
Thank you, sir.
@hsd287
@hsd287 Жыл бұрын
Tx a lot u did awesome 🥰❤️
@nazmulshohan8807
@nazmulshohan8807 3 жыл бұрын
Sir, Need video for feature extraction with example.
@AbhishekSherawat
@AbhishekSherawat 2 жыл бұрын
Is data cleaning the part of features engineering?
@mehrozalam94
@mehrozalam94 3 жыл бұрын
Great sir
@surajshukla4910
@surajshukla4910 Жыл бұрын
that expression and sound at 4:30..🤣🤣
@saimanohar3363
@saimanohar3363 3 жыл бұрын
Grt list of videos for EDA. In case we have more categorical variables and less numerical variables. Post EDA, should we work on Chaid algorithm. Please suggest. Thanks
@GamerBoy-ii4jc
@GamerBoy-ii4jc 3 жыл бұрын
all of these things which you shows in video.. is it available on your feature playlist??..with complete guidense!
@krishnaik06
@krishnaik06 3 жыл бұрын
yes sir
@islamickids19
@islamickids19 3 жыл бұрын
@@krishnaik06 I need your help
@shaelanderchauhan1963
@shaelanderchauhan1963 3 жыл бұрын
in some cases data collection is first
@joeljoseph26
@joeljoseph26 11 ай бұрын
One doubt, can we scale categorial lables even before encoding?? Is that possible ?
@anuragpandey6760
@anuragpandey6760 3 жыл бұрын
which pentab are you using
@prabhatale1135
@prabhatale1135 3 жыл бұрын
great video
@TheKumarAshwin
@TheKumarAshwin 5 ай бұрын
Does EDA and FE serve same purpose?
@gurpindersinghmuttar
@gurpindersinghmuttar 2 жыл бұрын
I have a grade column which contains values in percentage and cgpa mix ...how to convert all the data into percentage... A sample code will be helpful
@harshj84
@harshj84 3 жыл бұрын
@krish Naik, I am following your channel from the early days. I have a question, How to use information extracted from EDA? e.g by plotting a CDF graph, I can say that 70 % of people are below the age of 50. But the question is, where this information is used in the project?
@salehjamali6716
@salehjamali6716 Жыл бұрын
u r awesome
@priyanshusain2533
@priyanshusain2533 2 жыл бұрын
SIR CAN YOU SHOW THIS BY USING AN EXAMPLE STEP BY STEP
@rudrashankhanandy7938
@rudrashankhanandy7938 Жыл бұрын
"udush channel" - 0:02😂
@ajaykushwaha4233
@ajaykushwaha4233 2 жыл бұрын
Guys I have doubt, can anyone help. For scaling data: we have numerical column and categorical column are encoded in to numerical. So scaling need to done only on numerical column or on encoded column as well.
@MaheshWaranpr
@MaheshWaranpr 3 жыл бұрын
How to handle missing values in NLP like review and feedback not category features
@thepresistence5935
@thepresistence5935 3 жыл бұрын
just drop
@BIPLAVKANT
@BIPLAVKANT 2 жыл бұрын
Saying theory is easy than pratical with theory
@sathya.r3148
@sathya.r3148 7 ай бұрын
❤❤
@hrideshkumar7228
@hrideshkumar7228 3 жыл бұрын
Sir data structure and algorithm is used in data science
@SanjeevKumar-nc2rt
@SanjeevKumar-nc2rt 3 жыл бұрын
kzbin.info/www/bejne/hHWWeYt5aZuthZY This video of kris will answer your question.
@yashrajsinghrawat
@yashrajsinghrawat 3 жыл бұрын
Sir but, before doing EDA we can also split the data first, so that the test data can be completely isolated and don't have any idea about the training one. And then we can perform EDA on training data and further transform the test data. Is this a good practice? or do we perform EDA for complete data?
@ASAPKep
@ASAPKep 3 жыл бұрын
In theory you can create the training/test split at any point of the "pipeline". Generally you are sampling data points based on some distribution, or at random, and classifying those records as training/testing. That being said, you want the same transformations applied to the training and testing so you can apply one inverse function to revert these transformations. For example, if you are doing MinMax scaler, if you apply this after splitting then the inverse to undo the normalization will be different for each since the min/max for each dataset is different. So idealy you apply feature engineering on the dataset as a whole before splitting.
@vaibhavdubey2474
@vaibhavdubey2474 3 жыл бұрын
Can you make a detailed hyperparameter tuning?
@remrem6681
@remrem6681 3 жыл бұрын
He did , i think so
@yashmishra1024
@yashmishra1024 3 жыл бұрын
The telegram link is broken
@Ojjas26
@Ojjas26 3 жыл бұрын
But missing values should be handled before or after splitting dataset into train and test data?
@kancharlasrimannarayana7068
@kancharlasrimannarayana7068 Жыл бұрын
sir , for data columns which had more no. of zeros , we have to replace by mean,meadian, in numerical column. should we consider those zeros as missing values . for my data set belongs to timerseries which hads spends vs sales columns in different week level .i saw a column, spends in one channel is having too many zeros, what to do in this case?
@gauravsawant5482
@gauravsawant5482 3 жыл бұрын
Sir I am doing MSc integrated in data science(BSC+MSc) in Goa, so in 5th semester they will teach us machine learning so should I do MLDL from ineuron ?? And can u suggest course which will be plus point for my career
@mukeshkund4465
@mukeshkund4465 3 жыл бұрын
Go for that MLDL Course from ineuron...You will have vast knowledge
@gauravsawant5482
@gauravsawant5482 3 жыл бұрын
@@mukeshkund4465 amf I have one more question should I take MLDL from iNeuron or should I do it from the playlist which sir uploaded
@shansingh9858
@shansingh9858 3 жыл бұрын
If u are planning for job in AI or ML , then go for AppliedAI course.. if u are learning for your knowledge , u can consider Krish sir playlist or courses from Ineuron..
@camillajoseph3636
@camillajoseph3636 3 жыл бұрын
b6oaa vyn.fyi
Feature Selection Techniques Easily Explained | Machine Learning
23:01
За кого болели?😂
00:18
МЯТНАЯ ФАНТА
Рет қаралды 3,2 МЛН
Can You Find Hulk's True Love? Real vs Fake Girlfriend Challenge | Roblox 3D
00:24
How Much Tape To Stop A Lamborghini?
00:15
MrBeast
Рет қаралды 236 МЛН
Different Types of Feature Engineering Encoding Techniques
24:07
Krish Naik
Рет қаралды 194 М.
ML Was Hard Until I Learned These 5 Secrets!
13:11
Boris Meinardus
Рет қаралды 341 М.
Exploratory Data Analysis in Pandas | Python Pandas Tutorials
32:13
Alex The Analyst
Рет қаралды 149 М.
How To Become Expertise in Exploratory Data Analysis
10:05
Krish Naik
Рет қаралды 185 М.
Feature Engineering Techniques For Machine Learning in Python
47:58
Standardization Vs Normalization- Feature Scaling
12:52
Krish Naik
Рет қаралды 304 М.
За кого болели?😂
00:18
МЯТНАЯ ФАНТА
Рет қаралды 3,2 МЛН