The practice dataset and SQL statements for this video tutorial are available here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@woolfel2 жыл бұрын
One important thing to ask is "what causes duplicate records in the database?" In large applications where data comes from multiple sources, "de-duping" before insert is a real problem. It's good to know how to deleting duplicates, but in some cases that's not a viable option. For example, if the table has 100 million rows, deleting duplicates could be quite expensive and shouldn't be run during business hours. When I ask SQL questions in interviews, I'm not looking for "how do I use group by for de-duping." I want to see the candidate thinking about the larger problem and taking time to understand business needs. Cleaning up dupes after the fact doesn't scale and candidates that ask "what is causing dupes and what's the impact" are the developers I want. A developer that only knows how to group by, but never bothers to ask "why is this happening and what is the root problem" aren't people I will hire.
@LearnatKnowstar2 жыл бұрын
Love your thoughts! Thank you for sharing this with us 👍
@redguard1282 жыл бұрын
Depends what are you interviewing for. Asking a simple developer about business problems doesn't make much sense. If you want to hire a tech lead, team lead, project manager, architect or a consultant, then yes.
@woolfel2 жыл бұрын
@@redguard128 if I'm interviewing an entry level developer, I still ask the question for a few reasons. The first one is to expose the candidate to important issues they will eventually have to deal with. The second is to emphasize SQL isn't just for programming sake, it's to manage data and solve functional needs of the application. If applications aren't checking for dupes before inserting data, your database is going to become a pile of garbage very quickly.
@redguard1282 жыл бұрын
@@woolfel For me a developer is an executive role. He has to do what I tell him/her to do. The "Why"s and "How"s isn't their concern. Sometimes the business decides that duplicates is what they want so a developer that prematurely solves a problem, actually deals more damage than fixes. Some businesses run on circular logic, repeating themselves, defining settings everywhere, modifying global variables in functions, running multiple databases with duplicate data, having too low spec or too high spec servers, etc.
@woolfel2 жыл бұрын
@@redguard128 that's one way to do it. I work in the consulting world and growing our developers is very important to me. The faster the developer learns, the less hand holding I need to do and it makes the entire team more productive. I've worked in fortune 500 world long enough to know hiring a bunch of low level developers who can't grow causes more problems. In healthcare and finance sector, duplicate data causes huge data integrity issues. I would say 90% of the ETL work in fortune 500 deal with dirty data. When issues happen it's because of dirty data (bad references, missing data and dupes). Many of our customers waste 3-12months dealing with dirty data every year. A developer that isn't thinking about these issues and constantly growing will become obsolete. I have seen fulltime employees (aka not consultants) work this way. I question is that a good thing to teach people? Who wants to stay a low level engineer forever and be someone else's slave? Who wants to work at a job where the tech lead treats them like a pair of fingers?
@gayathriarunachalam834617 күн бұрын
Great one
@LearnatKnowstar16 күн бұрын
Thanks, I hope you found it helpful.
@niteshsingh43772 жыл бұрын
please upload dataset as well to follow along.
@LearnatKnowstar Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@whitecrowuk5752 жыл бұрын
Best is to use row_number() and partition by to create sequence column on duplicates - applicable to all kind of duplicates ( identical rows especially)
@MahaLakshmi_vlogs2 жыл бұрын
Yes
@KS2110002 жыл бұрын
True
@LearnatKnowstar4 жыл бұрын
Have you been asked a SQL query interview question that you couldn't answer? Let us know in the comments below and we will answer those in our upcoming videos!
@sachingaware68223 жыл бұрын
in my last interview interviewer ask me what is difference between count(*) and count(1) and which one is performance wise better so can u please make video on this.
@hellocartoons17352 жыл бұрын
Rollback; before commit;
@akhildas37433 жыл бұрын
Can we use dense rank also?
@surajpatil35793 жыл бұрын
Thanks for making this interview questions series.. cleared my major doubts
@LearnatKnowstar3 жыл бұрын
Glad to hear that! Thank you for your support.
@pankajbhatt73732 жыл бұрын
So deleting data with in cte will delete data in table too , how? Does this happens in derived table and views also?
@lucas13652 жыл бұрын
Same question
@danieljust2953 жыл бұрын
The question is how to delete duplicates from table, not how to display duplicates or how to display unique - two different things. The correct answer is DELETE FROM WHERE max() … , even though this is inefficient. It’s about how to ingest data into table without duplicates.
@Datapassenger_prashant2 жыл бұрын
It is also about how much time query is taking to execute
@danieljust2952 жыл бұрын
prashant verma Right, but the execution time this is secondary problem, first is the functionality, then optimization.
@Datapassenger_prashant2 жыл бұрын
@@danieljust295 absolutely right, if we just consider the test question thn yes your approach was the simplest..
@danieljust2952 жыл бұрын
prashant verma I add also that “deleting duplicates” is ambiguous phrase. Does it mean to completely remove duplicated rows from the table or leave unique row in case there are multiple rows with the same values (duplicates).
@Datapassenger_prashant2 жыл бұрын
@@danieljust295 that's a question we should ask before proceeding with the query. However, in an interview we don't get chance. So here we can assume.. remove the duplicate from the table. As she said in her video we can keep the latest one
@gauraosayasikamal46833 жыл бұрын
Good way of teaching
@LearnatKnowstar3 жыл бұрын
Thank you
@kristyowens22844 жыл бұрын
Please post more SQL queries.
@LearnatKnowstar4 жыл бұрын
Will be posting more soon.
@ouramazingnature2 жыл бұрын
Hello there, Can you please do a video on how to add a new large dataset (million of rows) to an exsiting table without deleting the data in the table. Provided that column names and data types are the same in old table and new added data. Thanks
@NoonAndKnight2 жыл бұрын
may not be best method, but create a loop to insert batch records. 20k columns at a time. make sure you set the loop to end once finished
@Beast702 жыл бұрын
Best method, drop indexes, create a loop based on optimal size of batch, typically 20k to 500k, using bulk insert, once complete re-apply indexes.
@LearnatKnowstar2 жыл бұрын
Very good approaches have been mentioned in the comments. Thank you
@arshadmohammed10902 жыл бұрын
In the last approach you mentioned, we are deleting the values from the cte right ? Not from the main table ?
@LearnatKnowstar2 жыл бұрын
Deleting from CTE will delete it from the underlying table.
@arshadmohammed10902 жыл бұрын
@@LearnatKnowstar got it. I wasn't aware of this fact about cte back then.
@Analystmind2 жыл бұрын
You are amazing teacher
@LearnatKnowstar2 жыл бұрын
Thank you 🙏
@visaalakshiselvaraj55723 жыл бұрын
Wonderful 😍 thanks a lot 🙏
@kayk13882 жыл бұрын
Deleting from CTE deletes data from the source table????????? How?
@hnaidu.pro218 ай бұрын
Yes, you can try the same. WITH CTE_dup AS ( SELECT EmpID, FirstName, LastName, ROW_NUMBER() OVER (PARTITION BY FirstName, LastName ORDER BY EmpID) AS rownum FROM [dbo].[tblDuplicate] ) DELETE FROM CTE_dup WHERE rownum > 1; SELECT * FROM [dbo].[tblDuplicate]
@omkarkhandekar74303 жыл бұрын
Excellent Mam😍
@LearnatKnowstar3 жыл бұрын
Thank you
@m00050 Жыл бұрын
Very helpful. I am totally new. Learning SQL. Question, what platform is this where u are running sql queries?
@LearnatKnowstar Жыл бұрын
This is SQL Server. You can download the software for free from Microsoft website.
@harikrishnasai61042 жыл бұрын
We can use rownumber also right ? While we deleting the duplicates in cte
@LearnatKnowstar2 жыл бұрын
Yes
@javeedakramshaik70524 жыл бұрын
I appreciate your work mam and videos are good explanatory
@LearnatKnowstar4 жыл бұрын
Thank you
@TechnoSparkBigData9 ай бұрын
How deleting the records from CTE is deleting the rows from main table?
@skdonsingh3 жыл бұрын
From rank function it's better .. thanks
@akhildas37433 жыл бұрын
For finding the duplicate values we can use 'having count >1', it wont be possible to use a delete funtion here and the having clause as in a subquery?
@LearnatKnowstar3 жыл бұрын
It will delete all occurrences of the duplicate records. The method explained retains one occurrence of the duplicate records.
@akhildas37433 жыл бұрын
@@LearnatKnowstar thank u
@shivakumarj40802 жыл бұрын
Hello, This video is very helpful. But I have one question, The maximum employee ID is the duplicate one right? But you are deleting min of employee ID. Could you please clarify that?
@LearnatKnowstar2 жыл бұрын
yes, you can delete max of employee id considering it as a duplicate. It was just assumed in the example that we want to retain the max employee id.
@VinodSharma-z3s Жыл бұрын
Really appreciate your effort.. If possible please add table script as well, it will helpful for beginner's. Thank you!
@LearnatKnowstar Жыл бұрын
Thank you. We have started adding the table scripts in our latest videos!
@LearnatKnowstar Жыл бұрын
The practice dataset and SQL statements are now available here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@praldandagi95742 жыл бұрын
Video is visible but explanation is superb
@parsuramkumar28262 жыл бұрын
Thanks for the video, can you pls tell me how the datasets used here can be found or accessed ?
@LearnatKnowstar Жыл бұрын
The practice datasets are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@spedeveryday96992 жыл бұрын
Very Helpful. Thank you
@LearnatKnowstar2 жыл бұрын
Thank you
@marcuslee20262 жыл бұрын
Excellent. Thank you!
@LearnatKnowstar2 жыл бұрын
Glad it was helpful!
@kebincui3 жыл бұрын
very good; thanks
@LearnatKnowstar3 жыл бұрын
Thank you
@sonyguptaagrawal6028 Жыл бұрын
please provide practice database ... that you have used in this video
@LearnatKnowstar Жыл бұрын
The practice dataset and SQL statements are available here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@sonyguptaagrawal6028 Жыл бұрын
@@LearnatKnowstar thank you
@rajasekharm90492 жыл бұрын
To delete duplicates just we can go through distinct * from emp we can delete duplicates from entire table
@narendrachowdary9912 жыл бұрын
Good knowledge Bro
@buttercup98842 жыл бұрын
Not really in the example these records were not duplicates as they differed at EmployeeID - distinct would work only is all fields would be the same. There are other ways of getting rid of duplicates as using rowids something like (using the example where duplicates existed at first,last name, phone and emal DELETE FROM EMPLOYEE1 WHERE ROWID NOT IN (SELECT MIN(ROWID) FROM EMPLOYEE1 GROUP BY FIRSTNAME,LASTNAME,PHONE,EMAIL);
@sweetysweetyvghb4 жыл бұрын
Mam...instead of employee id , can we use rowid here and order by rowid ...because in many tables practically column like emp I'd won't be present.
@LearnatKnowstar4 жыл бұрын
You will need to choose a key column that identifies a unique record in the table
@sweetysweetyvghb4 жыл бұрын
@@LearnatKnowstar can I not use rowid ?
@LearnatKnowstar4 жыл бұрын
Rowid would be unique for each row in the table. Each duplicate row will have its own rowid and hence rowid can not be used. You need to identify a key column that represents a unique record to business
@narayanagottipati59804 жыл бұрын
Hi madam can you please make a video on how to recover accidentally deleted data from the table. Thanks in advance
@LearnatKnowstar4 жыл бұрын
That's a great question. Will definitely plan a video soon.
@sarmasvali61472 жыл бұрын
We can use roll back command, I think, if we delete the data accidentally. But this command can't be executed for DDL commands
@sorbasishshaw51612 жыл бұрын
with Employee_CTE as (Select *, RANK() over (partition by FirstName, Lastname order by EmployeeID desc) as Rank from Employee); delete from Employee_CTE where Rank > 1; whenever I am typing this block of code in my oracle db (11g), I am getting an error ORA-00923: FROM keyword not found where expected. can somebody please help me in this matter?
@LearnatKnowstar2 жыл бұрын
You do not need to terminate the CTE with a semi colon.
@geraldsegun32972 жыл бұрын
When I try to duplicate your example on mysql, i get the error Error Code: 1288. The target table employye_cte of the DELETE is not updatable . This is the query I am trying to run with employye_cte as (select firstname, lastname, employeenumber, row_number() over (partition by lastname order by employeenumber) as rownumber from employees1) delete from employye_cte where rownumber = '2' What am I missing? Thanks
@ivanbesando5562 жыл бұрын
this is also my problem, i found that mysql couldn't delete a subquery. do you find the solution for this?
@geraldsegun32972 жыл бұрын
@@ivanbesando556 not so far yet
@ishitvasingh9902 Жыл бұрын
Hi, I have a query please reply to what is wrong in it, my employ table contains, id, name, sal, email with dup_emp as (select *,dense_rank() over (partition by email order by id desc) as dens_rnk from employ e) delete from dup_emp where dens_rnk >1 now this code is showing this error, SQL Error [42P01]: ERROR: relation "dup_emp" does not exist Position: 127 i am selecting everything , including the cte, and then executing the query
@garlaamar89733 жыл бұрын
Thanks u mam for sharing this.🙏
@LearnatKnowstar3 жыл бұрын
Thank you
@souravmoha22242 жыл бұрын
Hi Could you please share the sample data.Thanks
@LearnatKnowstar Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@shashidharr65952 жыл бұрын
Nice 👍helpful
@LearnatKnowstar2 жыл бұрын
Thank you
@EternalEvanesce Жыл бұрын
What is difference between roll_number and rank function? Both can be used interchangeably??
@paulwerdak58882 ай бұрын
Do you mean Row_Number instead of roll_number?
@ersuresh4488 Жыл бұрын
as per my understanding if we have >2 duplicate records in a table then rank() and denserank() will not not work here in this case we have to use row_number() only!!!
@ipsita12383 жыл бұрын
Thanks
@LearnatKnowstar3 жыл бұрын
Thank you
@ShubhamRai063 жыл бұрын
what if there are more no of column and just have different timestamp , or user but having duplicate values( key columns) then in that case how can we delete duplicate using row_number...
@LearnatKnowstar3 жыл бұрын
You just need to use key columns in partition by clause
@benarjiyt9942 жыл бұрын
@@LearnatKnowstar 0
@tiagosilva8562 жыл бұрын
You are a bless in my life 😘
@GopiVardhan Жыл бұрын
Where is dataset to download
@shitalgavasane86999 ай бұрын
Output sequence of below query should be - select firstname,lastname,count(*) from employee gropy by fistnmae,lastname- output- firstname lastname count(*) Adam ownes 2 Mark wills 1 natasha lee 2 ruley jones 1
@naveenreddy2059 Жыл бұрын
ne style bavundi akka
@asutoshnayak13913 жыл бұрын
If the data set is bigger suppose 10000 rows then how would you remove duplicate from that without seeing which one is duplicate and to remove ? Please tell me.
@abiodun.alawal8533 Жыл бұрын
Does anyone know how i can have access to the datasources used in this video?
@LearnatKnowstar Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@aakashvaish37572 жыл бұрын
Can you please share the DDL of all the questions mentioned ?
@LearnatKnowstar Жыл бұрын
The DDLs are available her e- know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@harshitchoudhary46933 жыл бұрын
Where can I get these practice tables?
@LearnatKnowstar3 жыл бұрын
You can practice with tables in the Microsoft Adventure Works database.
@LearnatKnowstar Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@jaydeeppatidar41892 жыл бұрын
Question:- You have created CTE and deleted records from CTE then how data got deleted from the original table? Will wait for answer from anyone.
@LearnatKnowstar2 жыл бұрын
This is a feature of CTE. If you delete from CTE , it will delete from the underlying table 👍
@jaydeeppatidar41892 жыл бұрын
@@LearnatKnowstar thank you!
@unknownnaanu2 жыл бұрын
Select rank() over ( partition by firstname order by employeeid) as employeenumber, firstname,lastname,phone,email from employee; Please correct me if i am wrong .
@sakesh4042 жыл бұрын
Dear Mam, please zoom,writings are not readable. 🙏🏼
@LearnatKnowstar2 жыл бұрын
Sure.Noted. In latest videos, the font is enlarged.
@shashidharr65952 жыл бұрын
👍👍👍
@LearnatKnowstar2 жыл бұрын
Thank you
@Sayon____bhattacharjee4 жыл бұрын
DEAR MADAM , PLEASE HELP....... WITH NEW_TABLE AS ( SELECT ID, F_N, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS RANK FROM EMPLOYEES ) DELETE FROM NEW_TABLE WHERE RANK>1; ---------------------------------------------------------------------------- ERROR MSG: ORA-00928: missing SELECT keyword 00928. 00000 - "missing SELECT keyword" *Cause: *Action: Error at Line: 6 Column: 1
@sarmasvali61472 жыл бұрын
You have used row num in beginning and rank command at the end. How it will execute. Use either rownum or rank.
@vikastiwari65643 жыл бұрын
I appreciate your way of explanations. I just loved it...I would say thank you but I want you to make a video on common expiration which are very important in SQL. one more question for you it that. Suppose there are two table and name is table A and table B Table A having Table B having ID | STUDENT NAME ID | SUJECT | MARKS 1 A 2 ENGLISH 40 2 B 4 ENGLISH 60 3 C 5 MATHS 100 4 6 SCIENCE 80 Find out student name who got max mark? I had been asked this question. plz solve here so other people can also get to know. Thank you very much in advance Mam....I will keep on waiting for answer of above question.
@LearnatKnowstar2 жыл бұрын
Thank you. Please see the below video- It has a similar query to the one in your comment. kzbin.info/www/bejne/kGSXiWSXYt-Cr8k
@kayk13882 жыл бұрын
What is in table A?
@hanam61382 жыл бұрын
I hope my answers right, if wrong please let me know tks. Select A.student name from A inner join B on A.id = B.id where max(b.marks)
@jaitiwari2412 жыл бұрын
Where is sql code to practice
@LearnatKnowstar Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@brbr14143 жыл бұрын
Doesnt work, table not updateable
@vishnuvamshi46452 жыл бұрын
PUT QUERY IN DESCRIPTION
@LearnatKnowstar Жыл бұрын
The practice dataset and SQL statements are now available and you can access them here - know-star.blogspot.com/2023/04/sql-query-how-to-delete-duplicates-from.html
@ajay50283 жыл бұрын
Video not cleared...your voice is clered... Not able to see words
@ashokreddy33662 жыл бұрын
Rownum is best instead of these
@geraldmedrano85942 жыл бұрын
My answer: Google
@LearnatKnowstar2 жыл бұрын
Google might lead you here 👍
@shantanu69402 жыл бұрын
waste of my time
@velo13372 жыл бұрын
delete from dbo.employee where employeeid not in (select max(employeeid) from dpo.employee group by firstname, lastname)
@chaitanyakumar67902 жыл бұрын
Hi , Actually after having the query SELECT *, (RANK() OVER (PARTITION BY Firstname,Lastname ORDER BY EmployeeId asc)) AS Rank1 FROM dbo.Employee1, I am unable to get the data in the order of EmployeeId, its getting in the order of Firstname alphabetical order, Can you please let me know the issue
@jamesgg99502 жыл бұрын
DELETE FROM aliasB FROM dbo.employee1 aliasA INNER JOIN dbo.employee1 aliasB ON aliasA.FirstName = aliasB.FirstName AND aliasA.LastName = aliasB.LastName AND aliasA.EmployeeID < aliasB.EmployeeID -- ONLY valid if PK supports '