Repeated DNA Sequences - Leetcode 187

Repeated DNA Sequences - Leetcode 187 - Python

Рет қаралды 35,266

NeetCode

Күн бұрын

Пікірлер: 40

@1236emu 2 жыл бұрын

Thanks

@masternobody1896 2 жыл бұрын

me unemployed as always. thanks neetcode

@yungzed Жыл бұрын

r u still unemployed

@hexgrade177 8 ай бұрын

@@yungzed hopefully not for long

@Spidey-2006 8 ай бұрын

@@hexgrade177 lol

@greatestever2914 2 жыл бұрын

Solution : Every iteration starting from index i = 0 -> size of string - 9 you are getting the substring of 10 characters out from i'th index and adding them to the set, while also checking if the added substring exists in the set. And if it does, add it to result (which means the 10 character substring appeared more than once). Then finally returning the result. I understand this very clearly but wouldn't come up with it on the spot.. I guess more practice in recognizing these patterns will eventually help me get through interviews at tech companies. Good luck everyone

@poptart007-b2r 2 жыл бұрын

I was kinda worried we might have to use some kinda varriation of KMP to solve this pb until i realized the brute force solution was not so bad + they said 1

@mostinho7 Жыл бұрын

Done thanks todo take note Using hashset to store every 10 letter sequence in the string to check if seen before If seen before then add it to results hashset so using two hash sets one to identify seen and one to hold the results as a sequence can be found multiple times

@begenchorazgeldiyev5298 2 жыл бұрын

I think we can use a hashmap instead of a set for seen and the value could be false (meaning we didn’t add to result yet) by default and when seen second time and we add it to our result, we can update it to true. That way our result can be a list from the start and second check of true or false value would ensure we don’t add it to our result twice. All said, thank you for a detailed, easy to understand explanation

@anonymy834 Жыл бұрын

I dont understand in example one, why "aaaacccca", "aaacccccaa|","aacccccaaa",etc not in the output? they also appeared more than once in the string

@imPriyansh77 Жыл бұрын

Same doubt

@sahilbhagat5024 11 ай бұрын

i also have same doubt

@venkatanishanthchaganty1697 6 ай бұрын

Pay close attention to the attention. In the output string, there are 5 substrings of Cs and 6 in the second substring of C. Therefore, the examples you mentioned, never happened twice and were not added to the result array.

@tusharmishra8190 2 жыл бұрын

Great videos sir!! your videos are go to whenever I'm stuck on a particular question

@harigovind11 2 жыл бұрын

Isn't substring O(n) operation ? In Java it is. Using rolling hash to compute duplicates is useful here.

@ShikaIE 2 жыл бұрын

But it is a constant 10 isn’t it. Would that be 10n ~ n.

@mixtli1975 4 ай бұрын

The ratings on these problems is crazy. Some mediums require you to have a PhD in rocket surgery, and some are like this. I think my worry with getting this problem in an interview is I'd be wondering of I'm missing something.. like, some magic solution to get rid of the space for the hash.

@Alchemx80 2 жыл бұрын

I'm worried that this initial solution wouldn't satisfy the interviewer. To get full marks I suspect you would have to implement the space saving mapping technique you review, or the bitmasking solution provided as solution #3 on leetcode.

@moizurrehman4748 Жыл бұрын

time complexity will be O(n*k) where k = length of string slicing. to do it in O(n), you have to use rolling hash algorithm.

@anonymoussloth6687 2 жыл бұрын

Instead of two hash sets, couldn't we use a hash map to store the frequency of each string? And for all the strings with frequency more than or equal to two, ad those to the list.

@pramalvi 2 жыл бұрын

Because we dont need frequency of string in output so just hashset is enough to identify the target string.

@admercs 2 жыл бұрын

You could but you would need to add a for loop that is O(m) with respect to the size of the hashmap. The space saving is only ~20%, as each integer is 64 bits, 8 times the size of a UTF-8 character.

@anonymoussloth6687 2 жыл бұрын

@@admercs but internally, when we convert a hashset to a list, that also takes some time right? It isn't a constant time operation

@admercs 2 жыл бұрын

@@anonymoussloth6687 Maybe, list() operates on iterables similar to a for loop and would be O(m) in that case. But, coercing a set to a list likely passes an array of pointers to data in the underlying cpython implementation. That would be O(1) time.

@admercs 2 жыл бұрын

Here is your 4-bit encoding using a hashmap: hashmap = { 'A': 0b00, 'T': 0b01, 'C': 0b10, 'G': 0b11 } print(hashmap['A'])

@jordangamble172 2 жыл бұрын

Superb content. Was wondering if you could go over 1155. Number of Dice Rolls With Target Sum on LeetCode. Got asked this problem in a mock interview with a Google Engineer

@axeliuxTEC 2 ай бұрын

What is the time complexity for cur = s[ l: l + 9] ? Isn't that O(k) where k is the size of the substring? Then wouldn't the time complexity of this solution is O (n * k ) where n is the size of the dns sequence and k the size of the substrings? Why are we taking on this video that the complexity of this is O (n ) ? In java though, this would be O ( n ) given how Java handles substring though ;)

@mucle6 Жыл бұрын

hash sets hash the input. I don't think it saves space. Similar to websites that limit how long your password can be. No website saves your password, they save a >100 character long hash of your password

@vdyb745 2 жыл бұрын

Superb as usual !

@ovasarg3294 2 жыл бұрын

Thank u for your videos

@RockyScenes 2 жыл бұрын

Can someone explain why we do -9 in the for loop?

@riceball100 Жыл бұрын

He mentioned at 5:50 that you want to make sure there's at least 9 characters remaining, so basically it's to only check sequences of 10 chars and if it goes out of bounds of the array this `-9` would prevent that

@simply54 3 ай бұрын

you are awsm bro

@anandakrishnanp6579 2 жыл бұрын

10 letter long Substrings like 'AAAACCCCCA', 'AAACCCCCAA', 'AACCCCCAAA', 'ACCCCCAAAA' are also repeated . The solution provided does cover these as well, just that the example hasn't provided all

@FCBarcelonaXMI 2 жыл бұрын

If you look closely, the 2nd set of repeating C's contains 6 C's not 5, making the examples you listed invalid

@gianniprocida3332 2 жыл бұрын

@@FCBarcelonaXMI I don't get your point. The result clearly doesn't match the expected output. result= ['CCCCCAAAAA', 'AACCCCCAAA', 'AAAAACCCCC', 'ACCCCCAAAA', 'AAACCCCCAA', 'AAAACCCCCA'], output=['AAAAACCCCC','CCCCCAAAAA]

@FCBarcelonaXMI 2 жыл бұрын

@@gianniprocida3332 how are those results repeated? count the number of C's in the original string

@George-qr3tr 2 жыл бұрын

@@FCBarcelonaXMI I overlooked the input as well..thanks for bringing this up

@shauryakumar6372 Жыл бұрын

why this is not working: vector findRepeatedDnaSequences(string s) { int n = s.size(); vector ans; if(n < 10) return ans; map m; int i=0; int j=0; string temp = ""; while(j < n){ temp += s[j]; if(j-i+1 < 10) j++; else if(j-i+1 == 10){ m[temp]++; //so that it will be included only once if more than one occurence if(m[temp] == 2) ans.push_back(temp); j++; } else{ temp.erase(0,1); i++; j++; } } return ans; }