First hour with a Kaggle Challenge

  Рет қаралды 129,519

sentdex

sentdex

Күн бұрын

Neural Networks from Scratch: nnfs.io
Channel membership: / @sentdex
Discord: / discord
Support the content: pythonprogramm...
Twitter: / sentdex
Instagram: / sentdex
Facebook: / pythonprogramming.net
Twitch: / sentdex

Пікірлер: 321
@SexySnorlax
@SexySnorlax 4 жыл бұрын
"keep social distance" sir know your audience we already are
@abcdxx1059
@abcdxx1059 4 жыл бұрын
nope your were at distance when everyone was outside the house
@GaneshKumar-zz9py
@GaneshKumar-zz9py 3 жыл бұрын
@@abcdxx1059 mjhjnn bjm
@GaneshKumar-zz9py
@GaneshKumar-zz9py 3 жыл бұрын
@@abcdxx1059 khdw
@suzikang9283
@suzikang9283 4 жыл бұрын
0:00 intro to dataset 3:18 browsing through files 5:58 loading files into python program 8:43 (cleaning & structuring) getting keys from text and storing into variables 15:07 thinking about “extracting meaning from text” --> NLP. i.e. what are we looking for? --> keywords in papers that are consistent 19:03 looking for “incubation” in text 32:18 using regular expressions 42:08 plotting 44:20 adding rest of files to script 45:21 looking at other kernels on kaggle
@alexdavis9324
@alexdavis9324 4 жыл бұрын
Thank you for not editing out the mistake at the 30 minute mark. That makes me feel a lot better about my own silly mistakes.
@sentdex
@sentdex 4 жыл бұрын
Heh, happy to keep it realistic :D
@wongright
@wongright 4 жыл бұрын
@@sentdex Improved approachability was the big benefit when watching you debug in real time. Thank you.
@hammerofheaven1313
@hammerofheaven1313 4 жыл бұрын
Not to be confused with kegel challenges.
@whoisabishag3433
@whoisabishag3433 4 жыл бұрын
... that's the word ... She whispered to me ...
@Alex-pd5xc
@Alex-pd5xc 4 жыл бұрын
hah!
@caseymeehan5901
@caseymeehan5901 4 жыл бұрын
So glad you left in that error (printing full text ~30 min mark). It makes me feel so much better :) Thanks for doing this, it is so rad! I was looking at the kaggle competition but I am too much of a noob to know where to start.
@TheAcolossus
@TheAcolossus 4 жыл бұрын
Everyone: What are you working on? Me: A Covid-19 machine machine learning model Everyone: How does it help us with Covid-19? Me: It doesn't
@sentdex
@sentdex 4 жыл бұрын
If you're helping to parse through the insanely dense amount of information and research to answer the questions that are being asked, you *are* helping.
@owoled282
@owoled282 4 жыл бұрын
Hey, is machine learning squared deeper than deep learning ?
@cesarp6761
@cesarp6761 4 жыл бұрын
if it keeps you home for hours doing this.. it does help! :p
@fuba44
@fuba44 4 жыл бұрын
I liked this "come as you are" format, could have easily been longer..
@pw7225
@pw7225 4 жыл бұрын
In fact, I think this is way better for learning. Since you see the actual process. Like George Hotz' coding sessions.
@non_complete
@non_complete 4 жыл бұрын
@@pw7225 I love george's sessions. You might like Jon Gjengset too he has a similar style, mostly does rust development.
@parkerdinkins5541
@parkerdinkins5541 4 жыл бұрын
@@pw7225 geohot is an absolute mad lad! this format is definitely much better than structured sessions. it really captures the trial and error process of programming
@danielcolomer7815
@danielcolomer7815 4 жыл бұрын
@@parkerdinkins5541 same here for quantum computing kzbin.info/door/-2knDbf4kzT3uzWo7iTJyw (disclosure I own the channel xD) and that's exactly the reason i chose the style as well! I find showing the real process is the best way to help ppl learn
@FlorianLinscheid
@FlorianLinscheid 4 жыл бұрын
Just to answer that very basic question. Make the decimal part optional by grouping it and then make the whole thing a group and you're good to go. re.findall(r'( \d{1,2}(\.\d{1,2})? day[s]?)', sentence)
@sentdex
@sentdex 4 жыл бұрын
Heh, thanks!
@floxire7042
@floxire7042 4 жыл бұрын
Could you please explain why his technique with the parentheses didn't work ?
@sentdex
@sentdex 4 жыл бұрын
@@floxire7042 if you just have 1 set of parentheses, you'll find only examples that match the full string u searched for. But only return the part of the match inside the parentheses
@FlorianLinscheid
@FlorianLinscheid 4 жыл бұрын
The main catch here was that he made that one group by using parentheses. Regex will only output what's inside the parentheses then. So to get the whole number again, you need to make the whole expression another group then. Putting the first or the second half in parenthesis doesn't matter in this case. I just found it more logical to have always the first two digits, followed by an optional decimal. Other way round works just as well. Hope that was clear.
@floxire7042
@floxire7042 4 жыл бұрын
@@sentdex Oh ok thanks I didn't know that
@nmertsch8725
@nmertsch8725 4 жыл бұрын
About the cite-fields in the JSON data: When you write a scholarly article and use findings of other articles (e.g. to compare your results with them or to build your study on earlier findings), you cite the original articles. "Because others already found out that rotten fish smells (see 1-4), ..." would contain a cite range from 1 to 4, because the cited articles 1 to 4 have shown that rotten fish smells and you build up on that without investigating it yourself. In the bottom of the article there is a list of references, where each number is associated with another scholarly article.
@hewypy9015
@hewypy9015 4 жыл бұрын
loving these live coding videos where you explore real world datasets. thank you!
@balthazaromeyer4334
@balthazaromeyer4334 4 жыл бұрын
Sentdex you are one of the best teacher I ever encountered. Keep Strong, Keep Teaching Us! People like you should be Glorified! Leaving your error in he videos is humble and remembers the mortals where we come from.
@gabrielk3733
@gabrielk3733 4 жыл бұрын
I've been waching your channel for a few months now...the way you are talking, the way you are thinking, your python knowledge and experience is absolutely amazing for me, you're a GENIUS!
@sebbecht
@sebbecht 4 жыл бұрын
Just got started browsing kaggle for future challenges two days ago, great to see a series on this :) excited to see how far you take this!
@TheSaintsVEVO
@TheSaintsVEVO 4 жыл бұрын
😂 “that went so fast” - yeah, you do remember you have a supercomputer right?
@ankushbisht-0055
@ankushbisht-0055 2 жыл бұрын
Love how you keep the debuging part and small mistakes. Keep doing such great live coding , was looking for such content from a long time.
@thesitcomaddict
@thesitcomaddict 4 жыл бұрын
Your thought process is so clear! Thanks for showing this was really enlightening to watch :)!
@cyruscuenca
@cyruscuenca 4 жыл бұрын
I'm learning to analyze image data right now, and even though you're analyzing text, I found this really helpful. Thanks!
@merth17
@merth17 4 жыл бұрын
he's just that inspiring
@connor4440
@connor4440 4 жыл бұрын
SO happy you left the error in lol, shows that every programmer, no matter the skill level can have stupid little errors like that
@HellTriX
@HellTriX 4 жыл бұрын
I think the most impressive part of this challenge, is a 50 minute challenge of not mentioning that which shall be demonetized :)
@nicky_buttigieg
@nicky_buttigieg 4 жыл бұрын
For the incubation day regex you can also do: re.findall(r"(\d{1,2}\.?\d{1,2}) day", sentence) This will find any integer/decimal number followed by 'day' but only output the number, avoiding the need to split.
@BrentBrewington
@BrentBrewington 3 жыл бұрын
you got my like & subscribe, my dude. wow, this was super useful to watch - i'm a Sr Data Analyst looking to go Data Scientist, so looking to learn from more people like you. also kind of interesting to watch this 1 yr later
@RepiGameplays
@RepiGameplays 4 жыл бұрын
To get the names of folder I usually just use F2 instead of right clicking and rename. I rename stuff a lot and it surely has been helping. Great video!
@attentiondeficitdisorder
@attentiondeficitdisorder 4 жыл бұрын
These are so awesome to watch. It really helps to see your logic and thought process. As someone new to trying to process datasets like this, it's great to have confirmation that I'm not doing it some weird, crazy way.
@PositronQ
@PositronQ 4 жыл бұрын
Formula: Pf = the probability of infection on the virus C = the consequences of the situation Dn+1 = C*Pf Dn = another_day Dn+1 = next_day or actual day So Dn+1/Dn= “the porcentage of the increase of days” example: 22/11=2 so 2 is the porcentage of that days in increase but if you want to predict the next day so multiply the actual_day(dn+1 * “the porcentage of increase” in This case 22*2= 44) and this a formula if you want to predict all days of your country or in the world.
@erosennin950
@erosennin950 4 жыл бұрын
That's what im talking about a kaggle challenge MAN! big thanks :)) I would like to learn from a pro, how to approach problems and solve them the fastest way possible.
@Mahmoud_Gabr
@Mahmoud_Gabr 3 жыл бұрын
I’m sure I’m not the first to ask, but please do more videos like this!! The lack of editing is also very helpful. Thank you 👍
@kar-s6716
@kar-s6716 3 жыл бұрын
That print(t) made my day .. 😂😂
@DanipBlog
@DanipBlog 2 жыл бұрын
I'm glad you decided to leave the 'print(t)' blooper in the video 😂😂
@ramzykaram296
@ramzykaram296 4 жыл бұрын
I can keep watching you programming the whole quarantine time, seriously your videos are so interesting so please do more videos
@clearthinking5441
@clearthinking5441 4 жыл бұрын
Great video Harrison! I really enjoy seeing how you think, it gives the viewer a more accurate picture as to what coding is really like. Keep these videos coming please!
@shivamshukla438
@shivamshukla438 4 жыл бұрын
this is really nice i think we can apply more re's and logic to get more information as you suggested like cleaning too
@Evilleoleo
@Evilleoleo 4 жыл бұрын
dude you have so many good videos, was going through your data analysis playlist yesterday, so today this was perfect thanks dude!
@sentdex
@sentdex 4 жыл бұрын
Glad you like them!
@DP-dc2vv
@DP-dc2vv 4 жыл бұрын
Super informative, thanks for posting. Two thoughts: (1) Get VS code or Spyder/Anaconda; I use Spyder for general purpose Python stuff--the iPython integration is the best I've found. VS Code is potentially better still (depending on preferences), as it provides access to a terminal as well (though the iPython implementation is run through Jupyter and pretty janky). (2) Re the regex stuff, no shame in googling. There isn't a SINGLE programmer of any sort that doesn't need to google at times. I've been a professional for over a decade and regularly need to google syntax on basic methods, etc. Regex is a much more involved beast, and unless you use it daily I'd be amazed if you remembered syntax needed for specific applications.
@qaispalekar
@qaispalekar 4 жыл бұрын
Thanks for maiking this video. It would be great if you make more videos like this. Will get a rough idea of how to tackle such big data.
@not-lain
@not-lain 3 жыл бұрын
24:59 beautiful cup noises
@marcelo403
@marcelo403 4 жыл бұрын
maybe you mentioned it at some point in the video, but if not: it is important to note that since you got the last number before the word "day" or "days", likely your estimation is upward biased, because you might catch the mean incubation period but also the upper bound of those papers that report lower and upper bound, such as "from 4 to 12 days", or "4 - 12 days", and so on...
@vaibhavkhobragade9773
@vaibhavkhobragade9773 3 жыл бұрын
You are so swift. It seems you are invincible in coding. I love your walkthrough for the kaggle challenge.
@hectoralarcon4888
@hectoralarcon4888 4 жыл бұрын
I envy the fluency of your python programming. :( I always get stuck during preprocess for a while.
@Luckylesss
@Luckylesss 4 жыл бұрын
I LOVE these types of videos. Please keep them coming! Maybe even show us your googling to find answers to problems like needing a regex refresher.
@sentdex
@sentdex 4 жыл бұрын
Given more time, I would have included that. I've including some of my internet searching in the past, seems like people really enjoyed this format of video, so maybe more to come :)
@KylePapili
@KylePapili 4 жыл бұрын
Very interesting seeing your thought process working through a new dataset like in this vid. Loved it!
@Tony-mt4pi
@Tony-mt4pi 3 жыл бұрын
When I saw that he did not notice the "print(t)" line, I wanted to shout "into" the screen to let him know that.
@TiboLatte
@TiboLatte 4 жыл бұрын
This was really useful please continue ! You're doing awesome work thanks
@ralphlagos4210
@ralphlagos4210 4 жыл бұрын
Love this channel! So glad I found it, thanks for uploading :).
@jackbillimack7159
@jackbillimack7159 4 жыл бұрын
You are the man sentdex! It's hard to express how much you have inspired me while introducing great concepts. Keep up the great work. Does anyone know if the scientific community is making strides to standardize raw data and move from PDF-type papers that need more cleanup to interactive IPython-type papers that could store all findings? A move like this seems like it would open the flood gates of open-source hypothesis testing and review. Hosting and publicizing poor analysis could be a problem, but I would appreciate any information and opinions folks have.
@Dhukino
@Dhukino 4 жыл бұрын
in university theses you can have abstracts in multiple languages, hence the list structure. in papers? probably not
@pinakeekaushik7803
@pinakeekaushik7803 4 жыл бұрын
print( " Really loving this bro, can you please continue it like for other kaggle challenges too" )
@sentdex
@sentdex 4 жыл бұрын
I could try some others like this, sure
@onlyme0349
@onlyme0349 4 жыл бұрын
You're hardcoding a format to search information that has wildly different formats. Just parse out any number not separated by a space. Since you can expect it in the format of "8 to 12 days" you'd have to write something for that too eventually.
@cayanaraycaudhuri
@cayanaraycaudhuri 4 жыл бұрын
I got 9.93 days with the data I downloaded. They have removed non commercial stuff. Have you looked into the nltk package? This video was awesome, and I learnt something I never thought was possible.
@waron999
@waron999 4 жыл бұрын
Make a data extractor. Extract plot and tabular data from pdf files and add meta data about the methodologies. Many of the coronavirus research papers a made open access This would create a database that could be highly useful.
@crazyoldhippieguy
@crazyoldhippieguy 4 жыл бұрын
25-03-2020.Hi l just found you today.l,m the guy who gave the talk in 1990, called Joy and Creativety in the qauntum realm With the blessing of IBM and Honeywell in Malta, the laser cooled ion team from Boulder, was there, it was to pick up were fieyman left off.Thanck you.
@whoisabishag3433
@whoisabishag3433 4 жыл бұрын
"Planting The Flag Of The First Comment"!" Kaggle And Corona ... New Tactics
@shkronjax
@shkronjax 4 жыл бұрын
very nice. Im glad this knowledge is open source.
@souradeepsinha
@souradeepsinha 4 жыл бұрын
Do a live chat while doing this challenge.. No one has much to do anyway and we can brainstorm together! :P
@alexr7530
@alexr7530 4 жыл бұрын
36:56 I guess in the regular expression you wanted to make a non-capturing group: '(?: ...)'
@fuat7775
@fuat7775 4 жыл бұрын
Hey thanks for the video. Abstract is a list because each paragraph is a text item. Look for the schema file included in the directory, that might help you. Cheers
@josephsmy1994
@josephsmy1994 4 жыл бұрын
Hey sentdex, one of the wacky things about re in python is that when you use parentheses, you need to specify that you're not trying to capture that as a group when matching a whole pattern. (?: .... ) denotes a non-capture group you're looking for this... "(?:\d{1,2}\.)?\d{1,2} day" at 36:00
@ahmetdiril824
@ahmetdiril824 4 жыл бұрын
This also worked for me: " \d{1,2}\.?\d{1,2} day" I think the ? considers all of the expression before it. I also dropped the r at the very front.
@Gamegankk
@Gamegankk 4 жыл бұрын
the forward slash works and it always works
@mdougf
@mdougf 4 жыл бұрын
Thank you so much! I’ve been so intimidated by even approaching a Kaggle problem!!!!
@Shubham-ny2ce
@Shubham-ny2ce 4 жыл бұрын
you are really doing great to teach the scholars and non scholars . . Why don't you start some open source projects or start building a community if you are not planning a new startup . You are a great learner and far better teacher.
@techystuffs371
@techystuffs371 3 жыл бұрын
It was the coffee mug for me :)
@nighteagle9961
@nighteagle9961 4 жыл бұрын
38:13 I think this works: single_day = re.findall(r" (?:\d{1,2}[.])?\d{1,2} [D,d]ay", sentence) Thanks for the videos and keep this good work. Learning a lot from you.
@junaidmahmud2894
@junaidmahmud2894 3 жыл бұрын
Can you please do some more competitions like this? This is amazing?
@mihaisabadac9631
@mihaisabadac9631 4 жыл бұрын
Great tutorial and good theme also :) For me worked re.findall(r" \d{1,2}\.*\d{1,2} day", sentence). I don't know if someone else wrote some other solution, too many comments :D Thanks sentdex
@cosmosnomad
@cosmosnomad 4 жыл бұрын
Probably want to avoid having more than one incubation time per paper to avoid skewing the plot/mean. Make sure they're associated with Covid-19. I saw one to do with an avian flu strain.
@sentdex
@sentdex 4 жыл бұрын
This was addressed in the video with that exact example :p
@nassehk
@nassehk 4 жыл бұрын
Hello. Great to see your workflow. I think median is a better measure of finding the average rather than mean in your case because you are looking at a population of incubation times.
@adeeb12321
@adeeb12321 4 жыл бұрын
thank you
@ramil17998
@ramil17998 4 жыл бұрын
Really enjoyed the video. Thanks for making all the mistaies and raising my confidence bar :P
@sentdex
@sentdex 4 жыл бұрын
Heh, happy to help
@puneetsingh5219
@puneetsingh5219 4 жыл бұрын
Yo, this video was long due. Thank you.
@ambarishkapil8004
@ambarishkapil8004 4 жыл бұрын
Nice and Insightful tutorial.
@classicrockman90
@classicrockman90 4 жыл бұрын
Definitely look into glob from the standard library. Much easier than nested for loops to pick up files recursively in a folder structure with a pattern like *.json
@adityask277
@adityask277 4 жыл бұрын
Hey sentdex. Not sure if you will see this. I have been having problems with the recent updates in certain packages. For example. BeautifulSoup.findAll() returns empty list, but beautiful Soup.find() isn't. I'm using version 4.6.
@kuldeepsingh2983
@kuldeepsingh2983 4 жыл бұрын
i am in love with shark-coffee
@mubinabdulkader1525
@mubinabdulkader1525 4 жыл бұрын
Just quit from my 30Hr online course after watching this...
@JustSomeAussie1
@JustSomeAussie1 4 жыл бұрын
The forward slash in os.listdir(f"{}/{}") definitely works on Windows, i just tested it. (tested with Python 3.6.4)
@SomebodyOutTh3re
@SomebodyOutTh3re 4 жыл бұрын
22:31 hahaha. Great video thank you!
@leonshamsschaal
@leonshamsschaal 4 жыл бұрын
Thank you so much! I have always wanted to do Kaggle competitions but never really known how to approach them.
@sentdex
@sentdex 4 жыл бұрын
Happy to help!
@treelight1707
@treelight1707 4 жыл бұрын
Hey sentdex. I think the cite_span key at 5:25 are the location of the article in the publication print. 'start' /'end' is like start page-end page, 'S1' stands for supplementary material, usually put at the end of the print, like an appendix. 'abstract' is like the summary of the entire publication, but there seems to be text from other parts of the article, that might be more specialized stuff. Don't know if this info would help.
@sentdex
@sentdex 4 жыл бұрын
Thanks for the info!
@thinboxdictator6720
@thinboxdictator6720 4 жыл бұрын
@@sentdex $ cat CORD-19-research-challenge/json_schema.txt | less
@borispapic9510
@borispapic9510 4 жыл бұрын
Wow just took up this challenge a few days ago but hit a wall and didnt know how to proceed. This is a godsend!
@ahmetdiril824
@ahmetdiril824 4 жыл бұрын
I put the work done here in a notebook. I improved the regex and added some cleaning: www.kaggle.com/ahmetdiril/01-incubation-from-sentdex
@noctreik
@noctreik 4 жыл бұрын
Also, with your style of programming, I recommend you to run things in ipython shell and copy/paste fragments of working code in sublime text.
@mahdi7d1rostami
@mahdi7d1rostami 4 жыл бұрын
your text editor looks futuristic. i will use gedit more frequently if i now the name of theme.
@TheMaidenOnes
@TheMaidenOnes 4 жыл бұрын
its not gedit, its sublimetext
@-nepherim
@-nepherim 4 жыл бұрын
He's using the Sublime text editor.
@mahdi7d1rostami
@mahdi7d1rostami 4 жыл бұрын
at the beginning when he opened one of the json files he used gedit. beside that black gtk theme for the whole system was great. i wanted to know its name.
@adityavarma131
@adityavarma131 4 жыл бұрын
one can use os.path.join() to overcome any issues with forward or backslashes in different operating systems.
@abdelrhmandameen2215
@abdelrhmandameen2215 3 жыл бұрын
Programmers are masters in selling themselves short
@tarsala1995
@tarsala1995 4 жыл бұрын
Your machine took over the camera view. Who knows where is this going
@mayukh_
@mayukh_ 4 жыл бұрын
I am starting to like your mugs
@LKokos
@LKokos 4 жыл бұрын
28:40 you still had print(t) on line 42 thats why it printed all edit: nvm
@MistaT44
@MistaT44 4 жыл бұрын
This is an excellent series! kudos
@EranM
@EranM 4 жыл бұрын
Harrison! Well put video! I very much enjoyed it! You are hilarious!
@leosdeoilha
@leosdeoilha 4 жыл бұрын
Always great videos! Why don’t you use spacy for nlp? It takes a lot of the re out of the way!
@Mr3zoozee
@Mr3zoozee 4 жыл бұрын
what a Coincidence i was looking for videos like this thx sentex
@chrisherring8733
@chrisherring8733 4 жыл бұрын
You had a capturing group there. You need to make it non capturing group like ((?:\d{1, 2}\.)\d{1, 2} day)
@mbappekawani9716
@mbappekawani9716 3 жыл бұрын
nice data cleanup buddy
@sameerzahid3544
@sameerzahid3544 4 жыл бұрын
I really like how your operating system looks and the text editor 😍👌
@ВолодимирКузько-б5ж
@ВолодимирКузько-б5ж 4 жыл бұрын
why did you define the incubation = df[df['full_text'].......some code..] - was it some kind of wrapping?
@clumsydnkey29
@clumsydnkey29 4 жыл бұрын
Such a helpful video! Thank you!
@noctreik
@noctreik 4 жыл бұрын
I recommend you setting up i3wm for desktop environment. You will be much more efficient. You will not want to come back after spending couple of weeks using it.
@selcukmisir2399
@selcukmisir2399 4 жыл бұрын
You are the best sentdex!!!
@alexr7530
@alexr7530 4 жыл бұрын
Thanks for the video. Hope you'll continue the rubric
@teresitaeyzaguirre4741
@teresitaeyzaguirre4741 2 жыл бұрын
new Fave channel
@Pythonenthusiast
@Pythonenthusiast 4 жыл бұрын
I don't know if others mentioned it before, but you got some cool mugs! I guess you can make a video on that as well!
Coding Adventure with Kaggle and Lux AI
27:46
sentdex
Рет қаралды 96 М.
Gzip is all You Need! (This SHOULD NOT work)
19:47
sentdex
Рет қаралды 150 М.
Как подписать? 😂 #shorts
00:10
Денис Кукояка
Рет қаралды 6 МЛН
The Joker wanted to stand at the front, but unexpectedly was beaten up by Officer Rabbit
00:12
GIANT Gummy Worm Pt.6 #shorts
00:46
Mr DegrEE
Рет қаралды 36 МЛН
Will A Guitar Boat Hold My Weight?
00:20
MrBeast
Рет қаралды 205 МЛН
Letting GPT-4 Control My Terminal (TermGPT)
23:12
sentdex
Рет қаралды 75 М.
A. I. Learns to Play Starcraft 2 (Reinforcement Learning)
17:42
GPT Journey - A text and image game with ChatGPT
46:57
sentdex
Рет қаралды 35 М.
No, Einstein Didn’t Solve the Biggest Problem in Physics
8:04
Sabine Hossenfelder
Рет қаралды 299 М.
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 387 М.
Pandas Dataframes on your GPU w/ CuDF
12:04
sentdex
Рет қаралды 43 М.
Top 18 Most Useful Python Modules
10:50
Tech With Tim
Рет қаралды 930 М.
Как подписать? 😂 #shorts
00:10
Денис Кукояка
Рет қаралды 6 МЛН