Neural Networks from Scratch: nnfs.io Channel membership: / @sentdex Discord: / discord Support the content: pythonprogramm... Twitter: / sentdex Instagram: / sentdex Facebook: / pythonprogramming.net Twitch: / sentdex
Пікірлер: 321
@SexySnorlax4 жыл бұрын
"keep social distance" sir know your audience we already are
@abcdxx10594 жыл бұрын
nope your were at distance when everyone was outside the house
@GaneshKumar-zz9py3 жыл бұрын
@@abcdxx1059 mjhjnn bjm
@GaneshKumar-zz9py3 жыл бұрын
@@abcdxx1059 khdw
@suzikang92834 жыл бұрын
0:00 intro to dataset 3:18 browsing through files 5:58 loading files into python program 8:43 (cleaning & structuring) getting keys from text and storing into variables 15:07 thinking about “extracting meaning from text” --> NLP. i.e. what are we looking for? --> keywords in papers that are consistent 19:03 looking for “incubation” in text 32:18 using regular expressions 42:08 plotting 44:20 adding rest of files to script 45:21 looking at other kernels on kaggle
@alexdavis93244 жыл бұрын
Thank you for not editing out the mistake at the 30 minute mark. That makes me feel a lot better about my own silly mistakes.
@sentdex4 жыл бұрын
Heh, happy to keep it realistic :D
@wongright4 жыл бұрын
@@sentdex Improved approachability was the big benefit when watching you debug in real time. Thank you.
@hammerofheaven13134 жыл бұрын
Not to be confused with kegel challenges.
@whoisabishag34334 жыл бұрын
... that's the word ... She whispered to me ...
@Alex-pd5xc4 жыл бұрын
hah!
@caseymeehan59014 жыл бұрын
So glad you left in that error (printing full text ~30 min mark). It makes me feel so much better :) Thanks for doing this, it is so rad! I was looking at the kaggle competition but I am too much of a noob to know where to start.
@TheAcolossus4 жыл бұрын
Everyone: What are you working on? Me: A Covid-19 machine machine learning model Everyone: How does it help us with Covid-19? Me: It doesn't
@sentdex4 жыл бұрын
If you're helping to parse through the insanely dense amount of information and research to answer the questions that are being asked, you *are* helping.
@owoled2824 жыл бұрын
Hey, is machine learning squared deeper than deep learning ?
@cesarp67614 жыл бұрын
if it keeps you home for hours doing this.. it does help! :p
@fuba444 жыл бұрын
I liked this "come as you are" format, could have easily been longer..
@pw72254 жыл бұрын
In fact, I think this is way better for learning. Since you see the actual process. Like George Hotz' coding sessions.
@non_complete4 жыл бұрын
@@pw7225 I love george's sessions. You might like Jon Gjengset too he has a similar style, mostly does rust development.
@parkerdinkins55414 жыл бұрын
@@pw7225 geohot is an absolute mad lad! this format is definitely much better than structured sessions. it really captures the trial and error process of programming
@danielcolomer78154 жыл бұрын
@@parkerdinkins5541 same here for quantum computing kzbin.info/door/-2knDbf4kzT3uzWo7iTJyw (disclosure I own the channel xD) and that's exactly the reason i chose the style as well! I find showing the real process is the best way to help ppl learn
@FlorianLinscheid4 жыл бұрын
Just to answer that very basic question. Make the decimal part optional by grouping it and then make the whole thing a group and you're good to go. re.findall(r'( \d{1,2}(\.\d{1,2})? day[s]?)', sentence)
@sentdex4 жыл бұрын
Heh, thanks!
@floxire70424 жыл бұрын
Could you please explain why his technique with the parentheses didn't work ?
@sentdex4 жыл бұрын
@@floxire7042 if you just have 1 set of parentheses, you'll find only examples that match the full string u searched for. But only return the part of the match inside the parentheses
@FlorianLinscheid4 жыл бұрын
The main catch here was that he made that one group by using parentheses. Regex will only output what's inside the parentheses then. So to get the whole number again, you need to make the whole expression another group then. Putting the first or the second half in parenthesis doesn't matter in this case. I just found it more logical to have always the first two digits, followed by an optional decimal. Other way round works just as well. Hope that was clear.
@floxire70424 жыл бұрын
@@sentdex Oh ok thanks I didn't know that
@nmertsch87254 жыл бұрын
About the cite-fields in the JSON data: When you write a scholarly article and use findings of other articles (e.g. to compare your results with them or to build your study on earlier findings), you cite the original articles. "Because others already found out that rotten fish smells (see 1-4), ..." would contain a cite range from 1 to 4, because the cited articles 1 to 4 have shown that rotten fish smells and you build up on that without investigating it yourself. In the bottom of the article there is a list of references, where each number is associated with another scholarly article.
@hewypy90154 жыл бұрын
loving these live coding videos where you explore real world datasets. thank you!
@balthazaromeyer43344 жыл бұрын
Sentdex you are one of the best teacher I ever encountered. Keep Strong, Keep Teaching Us! People like you should be Glorified! Leaving your error in he videos is humble and remembers the mortals where we come from.
@gabrielk37334 жыл бұрын
I've been waching your channel for a few months now...the way you are talking, the way you are thinking, your python knowledge and experience is absolutely amazing for me, you're a GENIUS!
@sebbecht4 жыл бұрын
Just got started browsing kaggle for future challenges two days ago, great to see a series on this :) excited to see how far you take this!
@TheSaintsVEVO4 жыл бұрын
😂 “that went so fast” - yeah, you do remember you have a supercomputer right?
@ankushbisht-00552 жыл бұрын
Love how you keep the debuging part and small mistakes. Keep doing such great live coding , was looking for such content from a long time.
@thesitcomaddict4 жыл бұрын
Your thought process is so clear! Thanks for showing this was really enlightening to watch :)!
@cyruscuenca4 жыл бұрын
I'm learning to analyze image data right now, and even though you're analyzing text, I found this really helpful. Thanks!
@merth174 жыл бұрын
he's just that inspiring
@connor44404 жыл бұрын
SO happy you left the error in lol, shows that every programmer, no matter the skill level can have stupid little errors like that
@HellTriX4 жыл бұрын
I think the most impressive part of this challenge, is a 50 minute challenge of not mentioning that which shall be demonetized :)
@nicky_buttigieg4 жыл бұрын
For the incubation day regex you can also do: re.findall(r"(\d{1,2}\.?\d{1,2}) day", sentence) This will find any integer/decimal number followed by 'day' but only output the number, avoiding the need to split.
@BrentBrewington3 жыл бұрын
you got my like & subscribe, my dude. wow, this was super useful to watch - i'm a Sr Data Analyst looking to go Data Scientist, so looking to learn from more people like you. also kind of interesting to watch this 1 yr later
@RepiGameplays4 жыл бұрын
To get the names of folder I usually just use F2 instead of right clicking and rename. I rename stuff a lot and it surely has been helping. Great video!
@attentiondeficitdisorder4 жыл бұрын
These are so awesome to watch. It really helps to see your logic and thought process. As someone new to trying to process datasets like this, it's great to have confirmation that I'm not doing it some weird, crazy way.
@PositronQ4 жыл бұрын
Formula: Pf = the probability of infection on the virus C = the consequences of the situation Dn+1 = C*Pf Dn = another_day Dn+1 = next_day or actual day So Dn+1/Dn= “the porcentage of the increase of days” example: 22/11=2 so 2 is the porcentage of that days in increase but if you want to predict the next day so multiply the actual_day(dn+1 * “the porcentage of increase” in This case 22*2= 44) and this a formula if you want to predict all days of your country or in the world.
@erosennin9504 жыл бұрын
That's what im talking about a kaggle challenge MAN! big thanks :)) I would like to learn from a pro, how to approach problems and solve them the fastest way possible.
@Mahmoud_Gabr3 жыл бұрын
I’m sure I’m not the first to ask, but please do more videos like this!! The lack of editing is also very helpful. Thank you 👍
@kar-s67163 жыл бұрын
That print(t) made my day .. 😂😂
@DanipBlog2 жыл бұрын
I'm glad you decided to leave the 'print(t)' blooper in the video 😂😂
@ramzykaram2964 жыл бұрын
I can keep watching you programming the whole quarantine time, seriously your videos are so interesting so please do more videos
@clearthinking54414 жыл бұрын
Great video Harrison! I really enjoy seeing how you think, it gives the viewer a more accurate picture as to what coding is really like. Keep these videos coming please!
@shivamshukla4384 жыл бұрын
this is really nice i think we can apply more re's and logic to get more information as you suggested like cleaning too
@Evilleoleo4 жыл бұрын
dude you have so many good videos, was going through your data analysis playlist yesterday, so today this was perfect thanks dude!
@sentdex4 жыл бұрын
Glad you like them!
@DP-dc2vv4 жыл бұрын
Super informative, thanks for posting. Two thoughts: (1) Get VS code or Spyder/Anaconda; I use Spyder for general purpose Python stuff--the iPython integration is the best I've found. VS Code is potentially better still (depending on preferences), as it provides access to a terminal as well (though the iPython implementation is run through Jupyter and pretty janky). (2) Re the regex stuff, no shame in googling. There isn't a SINGLE programmer of any sort that doesn't need to google at times. I've been a professional for over a decade and regularly need to google syntax on basic methods, etc. Regex is a much more involved beast, and unless you use it daily I'd be amazed if you remembered syntax needed for specific applications.
@qaispalekar4 жыл бұрын
Thanks for maiking this video. It would be great if you make more videos like this. Will get a rough idea of how to tackle such big data.
@not-lain3 жыл бұрын
24:59 beautiful cup noises
@marcelo4034 жыл бұрын
maybe you mentioned it at some point in the video, but if not: it is important to note that since you got the last number before the word "day" or "days", likely your estimation is upward biased, because you might catch the mean incubation period but also the upper bound of those papers that report lower and upper bound, such as "from 4 to 12 days", or "4 - 12 days", and so on...
@vaibhavkhobragade97733 жыл бұрын
You are so swift. It seems you are invincible in coding. I love your walkthrough for the kaggle challenge.
@hectoralarcon48884 жыл бұрын
I envy the fluency of your python programming. :( I always get stuck during preprocess for a while.
@Luckylesss4 жыл бұрын
I LOVE these types of videos. Please keep them coming! Maybe even show us your googling to find answers to problems like needing a regex refresher.
@sentdex4 жыл бұрын
Given more time, I would have included that. I've including some of my internet searching in the past, seems like people really enjoyed this format of video, so maybe more to come :)
@KylePapili4 жыл бұрын
Very interesting seeing your thought process working through a new dataset like in this vid. Loved it!
@Tony-mt4pi3 жыл бұрын
When I saw that he did not notice the "print(t)" line, I wanted to shout "into" the screen to let him know that.
@TiboLatte4 жыл бұрын
This was really useful please continue ! You're doing awesome work thanks
@ralphlagos42104 жыл бұрын
Love this channel! So glad I found it, thanks for uploading :).
@jackbillimack71594 жыл бұрын
You are the man sentdex! It's hard to express how much you have inspired me while introducing great concepts. Keep up the great work. Does anyone know if the scientific community is making strides to standardize raw data and move from PDF-type papers that need more cleanup to interactive IPython-type papers that could store all findings? A move like this seems like it would open the flood gates of open-source hypothesis testing and review. Hosting and publicizing poor analysis could be a problem, but I would appreciate any information and opinions folks have.
@Dhukino4 жыл бұрын
in university theses you can have abstracts in multiple languages, hence the list structure. in papers? probably not
@pinakeekaushik78034 жыл бұрын
print( " Really loving this bro, can you please continue it like for other kaggle challenges too" )
@sentdex4 жыл бұрын
I could try some others like this, sure
@onlyme03494 жыл бұрын
You're hardcoding a format to search information that has wildly different formats. Just parse out any number not separated by a space. Since you can expect it in the format of "8 to 12 days" you'd have to write something for that too eventually.
@cayanaraycaudhuri4 жыл бұрын
I got 9.93 days with the data I downloaded. They have removed non commercial stuff. Have you looked into the nltk package? This video was awesome, and I learnt something I never thought was possible.
@waron9994 жыл бұрын
Make a data extractor. Extract plot and tabular data from pdf files and add meta data about the methodologies. Many of the coronavirus research papers a made open access This would create a database that could be highly useful.
@crazyoldhippieguy4 жыл бұрын
25-03-2020.Hi l just found you today.l,m the guy who gave the talk in 1990, called Joy and Creativety in the qauntum realm With the blessing of IBM and Honeywell in Malta, the laser cooled ion team from Boulder, was there, it was to pick up were fieyman left off.Thanck you.
@whoisabishag34334 жыл бұрын
"Planting The Flag Of The First Comment"!" Kaggle And Corona ... New Tactics
@shkronjax4 жыл бұрын
very nice. Im glad this knowledge is open source.
@souradeepsinha4 жыл бұрын
Do a live chat while doing this challenge.. No one has much to do anyway and we can brainstorm together! :P
@alexr75304 жыл бұрын
36:56 I guess in the regular expression you wanted to make a non-capturing group: '(?: ...)'
@fuat77754 жыл бұрын
Hey thanks for the video. Abstract is a list because each paragraph is a text item. Look for the schema file included in the directory, that might help you. Cheers
@josephsmy19944 жыл бұрын
Hey sentdex, one of the wacky things about re in python is that when you use parentheses, you need to specify that you're not trying to capture that as a group when matching a whole pattern. (?: .... ) denotes a non-capture group you're looking for this... "(?:\d{1,2}\.)?\d{1,2} day" at 36:00
@ahmetdiril8244 жыл бұрын
This also worked for me: " \d{1,2}\.?\d{1,2} day" I think the ? considers all of the expression before it. I also dropped the r at the very front.
@Gamegankk4 жыл бұрын
the forward slash works and it always works
@mdougf4 жыл бұрын
Thank you so much! I’ve been so intimidated by even approaching a Kaggle problem!!!!
@Shubham-ny2ce4 жыл бұрын
you are really doing great to teach the scholars and non scholars . . Why don't you start some open source projects or start building a community if you are not planning a new startup . You are a great learner and far better teacher.
@techystuffs3713 жыл бұрын
It was the coffee mug for me :)
@nighteagle99614 жыл бұрын
38:13 I think this works: single_day = re.findall(r" (?:\d{1,2}[.])?\d{1,2} [D,d]ay", sentence) Thanks for the videos and keep this good work. Learning a lot from you.
@junaidmahmud28943 жыл бұрын
Can you please do some more competitions like this? This is amazing?
@mihaisabadac96314 жыл бұрын
Great tutorial and good theme also :) For me worked re.findall(r" \d{1,2}\.*\d{1,2} day", sentence). I don't know if someone else wrote some other solution, too many comments :D Thanks sentdex
@cosmosnomad4 жыл бұрын
Probably want to avoid having more than one incubation time per paper to avoid skewing the plot/mean. Make sure they're associated with Covid-19. I saw one to do with an avian flu strain.
@sentdex4 жыл бұрын
This was addressed in the video with that exact example :p
@nassehk4 жыл бұрын
Hello. Great to see your workflow. I think median is a better measure of finding the average rather than mean in your case because you are looking at a population of incubation times.
@adeeb123214 жыл бұрын
thank you
@ramil179984 жыл бұрын
Really enjoyed the video. Thanks for making all the mistaies and raising my confidence bar :P
@sentdex4 жыл бұрын
Heh, happy to help
@puneetsingh52194 жыл бұрын
Yo, this video was long due. Thank you.
@ambarishkapil80044 жыл бұрын
Nice and Insightful tutorial.
@classicrockman904 жыл бұрын
Definitely look into glob from the standard library. Much easier than nested for loops to pick up files recursively in a folder structure with a pattern like *.json
@adityask2774 жыл бұрын
Hey sentdex. Not sure if you will see this. I have been having problems with the recent updates in certain packages. For example. BeautifulSoup.findAll() returns empty list, but beautiful Soup.find() isn't. I'm using version 4.6.
@kuldeepsingh29834 жыл бұрын
i am in love with shark-coffee
@mubinabdulkader15254 жыл бұрын
Just quit from my 30Hr online course after watching this...
@JustSomeAussie14 жыл бұрын
The forward slash in os.listdir(f"{}/{}") definitely works on Windows, i just tested it. (tested with Python 3.6.4)
@SomebodyOutTh3re4 жыл бұрын
22:31 hahaha. Great video thank you!
@leonshamsschaal4 жыл бұрын
Thank you so much! I have always wanted to do Kaggle competitions but never really known how to approach them.
@sentdex4 жыл бұрын
Happy to help!
@treelight17074 жыл бұрын
Hey sentdex. I think the cite_span key at 5:25 are the location of the article in the publication print. 'start' /'end' is like start page-end page, 'S1' stands for supplementary material, usually put at the end of the print, like an appendix. 'abstract' is like the summary of the entire publication, but there seems to be text from other parts of the article, that might be more specialized stuff. Don't know if this info would help.
@sentdex4 жыл бұрын
Thanks for the info!
@thinboxdictator67204 жыл бұрын
@@sentdex $ cat CORD-19-research-challenge/json_schema.txt | less
@borispapic95104 жыл бұрын
Wow just took up this challenge a few days ago but hit a wall and didnt know how to proceed. This is a godsend!
@ahmetdiril8244 жыл бұрын
I put the work done here in a notebook. I improved the regex and added some cleaning: www.kaggle.com/ahmetdiril/01-incubation-from-sentdex
@noctreik4 жыл бұрын
Also, with your style of programming, I recommend you to run things in ipython shell and copy/paste fragments of working code in sublime text.
@mahdi7d1rostami4 жыл бұрын
your text editor looks futuristic. i will use gedit more frequently if i now the name of theme.
@TheMaidenOnes4 жыл бұрын
its not gedit, its sublimetext
@-nepherim4 жыл бұрын
He's using the Sublime text editor.
@mahdi7d1rostami4 жыл бұрын
at the beginning when he opened one of the json files he used gedit. beside that black gtk theme for the whole system was great. i wanted to know its name.
@adityavarma1314 жыл бұрын
one can use os.path.join() to overcome any issues with forward or backslashes in different operating systems.
@abdelrhmandameen22153 жыл бұрын
Programmers are masters in selling themselves short
@tarsala19954 жыл бұрын
Your machine took over the camera view. Who knows where is this going
@mayukh_4 жыл бұрын
I am starting to like your mugs
@LKokos4 жыл бұрын
28:40 you still had print(t) on line 42 thats why it printed all edit: nvm
@MistaT444 жыл бұрын
This is an excellent series! kudos
@EranM4 жыл бұрын
Harrison! Well put video! I very much enjoyed it! You are hilarious!
@leosdeoilha4 жыл бұрын
Always great videos! Why don’t you use spacy for nlp? It takes a lot of the re out of the way!
@Mr3zoozee4 жыл бұрын
what a Coincidence i was looking for videos like this thx sentex
@chrisherring87334 жыл бұрын
You had a capturing group there. You need to make it non capturing group like ((?:\d{1, 2}\.)\d{1, 2} day)
@mbappekawani97163 жыл бұрын
nice data cleanup buddy
@sameerzahid35444 жыл бұрын
I really like how your operating system looks and the text editor 😍👌
@ВолодимирКузько-б5ж4 жыл бұрын
why did you define the incubation = df[df['full_text'].......some code..] - was it some kind of wrapping?
@clumsydnkey294 жыл бұрын
Such a helpful video! Thank you!
@noctreik4 жыл бұрын
I recommend you setting up i3wm for desktop environment. You will be much more efficient. You will not want to come back after spending couple of weeks using it.
@selcukmisir23994 жыл бұрын
You are the best sentdex!!!
@alexr75304 жыл бұрын
Thanks for the video. Hope you'll continue the rubric
@teresitaeyzaguirre47412 жыл бұрын
new Fave channel
@Pythonenthusiast4 жыл бұрын
I don't know if others mentioned it before, but you got some cool mugs! I guess you can make a video on that as well!