Data Analysis 8: Classifying Data

Data Analysis 8: Classifying Data - Computerphile

Рет қаралды 44,621

Күн бұрын

For your eyes only! Classifying data isn't a spy trick. Dr Mike Pound creates a decision tree automatically from a data set. This is part 8 of the Data Analysis Learning Playlist: • Data Analysis with Dr ...
This Learning Playlist was designed by Dr Mercedes Torres-Torres & Dr Michael Pound of the University of Nottingham Computer Science Department. Find out more about Computer Science at Nottingham here: bit.ly/2IqwtNg
This series was made possible by sponsorship from by Google.
The Credit approval dataset can be found here: archive.ics.uci.edu/ml/datase...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 38

@Computerphile 5 жыл бұрын

Check out the full Data Analysis Learning Playlist: kzbin.info/aero/PLzH6n4zXuckpfMu_4Ff8E7Z1behQks5ba

@zerokelvin3626 5 жыл бұрын

Great video! This training, validation and testing is relevant for modeling and simulation in general, and you would be surprised how many scientists and practitioners get this wrong.

@jurietheron 4 жыл бұрын

What a fantastic series! Will definitely rewatch it. I would love a video about image classification and validating results, confusion matrix ext.

@randomnessgameful 5 жыл бұрын

Love this series!

@potatoMaster-wr3jz 6 ай бұрын

You explained so many machine learning concepts easily within 15 minutes of this video. But this video aint popular like your cryptography and cybersecurity stuff,explains what the general audience likes

@onuktav 5 жыл бұрын

Computer says no 😁

@andresg3110 Жыл бұрын

You are absolutely handsome and brilliant! I'm so happy to learn from you such a smart and kind soul thank you for sharing your talent with the world

@heyandy889 4 жыл бұрын

that's pretty wild that you can automatically create a reasonable decision tree to classify arbitrary data towards an arbitrary target attribute. likewise one could imagine targeting the decision tree towards gender, or income; it sounds like the algorithm doesn't care, it just uses clustering techniques to best group the data to predict the target attribute.

@4.0.4 4 жыл бұрын

I really want a video just on Support Vector Machines! (Example: why would a traditional neural network outperform it?)

@Fractus 4 жыл бұрын

The use of 'precision' here sounds more like 'accuracy' in a truly scientific sense, that being how well it reflects a 'true' or correct outcome. In this vein 'precision' would be more like the ability of the system to repeatedly classify similar data, or the same sets, to the same outcome.

@jlopezg8 4 жыл бұрын

In classification, the definitions for precision and accuracy differ from those commonly used in science. Precision is defined as the proportion of instances correctly classified as positive (true positives) among all the instances classified as positive (true positives + false positives). Accuracy, on the other hand, is defined as the proportion of instances classified correctly (true positives + true negatives) among all instances. So, for example, imagine 100 people take a medical test. 20 are diagnosed with a disease, and among those, 15 do have the disease. Furthermore, of the 80 people not diagnosed with the disease, 5 do have the disease, so 75 people are correctly classified as not having the disease. As a result, the precision of the test is 15/20 = 75%, while the accuracy of the test is (15+75)/100 = 90%.

@DerDieDasBoB 5 жыл бұрын

Love the videos! He is really a good teacher - thanks for all the good explanations. but when i see the paper he draws on, it reminds me on 80's printer paper....are they still in use or what is it for?

@WhompingWalrus 5 жыл бұрын

Idk if it's true or not, but I've heard that some universities bought a quinjabillion metric clucktonnes of that paper way back when it was expected to be used massively for a long time, so they hand it out gladly to whoever has a use for it now.

@jasonspence 5 жыл бұрын

That's exactly what it is, and it's the standard Computerphile paper in all of the videos

@veeek8 2 жыл бұрын

Yeah nice touch isn't it, makes me feel like it's the 80s again 😂

@jorgefontenlagonzalez8412 2 жыл бұрын

I loved the series, but I got a bit lost with this video. How does the content of video #8 relate to what was explained up to now? Does video #8 continue where video #7 left off, or does it take its output as an input in some way?

@synchro-dentally1965 3 жыл бұрын

I'm not sure what the majority of medical doctors would have to say, but I do hear apprehension on the use of AI to aid in diagnosing patients. Which is interesting, because wouldn't it just be another useful tool at their disposal, such as a stethoscope?

@leantide7880 5 жыл бұрын

So if the data set contains such attributes as gender, race, religion, languages spoken, etc., the machine learning could make modeling decisions on loan approvals for instance heavily based on such factors. Interesting.

@SiddharthPrabhu1983 4 жыл бұрын

Yes. That's precisely why ethics in AI is such a growing concern. Many organizations are working to ensure that these kinds of biases do not inadvertently (or intentionally) make their way into ML-driven decision engines.

@snippletrap 4 жыл бұрын

Only if those attributes are positively correlated with, say, debt default.

@ramixnudles7958 5 жыл бұрын

How is "validation" different from "testing"?

@MusicBent 5 жыл бұрын

Ramix Nudles here is how I imagine it. The training data was used for training your model (obviously) so running the model on training data will always show 100% accuracy. The testing data is used by the model developer and is used to analyze he performance. The developer can look into the results and see any obvious mistakes and try and correct for them. The validation data would remain invisible to the developer, and would represent ‘new’ data points that the mode would see in the real world after the model has been developed and deployed. It should also perform well for on this with 0 developer interaction or knowledge of the data.

@MusicBent 5 жыл бұрын

Also, nice profile pic 👌🏻

@ramixnudles7958 5 жыл бұрын

@@MusicBent :-D

@jlopezg8 4 жыл бұрын

@@MusicBent Pretty much, but you mixed up test and validation data. Validation data is used to evaluate the model after training, or even while it's training on the training data, and see if it needs tweaking to improve its performance. But to make sure we ourselves don't overfit the model to the validation data, we evaluate the model on data unseen by the model (test data) to give a final unbiased assessment of its performance.

@Acampandoconfrikis 3 жыл бұрын

I'm passing this exam thanks to you lol

@abhishektyagi4428 5 жыл бұрын

Sir Could you please make a video explaining the resources you use to learn or enhance your programming skills

@heyandy889 4 жыл бұрын

have a look at reddit.com/r/learnprogramming

@abhishektyagi4428 4 жыл бұрын

@@heyandy889 thanks a lot

@grainfrizz 5 жыл бұрын

Neural network is gonna beat KNN, Tree, and SVM. But, no, I don't watch Siraj Raval anymore.

@hammad8707 5 жыл бұрын

lol ok

@KilgoreTroutAsf 5 жыл бұрын

So data classifiers are a new way of building uncompromising bureaucratic rules that escape peer-review and public oversight and not even their creators understand. Got it.

@4.0.4 4 жыл бұрын

And that can be demonstrably (statistically) fairer (more likely to predict if you'll pay back your debt or not) than any human who decides based on emotion.

@KilgoreTroutAsf 4 жыл бұрын

@@4.0.4 What a wonderfully naive response.

@clarkkentglasses6443 4 жыл бұрын

@@4.0.4 who says the training data isn't biased?

@quillaja 3 жыл бұрын

I love this comment.