Sara Hooker - The Hardware Lottery, Sparsity and Fairness

Рет қаралды 4,808

Күн бұрын

Dr. Tim Scarfe, Yannic Kilcher and Sayak Paul chat with Sara Hooker from the Google Brain team! We discuss her recent hardware lottery paper, pruning / sparsity, bias mitigation and intepretability.
The hardware lottery -- what causes inertia or friction in the marketplace of ideas? Is there a meritocracy of ideas or do the previous decisions we have made enslave us? Sara Hooker calls this a lottery because she feels that machine learning progress is entirely beholden to the hardware and software landscape. Ideas succeed if they are compatible with the hardware and software at the time and also the existing inventions. The machine learning community is exceptional because the pace of innovation is fast and we operate largely in the open, this is largely because we don't build anything physical which is expensive, slow and the cost of being scooped is high. We get stuck in basins of attraction based on our technology decisions and it's expensive to jump outside of these basins. So is this story unique to hardware and AI algorithms or is it really just the story of all innovation? Every great innovation must wait for the right stepping stone to be in place before it can really happen. We are excited to bring you Sara Hooker to give her take.
00:00:00 Tim Intro - Hardware Lottery
00:03:15 Tim Intro - Cultural divide in machine learning
00:04:47 Tim Intro - Pruning
00:06:47 Tim Intro - Bias Mitigation
00:09:46 Tim Intro - Intepretability
00:11:05 Sara joines
00:11:51 Show introduction with everyone on the call
00:14:45 Elevator pitch on hardware lottery
00:16:08 Whats so special about hardware and tooling
00:17:56 Connectionist approaches losing out and now being stuck with them
00:20:58 GPU to TPU
00:26:08 Isn't this just a story of stepping stones and innovation in general (Kenneth Stanley reference)
00:29:27 We have a missing counterfactual of what hardware could exist.
00:30:37 Capsule networks - we have converged on one "global update" paradigm of NNs
00:32:49 Compression / Structured / Unstructured sparsity
00:35:46 As you prune, what does a model forget? Longtail of model parameters encode low frequency information
00:39:14 Cultural Divide In Machine Learning (Welling vs Sutton)
00:42:28 Our own intelligence is based on local updates
00:44:33 Max Welling, Priors and the focussing on the long tail, lets not treat the data equally!
00:47:07 Sparsity training the future? (model compression research/gradient flow)
00:51:42 Is it a resource allocation problem? Too much exploration would spread us too thin
00:56:50 Isn't it just being at the right place at the right time? Is it a lottery?
01:00:08 National strategy to combat the hardware lottery
01:03:00 Ironic if DL created the next generation of hardware
01:03:28 Maybe we are in the AI winter now though? Maybe we need to go symbolic :)
01:07:13 Periods of feast and famine in ML research (Bayesian causal symbolic DL etc)
01:08:49 Characterising and mitigating bias in compact models.
01:12:18 Bias - dataset vs algorithm - how do we gety rid of it protected attributes
01:16:54 Are sparse networks more intepretable single image intepretations saliency methods
01:21:56 Protected attributes
01:25:19 How do you differentiate between something that is underrepresented vs more challenging
01:28:16 Anna Karenina principle and Sara's eccentric style of paper writing
Show notes; drive.google.com/file/d/1S_rH...
Sara Hooker page; www.sarahooker.me
Show notes -- drive.google.com/file/d/1S_rH...
Podcast version -- anchor.fm/machinelearningstre...
Yannic video on hardware lottery -- • The Hardware Lottery (...

Пікірлер: 18

@iestynne 2 жыл бұрын

For someone interested in getting into ML, having access to this kind of high level discussion from top experts - for free! - is an amazing privilege. Thank you so much for sharing!!

@sandraviknander7898 3 жыл бұрын

This was a really Interesting episode. Sara is amazing! I have thought about this hardware lottery a lot since I have been really interested in processing in memory (PIM). I do hope that the development of PIM will help expanding the possibilities of ideas. Now maybe it’s going to be as you discuss with R&D resources and it will never take of and what we really need is a really good software stack for FPGAs so that everyone would be able to easily make the hardware that best fit the idea they want to develop. However, Fpga’s would not be the answer entire answer for capsule networks since it is sequential processing. Although you can come a long way in latency through effective cashing. Perhaps you could even leverage some higher order prefetching of data that is tailored to these sequential models. Amazing episode and great insights from all of you!

@markm4642 2 жыл бұрын

This is the only show that I would listen and get soo excited about a long introduction. Great work team.

@quebono100 3 жыл бұрын

Yupi new video. I dont get it why such good content does not have more subscribers

@MachineLearningStreetTalk 3 жыл бұрын

We are working on it 🙌😂

@quebono100 3 жыл бұрын

This is really a great one. Thank you

@Hexanitrobenzene 2 жыл бұрын

Great guest - warm personality, wide knowledge, just great overall :)

@TheReferrer72 3 жыл бұрын

The Paper is a good read, and its hard to find fault. 15M$ to train a model is peanuts especially as the weights can be copied for near zero cost, its the inference cost that's the big worry. It is also a good thing that commercial hardware is used, because it means that the technology is easier to get in the hands of society as opposed the preserve of the military.

@florianhonicke5448 3 жыл бұрын

Good work!

@ratsukutsi 3 жыл бұрын

I have a feeling that Sara's work at the Hardware Lottery, as well as Kenneth Stanley's, and maybe even Max Welling's, are almost like socio-political arguments isolated from politics by a transparent thick membrane of technical knowledge.

@jasdeepsinghgrover2470 3 жыл бұрын

Great work... A nice deep discussion. But honestly, is a neural network with billion parameters even as smart as a Jelly Fish with 5.5k neurons (they seem to do multilabel object detention, motion planning, group behaviour and much more at the same time) .

@shanepeckham8566 3 жыл бұрын

Fantastic intro Tim!

@machinelearningdojowithtim2898 3 жыл бұрын

Thanks Shane! Miss you bro!

@DavenH 3 жыл бұрын

God, the popups of the show's main cast has me giggling like a schoolboy! Those buzzy sound effects ... lol! Facetious question: how does one become a "named Bayesian" like Keith? And where are you aviators Tim? I haven't watched through yet, so this could be redundant, but your questions in the intro about how to stop models compressing out protected attributes that are in the long tail -- often a feature we prize in neural nets (i.e. robustness to noise and outliers) -- well, could you oversample those instances so that the model's exposure to these is closer to uniform, if those attributes are indeed important or constitutionally protected? Or maybe it's as simple as making a per-instance learning rate, where the rare instances get higher rates...(on second thought, this would cause spikes for momentum-based optimizers) how would it know what's rare hmmm--maybe an autoencoder side-model to indicate likelihood? But how to decide between outliers and underrepresented, yet important instances. I don't think an automatic process will ever be able to know what those "protected" attributes are, as they are more a reflection of the somewhat arbitrary history of atrocities contingent on some attributes but not others (has there ever been a genocide based upon eye color, say? No -> not protected. Though, perhaps in the Stormlight Archives universe, a liberal society would do so). If that oversampling overfits the long tail instances because there's only a handful per, then it will become obvious; get more data for them specifically. This process will inevitably degrade the accuracy for a constant model size, but you can always make it bigger to represent more stuff in the long tail. No free lunch kinda thing. Looking forward to the episode.

@machinelearningdojowithtim2898 3 жыл бұрын

Hey DavenH! Thanks for commenting! Oversampling those protected instances and/or augmenting them with semantically equivalent mutations might help. Cool idea on the per-instance learning rate! Clearly some tweaks to the SGD algorithm would be required, perhaps select mini-batches with protected instances and increase the learning rate on those batches dynamically.

@datta97 3 жыл бұрын

great intro!

@machinelearningdojowithtim2898 3 жыл бұрын

First!

@XOPOIIIO 3 жыл бұрын

DNN's conclusions even from biased data are far more reliable than human conclusions from the same data. Because algorithm is unbiased unlike human brain.