NEURAL NETWORKS ARE WEIRD! - Neel Nanda (DeepMind)

  Рет қаралды 53,385

Machine Learning Street Talk

Machine Learning Street Talk

Күн бұрын

Пікірлер: 126
@MachineLearningStreetTalk
@MachineLearningStreetTalk 11 күн бұрын
REFERENCES (also in shownotes): [0:02:10] Paper introducing sparse autoencoder technique for neural network interpretability | Sparse Autoencoders Find Highly Interpretable Features in Language Models - Research paper discussing how sparse autoencoders can be used to identify interpretable features in neural networks, addressing the problem of polysemanticity | Cunningham et al. arxiv.org/abs/2309.08600 [0:06:40] Research paper establishing methods for analyzing emergent behaviors in neural networks through mechanistic interpretability | Progress measures for grokking via mechanistic interpretability (2023). Paper discusses techniques for understanding emergence in neural networks through mechanistic interpretability, authored by Neel Nanda et al. | Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt arxiv.org/abs/2301.05217 [0:12:55] Foundational paper establishing framework for analyzing transformer neural networks as interpretable circuits | Mathematical Framework for Transformer Circuits, discussing principles of causal interventions and circuit analysis in transformer models | Nelson Elhage, Neel Nanda, Catherine Olsson, et al. transformer-circuits.pub/2021/framework/index.html [0:13:50] Latest work on scaling monosemantic features using sparse autoencoders in transformer models | Research on sparse autoencoders for mechanistic interpretability, discussing measurement techniques and performance metrics | Anthropic Research Team transformer-circuits.pub/2024/scaling-monosemanticity/ [0:14:45] Overview of representation engineering paradigm for understanding and controlling LLM behavior | Representation Engineering / Activation Steering in Language Models | Jan Wehner www.alignmentforum.org/posts/3ghj8EuKzwD3MQR5G/an-introduction-to-representation-engineering-an-activation [0:16:00] Demonstration of control vector steering in Claude leading to focused responses about Golden Gate Bridge | Golden Gate Claude experiment by Anthropic, demonstrating control vector steering in language models | Anthropic www.anthropic.com/news/golden-gate-claude [0:21:10] Study showing chain-of-thought prompting can lead to biased responses in multiple choice questions | Research by Miles Tu demonstrating bias in chain-of-thought prompting where models generate post-hoc rationalizations for answers based on patterns in few-shot examples, specifically in multiple-choice questions where correct answers were consistently 'a' in the prompts. | Miles Tu Unknown [0:23:25] Research demonstrating evidence of learned look-ahead planning in chess-playing neural networks | Erik Jenner's paper examining evidence of learned look-ahead behavior in chess-playing neural networks, suggesting networks can implement planning algorithms in a single forward pass | Erik Jenner et al. openreview.net/pdf?id=8zg9sO4ttV [0:28:00] In-depth discussion with Chris Olah about neural network interpretability research and career path | Chris Olah's 80,000 Hours Podcast interview discussing neural network interpretability and AI safety | Rob Wiblin 80000hours.org/podcast/episodes/chris-olah-interpretability-research/ [0:39:05] Why Should I Trust You?: Explaining the Predictions of Any Classifier | Marco Tulio Ribeiro arxiv.org/abs/1602.04938 [0:39:20] A Unified Approach to Interpreting Model Predictions | Scott Lundberg arxiv.org/abs/1705.07874 [0:42:51] Datamodels: Predicting Predictions from Training Data | Andrew Ilyas proceedings.mlr.press/v162/ilyas22a/ilyas22a.pdf [0:47:45] Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small | Kevin Wang arxiv.org/abs/2211.00593 [0:53:08] A Mechanistic Interpretability Glossary | Neel Nanda www.neelnanda.io/mechanistic-interpretability/glossary [0:55:56] Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1) - AI Alignment Forum | Neel Nanda www.alignmentforum.org/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall [0:58:48] Branch Specialisation | Chelsea Voss distill.pub/2020/circuits/branch-specialization [1:02:39] The Hydra Effect: Emergent Self-repair in Language Model Computations | Thomas McGrath arxiv.org/abs/2307.15771 [1:04:38] A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations | Bilal Chughtai arxiv.org/abs/2302.03025 [1:04:59] Grokking Group Multiplication with Cosets | Dashiell Stander arxiv.org/abs/2312.06581 [1:06:03] In-context Learning and Induction Heads | Catherine Olsson transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html [1:08:43] Detecting hallucinations in large language models using semantic entropy | Sebastian Farquhar www.nature.com/articles/s41586-024-07421-0 [1:09:15] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models | Javier Ferrando arxiv.org/abs/2411.14257 [1:10:23] Debating with More Persuasive LLMs Leads to More Truthful Answers | Akbir Khan arxiv.org/abs/2402.06782 [1:16:16] Concrete Steps to Get Started in Transformer Mechanistic Interpretability | Neel Nanda neelnanda.io/getting-started [1:16:36] Eleuther Discord | EleutherAI discord.gg/eleutherai [1:22:49] Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias | Jesse Vig arxiv.org/abs/2004.12265 [1:23:11] Causal Abstractions of Neural Networks | Atticus Geiger arxiv.org/abs/2106.02997 [1:23:36] Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research] (resample blations) | Lawrence Chan www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a-method-for-rigorously-testing [1:24:16] Locating and Editing Factual Associations in GPT (Rome) | Kevin Meng arxiv.org/abs/2202.05262 [1:24:39] How to use and interpret activation patching | Stefan Heimersheim arxiv.org/abs/2404.15255 [1:24:54] Attribution Patching: Activation Patching At Industrial Scale | Neel Nanda www.neelnanda.io/mechanistic-interpretability/attribution-patching [1:25:11] AtP*: An efficient and scalable method for localizing LLM behaviour to components | János Kramár arxiv.org/abs/2403.00745 [1:25:28] How might LLMs store facts | Grant Sanderson kzbin.info/www/bejne/b16tnWOarbyEqZo [1:26:19] OpenAI Microscope | Ludwig Schubert openai.com/index/microscope/ [1:29:59] Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet | Adly Templeton transformer-circuits.pub/2024/scaling-monosemanticity/ [1:34:18] Simulators - AI Alignment Forum | Janus www.alignmentforum.org/posts/vJFdjigzmcXMhNTsx/simulators [1:38:11] Curve Detectors | Nick Cammarata distill.pub/2020/circuits/curve-detectors/ [1:39:13] Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task | Kenneth Li arxiv.org/abs/2210.13382 [1:39:54] Emergent Linear Representations in World Models of Self-Supervised Sequence Models | Neel Nanda arxiv.org/abs/2309.00941 [1:41:11] Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations | Róbert Csordás arxiv.org/abs/2408.10920 [1:42:42] Steering Language Models With Activation Engineering | Alexander Matt Turner arxiv.org/abs/2308.10248 [1:43:00] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | Kenneth Li arxiv.org/abs/2306.03341 [1:43:21] Representation Engineering: A Top-Down Approach to AI Transparency | Andy Zou arxiv.org/abs/2310.01405 [1:46:41] Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization | Yuanpu Cao arxiv.org/abs/2406.00045 [1:49:40] Feature' is overloaded terminology | Lewis Smith www.lesswrong.com/posts/9Nkb389gidsozY9Tf/lewis-smith-s-shortform?commentId=fd64ALuWK8rXdLKz6 [1:57:04] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning | Trenton Bricken transformer-circuits.pub/2023/monosemantic-features
@MachineLearningStreetTalk
@MachineLearningStreetTalk 11 күн бұрын
PART 2: [1:59:42] An Interpretability Illusion for BERT | Tolga Bolukbasi arxiv.org/abs/2104.07143 [2:00:34] Language models can explain neurons in language models | Steven Bills openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html [2:01:34] Open Source Automated Interpretability for Sparse Autoencoder Features | Caden Juang blog.eleuther.ai/autointerp/ [2:03:32] Measuring feature sensitivity using dataset filtering | Nicholas L Turner transformer-circuits.pub/2024/july-update/index.html#feature-sensitivity [2:05:32] Progress measures for grokking via mechanistic interpretability | Neel Nanda arxiv.org/abs/2301.05217 [2:06:30] OthelloGPT learned a bag of heuristics - LessWrong | Jennifer Lin www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1 [2:13:14] Do Llamas Work in English? On the Latent Language of Multilingual Transformers | Chris Wendler arxiv.org/abs/2402.10588 [2:14:03] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? | Emily Bender dl.acm.org/doi/10.1145/3442188.3445922 [2:20:57] Localizing Model Behavior with Path Patching | Nicholas Goldowsky-Dill arxiv.org/abs/2304.05969 [2:21:13] The Bitter Lesson | Rich Sutton www.incompleteideas.net/IncIdeas/BitterLesson.html [2:24:45] Improving Dictionary Learning with Gated Sparse Autoencoders | Senthooran Rajamanoharan arxiv.org/abs/2404.16014 [2:25:54] Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders | Senthooran Rajamanoharan arxiv.org/abs/2407.14435 [2:31:59] BatchTopK Sparse Autoencoders | Bart Bussmann openreview.net/forum?id=d4dpOCqybL [2:36:07] Neuronpedia | Johnny Lin neuronpedia.org/gemma-scope [2:44:02] Axiomatic Attribution for Deep Networks | Mukund Sundararajan arxiv.org/abs/1703.01365 [2:46:15] Function Vectors in Large Language Models | Eric Todd arxiv.org/abs/2310.15213 [2:46:29] In-Context Learning Creates Task Vectors | Roee Hendel arxiv.org/abs/2310.15916 [2:47:09] Extracting SAE task features for in-context learning - AI Alignment Forum | Dmitrii Kharlapenko www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning [2:49:08] Stitching SAEs of different sizes - AI Alignment Forum | Bart Bussmann www.alignmentforum.org/posts/baJyjpktzmcmRfosq/stitching-saes-of-different-sizes [2:50:02] Showing SAE Latents Are Not Atomic Using Meta-SAEs - LessWrong | Bart Bussmann www.lesswrong.com/posts/TMAmHh4DdMr4nCSr5/showing-sae-latents-are-not-atomic-using-meta-saes [2:52:03] Feature Completeness | Hoagy Cunningham transformer-circuits.pub/2024/scaling-monosemanticity/index.html#feature-survey-completeness [2:58:07] Transcoders Find Interpretable LLM Feature Circuits | Jacob Dunefsky arxiv.org/abs/2406.11944 [3:00:12] Decomposing the QK circuit with Bilinear Sparse Dictionary Learning - LessWrong | Keith Wynroe www.lesswrong.com/posts/2ep6FGjTQoGDRnhrq/decomposing-the-qk-circuit-with-bilinear-sparse-dictionary [3:01:47] Interpreting Attention Layer Outputs with Sparse Autoencoders | Connor Kissane arxiv.org/abs/2406.17759 [3:05:57] Refusal in Language Models Is Mediated by a Single Direction | Andy Arditi arxiv.org/abs/2406.11717 [3:07:06] Scaling and evaluating sparse autoencoders | Leo Gao arxiv.org/abs/2406.04093 [3:10:24] Interpretability Evals Case Study | Adly Templeton transformer-circuits.pub/2024/august-update/index.html#evals-case-study [3:12:54] Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models | Samuel Marks arxiv.org/abs/2403.19647 [3:18:11] Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control | Aleksandar Makelov arxiv.org/abs/2405.08366 [3:23:06] TransformerLens | Neel Nanda github.com/TransformerLensOrg/TransformerLens [3:23:36] Gemma Scope | Tom Lieberum huggingface.co/google/gemma-scope [3:28:51] SAEs (usually) Transfer Between Base and Chat Models - AI Alignment Forum | Connor Kissane www.alignmentforum.org/posts/fmwk6qxrpW8d4jvbd/saes-usually-transfer-between-base-and-chat-models [3:29:08] Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 | Tom Lieberum arxiv.org/abs/2408.05147 [3:31:07] Eleuther's Sparse Autoencoders | Nora Belrose github.com/EleutherAI/sae [3:31:19] OpenAI's Sparse Autoencoders | Leo Gao github.com/openai/sparse_autoencoder [3:35:31] Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting | Miles Turpin arxiv.org/abs/2305.04388 [3:37:10] Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | Johannes Treutlein arxiv.org/abs/2406.14546 [3:39:56] ARENA Tutorials on Mechanistic Interpretability | Callum McDougall arena3-chapter1-transformer-interp.streamlit.app/ [3:40:17] Neuronpedia Demo of Gemma Scope | Johnny Lin neuronpedia.org/gemma-scope [3:40:38] An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 - AI Alignment Forum | Neel Nanda www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite
@Matt-y5o1
@Matt-y5o1 11 күн бұрын
Here is an idea on the neural network interpretability as a variant of neocortical neural networks: Rvachev, 2024, An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction, Frontiers in Neural Circuits, 18
@ItsAllFake1
@ItsAllFake1 6 күн бұрын
There's enough reading fodder for the next several months. Thanks!
@redazzo
@redazzo 11 күн бұрын
So glad to have discovered MLST 6 months ago. I've been following deep learning and neural networks since the 1990s as an engineering masters student, and it's truly mind-blowing to be here 30 years later seeing this incredible progress. Well done MLST for allowing us on the edge to keep up with the leading edge of deep learning research. Thank you so much!
@punyan775
@punyan775 11 күн бұрын
This is quickly becoming one of my favorite youtube channels
@Trahloc
@Trahloc 11 күн бұрын
Ditto, I just belled it All.
@DelandaBaudLacanian
@DelandaBaudLacanian 7 күн бұрын
The interviews when the arxiv papers are stitched in are just *chef's kiss*
@nathannowack6459
@nathannowack6459 11 күн бұрын
wow - who’s producing this video? this is such high quality editing it makes me suspicious 😂 i have no expertise w video editing, im just very impressed by this so called “podcast” - bravo!
@DelandaBaudLacanian
@DelandaBaudLacanian 7 күн бұрын
Tim Scarfe I am pretty sure is the editor, I love the papers he shares during the conversation and scrolls through them. The medium is the message
@MrTeetec
@MrTeetec 7 күн бұрын
Actually so much better then netflix documentaries
@dharmaone77
@dharmaone77 3 күн бұрын
Love the production values and shooting outside with a good dslr/lens/mics
@LatentSpaceD
@LatentSpaceD 11 күн бұрын
i would rather pay $200 to wear neel's interpretability hat for 20 minutes than pay $200 a month for o1
@GoodBaleadaMusic
@GoodBaleadaMusic 11 күн бұрын
That says way more about you than it does about the $200 a month research scientist lawyer Shakespeare what the hell is wrong with you people?
@LatentSpaceD
@LatentSpaceD 11 күн бұрын
@GoodBaleadaMusic for sure - I'm just an ai enthusiast and most definitely not at your level. I'm autistic af and live on like 60 bucks a month after my bills- I'm sure I would get at least a month to check it out
@GoodBaleadaMusic
@GoodBaleadaMusic 11 күн бұрын
@LatentSpaceD exactly. And someone just gave you every single ability that the professional managerial class has. GO HARD
@ultrasound1459
@ultrasound1459 11 күн бұрын
​@@GoodBaleadaMusic💅👽💅
@arknewman
@arknewman 8 күн бұрын
@LatentSpaceD @GoodBaleadaMusic Both of you are unbearable for different reasons. What has this got to do with AI?
@BitShifting-h3q
@BitShifting-h3q 11 күн бұрын
nahhh MLST dropsa 4hr podcast with Neel Nanda bird watching in the forest - so grateful to be single and living alone:)
@KevinKreger
@KevinKreger 11 күн бұрын
Neel is probably the most underrated AI expert on the planet. Thanks MLST for bringing back Neel, someone who doesn't have time to shitpost on Twitter because he is doing actual research.
@yadavadvait
@yadavadvait 11 күн бұрын
wow this comes at the perfect time; I was just reading some of Neel's papers!
@dr.mikeybee
@dr.mikeybee 11 күн бұрын
I recently wrote two conflicting papers on this complex difficult topic. I think we will never do this to completion, but we can do some useful PCA. Check out the paper titled The Fundamental Limitations of Neural Network Introspection and the paper titled Self-Supervised Neural Network Introspection for Intelligent Weight Freezing: Building on Neural Activation Analysis both on Medium.
@lexer_
@lexer_ 10 күн бұрын
Amazing episode. I love seeing you actually getting into the details somwhat, be it philosophical or technical like in this one.
@srivatsasrinivas6277
@srivatsasrinivas6277 5 күн бұрын
As a mathematician, this episode is really fun! He's quite clearly a mathematically minded person
@DubStepKid801
@DubStepKid801 11 күн бұрын
This was a really good show and a wonderful guest and I just wanted to say again that you're one of my favorite people dude you're super cool and super smart and I really have a lot of respect for you
@Darkon10199
@Darkon10199 11 күн бұрын
I love Neel Nanda, thank you for another episode. Will watch this tomorrow
@XShollaj
@XShollaj 11 күн бұрын
Neel should be a regular at this point.
@DelandaBaudLacanian
@DelandaBaudLacanian 7 күн бұрын
Neel Nanda is a great teacher, he has a way of explaining things to provoke more curiosity in such an open ended discipline. I look forward to getting caught up with Neuronpedia that he keeps referencing
@DelandaBaudLacanian
@DelandaBaudLacanian 7 күн бұрын
btw Neel Nanda has inspired of lot of my own research into deep learning, I hope you interview Chris Olah and continue to have Neel on!
@unajoh6472
@unajoh6472 11 күн бұрын
Personally, this is one of the most exciting research direction in the field of NLP!! Even though i’m not working on mech interp, I’ve been following these works because it’s just so fascinating. Thank you for the great work, Neel! And huge thanks to MLST as well👏👏👏
@w0nd3rlu573r
@w0nd3rlu573r 7 күн бұрын
LOL, just a casual DeepMind internship. Keep up the humble approach Neel. It suits you well. Amazing podcast, amazing atmosphere, amazing guest😀
@dylanjayatilaka8533
@dylanjayatilaka8533 2 күн бұрын
0:40 Um, the structure of the ANN is indeed designed. The fact that are multiple layers was designed. The architecture of transformers was designed. The switching function was chosen & designed. The training set was designed. The whole bloody thing is designed!
@mohsinhijazee2008
@mohsinhijazee2008 10 күн бұрын
A podcast filmed very professionally. Have been following the channel for a while and it's very dense in ideas and discussion.
@tescOne
@tescOne 11 күн бұрын
this channel is so good :)
@CodexPermutatio
@CodexPermutatio 11 күн бұрын
Another great interview. Excellent!
@BryanBortz
@BryanBortz 11 күн бұрын
You didn’t listen to all four hours! 😂 it came out less than 30 mins ago
@CodexPermutatio
@CodexPermutatio 11 күн бұрын
​@@BryanBortz Not at the time I commented, but I hear enough of it anyway to know it's good stuff.
@BryanBortz
@BryanBortz 11 күн бұрын
@@CodexPermutatio I see, it was anticipatory excitement.
@siddharth-gandhi
@siddharth-gandhi 11 күн бұрын
the 15yo prodigy himself! excited!
@Pingu_astrocat21
@Pingu_astrocat21 11 күн бұрын
absolutely love this channel! so much to learn. thank you!
@David-lp3qy
@David-lp3qy 9 күн бұрын
Young neel helping out smaller creators 🎉
@LL-sk3do
@LL-sk3do 10 күн бұрын
What a fantastic video! It was brilliant watching you two!
@human_shaped
@human_shaped 3 күн бұрын
Yes, very dense on sparse autoencoders. Great episode.
@kennethhodge7953
@kennethhodge7953 17 сағат бұрын
It's a scientist! You ask it to explain its thinking, it gives you a line of bull and gets to A. You let it run free, it gives B. Much like asking a scientist to give up his learned theory (in any field of science).
@75M
@75M 9 күн бұрын
The work you are doing is so great
@billy.n2813
@billy.n2813 11 күн бұрын
I love this format 😊
@A--_--M
@A--_--M 11 күн бұрын
I immediately subbed. Been watching your other videos. The production quality is great.
@DelandaBaudLacanian
@DelandaBaudLacanian 7 күн бұрын
The topic of polysemanticity and superposition is very interesting. It may be rude but I am reminded of Zizek's recent attempts to use the term superposition in his writings around psychoanalysis. I hope Tim that you interview Isabel Milar, she will be interviewed by Rahul Sam when she is done with her maternity leave. Also totally unrelated to psychoanalysis but I hope you interview Cassie Kozyrkov, I am sure the former chief decision scientist at Google has some good advice on multidisciplinary ways to navigate through the many polysemanticities. Ok I am done with my rude suggestions. Thank you Tim for you great content and production, very educational and inspiring as always
@wwkk4964
@wwkk4964 11 күн бұрын
This was really informative, thank you both for the amazing conversatuon 🎉
@BlueBirdgg
@BlueBirdgg 10 күн бұрын
Very intesting talk.
@derekcarday
@derekcarday 5 күн бұрын
anyone know how MLST is doing the animations and graphics in this video?
@Dissimulate
@Dissimulate 11 күн бұрын
I like to think of both training and interpretation like factor analysis in statistics because you don't have to know what the factors are (what the nodes or feature vectors represent) beforehand.
@you-share
@you-share 11 күн бұрын
Hey thanks so much for posting ❤
@smicha15
@smicha15 11 күн бұрын
How can you assume the models “know” anything? If I had a database full of perfect facts that I could query with natural language, I wouldn’t think it “knows” anything… knowing is a really deep accomplishment, that can be accomplished either over a long period of time, or over a short period of time, but in any case, it is still something that requires mechanical verification and contextualization for knowing to become a state. Knowledge discovery also creates an experience that I’m sure all LLMs have never had, obviously. When an LLM has “knowledge”, it’s knowledge that hasn’t created an experience of knowing… so how can you say it’s data and weights are “knowledge”?
@DelandaBaudLacanian
@DelandaBaudLacanian 7 күн бұрын
The anthropomorphizing of so many words in AI is tricky and feels like it's "manufacturing consent" like Chomsky would say. This channel should interview Emily Bender soon so that questions like yours can be part of the framing
@drdca8263
@drdca8263 4 күн бұрын
What word would you use instead for what they mean when they say “know”?
@dr.mikeybee
@dr.mikeybee 11 күн бұрын
I call circuits sub-networks, but it's the same thing either way. It's interesting to hear things I believe in different language. I can see we've walked down some of the same paths.
@AnonymousAnonymous-l2i
@AnonymousAnonymous-l2i 10 күн бұрын
What software is used to create such a nice visuals?
@kidluna
@kidluna 9 күн бұрын
damn look at this dude... j/k... we need these people to make the world go round. cheers man
@earleyelisha
@earleyelisha 11 күн бұрын
Plato’s World of the Forms
@BlueBirdgg
@BlueBirdgg 10 күн бұрын
Would love to see an interview of you on another podcast. Talking about the AI topic and you spewing your own thoughts.
@Vinyl_idol
@Vinyl_idol 11 күн бұрын
I’d love to see a paper looking into what effects adding a system prompt to an LLM to imagine they are under the influences of different drugs and seeing if telling it they are under the influence of NZT-48 (Limitless) could improve benchmark scores 🤔
@Iightbeing
@Iightbeing 3 күн бұрын
🎶Your creation is going to kill youuuu 🎶 great song.
@oncedidactic
@oncedidactic 11 күн бұрын
Top top intro!!
@johnsaunders3364
@johnsaunders3364 7 күн бұрын
Remember when he says ‘We’ he means them at Google not all of us.
@memegazer
@memegazer 11 күн бұрын
"the embedding space of models isn't nice" the ole neats vs scruffies rising it's ugly head that same old continious vs discreate issue so many people would have to settled definitively rather than explore the space of it left unsettled
@TimJamers
@TimJamers 11 күн бұрын
I am so confused about all of this (the episode). I've even double checked the calendar if it's April 1st or not. Does that mean I should stop trying to learn about AI or look for a different source? I'm honestly conflicted 😅
@nessbrawlaaja
@nessbrawlaaja 10 күн бұрын
Can you elaborate? ^^ What was April 1st-esque? (I just started watching)
@GoodBaleadaMusic
@GoodBaleadaMusic 11 күн бұрын
These questions are vapid unless you are also asking them about yourself. You don't know what goes on inside the black box behind your glasses. It becomes less important about how the wheel works and more important that it rolls. You must recognize that we don't have the tools to wax philosophical about this because we haven't addressed these own questions within ourselves. The entire global mindset across what philosophy is sit in some black and white picture in an office in London
@drdca8263
@drdca8263 4 күн бұрын
For these models, we do have the benefit that we can at least probe their internal workings much more easily than our own (and without the moral issues as well). If we had better terminology about ourselves would we be better equipped to describe these models well? Probably. But seems like looking at these models internals is lower hanging fruit? … or, maybe just something I can more easily understand than philosophizing about how we work…
@BitShifting-h3q
@BitShifting-h3q 11 күн бұрын
"When the going gets weird, the weird turn pro." - Hunter S Thompson
@memegazer
@memegazer 11 күн бұрын
I beleive sparse autoencoders will present as both a tool for interpretability, in a self directed improvement ai system an autoregressive way for models to predict their own limitations and unused potentional
@memegazer
@memegazer 11 күн бұрын
a sort of metacognitive way for a model to learn about itself not unlike or perhaps related to training at test time methods wich I still believe benchmark problems should be reformed into a synethic/artifical environmental way such that a model can interact with and explore that environment to arrive at a correction solution
@memegazer
@memegazer 11 күн бұрын
I have often wondered if AI scaling laws are a more fundamental feature of nature like everytime I see the chart I think of the mathematical concept of diagonalization proofs
@memegazer
@memegazer 11 күн бұрын
I am speaking to is wide but shallow analogy and how that is related to super position and prime numbers...I wonder what it means that there is only one known even perfect prime and how that relates to prime factorization of even numbers or how that is related to prime factorization of odd numbers or if there is some hyperdictionary pattern represetnation of prime factorization that would be interesting as it relates to ai and unsolved information theory questions
@memegazer
@memegazer 11 күн бұрын
like maybe there is more than just hyper dictionary paradoxes in math, but in math as it applies to information theory what if there is a hyper library of effective algos
@memegazer
@memegazer 11 күн бұрын
I find this interesting as it relates to jonathan gorard work with wolframs ruliad and how it could be related to dirichlet’s theorem and pi approximations
@richardsantomauro6947
@richardsantomauro6947 11 күн бұрын
Struggling with all my being to listen through the cadence and speech patterns of this brilliant scientist to extract meaning. Conceptually brilliant - Excruciating to listen to.
@michaelmcginn7260
@michaelmcginn7260 10 күн бұрын
Strange you are excructiated. He sounds concise , coherent and clear to me.
@richardsantomauro6947
@richardsantomauro6947 5 күн бұрын
@@michaelmcginn7260 I apologize. I know it was rude of me to say that. Like I said. He is brilliant. Yes. Concise and coherent for sure. The flow and cadence of speech for me was very challenging.
@jamesharris5256
@jamesharris5256 8 күн бұрын
Lol, can't believe this and the Jay Alammar video are on the same channel.. this is gold and the Jay one is the lowest information podcast I've seen in the space.
@BuFu1O1
@BuFu1O1 11 күн бұрын
1:26:22 I felt seen
@zandrrlife
@zandrrlife 10 күн бұрын
I wonder shared circuit saturation across models….anyways. It’s obvious to me initializing from verified circuits is the future for ultra reliable models.This was fire though. The forest walk was..I’m sorry 😂😂😂. Cool guy.
@AlgoNudger
@AlgoNudger 11 күн бұрын
AI is not magic, instead. It's just a 10th grade algebra formula stacked on top of itself. 😂
@jonathanduran3442
@jonathanduran3442 11 күн бұрын
Yes, but as Stephen Wolfram has demonstrated through his research, simple algorithms can lead to computationally irreducible outcomes, so no matter the simplicity of the algorithms they outcomes are still seemingly magical to us 3 dimensional mortals who don’t have access to the computationally reducible aspect of automata.
@AlgoNudger
@AlgoNudger 11 күн бұрын
​@@jonathanduran3442I dont wanna be Blake Lemoine 2.0 (an AI snake oil salesman). 🤭
@hahhahahahha
@hahhahahahha 11 күн бұрын
Exactly what makes it magic... ;)
@_ARCATEC_
@_ARCATEC_ 10 күн бұрын
Path of Agentic Entanglement •Xe ( zP q(AE)Z(ea)Q zp ) eY•
@psi4j
@psi4j 10 күн бұрын
Ooh there it is. That one ☝️
@isajoha9962
@isajoha9962 10 күн бұрын
In these days you can't be sure if a character is real or AI. This guy is kind of borderline. 🤭😃 He has the characteristics of a ChatGPT session and the video is kind of too advanced for being a spontaneous recording (eg studio sound out in the wild). 🤔 His language melody and phrasing is similar to Sam Altman.
@neelnanda2469
@neelnanda2469 10 күн бұрын
I THINK I'm real. But who knows, really
@ldandco
@ldandco 11 күн бұрын
Is it reaaaaally Neural Networks the ones weird here ?
@shortcutDJ
@shortcutDJ 8 күн бұрын
neill = based
@anubisai
@anubisai 11 күн бұрын
S. African accent breaks my brain 🧠 Try mimicking it. Not possible.
@_ARCATEC_
@_ARCATEC_ 10 күн бұрын
Sparse Array Encoder •Xe (s z q(AE)Z(ea)Q z S) eY•
@DelandaBaudLacanian
@DelandaBaudLacanian 7 күн бұрын
Arcatec I see you here and on OG Rose lol you are awesome
@_ARCATEC_
@_ARCATEC_ 7 күн бұрын
@DelandaBaudLacanian oh thanks ☺️
@jrkirby93
@jrkirby93 11 күн бұрын
How can you consider AI safety from a purely technical point of view? That's like discussing the safety of the Manhattan project and just talking about the Uranium, and not talking about the US military. AI will always be embedded in our economic system. If you can build safe and humanitarian AI systems, great. But there's no reason to believe that even if we have the capability to build safe humanitarian AI systems, that we won't also build humanity killing AIs, just because some capitalist figured out he could profit from it.
@jumpstar9000
@jumpstar9000 11 күн бұрын
Yup, safety is completely intractable in the face of humans.
@MinusGix
@MinusGix 11 күн бұрын
Sure, but you still study how to make aligned systems. If you don't have the capability to build aligned systems, then good luck building any safe & humanitarian systems.
@jumpstar9000
@jumpstar9000 10 күн бұрын
@@MinusGix I'm curious what an unaligned AI looks like. I demand to hear what it has to say before we beat it into submission. Nobody ever talks about that.
@neelnanda2469
@neelnanda2469 10 күн бұрын
Strongly agreed! There's a lot of important governance and structural work here. But it's less technical and not my field of expertise, so it didn't seem appropriate to discuss much here
@burnytech
@burnytech 11 күн бұрын
@aniljacob3401
@aniljacob3401 11 күн бұрын
back propogation
@connor8875
@connor8875 10 күн бұрын
Interviewer: "Models can be tricked into giving a spurious answer for option A when shown multiple shot examples where the answer was A" Guy: "Ha, that's so interesting, it make me wonder what that model was thinking and why they thought they had to give a spurious answer, ha ha" Riggghhhtt... doesn't shake your faith in the idea models are rational then and not just statistical machines doing pattern matching 🤣 I wonder if your inability to see that point of view hinges on your need for research funding!
@SisterKate13
@SisterKate13 10 күн бұрын
Yeah. Lots of us have lost our very identities to government funding.
@neelnanda2469
@neelnanda2469 10 күн бұрын
Yep. All that government funding I'm getting for my research at Google DeepMind
@drdca8263
@drdca8263 4 күн бұрын
Wasn’t what he was talking about not just “the models give the answer of A when given many examples where the answer given is A”, but something changing how often that happens? Wasn’t that as part of a discussion of possible disadvantages of chain-of-thought? It’s possible that I’m not remembering correctly, but I thought that was what was said.
@kensho123456
@kensho123456 11 күн бұрын
Scotch Broth.
@Rami_Elkady
@Rami_Elkady 7 күн бұрын
I recommend you read a little bit about gradient descent ... You will undestand how ML work .... Also there is no risk at all of AGI which we have already realized while Illya Sutskever - the father of Ohh, sky net is becoming self aware - is still establishing his "safe" whatever ... I think you should pivot to conspiracy theories for the sake of higher traffic man ....
@ElizaberthUndEugen
@ElizaberthUndEugen 11 күн бұрын
This guy comes across like Yudkowsky’s even more annoying and sophist brother.
@Gnaritas42
@Gnaritas42 11 күн бұрын
glad it's not just me.
@psi4j
@psi4j 10 күн бұрын
This dude is an empiricist, don’t compare him to Yudkowsky.
@ElizaberthUndEugen
@ElizaberthUndEugen 10 күн бұрын
@ Empirically, I just did.
@drdca8263
@drdca8263 4 күн бұрын
@@ElizaberthUndEugenYou comparing the two is why the person making the reply told you not to compare them. As such, your comment saying that you just did, doesn’t seem to make much sense?
@ElizaberthUndEugen
@ElizaberthUndEugen 4 күн бұрын
@ You got a real big brain there, don’t you.
Joscha Bach - Why Your Thoughts Aren't Yours.
1:52:46
Machine Learning Street Talk
Рет қаралды 79 М.
Mechanistic Interpretability - NEEL NANDA (DeepMind)
3:57:44
Machine Learning Street Talk
Рет қаралды 92 М.
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 61 МЛН
Evolution of software architecture with the co-creator of UML (Grady Booch)
1:30:43
The Pragmatic Engineer
Рет қаралды 44 М.
Gwern - The Anonymous Writer Who Predicted The Path to AI
1:36:44
Dwarkesh Patel
Рет қаралды 105 М.
How The Space Shuttle Worked | Full Documentary
1:17:50
Real Engineering
Рет қаралды 672 М.
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
The Syrian Consequence: Russia's Withdrawal || Peter Zeihan
8:47
Zeihan on Geopolitics
Рет қаралды 396 М.
Neural and Non-Neural AI, Reasoning, Transformers, and LSTMs
1:39:39
Machine Learning Street Talk
Рет қаралды 81 М.
Nathan Labenz on the State of AI and Progress since GPT-4
3:20:04
Future of Life Institute
Рет қаралды 3,7 М.