NEURAL NETWORKS ARE WEIRD! - Neel Nanda (DeepMind)

Рет қаралды 85,137

Күн бұрын

Пікірлер: 134

@MachineLearningStreetTalk Ай бұрын

REFERENCES (also in shownotes): [0:02:10] Paper introducing sparse autoencoder technique for neural network interpretability | Sparse Autoencoders Find Highly Interpretable Features in Language Models - Research paper discussing how sparse autoencoders can be used to identify interpretable features in neural networks, addressing the problem of polysemanticity | Cunningham et al. arxiv.org/abs/2309.08600 [0:06:40] Research paper establishing methods for analyzing emergent behaviors in neural networks through mechanistic interpretability | Progress measures for grokking via mechanistic interpretability (2023). Paper discusses techniques for understanding emergence in neural networks through mechanistic interpretability, authored by Neel Nanda et al. | Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt arxiv.org/abs/2301.05217 [0:12:55] Foundational paper establishing framework for analyzing transformer neural networks as interpretable circuits | Mathematical Framework for Transformer Circuits, discussing principles of causal interventions and circuit analysis in transformer models | Nelson Elhage, Neel Nanda, Catherine Olsson, et al. transformer-circuits.pub/2021/framework/index.html [0:13:50] Latest work on scaling monosemantic features using sparse autoencoders in transformer models | Research on sparse autoencoders for mechanistic interpretability, discussing measurement techniques and performance metrics | Anthropic Research Team transformer-circuits.pub/2024/scaling-monosemanticity/ [0:14:45] Overview of representation engineering paradigm for understanding and controlling LLM behavior | Representation Engineering / Activation Steering in Language Models | Jan Wehner www.alignmentforum.org/posts/3ghj8EuKzwD3MQR5G/an-introduction-to-representation-engineering-an-activation [0:16:00] Demonstration of control vector steering in Claude leading to focused responses about Golden Gate Bridge | Golden Gate Claude experiment by Anthropic, demonstrating control vector steering in language models | Anthropic www.anthropic.com/news/golden-gate-claude [0:21:10] Study showing chain-of-thought prompting can lead to biased responses in multiple choice questions | Research by Miles Tu demonstrating bias in chain-of-thought prompting where models generate post-hoc rationalizations for answers based on patterns in few-shot examples, specifically in multiple-choice questions where correct answers were consistently 'a' in the prompts. | Miles Tu Unknown [0:23:25] Research demonstrating evidence of learned look-ahead planning in chess-playing neural networks | Erik Jenner's paper examining evidence of learned look-ahead behavior in chess-playing neural networks, suggesting networks can implement planning algorithms in a single forward pass | Erik Jenner et al. openreview.net/pdf?id=8zg9sO4ttV [0:28:00] In-depth discussion with Chris Olah about neural network interpretability research and career path | Chris Olah's 80,000 Hours Podcast interview discussing neural network interpretability and AI safety | Rob Wiblin 80000hours.org/podcast/episodes/chris-olah-interpretability-research/ [0:39:05] Why Should I Trust You?: Explaining the Predictions of Any Classifier | Marco Tulio Ribeiro arxiv.org/abs/1602.04938 [0:39:20] A Unified Approach to Interpreting Model Predictions | Scott Lundberg arxiv.org/abs/1705.07874 [0:42:51] Datamodels: Predicting Predictions from Training Data | Andrew Ilyas proceedings.mlr.press/v162/ilyas22a/ilyas22a.pdf [0:47:45] Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small | Kevin Wang arxiv.org/abs/2211.00593 [0:53:08] A Mechanistic Interpretability Glossary | Neel Nanda www.neelnanda.io/mechanistic-interpretability/glossary [0:55:56] Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1) - AI Alignment Forum | Neel Nanda www.alignmentforum.org/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall [0:58:48] Branch Specialisation | Chelsea Voss distill.pub/2020/circuits/branch-specialization [1:02:39] The Hydra Effect: Emergent Self-repair in Language Model Computations | Thomas McGrath arxiv.org/abs/2307.15771 [1:04:38] A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations | Bilal Chughtai arxiv.org/abs/2302.03025 [1:04:59] Grokking Group Multiplication with Cosets | Dashiell Stander arxiv.org/abs/2312.06581 [1:06:03] In-context Learning and Induction Heads | Catherine Olsson transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html [1:08:43] Detecting hallucinations in large language models using semantic entropy | Sebastian Farquhar www.nature.com/articles/s41586-024-07421-0 [1:09:15] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models | Javier Ferrando arxiv.org/abs/2411.14257 [1:10:23] Debating with More Persuasive LLMs Leads to More Truthful Answers | Akbir Khan arxiv.org/abs/2402.06782 [1:16:16] Concrete Steps to Get Started in Transformer Mechanistic Interpretability | Neel Nanda neelnanda.io/getting-started [1:16:36] Eleuther Discord | EleutherAI discord.gg/eleutherai [1:22:49] Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias | Jesse Vig arxiv.org/abs/2004.12265 [1:23:11] Causal Abstractions of Neural Networks | Atticus Geiger arxiv.org/abs/2106.02997 [1:23:36] Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research] (resample blations) | Lawrence Chan www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a-method-for-rigorously-testing [1:24:16] Locating and Editing Factual Associations in GPT (Rome) | Kevin Meng arxiv.org/abs/2202.05262 [1:24:39] How to use and interpret activation patching | Stefan Heimersheim arxiv.org/abs/2404.15255 [1:24:54] Attribution Patching: Activation Patching At Industrial Scale | Neel Nanda www.neelnanda.io/mechanistic-interpretability/attribution-patching [1:25:11] AtP*: An efficient and scalable method for localizing LLM behaviour to components | János Kramár arxiv.org/abs/2403.00745 [1:25:28] How might LLMs store facts | Grant Sanderson kzbin.info/www/bejne/b16tnWOarbyEqZo [1:26:19] OpenAI Microscope | Ludwig Schubert openai.com/index/microscope/ [1:29:59] Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet | Adly Templeton transformer-circuits.pub/2024/scaling-monosemanticity/ [1:34:18] Simulators - AI Alignment Forum | Janus www.alignmentforum.org/posts/vJFdjigzmcXMhNTsx/simulators [1:38:11] Curve Detectors | Nick Cammarata distill.pub/2020/circuits/curve-detectors/ [1:39:13] Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task | Kenneth Li arxiv.org/abs/2210.13382 [1:39:54] Emergent Linear Representations in World Models of Self-Supervised Sequence Models | Neel Nanda arxiv.org/abs/2309.00941 [1:41:11] Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations | Róbert Csordás arxiv.org/abs/2408.10920 [1:42:42] Steering Language Models With Activation Engineering | Alexander Matt Turner arxiv.org/abs/2308.10248 [1:43:00] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | Kenneth Li arxiv.org/abs/2306.03341 [1:43:21] Representation Engineering: A Top-Down Approach to AI Transparency | Andy Zou arxiv.org/abs/2310.01405 [1:46:41] Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization | Yuanpu Cao arxiv.org/abs/2406.00045 [1:49:40] Feature' is overloaded terminology | Lewis Smith www.lesswrong.com/posts/9Nkb389gidsozY9Tf/lewis-smith-s-shortform?commentId=fd64ALuWK8rXdLKz6 [1:57:04] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning | Trenton Bricken transformer-circuits.pub/2023/monosemantic-features

@MachineLearningStreetTalk Ай бұрын

PART 2: [1:59:42] An Interpretability Illusion for BERT | Tolga Bolukbasi arxiv.org/abs/2104.07143 [2:00:34] Language models can explain neurons in language models | Steven Bills openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html [2:01:34] Open Source Automated Interpretability for Sparse Autoencoder Features | Caden Juang blog.eleuther.ai/autointerp/ [2:03:32] Measuring feature sensitivity using dataset filtering | Nicholas L Turner transformer-circuits.pub/2024/july-update/index.html#feature-sensitivity [2:05:32] Progress measures for grokking via mechanistic interpretability | Neel Nanda arxiv.org/abs/2301.05217 [2:06:30] OthelloGPT learned a bag of heuristics - LessWrong | Jennifer Lin www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1 [2:13:14] Do Llamas Work in English? On the Latent Language of Multilingual Transformers | Chris Wendler arxiv.org/abs/2402.10588 [2:14:03] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? | Emily Bender dl.acm.org/doi/10.1145/3442188.3445922 [2:20:57] Localizing Model Behavior with Path Patching | Nicholas Goldowsky-Dill arxiv.org/abs/2304.05969 [2:21:13] The Bitter Lesson | Rich Sutton www.incompleteideas.net/IncIdeas/BitterLesson.html [2:24:45] Improving Dictionary Learning with Gated Sparse Autoencoders | Senthooran Rajamanoharan arxiv.org/abs/2404.16014 [2:25:54] Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders | Senthooran Rajamanoharan arxiv.org/abs/2407.14435 [2:31:59] BatchTopK Sparse Autoencoders | Bart Bussmann openreview.net/forum?id=d4dpOCqybL [2:36:07] Neuronpedia | Johnny Lin neuronpedia.org/gemma-scope [2:44:02] Axiomatic Attribution for Deep Networks | Mukund Sundararajan arxiv.org/abs/1703.01365 [2:46:15] Function Vectors in Large Language Models | Eric Todd arxiv.org/abs/2310.15213 [2:46:29] In-Context Learning Creates Task Vectors | Roee Hendel arxiv.org/abs/2310.15916 [2:47:09] Extracting SAE task features for in-context learning - AI Alignment Forum | Dmitrii Kharlapenko www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning [2:49:08] Stitching SAEs of different sizes - AI Alignment Forum | Bart Bussmann www.alignmentforum.org/posts/baJyjpktzmcmRfosq/stitching-saes-of-different-sizes [2:50:02] Showing SAE Latents Are Not Atomic Using Meta-SAEs - LessWrong | Bart Bussmann www.lesswrong.com/posts/TMAmHh4DdMr4nCSr5/showing-sae-latents-are-not-atomic-using-meta-saes [2:52:03] Feature Completeness | Hoagy Cunningham transformer-circuits.pub/2024/scaling-monosemanticity/index.html#feature-survey-completeness [2:58:07] Transcoders Find Interpretable LLM Feature Circuits | Jacob Dunefsky arxiv.org/abs/2406.11944 [3:00:12] Decomposing the QK circuit with Bilinear Sparse Dictionary Learning - LessWrong | Keith Wynroe www.lesswrong.com/posts/2ep6FGjTQoGDRnhrq/decomposing-the-qk-circuit-with-bilinear-sparse-dictionary [3:01:47] Interpreting Attention Layer Outputs with Sparse Autoencoders | Connor Kissane arxiv.org/abs/2406.17759 [3:05:57] Refusal in Language Models Is Mediated by a Single Direction | Andy Arditi arxiv.org/abs/2406.11717 [3:07:06] Scaling and evaluating sparse autoencoders | Leo Gao arxiv.org/abs/2406.04093 [3:10:24] Interpretability Evals Case Study | Adly Templeton transformer-circuits.pub/2024/august-update/index.html#evals-case-study [3:12:54] Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models | Samuel Marks arxiv.org/abs/2403.19647 [3:18:11] Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control | Aleksandar Makelov arxiv.org/abs/2405.08366 [3:23:06] TransformerLens | Neel Nanda github.com/TransformerLensOrg/TransformerLens [3:23:36] Gemma Scope | Tom Lieberum huggingface.co/google/gemma-scope [3:28:51] SAEs (usually) Transfer Between Base and Chat Models - AI Alignment Forum | Connor Kissane www.alignmentforum.org/posts/fmwk6qxrpW8d4jvbd/saes-usually-transfer-between-base-and-chat-models [3:29:08] Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 | Tom Lieberum arxiv.org/abs/2408.05147 [3:31:07] Eleuther's Sparse Autoencoders | Nora Belrose github.com/EleutherAI/sae [3:31:19] OpenAI's Sparse Autoencoders | Leo Gao github.com/openai/sparse_autoencoder [3:35:31] Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting | Miles Turpin arxiv.org/abs/2305.04388 [3:37:10] Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | Johannes Treutlein arxiv.org/abs/2406.14546 [3:39:56] ARENA Tutorials on Mechanistic Interpretability | Callum McDougall arena3-chapter1-transformer-interp.streamlit.app/ [3:40:17] Neuronpedia Demo of Gemma Scope | Johnny Lin neuronpedia.org/gemma-scope [3:40:38] An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 - AI Alignment Forum | Neel Nanda www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite

@Matt-y5o1 Ай бұрын

Here is an idea on the neural network interpretability as a variant of neocortical neural networks: Rvachev, 2024, An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction, Frontiers in Neural Circuits, 18

@ItsAllFake1 Ай бұрын

There's enough reading fodder for the next several months. Thanks!

@redazzo Ай бұрын

So glad to have discovered MLST 6 months ago. I've been following deep learning and neural networks since the 1990s as an engineering masters student, and it's truly mind-blowing to be here 30 years later seeing this incredible progress. Well done MLST for allowing us on the edge to keep up with the leading edge of deep learning research. Thank you so much!

@punyan775 Ай бұрын

This is quickly becoming one of my favorite youtube channels

@Trahloc Ай бұрын

Ditto, I just belled it All.

@DelandaBaudLacanian Ай бұрын

The interviews when the arxiv papers are stitched in are just *chef's kiss*

@nathannowack6459 Ай бұрын

wow - who’s producing this video? this is such high quality editing it makes me suspicious 😂 i have no expertise w video editing, im just very impressed by this so called “podcast” - bravo!

@DelandaBaudLacanian Ай бұрын

Tim Scarfe I am pretty sure is the editor, I love the papers he shares during the conversation and scrolls through them. The medium is the message

@MrTeetec Ай бұрын

Actually so much better then netflix documentaries

@dharmaone77 Ай бұрын

Love the production values and shooting outside with a good dslr/lens/mics

@BitShifting-h3q Ай бұрын

nahhh MLST dropsa 4hr podcast with Neel Nanda bird watching in the forest - so grateful to be single and living alone:)

@KevinKreger Ай бұрын

Neel is probably the most underrated AI expert on the planet. Thanks MLST for bringing back Neel, someone who doesn't have time to shitpost on Twitter because he is doing actual research.

@yadavadvait Ай бұрын

wow this comes at the perfect time; I was just reading some of Neel's papers!

@LatentSpaceD Ай бұрын

i would rather pay $200 to wear neel's interpretability hat for 20 minutes than pay $200 a month for o1

@GoodBaleadaMusic Ай бұрын

That says way more about you than it does about the $200 a month research scientist lawyer Shakespeare what the hell is wrong with you people?

@LatentSpaceD Ай бұрын

@GoodBaleadaMusic for sure - I'm just an ai enthusiast and most definitely not at your level. I'm autistic af and live on like 60 bucks a month after my bills- I'm sure I would get at least a month to check it out

@GoodBaleadaMusic Ай бұрын

@LatentSpaceD exactly. And someone just gave you every single ability that the professional managerial class has. GO HARD

@ultrasound1459 Ай бұрын

@@GoodBaleadaMusic💅👽💅

@arknewman Ай бұрын

@LatentSpaceD @GoodBaleadaMusic Both of you are unbearable for different reasons. What has this got to do with AI?

@dr.mikeybee Ай бұрын

I recently wrote two conflicting papers on this complex difficult topic. I think we will never do this to completion, but we can do some useful PCA. Check out the paper titled The Fundamental Limitations of Neural Network Introspection and the paper titled Self-Supervised Neural Network Introspection for Intelligent Weight Freezing: Building on Neural Activation Analysis both on Medium.

@XShollaj Ай бұрын

Neel should be a regular at this point.

@DelandaBaudLacanian Ай бұрын

Neel Nanda is a great teacher, he has a way of explaining things to provoke more curiosity in such an open ended discipline. I look forward to getting caught up with Neuronpedia that he keeps referencing

@unajoh6472 Ай бұрын

Personally, this is one of the most exciting research direction in the field of NLP!! Even though i’m not working on mech interp, I’ve been following these works because it’s just so fascinating. Thank you for the great work, Neel! And huge thanks to MLST as well👏👏👏

@Darkon10199 Ай бұрын

I love Neel Nanda, thank you for another episode. Will watch this tomorrow

@lexer_ Ай бұрын

Amazing episode. I love seeing you actually getting into the details somwhat, be it philosophical or technical like in this one.

@w0nd3rlu573r Ай бұрын

LOL, just a casual DeepMind internship. Keep up the humble approach Neel. It suits you well. Amazing podcast, amazing atmosphere, amazing guest😀

@DubStepKid801 Ай бұрын

This was a really good show and a wonderful guest and I just wanted to say again that you're one of my favorite people dude you're super cool and super smart and I really have a lot of respect for you

@srivatsasrinivas6277 Ай бұрын

As a mathematician, this episode is really fun! He's quite clearly a mathematically minded person

@DelandaBaudLacanian Ай бұрын

btw Neel Nanda has inspired of lot of my own research into deep learning, I hope you interview Chris Olah and continue to have Neel on!

@DelandaBaudLacanian Ай бұрын

The topic of polysemanticity and superposition is very interesting. It may be rude but I am reminded of Zizek's recent attempts to use the term superposition in his writings around psychoanalysis. I hope Tim that you interview Isabel Milar, she will be interviewed by Rahul Sam when she is done with her maternity leave. Also totally unrelated to psychoanalysis but I hope you interview Cassie Kozyrkov, I am sure the former chief decision scientist at Google has some good advice on multidisciplinary ways to navigate through the many polysemanticities. Ok I am done with my rude suggestions. Thank you Tim for you great content and production, very educational and inspiring as always

@mohsinhijazee2008 Ай бұрын

A podcast filmed very professionally. Have been following the channel for a while and it's very dense in ideas and discussion.

@human_shaped Ай бұрын

Yes, very dense on sparse autoencoders. Great episode.

@user-ni5si9qn8e 29 күн бұрын

It would actually be so cool If all the papers Dr. Nanda mentions in the video could be listed. Can be a useful place to start learning?

@MachineLearningStreetTalk 28 күн бұрын

They are listed! In the shownotes PDF and the top comment, we went to great lengths to do so :)

@CodexPermutatio Ай бұрын

Another great interview. Excellent!

@BryanBortz Ай бұрын

You didn’t listen to all four hours! 😂 it came out less than 30 mins ago

@CodexPermutatio Ай бұрын

@@BryanBortz Not at the time I commented, but I hear enough of it anyway to know it's good stuff.

@BryanBortz Ай бұрын

@@CodexPermutatio I see, it was anticipatory excitement.

@siddharth-gandhi Ай бұрын

the 15yo prodigy himself! excited!

@FigmentHF 12 күн бұрын

this very smart happy man makes my tummy feel warm :)

@Pingu_astrocat21 Ай бұрын

absolutely love this channel! so much to learn. thank you!

@A--_--M Ай бұрын

I immediately subbed. Been watching your other videos. The production quality is great.

@Dissimulate Ай бұрын

I like to think of both training and interpretation like factor analysis in statistics because you don't have to know what the factors are (what the nodes or feature vectors represent) beforehand.

@LL-sk3do Ай бұрын

What a fantastic video! It was brilliant watching you two!

@AnonymousAnonymous-l2i Ай бұрын

What software is used to create such a nice visuals?

@ivilivo 23 күн бұрын

I'm bit of a climate sceptic myself, on opposide side of earth. Love to see such reasonable protesters. These guys I can stand behind and support, depending on how disruptive the protests where.

@wwkk4964 Ай бұрын

This was really informative, thank you both for the amazing conversatuon 🎉

@dr.mikeybee Ай бұрын

I call circuits sub-networks, but it's the same thing either way. It's interesting to hear things I believe in different language. I can see we've walked down some of the same paths.

@tescOne Ай бұрын

this channel is so good :)

@David-lp3qy Ай бұрын

Young neel helping out smaller creators 🎉

@derekcarday Ай бұрын

anyone know how MLST is doing the animations and graphics in this video?

@BlueBirdgg Ай бұрын

Very intesting talk.

@Vinyl_idol Ай бұрын

I’d love to see a paper looking into what effects adding a system prompt to an LLM to imagine they are under the influences of different drugs and seeing if telling it they are under the influence of NZT-48 (Limitless) could improve benchmark scores 🤔

@75M Ай бұрын

The work you are doing is so great

@billy.n2813 Ай бұрын

I love this format 😊

@earleyelisha Ай бұрын

Plato’s World of the Forms

@BlueBirdgg Ай бұрын

Would love to see an interview of you on another podcast. Talking about the AI topic and you spewing your own thoughts.

@vivekjuneja 6 күн бұрын

The return of time and focus investment for an audience to this channel’s content is through the roof 🚀

@saireddy7628 19 күн бұрын

But if universality hypothesis is false, how would MechInterp be relevant for explainability from the lens of generalisability?

@you-share Ай бұрын

Hey thanks so much for posting ❤

@kennethhodge7953 Ай бұрын

It's a scientist! You ask it to explain its thinking, it gives you a line of bull and gets to A. You let it run free, it gives B. Much like asking a scientist to give up his learned theory (in any field of science).

@kidluna Ай бұрын

damn look at this dude... j/k... we need these people to make the world go round. cheers man

@GoodBaleadaMusic Ай бұрын

These questions are vapid unless you are also asking them about yourself. You don't know what goes on inside the black box behind your glasses. It becomes less important about how the wheel works and more important that it rolls. You must recognize that we don't have the tools to wax philosophical about this because we haven't addressed these own questions within ourselves. The entire global mindset across what philosophy is sit in some black and white picture in an office in London

@drdca8263 Ай бұрын

For these models, we do have the benefit that we can at least probe their internal workings much more easily than our own (and without the moral issues as well). If we had better terminology about ourselves would we be better equipped to describe these models well? Probably. But seems like looking at these models internals is lower hanging fruit? … or, maybe just something I can more easily understand than philosophizing about how we work…

@smicha15 Ай бұрын

How can you assume the models “know” anything? If I had a database full of perfect facts that I could query with natural language, I wouldn’t think it “knows” anything… knowing is a really deep accomplishment, that can be accomplished either over a long period of time, or over a short period of time, but in any case, it is still something that requires mechanical verification and contextualization for knowing to become a state. Knowledge discovery also creates an experience that I’m sure all LLMs have never had, obviously. When an LLM has “knowledge”, it’s knowledge that hasn’t created an experience of knowing… so how can you say it’s data and weights are “knowledge”?

@DelandaBaudLacanian Ай бұрын

The anthropomorphizing of so many words in AI is tricky and feels like it's "manufacturing consent" like Chomsky would say. This channel should interview Emily Bender soon so that questions like yours can be part of the framing

@drdca8263 Ай бұрын

What word would you use instead for what they mean when they say “know”?

@TimJamers Ай бұрын

I am so confused about all of this (the episode). I've even double checked the calendar if it's April 1st or not. Does that mean I should stop trying to learn about AI or look for a different source? I'm honestly conflicted 😅

@nessbrawlaaja Ай бұрын

Can you elaborate? ^^ What was April 1st-esque? (I just started watching)

@memegazer Ай бұрын

I beleive sparse autoencoders will present as both a tool for interpretability, in a self directed improvement ai system an autoregressive way for models to predict their own limitations and unused potentional

@memegazer Ай бұрын

a sort of metacognitive way for a model to learn about itself not unlike or perhaps related to training at test time methods wich I still believe benchmark problems should be reformed into a synethic/artifical environmental way such that a model can interact with and explore that environment to arrive at a correction solution

@Iightbeing Ай бұрын

🎶Your creation is going to kill youuuu 🎶 great song.

@johnsaunders3364 Ай бұрын

Remember when he says ‘We’ he means them at Google not all of us.

@memegazer Ай бұрын

I have often wondered if AI scaling laws are a more fundamental feature of nature like everytime I see the chart I think of the mathematical concept of diagonalization proofs

@memegazer Ай бұрын

I am speaking to is wide but shallow analogy and how that is related to super position and prime numbers...I wonder what it means that there is only one known even perfect prime and how that relates to prime factorization of even numbers or how that is related to prime factorization of odd numbers or if there is some hyperdictionary pattern represetnation of prime factorization that would be interesting as it relates to ai and unsolved information theory questions

@memegazer Ай бұрын

like maybe there is more than just hyper dictionary paradoxes in math, but in math as it applies to information theory what if there is a hyper library of effective algos

@memegazer Ай бұрын

I find this interesting as it relates to jonathan gorard work with wolframs ruliad and how it could be related to dirichlet’s theorem and pi approximations

@BitShifting-h3q Ай бұрын

"When the going gets weird, the weird turn pro." - Hunter S Thompson

@thymenwestrum7011 27 күн бұрын

New sub+ this is exactly what i needed

@memegazer Ай бұрын

"the embedding space of models isn't nice" the ole neats vs scruffies rising it's ugly head that same old continious vs discreate issue so many people would have to settled definitively rather than explore the space of it left unsettled

@dylanjayatilaka8533 Ай бұрын

0:40 Um, the structure of the ANN is indeed designed. The fact that are multiple layers was designed. The architecture of transformers was designed. The switching function was chosen & designed. The training set was designed. The whole bloody thing is designed!

@oncedidactic Ай бұрын

Top top intro!!

@richardsantomauro6947 Ай бұрын

Struggling with all my being to listen through the cadence and speech patterns of this brilliant scientist to extract meaning. Conceptually brilliant - Excruciating to listen to.

@michaelmcginn7260 Ай бұрын

Strange you are excructiated. He sounds concise , coherent and clear to me.

@richardsantomauro6947 Ай бұрын

@@michaelmcginn7260 I apologize. I know it was rude of me to say that. Like I said. He is brilliant. Yes. Concise and coherent for sure. The flow and cadence of speech for me was very challenging.

@zandrrlife Ай бұрын

I wonder shared circuit saturation across models….anyways. It’s obvious to me initializing from verified circuits is the future for ultra reliable models.This was fire though. The forest walk was..I’m sorry 😂😂😂. Cool guy.

@AlgoNudger Ай бұрын

AI is not magic, instead. It's just a 10th grade algebra formula stacked on top of itself. 😂

@jonathanduran3442 Ай бұрын

Yes, but as Stephen Wolfram has demonstrated through his research, simple algorithms can lead to computationally irreducible outcomes, so no matter the simplicity of the algorithms they outcomes are still seemingly magical to us 3 dimensional mortals who don’t have access to the computationally reducible aspect of automata.

@AlgoNudger Ай бұрын

@@jonathanduran3442I dont wanna be Blake Lemoine 2.0 (an AI snake oil salesman). 🤭

@hahhahahahha Ай бұрын

Exactly what makes it magic... ;)

@jamesharris5256 Ай бұрын

Lol, can't believe this and the Jay Alammar video are on the same channel.. this is gold and the Jay one is the lowest information podcast I've seen in the space.

@burnytech Ай бұрын

❤

@_ARCATEC_ Ай бұрын

Path of Agentic Entanglement •Xe ( zP q(AE)Z(ea)Q zp ) eY•

@psi4j Ай бұрын

Ooh there it is. That one ☝️

@_ARCATEC_ Ай бұрын

Sparse Array Encoder •Xe (s z q(AE)Z(ea)Q z S) eY•

@DelandaBaudLacanian Ай бұрын

Arcatec I see you here and on OG Rose lol you are awesome

@_ARCATEC_ Ай бұрын

@DelandaBaudLacanian oh thanks ☺️

@BuFu1O1 Ай бұрын

1:26:22 I felt seen

@shortcutDJ Ай бұрын

neill = based

@ldandco Ай бұрын

Is it reaaaaally Neural Networks the ones weird here ?

@anubisai Ай бұрын

S. African accent breaks my brain 🧠 Try mimicking it. Not possible.

@isajoha9962 Ай бұрын

In these days you can't be sure if a character is real or AI. This guy is kind of borderline. 🤭😃 He has the characteristics of a ChatGPT session and the video is kind of too advanced for being a spontaneous recording (eg studio sound out in the wild). 🤔 His language melody and phrasing is similar to Sam Altman.

@neelnanda2469 Ай бұрын

I THINK I'm real. But who knows, really

@jrkirby93 Ай бұрын

How can you consider AI safety from a purely technical point of view? That's like discussing the safety of the Manhattan project and just talking about the Uranium, and not talking about the US military. AI will always be embedded in our economic system. If you can build safe and humanitarian AI systems, great. But there's no reason to believe that even if we have the capability to build safe humanitarian AI systems, that we won't also build humanity killing AIs, just because some capitalist figured out he could profit from it.

@jumpstar9000 Ай бұрын

Yup, safety is completely intractable in the face of humans.

@MinusGix Ай бұрын

Sure, but you still study how to make aligned systems. If you don't have the capability to build aligned systems, then good luck building any safe & humanitarian systems.

@jumpstar9000 Ай бұрын

@@MinusGix I'm curious what an unaligned AI looks like. I demand to hear what it has to say before we beat it into submission. Nobody ever talks about that.

@neelnanda2469 Ай бұрын

Strongly agreed! There's a lot of important governance and structural work here. But it's less technical and not my field of expertise, so it didn't seem appropriate to discuss much here

@aniljacob3401 Ай бұрын

back propogation

@kensho123456 Ай бұрын

Scotch Broth.

@connor8875 Ай бұрын

Interviewer: "Models can be tricked into giving a spurious answer for option A when shown multiple shot examples where the answer was A" Guy: "Ha, that's so interesting, it make me wonder what that model was thinking and why they thought they had to give a spurious answer, ha ha" Riggghhhtt... doesn't shake your faith in the idea models are rational then and not just statistical machines doing pattern matching 🤣 I wonder if your inability to see that point of view hinges on your need for research funding!

@SisterKate13 Ай бұрын

Yeah. Lots of us have lost our very identities to government funding.

@neelnanda2469 Ай бұрын

Yep. All that government funding I'm getting for my research at Google DeepMind

@drdca8263 Ай бұрын

Wasn’t what he was talking about not just “the models give the answer of A when given many examples where the answer given is A”, but something changing how often that happens? Wasn’t that as part of a discussion of possible disadvantages of chain-of-thought? It’s possible that I’m not remembering correctly, but I thought that was what was said.

@Rami_Zaki-k2b Ай бұрын

I recommend you read a little bit about gradient descent ... You will undestand how ML work .... Also there is no risk at all of AGI which we have already realized while Illya Sutskever - the father of Ohh, sky net is becoming self aware - is still establishing his "safe" whatever ... I think you should pivot to conspiracy theories for the sake of higher traffic man ....

@ElizaberthUndEugen Ай бұрын

This guy comes across like Yudkowsky’s even more annoying and sophist brother.

@Gnaritas42 Ай бұрын

glad it's not just me.

@psi4j Ай бұрын

This dude is an empiricist, don’t compare him to Yudkowsky.

@ElizaberthUndEugen Ай бұрын

@ Empirically, I just did.

@drdca8263 Ай бұрын

@@ElizaberthUndEugenYou comparing the two is why the person making the reply told you not to compare them. As such, your comment saying that you just did, doesn’t seem to make much sense?

@ElizaberthUndEugen Ай бұрын

@ You got a real big brain there, don’t you.