WE MUST ADD STRUCTURE TO DEEP LEARNING BECAUSE...

Рет қаралды 73,100

Күн бұрын

Dr. Paul Lessard and his collaborators have written a paper on "Categorical Deep Learning and Algebraic Theory of Architectures". They aim to make neural networks more interpretable, composable and amenable to formal reasoning. The key is mathematical abstraction, as exemplified by category theory - using monads to develop a more principled, algebraic approach to structuring neural networks.
We also discussed the limitations of current neural network architectures in terms of their ability to generalise and reason in a human-like way. In particular, the inability of neural networks to do unbounded computation equivalent to a Turing machine. Paul expressed optimism that this is not a fundamental limitation, but an artefact of current architectures and training procedures.
The power of abstraction - allowing us to focus on the essential structure while ignoring extraneous details. This can make certain problems more tractable to reason about. Paul sees category theory as providing a powerful "Lego set" for productively thinking about many practical problems.
Towards the end, Paul gave an accessible introduction to some core concepts in category theory like categories, morphisms, functors, monads etc. We explained how these abstract constructs can capture essential patterns that arise across different domains of mathematics.
Paul is optimistic about the potential of category theory and related mathematical abstractions to put AI and neural networks on a more robust conceptual foundation to enable interpretability and reasoning. However, significant theoretical and engineering challenges remain in realising this vision.
Please support us on Patreon. We are entirely funded from Patreon donations right now.
/ mlst
If you would like to sponsor us, so we can tell your story - reach out on mlstreettalk at gmail
Links:
Categorical Deep Learning: An Algebraic Theory of Architectures
Bruno Gavranović, Paul Lessard, Andrew Dudzik,
Tamara von Glehn, João G. M. Araújo, Petar Veličković
Paper: categoricaldeeplearning.com/
Symbolica:
/ symbolica
www.symbolica.ai/
Dr. Paul Lessard (Principal Scientist - Symbolica)
/ paul-roy-lessard
Neural Networks and the Chomsky Hierarchy (Grégoire Delétang et al)
arxiv.org/abs/2207.02098
Interviewer: Dr. Tim Scarfe
Pod: podcasters.spotify.com/pod/sh...
Transcript:
docs.google.com/document/d/1N...
More info about NNs not being recursive/TMs:
• Can ChatGPT Handle Inf...
Geometric Deep Learning blueprint:
• GEOMETRIC DEEP LEARNIN...
TOC:
00:00:00 - Intro
00:05:07 - What is the category paper all about
00:07:19 - Composition
00:10:42 - Abstract Algebra
00:23:01 - DSLs for machine learning
00:24:10 - Inscrutability
00:29:04 - Limitations with current NNs
00:30:41 - Generative code / NNs don't recurse
00:34:34 - NNs are not Turing machines (special edition)
00:53:09 - Abstraction
00:55:11 - Category theory objects
00:58:06 - Cat theory vs number theory
00:59:43 - Data and Code are one and the same
01:08:05 - Syntax and semantics
01:14:32 - Category DL elevator pitch
01:17:05 - Abstraction again
01:20:25 - Lego set for the universe
01:23:04 - Reasoning
01:28:05 - Category theory 101
01:37:42 - Monads
01:45:59 - Where to learn more cat theory

Пікірлер: 249

@deadeaded Ай бұрын

I'm slightly embarrassed at how excited I got when I saw the natural transformation square in the thumbnail...

@MDNQ-ud1ty Ай бұрын

You shouldn't, it is not a natural transformation square.

@deadeaded Ай бұрын

@@MDNQ-ud1tyWhat do you mean? It sure looks like a naturality square to me.

@tomhardyofmaths2594 Ай бұрын

Right?? I was like 'Oooh what's this?'

@PlayerMathinson Ай бұрын

@@MDNQ-ud1ty Yes that is the natural transformation square. alpha is natural transformation and F is functor

@MDNQ-ud1ty Ай бұрын

@@PlayerMathinson If you interpret it that way, ok then. But that is what mathematicians call a commuting square, not a natural transformation square. A natural transformation, while being what you said(a map between two functors), is written as differently(a double arrow between two functor arrows, a globular 2-morphism). The way I see it is that it is just the components of a natural transformation... at least potentially since of course we have to guess exactly what the other symbols. Basically: ncatlab org nlab show natural+transformation The first diagram is the natural transformation. The second is the commuting *square*(and which looks where he copied it from, so to speak) which is talking about the components. The reason the square, in my mind, is not technically a natural transformation is that a natural transformation requires it to be true for all morphisms and hence the different notation. Basically the square is a commuting square(assuming things commute) in the functor category. That may or may not be a component of some natural transformation(there may be no natural transformation between F and G). So to call it a natural transformation seems to me to be a bit loose with terminology.

@johntanchongmin Ай бұрын

I like Dr. Paul's thinking - clear, concise and very analytical. LLMs don't reason, but they can do some form of heuristic search. When used on some structure, it can lead to very powerful search over the structure provided and increase their reliability.

@andersbodin1551 Ай бұрын

More like some kind of compression of training data

@jabowery Ай бұрын

Removing the distinction between a function and data type is at the heart of Algorithmic Information. AND gee guess what? That is at the heart of Ockham's Razor!

@stretch8390 Ай бұрын

I haven't encountered this before so have a basic question: in what way is removing distinction between function and data type different to having first class functions?

@walidoutaleb7121 Ай бұрын

@@stretch8390 no difference its the same thing. in the original sicp lecture they are talked about interchangeably

@stretch8390 Ай бұрын

@@walidoutaleb7121thanks for that clarification.

@jabowery Ай бұрын

@@stretch8390 Think about 0 argument functions (containing no loops and that call no other functions) as program literals. The error terms in Kolmogorov Complexity programs (the representation of Algorithmic Information) are such functions.

@luisantonioguzmanbucio245 Ай бұрын

Yes! In fact typed lambda calculus or other type systems eg. Calculus of Inductive Constructions and so on, functions have a type. Some of these type systems also serve as a foundation of mathematics, including Homotopy type theory, discussed in the video.

@AliMoeeny Ай бұрын

Yet another exceptionally invaluable episode. Thank you Tim

@derricdubois1866 23 күн бұрын

The point of abstraction is to enable one to achieve a view of some particular forest by avoiding being blinded to such by the sight of some trees.

@thecyberofficial Ай бұрын

As an abstract handle theorist, everything is my nail, my screw, my bolt, ... :) Often, the details thrown away by categorisation are exactly what matters, otherwise you just end up working with the object theory in the roundabout Cat (or Topoi) meta-language.

@radscorpion8 23 күн бұрын

YOU THINK YOU'RE SOOOO SMART....and you probably are

@MDNQ-ud1ty 8 күн бұрын

Details matter. Without details there isn't anything. No one is throwing out details in abstraction, they are abstracting details. That is, generalizing and finding the common representation for what generates the details or how to factor them into common objects that are general. Category theory isn't really anything special in the sense that humans have been doing "Category theory" or thousands of years. What makes formal category theory great is it gives the precise tools/definitions to deal with complexity. I'm really only talking about your use of the word "throw away" as it as connotations that details don't matter when, in fact, details matter. One of the biggest problems in complexity is being able to operate at the right level of detail at the right time while not losing other levels of detail. When you lose "detail" you can't go back(non-invertible). Because mathematics rely so heavily on functions and functions are usually non-injective this creates loss of detail(two things being merged in to one thing without a way to "get back"). This can be beneficial because of finite time and resources if one can precisely "throw away" the detail one doesn't need but usually if one has to "get back" it becomes an intractable problem or much more complicated. I think the main benefit of modern category theory is that it makes precise how to think about things rather than having that vague idea that there is a "better way" but not really understand how to go about doing it. In fact, much of formal category theory is simply dealing with representations. So many things exist in our world(so many details) that are really just the same thing. Being able to go about determining such things in a formal process makes life much easier, specially when the "objects" are extremely complex. Category theory allows one effectively to treat every layer of complexity as the same(the same tools work at every layer).

@andrewwalker8985 Ай бұрын

How many people started watching this and feel like your passion for AI somehow tricked you into getting a maths degree

@Walczyk 22 күн бұрын

i got my degree before ai so no but i’m more interested now in algebraic geometry

@captainobvious9188 21 күн бұрын

I almost finished my degree in Math back in the 2000s for this reason, but medically got derailed and never made it back. I hope to get back someday!

@KunjaBihariKrishna 19 күн бұрын

"passion for AI" I just vomited

@andrewwalker8985 19 күн бұрын

@@KunjaBihariKrishna lol fair enough

@aitheignis Ай бұрын

This is an amazing video. I really love this tape. The idea about building formal language based on category theory to reason about some systems isn't limited to just application in neural network for sure. I can definitely see this being used in gene regulatory pathway. Thank you for the video, and I will definitely check out the paper.

@erikpost1381 Ай бұрын

For sure. I don't know anything about the domain you mentioned other than that it sounds interesting, but you may be interested to have a look at the AlgebraicJulia space.

@jumpstar9000 Ай бұрын

With regard to inscruitability around the 26 minute mark. My personal feeling is that the issue we face is with overloading of models. As an example, let's take an LLM. Current language models take a kitchen sink approach where we are pressing them to generate both coherent output and also apply reasoning. This doesn't really scale well when we introduce different modalities like vision, hearing or the central nervous system. We don't really want to be converting everything to text all the time and running it through a black box. Not simply because it is inefficient, but more that it isn't the right abstraction. It seems to me we should be training multiple models as an ensemble that compose from the outset where we have something akin to the pre-frontal cortex that does the planning in response to stimuli from other systems running in parallel. I have done quite a bit of thinking on this and I'm reasonably confident it can work. As for category theory and how it applies. If I squint I can kind of see it, but mostly in an abstract sense. I have built some prototypes for this that I guess you could say were type safe and informed by category theory. I can see it might help to have the formalism at this level to help with interpretability (because that's why I built them). Probabalistic category theory is more along the lines of what I have been thinking.

@tomaszjezak0 Ай бұрын

Would love to hear more about the brain approach

@chrism3440 12 күн бұрын

The concept of orchestrating multiple specialized models is intriguing and aligns with distributed systems' principles, where modularity and specialization reign. Hierarchical orchestration could indeed create an efficient top-down control mechanism, akin to a central nervous system, facilitating swift decision-making and prioritization. However, this might introduce a single point of failure and bottleneck issues. On the other hand, a distributed orchestration approach, inspired by decentralized neural networks, could offer resilience and parallel processing advantages. It encourages localized decision-making, akin to edge computing, allowing for real-time and context-aware responses. This could also align with principles of category theory, where morphisms between different model outputs ensure type safety and functional composition. Yet, I wonder if a hybrid model might not be the most robust path forward. This would dynamically shift between hierarchical and distributed paradigms based on the task complexity and computational constraints, possibly guided by meta-learning algorithms. Such fluidity might mirror the brain's ability to seamlessly integrate focused and diffused modes of thinking, leading to a more adaptable and potentially self-optimizing system. The implications for AI ethics and interpretability are profound. A hybrid orchestration could balance efficiency with the robustness of diverse inputs, potentially leading to AI systems whose decision-making processes are both comprehensible and auditable. Probabilistic category theory might play a vital role in this, offering a mathematically grounded framework to manage the complexity inherent in such systems.

@adokoka Ай бұрын

I believe Category Theory is the route to uncover how DNN and LLM work. For now, I think of a category as a higher level object that represents a semantic or topology. Imagine how lovely it would be if LLMs could be trained on categories possibly flattened into bytes.

@MikePaixao Ай бұрын

Nah, Numbers theory and fractal logic is where it's at :)

@adokoka Ай бұрын

@@MikePaixao It depends on the application.

@blackmail1807 Ай бұрын

Category theory isn’t a route to anything, it’s just the language of modern math. You can do whatever you want with it.

@grivza Ай бұрын

@@blackmail1807You are ignoring the role of language in leading your prospective formulations. For a naive example try doing some calculations using the Roman numerals.

@MikePaixao Ай бұрын

@@adokoka the problem with always relying on other people's theories is that you basically dead end your own creativity, my solutions to Ai have ended up looking like bits and pieces of a multitude of theories, but you honestly don't need any math or knowledge of existing models. By recreating or reverse engineering reality as a ground truth, you skip all the existing biases and limitations of existing solutions 🙂 I like to solve problems to truly understand the why they behave the way they do, I ask myself "why q* is effecient?" "Why does you know why converting to -101 can recreate 16bit float models precision?" I discovered all those systems last year when I reverse engineered how NERFS and GPT think and see the world -> then then did my own interpretation afterwards 🙃

@oncedidactic Ай бұрын

Great stuff! I enjoyed Paul’s way of talking about math - first the precise definition and then why do we care, part by part. Good work dragging it out until the pump primed itself 😅

@Daniel-Six 23 күн бұрын

This was an incredibly good discussion. Tim and company are definitely on to something elusive to articulate but crucial to appreciate regarding the real limitations of current machine "intelligence," and I can at least vaguely fathom how this will be made clear in the coming years.

@jonfe 24 күн бұрын

The guy talking about external read/write memory for improving AI is right for me, I was thinking exactly the same and have been developing a model that have a kind of memory for a timeseries problem, getting a lot of improvement in predictions.

@consumeentertainment9310 Ай бұрын

Amazing Talk!!

@mapleandsteel 29 күн бұрын

Claude Lévi-Strauss finally getting the respect he deserves

@asdf8asdf8asdf8asdf Ай бұрын

Dizzying abstract complexity surfing on a sea of reasonable issues and goals.

@u2b83 Ай бұрын

40:03 This is why I suspect NNs operated iteratively produce better results (e.g. stable diffusion, step by step reasoning, etc...). However finite recursion appears to be good enough in practice. In SAT problems you can pose recursive problems by unrolling the recursion loop, enabling proving properties of programs up to a certain size.

@hi-literyellow4483 12 күн бұрын

The british engineer is spot on. Respect sir for your clear vision and clarification of the BS sold by Google marketeers.

@pierredeloince9073 Ай бұрын

Thank you, how interesting 🤝

@davidrichards1302 Ай бұрын

Should we be thinking about "type adaptors"? Or is that too object-oriented?

@jonfe 24 күн бұрын

Reasoning for me is like having a gigant graph of "things" or "concepts" in your brain, learning the relationships between them thru experience, for example you can relate parts of an event to different one, just by finding correlations in the relationships between their internal parts, and doing that you can pass the learning of one event to the other.

@sirkiz1181 23 күн бұрын

Yeah which makes sense considering the structure of your brain. This sort of structuring is clearly the way forward but as a newcomer to AI it’s unclear to me how easy it is for AI and computers to understand concepts in the way that it is so intuitive for us and what kind of program would make that sort of understanding and subsequent reasoning possible

@AutomatedLiving09 Ай бұрын

I feel that my IQ increases just by watching this video.

@debunkthis Ай бұрын

It didn’t

@bwhit7919 Ай бұрын

Damn this is the first podcast I couldn’t just leave on 2x speed Edit nvm it was just the first 5 min

@user-wv9pw9tq1g Ай бұрын

Great discussion that gets into the weeds. Love the software engineer’s point of view. Only thing missing from Dr. Lessard is an ascot and a Glencairn of bourbon - because he wouldn’t dare sip Old Rip Van Winkle from a Snifter.😂

@mrpocock 23 күн бұрын

I kind of feel machine learning has a few foundational issues that you can only brute force by for so long. 1) as they say, there's no proper stack mechanism, so there are whole classes of problems that it can't actually model correctly but can only approximate special cases of. 2) the layers of a network build up to fit curves, but there's no proper way to extract equations for those curves and then replace that subnet with that equation. Including flow-control, so we are left with billions of parameters that are piece-wise fitting products and sine waves and exponentials and goodness knows what as complex sums of sums.

@2bsirius Ай бұрын

All they need is a membership card for admission to Jorge Borges' infinite library. I'm sure the resolution to this riddle is in one of the books in there somewhere.

@srivatsasrinivas6277 Ай бұрын

I'm skeptical about composability explaining neural networks because small neutral networks do not show the same properties as many chained together. Composability seems like a useful tool once the nets you're composing are already quite large. I think that the main contribution of category theory will be providing a dependent type theory for neural net specification. The next hype in explainable AI seems to come from the "energy based methods".

@wanfuse 7 күн бұрын

what an education one gets watching you guys! Thanks! on the stopping condition, why not stop on a proximity distance from the stop condition instead of exact? Trying iteratively can tell you what the limit of proximity is?

@erikowsiak Ай бұрын

I love your podcasts it seems you get all the right people to talk to :) just when I needed it :)

@colbynwadman7045 Ай бұрын

They should stop interrupting the speaker with random questions since it’s super annoying.

@darylallen2485 Ай бұрын

1:57 - Its been several years since took calculus, but I remember being exposed to some functions that calculated the area of a shape where the domain of the function was negative infinity to positive infinity, but the area was a finite number. Mathematically, seems it should be possible to achieve finite solutions with infinite inputs.

@lobovutare Ай бұрын

Gabriel's horn?

@darylallen2485 Ай бұрын

@@lobovutare Thats certainly one example.

@dhruvdatta1055 25 күн бұрын

in my opinion, the curve shape function, that we integrate can be considered a single inpuy

@chadx8269 26 күн бұрын

Professor Van Nostram do you allow questions?

@jumpstar9000 Ай бұрын

On the recursion topic, there was a little bit of confusion in the discussion. On the one hand there was something about language models not understanding recursion, but more key, they have trouble using recursion while producing output. Clearly LMs can write recursive code and even emulate it to some degree. In any case, it is possible to train an LM with action tokens that manipulate a stack in a way resembling FORTH and get full recursion. It may be possible to add this capability as a bolt-on to an existing LM via fine-tuning. Having this would expand capabilities no end, providing not just algorithmic execution but also features like context consolidation and general improvements to memory, especially if you also give them a tuple store where they can save and load state... yes, exactly, you said it.

@deadeaded Ай бұрын

Being able to write recursive code is totally irrelevant to what's going on under the hood. That's true in general. GPT can write the rules of chess, for example, but it cannot follow them. Don't be fooled into thinking that LLMs understand their output.

@jumpstar9000 Ай бұрын

@@deadeaded Yes, I was pointing out that there was initially some confusion in the discussion with regard to this.

@SLAM2977 Ай бұрын

This looks like very early stage academic research with very low prospects of a returns in the near/mid term, surprised that somebody was willing to put their money into it. Very interesting but too academic for a company, all the best to the guys.

@alelondon23 Ай бұрын

what makes you think the returns are so far? Let me remind you "Attention is all you need" was a single paper that triggered all these APPARENT (and probably not scalable)AI capabilities producing real returns.

@SLAM2977 Ай бұрын

@@alelondon23 there is no tangible evidence of it being applicable in a way that leads to competitive advantage at the moment, "just" a highly theoretical paper. Attention all you need had tangible results that supported the architecture("On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data."), then you can always remind me of a paper nobody cared about and years later was the solution to everything, somebody has to win the lottery...

@SLAM2977 Ай бұрын

Also frankly Google can afford to throw money at anything they want, hoping that among the many results some of their research will hit jackpot.

@JumpDiffusion Ай бұрын

@@alelondon23Attention paper had empirical results/evidence, not just architecture…

@eelcohoogendoorn8044 Ай бұрын

'Early stage academic research' is a bit kind, imo. This 'lets just slather it in category theory jargon so we can sound smart' thing isnt exactly a new idea.

@R0L913 20 күн бұрын

Not meet they are making mistakes and need fresh input. I am noting all the terms so I can learn. One of my kids is a linguist. Another is a recruiter and must recruit/ find people who can create programming languages that fit. It’s all one exciting thing. Remember Java, remember object oriented programming you’re important keep at it, you may create a breakthrough ❤

@colbynwadman7045 Ай бұрын

Both branches in an if expression in Haskell have to be of the same type. There are no union types like in other languages.

@davidallen5146 Ай бұрын

I think the future of these AI systems should be structured data in and out. This would support the concept of geometric deep learning as well as AI systems that can be more understandable and composable with each other and with traditionally programmed systems. This would also support the generation and use of domain specific interfaces/languages. We they also need is the ability to operate recurrently on these structures. This recurrence can occur internally to the AI systems, or as part of the composition of AI's.

@MrGeometres 10 күн бұрын

10:04 "Code is Data" is especially clear in Linear Algebra. A vector |v⟩ is data. A function is code. But a vector also canonically defines a linear function: x ↦ ⟨v∣x⟩.

@lincolnhannah2985 24 күн бұрын

LLMs store information in giant matrices of weights. Is there any model that can process a large amount of text and creat a relational database structure where the tables and fields are generated by the model as well as the data in them.

@stacksmasherninja7266 Ай бұрын

what was that template metaprogramming hack to pick the correct sorting algorithms? any references for that please? sounds super interesting

@nomenec Ай бұрын

Any chance you can join our MLST Discord (link at the bottom of the description), and send me (duggar) a mention from the software-engineering channel? We can better share and discuss there.

@nomenec Ай бұрын

Not sorting but here is an example from my recent code of providing two different downsample algorithms based on iterator traits: // random access iterators template < typename Iiter, typename Oiter > auto downsample ( Iiter & inext, Iiter idone, Oiter & onext, Oiter odone ) -> typename std::enable_if< std::is_same< typename std::iterator_traits::iterator_category, std::random_access_iterator_tag >::value, void>::type { // ... } // not random access iterators template < typename Iiter, typename Oiter > auto downsample ( Iiter & inext, Iiter idone, Oiter & onext, Oiter odone ) -> typename std::enable_if< !std::is_same< typename std::iterator_traits::iterator_category, std::random_access_iterator_tag >::value, void>::type { // ... } For very cool algebraic group examples check out Chapter 16 of "Scientific and Engineering C++: An Introduction With Advanced Techniques and Examples" by Barton & Nackman.

@andreismirnov69 Ай бұрын

original paper by Stepanov and Lee: citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=658343dd4b5153eb59f834a2ec8d82106db522a8 Later it became known as STL and ended up as part of C++ std lib

@srivatsasrinivas6277 Ай бұрын

I think that specificity is as important as abstraction Domain specific languages and programs mutually justify each other's existence

@drdca8263 22 күн бұрын

39:28 : small caveat to the “quantum computers can’t do anything a Turing machine can’t do” statement: while it is true that any individual computation that can be done by a quantum computer, can be done with a Turing machine (as a TM can simulate a QC), a quantum computer could have its memory be entangled with something else outside of it, while a Turing machine simulating a quantum computer can’t have the simulated quantum computer’s data be entangled with something which exists outside of the Turing machine. This might seem super irrelevant, but surprisingly, if you have two computationally powerful provers who can’t communicate with each-other, but do have a bunch of entanglement between them, and there is a judge communicating with both of them, then the entanglement between them can allow them to demonstrate to the judge that many computational problems have the answers they do, which the judge wouldn’t be able to compute for himself, and where the numbers of such problems that they could prove* the answer to to the judge, is greatly expanded when by their having a bunch of entanglement between them. MIP* = RE is iirc the result But, yeah, this is mostly just an obscure edge case, doesn’t really detract from the point being made, But I think it is a cool fact 53:18 : mostly just a bookmark for myself, But hm. How might we have a NN implement a FSM in a way that makes TMs that do something useful, be more viable? Like, one idea could be to have the state transitions be probabilistic, but to me that feels, questionable? But like, if you want to learn the FSM controlling the TM by gradient descent, you need to have some kind of differentiable parameters? Oh, here’s an idea: what if instead of the TM being probabilistic, you consider a probability distribution over FSMs, but use the same realization from the FSM throughout? Hm. That doesn’t seem like it would really like, be particularly amenable to things like, “learning the easy case first, and then learning how to modify it to fix the other cases”? Like, it seems like it would get stuck in a local minimum... Hmm... I guess if one did have a uniform distribution over TMs with at most N states, and had the distribution as the parameter, and like, looked at the expected score of the machines sampled from the distribution (where the score would be, “over the training set, what fraction of inputs resulted in the desired output, within T steps”, taking the gradient of that with respect to the parameters (i.e. the distribution) would, in principle, learn the program, provided that there was a TM with at most N states which solved the task within time T.. but that’s totally impractical. You (practically speaking) can’t just simulate all the N state TMs for T steps on a bunch of inputs. There are too many N state TMs. Maybe if some other way of ordering the possible FSMs was such that plausible programs occurred first? Like, maybe some structure beyond just “this state goes to that state”? Asdf.Qwertyuiop. Idk. Hm, when I think about what I would do to try to find the pattern in some data, I think one thing I might try, is to apply some transformation on either the input or the output, where the transformation is either invertible or almost invertible, and see if this makes it simpler? .. hm, If a random TM which always halts is selected (from some distribution), and one is given a random set of inputs and whether the TM accepts or rejects that input, and one’s task is to find a TM which agrees with the randomly selected TM on *all* inputs (not just the ones you were told what it’s output is for), how much help is it to also be told how long the secret chosen TM took to run, for each of the inputs on which you are told it’s output? I feel like, it would probably help quite a bit?

@Walczyk 22 күн бұрын

22:48 this is exactly how SQL formed!! The earlier structure of absolute trees stopped being practical once databases grew, and industry moved on fast. this will happen here for continued progress

@dr.mikeybee Ай бұрын

Semantic space is a model of human experience. Human experience is a real thing. Therefore the semantic space that is learned by masked learning is a model of a model. What intrigues me is that semantic space has a definite shape. This makes learned semantic spaces -- even in different languages -- similar.

@jumpstar9000 Ай бұрын

I'm only 8 minutes in, but it is making me nervous. The assertion regarding systems that are non-composable breaks down. Lists and Trees arent composable because of their representation paired with the choice of algorithm you are using. We already know that we can flatten trees to lists, or make list like trees.. or more abstractly introduce an additional dimension to both trees and lists that normalize their representation so we can apply a uniform algorithm. If you want to look at it from a different angle, we know that atoms form the basis of machines and therefore atoms have no problem dealing with both trees or lists. 2D images can also represent both data-types. The thing is, we don't walk around the world and change brains when we see an actual tree vs a train. Anyway, like I said I just got going. It is very interesting so far.. I'm sure all will be revealed. onward...

@tonysu8860 28 күн бұрын

In the segment NNs are not Turing machines, a lot of discussion seemed to be about how to limit recursive search and possibly that Turing machines are not capable of recursive functionality. I'm not a data scientist but have read the published AlphaZero paper and am somewhat familiar how that technology is implemented in Lc0. I've never looked at how that app terminates search but it's reasonable to assume it's determined by the parameters of gameplay. But I would also assume that limitation can be determined by other means, the observation that a "bright bit" might never light up is true but only if you think in an absolute sense which is generally how engineers think, in terms of precise and accurate results. I'd argue that problems like this requires a change of thinking more akin to quantum physics or economics where accuracy might be desirable if achievable but is more often determined by predominance when the answer is good enough if all the accumulated data and metadata suggests some very high level but not yet exact accuracy. Someone if not the algorithm itself has to set that threshold to illuminate that bright bit to signal the end to search and produce a result.

@your_utube Ай бұрын

In my view, with my limited knowledge, I think that the conversation about quantifying and classifying the primitives of ANNs should have been done by now and at least recording what has already now been learned over the last 2 decades into a format that allows you to merge it with existing systems is a given minimum. I ask myself whether one can explain existing ways to do computation in terms of the primitives of the ANN system that are popular now. In other words can we transform one process into another and back to at least prove what the limits and capabilities of the new ways are in terms of the well-known.

@FranAbenza Ай бұрын

Is human biological machinery better understood as a functional-driven system or OO? Why? from cell to cognition?

@glasperlinspiel 26 күн бұрын

Read Amaranthine: How to create a regenerative civilization using artificial intelligence

@u2b83 Ай бұрын

34:45 This diagram is really cool. The same simple finite state controller is iterating over different data structures. The complexity of the data structures enables the recognition or generation of different formal language classes. The surprise to me is that we can use [essentially] the same state machine to drive it.

@alvincepongos 21 күн бұрын

Say you apply category theory on NNs and you do find a geometric algebra that operationally formalizes the syntax and semantics of the system. Is it possible that the resulting algebra is exactly what's built in, compositions of activated linear equations? If that is the case, no insights are gained. To prevent this problem, how are CT/ML scientists posing the approach such that category theory's insights are deeper than that?

@CharlesVanNoland Ай бұрын

There was a paper about a cognitive architecture that combined an LSTM with an external memory to create a Neural Turing Machine called MERLIN a decade ago. There was a talk given about it that's over on the Simons Institute's KZbin channel called "An Integrated Cognitive Architecture".

@MachineLearningStreetTalk Ай бұрын

There are a bunch of cool architectures out there to make NNs simulate some TM-like behaviours, but, none are TMs. It's a cool area of research! It's also possible to make an NN which is like a TM which is not possible to train with SGD. I hope we make some progress here. Researchers - take up arms!

@charllsquarra1677 Ай бұрын

@@MachineLearningStreetTalk why wouldn't it be possible to train with a SGD? after all, commands in a TM are finite actions, which can be modelled with a GFlowNet, the only missing piece is an action that behaves as a terminal state and passes the output to a reward model that feedbacks into the GFlowNet

@nomenec Ай бұрын

@@charllsquarra1677 it's more of an empirical finding that as you increase the computational power of NNs, for example the various MANNs (memory augmented NNs), training starts running into extreme instability problems. I.e. we haven't yet figured out how to train MANNs for general purpose that is to search the entire space of Turing Complete algorithms rather than small subspaces like the FSA space. We might at some point, and the solution might even involve SGD. Just, nobody knows yet.

@transquantrademarkquantumf8894 8 күн бұрын

Nice edits great hi-speed symmetry

@max0x7ba 29 күн бұрын

You don't run RNN until bit 26 lights up. Rather you run it until it produces end-of-input token.

@andreismirnov69 Ай бұрын

would anyone recommend textbook level publications on category theory and homotopy type theory?

@pounchoutz Ай бұрын

Elements of infinity category theory by Emily Riehl and Dominic Verity

@hnanacc 29 күн бұрын

why is the nature infinite? what if it's just the same things repeating but with some variance? So also a plausible assumption is there is a large amount of information to be memorized, which needs further scaling but the model can emulate the variance.

@alexforget Ай бұрын

For sure you are right. Humans don't need to drive for 1000 of hours in each city. But if Tesla has the compute and data, they can always add a new algo and also win.

@tomaszjezak0 Ай бұрын

Regardless, the method is hitting a wall. The next problem will need a better approach, like they talk about

@dr.mikeybee Ай бұрын

We also explicitly create abstractions in transformers. The attention heads are creating new embeddings.

@MachineLearningStreetTalk Ай бұрын

Some abstractions are better than others. So far, we humans are rather good at creating some which machines can't learn. There are things like in-context learning or "algorithmic prompting" (arxiv.org/pdf/2211.09066.pdf) which explicitly code in certain (limited) types of reasoning in an LLM, like for example, adding numbers together out of distribution. If we could get NNs to learn this type of thing from data, that would be an advancement.

@charllsquarra1677 Ай бұрын

@@MachineLearningStreetTalk I'm sure you saw Andrew Karpathy video about tokenization. TLDR; tokenization is currently a mess that is swept under the rug, it is very hard for a LLM to properly do math when some multi-digit numbers are single tokens in their corpus

@MachineLearningStreetTalk Ай бұрын

I agree, tokenisation could be improved, but I don’t think it’s that big of a thing wrt learning to reason

@dr.mikeybee Ай бұрын

@MachineLearningStreetTalk, Yes, we'll keep working on optimizing what we can, including for prompt engineering and injection engineering, I suppose we can view the attention heads as a case of in-context learning as we calculate similarity weights and produce a newly formed calculated context. Of course the projection matrices are also acting as a kind of database retrieval. So here something is learned in the projection matrices that results in many changes to vector values in the context signature. The built (dare I say structured) new embeddings coming out of the attention heads are "decoded" in the MLP blocks for the tasks the MLPs were trained on. Nevertheless, higher level abstractions are being learned in all the differentiable MLP blocks. I don't think that can be denied. All in all, we need to discuss the semantic learning that happens for the embedding model via masked learning. This creates a geometric high-dimensional representation of a semantic space, positional encoding for syntactical agreement, attention heads for calculating similarity scores, projection matrices for information retrieval and filtering, MLPs for hierarchical learning of abstract features and categories, and residual connections for logic filtering. Of course there are many other possible component parts within this symbolic/connectionist hybrid system, since the FFNN is potentially functionally complete, but I think these are the main parts.

@lemurpotatoes7988 26 күн бұрын

More intelligent, structured masking strategies would be extremely helpful IMO. I like thinking about generative music and poetry models in this context. Masking a random one out of n notes or measures doesn't necessarily let you learn all the structure that's there.

@MikePaixao Ай бұрын

Too funny. I've been saying transformer models all put infinity in the wrong place 😂 You can get around finite limit, but not with transformers I would describe it like a data compression singularity :) 28:06 not that hard once you think about it for a bit, you end up with circular quadratic algebra 🙂 34:54 you can create a Turing machine and get around the Von Neumann bottlneck, then you end up somewhere near my non-transformer model 😊

@cryoshakespeare4465 Ай бұрын

Well, I had a great time watching this video, and considering I can abstract my own experiences into the category of human experiences in general, I'd say most people who watched it would enjoy it too. Thankfully, I'm also aware that my abductive capacities exist in the category of error-prone generalisations, and hence I can conclude that it's unlikely all human experiences of this show can be inferred from my own. While my ability to reason about why I typed this comment is limited, I can, at present, formalise it within the framework of human joke-making behaviours, itself in the category of appreciative homage-paying gestures.

@shahzodadavlatova7203 26 күн бұрын

Can you share the andrej karpathy talk?

@mobiusinversion Ай бұрын

Apologies for the pragmatism, but is this applicable in any realistic engineering driven effort?

@oncedidactic Ай бұрын

Well I think the jumping off point is expressly to envision what else is needed besides further engineering today’s systems, so the overlap might not be satisfactory. But I’d be interested to hear other takes.

@mobiusinversion Ай бұрын

@@oncedidactic thank you and I understand. I think my question is about assessing the ground truth of the word “needed”. I’m curious where this touches ground with comprehensible needs. What do you mean by needs and what is this addressing?

@patrickjdarrow 9 күн бұрын

@@mobiusinversion I took it that the work is motivated at least in part by the issues outlined early in the talk: explainability/interpretability, intractable architecture search, instability. These are the issues and the solutions potentially yielded by refounding ML in category theory are the “needs”

@mobiusinversion 7 күн бұрын

@@patrickjdarrow this sounds like publish or perish obligatory elegance. ML models of any reasonable power are non testable, that’s a fact, there are no QA procedures, only KPIs. Similarly, interpretability should be done at the input and output levels along with human I the loop subjective feedback. Personally, I don’t see AI and category theory going anywhere outside of Oxford.

@carlosdumbratzen6332 21 күн бұрын

as someone who only has a passing interest in these issues (because so far LLMs have not proven to be very usefull in my field, except for pampering papers), this was a very confusing watch.

@derekpmoore 24 күн бұрын

Re: domain specific languages - there are two domains: the problem domain and the solution domain.

@ariaden 24 күн бұрын

Yeah. Big props to the thumbnail. Maybe I will even watch the video, some time in my future,

@acortis 25 күн бұрын

mhmm, ... interesting how we can agree on the premises and yet drift apart on the conclusions. I desperately want to be wrong here, but I am afraid my human nature prevents me from trusting anyone who is trying to sell me something for which they do not have the simplest example of implementation. And here you are going to tell me, "It is their secret sauce, they are not going to tell you that!" ... maybe, and yet I feel like I spent almost two hours of my life listening to a pitch for "Category Theory" which only implementation is a GOAT, that does not mean the Greatest of All Theories. ... Again, would not be happier that being proved wrong with the most spectacular commercial product of all time! ... oh, almost forgot, great job from the part of the hosts, love the sharp questions!

@markwrede8878 Ай бұрын

We need A Mathematical Model of Relational Dialectical Reasoning.

@CristianGarcia Ай бұрын

Thanks! Vibes from the first 5 mins is that FSD Beta 12 seems to be working extremely well so the bet against this will have a hard time. Eager to watch the rest.

@MachineLearningStreetTalk Ай бұрын

I've not looked into it recently. I'm sure it's an incredible feat of engineering and may well work in many well-known situations (much like ChatGPT does). Would you trust it with your life though?

@lukahead6 26 күн бұрын

At 32:27, Paul's brain lights up so brightly you can see it through his skull. Dude's so electric, electrons be changing orbitals, and releasing photons

@robmorgan1214 24 күн бұрын

This focus abstraction whether algebraic or geometric is not the correct approach to this problem. Physicists made the same mistake with geometry.

@Siroitin Ай бұрын

35:45 yeah!

@samferrer Ай бұрын

The real power of category theory is in the way it treats relations and specially functional relations. Objects are not first class anymore but a mere consequence ... hence the power of "yoneda". Yet, I don't think there is a programming language that brings the awesomeness of category theory.

@rylaczero3740 Ай бұрын

Imagine imperative programming for a sec, now you know monads.

@dr.mikeybee Ай бұрын

FFNNs are functionally complete, but recursive loops are a problem. They can be solved in two ways however. A deep enough NN can unroll the loop. And multiple passes through the LLM with acquired context can do recursive operations. So I would argue that the statement that LLMs can't do recursion is false.

@MachineLearningStreetTalk Ай бұрын

LLMs can simulate recursion to some fixed size, but not unbounded depth - because they are finite state automatas. I would recommend to play back the segment a few times to grok it. Keith added a pinned note to our discord, and we have discussed it there in detail. This is an advanced topic so will take a few passes to understand. Keith's pinned comment below: "Traditional recurrent neural networks (RNNs) have a fixed, finite number of memory cells. In theory (assuming bounded range and precision), this limits their formal language recognition power to regular languages [Finite State Automata (FSA)], and in practice, RNNs have been shown to be unable to learn many context-free languages ... Standard recurrent neural networks (RNNs), including simple RNNs (Elman, 1990), GRUs (Cho et al., 2014), and LSTMs (Hochreiter & Schmidhuber, 1997), rely on a fixed, finite number of neurons to remember information across timesteps. When implemented with finite precision, they are theoretically just very large finite automata, restricting the class of formal languages they recognize to regular languages." Next, according to Hava Siegelmann herself, who originally "proved" the Turing-completeness of "RNNs"), we have: "To construct a Turing-complete RNN, we have to incorporate some encoding for the unbounded number of symbols on the Turing tape. This encoding can be done by: (a) unbounded precision of some neurons, (b) an unbounded number of neurons , or (c) a separate growing memory module." Such augmented RNNs are not RNNs, they are augmented RNNs. For example, calling a memory augmented NN (MANN) an NN would be as silly as calling a Turing machine an FSA because it is just a tape augmented FSA. That is pure obscurantism and Siegelmann is guilty of this same silliness depending on the paragraph. Distinguishing the different automata is vital and has practical consequences. Imagine if when Chomsky introduced the Chomsky Hierarchy some heckler in the audience was like "A Turing Machine is just an FSA. A Push Down Automata is just an FSA. All machines are FSAs. We don't need no stinking hierarchy!" arxiv.org/pdf/2210.01343.pdf

@dr.mikeybee Ай бұрын

@@MachineLearningStreetTalk LOL! You're funny. I love that "We don't need no stinkin'" line. I loved the movie too. Anyway thank you for the very thoughtful response. I was aware of the limitations of finite systems, but I love how you make this explicit -- also that a growable memory can give the appearance of Turing-completeness. That's a keeper. Language is tough. Because I talk to synthetic artifacts every day, I'm very aware of how difficult it is to condense high-dimensional ideas into a representation that conveys intended ideas and context. And of course decoding is just as difficult. Thanks for the additional context injection. Cheers!

@Dr.Z.Moravcik-inventor-of-AGI Ай бұрын

Guys, it must be hard for you to talk about AGI that is here already since 2016.

@nippur_x3570 25 күн бұрын

About NN and Turing completenes: I don't understand how you need specifically read/write memory to have a Turing Complete computing. You just need a Turing Complete Language like Lambda Calculus. So, I don't see any obstruction for neural network, with the right framework and the right language (probably using category theory) to do it.

@drdca8263 22 күн бұрын

Well, you do need like, unbounded state? But I think they are saying more, “a FSM the controls a read/write head, is sufficient for Turning completeness”, not “that’s the only way to be Turing complete”? To put a NN in there, you do need to put reals/floats in there somewhere I think. Idk where you’d put them in for lambda calculus? Like... hm.

@nippur_x3570 21 күн бұрын

@@drdca8263 Sorry for the misunderstanding it's not my point. My point is that the read/write state for the Turing Completeness property is not probably the right point of view on this problem. Lambda calculus was just to illustrate my point. You "just" need complete symbolic manipulation on a Turing Complete language for the NN to be Turing complete

@drdca8263 20 күн бұрын

@@nippur_x3570 I think the same essential thing should still apply? Like, in any Turing complete model of computation, there should be an analogy to the FSM part. The NN component will be a finite thing. Possibly it can take input from an unbounded sized part of the state of the computation, but this can always be split into parts of bounded size along with something that does the same thing over a subset of the data, and there will be like, some alternation between feeding things into the neural net components and getting outputs, and using those outputs to determine what parts are next to be used as inputs, right?

@CharlesVanNoland Ай бұрын

Also: Tim! Fix the chapter title, it's "Data and Code are one *and* the same". :]

@MachineLearningStreetTalk Ай бұрын

Done! Thank you sir

@hammerdureason8926 20 күн бұрын

on domain specific languages -- hell is ( understanding ) other people's code where "other people" includes yourself 6 months ago

@Walczyk 22 күн бұрын

we need more cutaways with explainers

@GlobalTheatreSkitsoanalysis Ай бұрын

In addition to Number Theory..any opinions about Group Theory vs Category Theory? And Set Theory vs Category Theory?

@preston3291 23 күн бұрын

chill with the sound effects

@EnesDeumic 25 күн бұрын

Interesting. But too many interruptions, let the guest talk more. We know you know, no need to prove it all the time.

@CybermindForge Ай бұрын

@lemurpotatoes7988 26 күн бұрын

I don't see why types are more or less of a problem than values of the same type that are very far or different from one another. Suppose that every piece of data that goes down Branch 1 ends up with its output in a chaotic ugly region of a function and every piece of data that goes down Branch 2 ends up in a nice simple region. You can have a function that handles both cases follow, yes, but that's the exact same scenario as writing a function that takes in either lists or trees as its input.

@lemurpotatoes7988 26 күн бұрын

I know neither category theory nor functional programming and I didn't grok Abstract Algebra I, I'm just coming at this from an applied math and stats perspective.

@HJ-gg6ju 23 күн бұрын

Whats with the distracting sound effects?

@jondor654 26 күн бұрын

Is the thread of intuition the pervasive glue that grounds the formalisms .

@mootytootyfrooty Ай бұрын

yo okay here's what's up Give me compact ONNX binaries but with non-static weights.

@ICopiedJohnOswald Ай бұрын

The part about if statements was very confused. The guy said that if you have an If expression where both branches return type T then you need to return a union of T and T and that that is not the same as T. This is wrong. If you look at the typing rules for boolean elimination (if expressions) you have: Gamma |- (t1 : Boolean) Gamma |- (t2 : T) Gamma |- (t3 : T) ------------------------------------------------------------------------------------------------ (if t1 then t2 else t3) : T In other words, an if statement is well typed if your predicate evaluates to a boolean and both branches return the same type T and this makes the if expression have type T.

@user-qm4ev6jb7d Ай бұрын

Agreed, putting it in terms of unions from the outset is rather weird. But I can see how one would arrive at a rule where boolean elimination is always in terms of unions. Specifically, if one is approaching it from the side of a language like Typescript, in which unions are already everywhere. Typescript's types are weird.

@ICopiedJohnOswald Ай бұрын

@@user-qm4ev6jb7dI can't speak to typescript other then to say yeah that is probably a bad place to get an intuition for type theory, but talking about Union types (NOT Disjoint Union types), isn't it the case that `Union T T = T`? Regardless, you don't need Union types to deal with if expressions. I think the interviewer generally had trouble thinking with types and also was conflating type theory and category theory at times.

@user-qm4ev6jb7d Ай бұрын

@@ICopiedJohnOswald Even if we assume actual union types, not disjoint ones, claiming that Union T T *equals* T is a very strong claim. Not all type theories have such a strong notion of equality. Am I correct that you are looking at it specifically from the perspective of *univalent* type theory?

@MachineLearningStreetTalk Ай бұрын

Sorry, this is not an area of expertise for us but we hope to make more content and explore it further

@ICopiedJohnOswald Ай бұрын

@@user-qm4ev6jb7dSorry I was playing fast and loose in youtube comments, disregard that comment. And no I'm not taking the perspective of univalent type theory as I am woefully under read on HoTT.

@jonfe 24 күн бұрын

maybe we should get back to analog to improve our AI.

@explicitlynotboundby Ай бұрын

Re: "Grothendieck's Theory-building for problem solving (kzbin.info/www/bejne/qJrIXmx3es2Mmrs) reminds me of Rob Pike Rule 5: "Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming."

@mbengiepeter965 27 күн бұрын

In the context of Large Language Models (LLMs) like me, the analogy of training data as the program and the neural network as the compiler isn't quite accurate. Instead, you might think of the **training data** as the **knowledge base** or **source code**, which provides the raw information and examples that the model learns from. The **neural network**, on the other hand, functions more like the **processor** or **execution environment** that interprets this data to generate responses or perform tasks. The **training process** itself could be likened to **compilation**, where the model is trained on the data to create a set of weights and parameters that define how it will respond to inputs. This is a bit like compiling source code into an executable program. However, unlike traditional compilation, this process is not a one-time conversion but an iterative optimization that can continue improving over time.

@Hans_Magnusson 24 күн бұрын

Just the title should scare you

@kiffeeify 19 күн бұрын

There is a brilliant (and also quite different) talk quite relevant to the stuff discussed around 16:30 from one of the rust core devs, they call it generic effects of functions. I would love to see a language that supports stuff like this :-) kzbin.info/www/bejne/g4XRepiuidlses0 One effekt would be "this method returns", others could be "this method allocates", "this method has constant complexity"