I like Dr. Paul's thinking - clear, concise and very analytical. LLMs don't reason, but they can do some form of heuristic search. When used on some structure, it can lead to very powerful search over the structure provided and increase their reliability.
@andersbodin15519 ай бұрын
More like some kind of compression of training data
@devlinbowman5251Ай бұрын
I can’t believe it took me so long to discover this video. There are so many things here that make me feel so much less crazy having heard you guys discuss in the way you did. Very enlightening.
@derricdubois18668 ай бұрын
The point of abstraction is to enable one to achieve a view of some particular forest by avoiding being blinded to such by the sight of some trees.
@carlosdumbratzen63328 ай бұрын
as someone who only has a passing interest in these issues (because so far LLMs have not proven to be very usefull in my field, except for pampering papers), this was a very confusing watch.
@aitheignis9 ай бұрын
This is an amazing video. I really love this tape. The idea about building formal language based on category theory to reason about some systems isn't limited to just application in neural network for sure. I can definitely see this being used in gene regulatory pathway. Thank you for the video, and I will definitely check out the paper.
@erikpost13819 ай бұрын
For sure. I don't know anything about the domain you mentioned other than that it sounds interesting, but you may be interested to have a look at the AlgebraicJulia space.
@AliMoeeny9 ай бұрын
Yet another exceptionally invaluable episode. Thank you Tim
@kamilziemian9956 ай бұрын
I subscribe to this message.
@jabowery9 ай бұрын
Removing the distinction between a function and data type is at the heart of Algorithmic Information. AND gee guess what? That is at the heart of Ockham's Razor!
@stretch83909 ай бұрын
I haven't encountered this before so have a basic question: in what way is removing distinction between function and data type different to having first class functions?
@walidoutaleb71219 ай бұрын
@@stretch8390 no difference its the same thing. in the original sicp lecture they are talked about interchangeably
@stretch83909 ай бұрын
@@walidoutaleb7121thanks for that clarification.
@jabowery9 ай бұрын
@@stretch8390 Think about 0 argument functions (containing no loops and that call no other functions) as program literals. The error terms in Kolmogorov Complexity programs (the representation of Algorithmic Information) are such functions.
@luisantonioguzmanbucio2459 ай бұрын
Yes! In fact typed lambda calculus or other type systems eg. Calculus of Inductive Constructions and so on, functions have a type. Some of these type systems also serve as a foundation of mathematics, including Homotopy type theory, discussed in the video.
@colbyn-wadman9 ай бұрын
Both branches in an if expression in Haskell have to be of the same type. There are no union types like in other languages.
@Jorn-sy6ho2 ай бұрын
I once asked how much data can be stored in metadata, but these meta databases are a whole new source of knowledge for AI to be trained on. I really love philosophy!
@darylallen24859 ай бұрын
1:57 - Its been several years since took calculus, but I remember being exposed to some functions that calculated the area of a shape where the domain of the function was negative infinity to positive infinity, but the area was a finite number. Mathematically, seems it should be possible to achieve finite solutions with infinite inputs.
@lobovutare9 ай бұрын
Gabriel's horn?
@darylallen24859 ай бұрын
@@lobovutare Thats certainly one example.
@dhruvdatta10558 ай бұрын
in my opinion, the curve shape function, that we integrate can be considered a single inpuy
@jumpstar90009 ай бұрын
With regard to inscruitability around the 26 minute mark. My personal feeling is that the issue we face is with overloading of models. As an example, let's take an LLM. Current language models take a kitchen sink approach where we are pressing them to generate both coherent output and also apply reasoning. This doesn't really scale well when we introduce different modalities like vision, hearing or the central nervous system. We don't really want to be converting everything to text all the time and running it through a black box. Not simply because it is inefficient, but more that it isn't the right abstraction. It seems to me we should be training multiple models as an ensemble that compose from the outset where we have something akin to the pre-frontal cortex that does the planning in response to stimuli from other systems running in parallel. I have done quite a bit of thinking on this and I'm reasonably confident it can work. As for category theory and how it applies. If I squint I can kind of see it, but mostly in an abstract sense. I have built some prototypes for this that I guess you could say were type safe and informed by category theory. I can see it might help to have the formalism at this level to help with interpretability (because that's why I built them). Probabalistic category theory is more along the lines of what I have been thinking.
@tomaszjezak09 ай бұрын
Would love to hear more about the brain approach
@chrism34408 ай бұрын
The concept of orchestrating multiple specialized models is intriguing and aligns with distributed systems' principles, where modularity and specialization reign. Hierarchical orchestration could indeed create an efficient top-down control mechanism, akin to a central nervous system, facilitating swift decision-making and prioritization. However, this might introduce a single point of failure and bottleneck issues. On the other hand, a distributed orchestration approach, inspired by decentralized neural networks, could offer resilience and parallel processing advantages. It encourages localized decision-making, akin to edge computing, allowing for real-time and context-aware responses. This could also align with principles of category theory, where morphisms between different model outputs ensure type safety and functional composition. Yet, I wonder if a hybrid model might not be the most robust path forward. This would dynamically shift between hierarchical and distributed paradigms based on the task complexity and computational constraints, possibly guided by meta-learning algorithms. Such fluidity might mirror the brain's ability to seamlessly integrate focused and diffused modes of thinking, leading to a more adaptable and potentially self-optimizing system. The implications for AI ethics and interpretability are profound. A hybrid orchestration could balance efficiency with the robustness of diverse inputs, potentially leading to AI systems whose decision-making processes are both comprehensible and auditable. Probabilistic category theory might play a vital role in this, offering a mathematically grounded framework to manage the complexity inherent in such systems.
@andrewwalker89859 ай бұрын
How many people started watching this and feel like your passion for AI somehow tricked you into getting a maths degree
@Walczyk8 ай бұрын
i got my degree before ai so no but i’m more interested now in algebraic geometry
@captainobvious91888 ай бұрын
I almost finished my degree in Math back in the 2000s for this reason, but medically got derailed and never made it back. I hope to get back someday!
@KunjaBihariKrishna8 ай бұрын
"passion for AI" I just vomited
@andrewwalker89858 ай бұрын
@@KunjaBihariKrishna lol fair enough
@kingcoherent4 ай бұрын
My passion for maths tricked me into getting a career in AI!
@camellkachour41122 ай бұрын
Paul Lessard looks very didactical ! It is incredible that category theory could enhanced aspects of deep learning !
@alvincepongos8 ай бұрын
Say you apply category theory on NNs and you do find a geometric algebra that operationally formalizes the syntax and semantics of the system. Is it possible that the resulting algebra is exactly what's built in, compositions of activated linear equations? If that is the case, no insights are gained. To prevent this problem, how are CT/ML scientists posing the approach such that category theory's insights are deeper than that?
@Jorn-sy6ho2 ай бұрын
I love how numbers theory isn't that informative to you. I learned a lot of maths at school and never knew what i was doing and why. I'll be researching category theory, sounds interesting!
@Jorn-sy6ho2 ай бұрын
Category theory sounds so nice! real intersectionality on basis of proof
@srivatsasrinivas62779 ай бұрын
I'm skeptical about composability explaining neural networks because small neutral networks do not show the same properties as many chained together. Composability seems like a useful tool once the nets you're composing are already quite large. I think that the main contribution of category theory will be providing a dependent type theory for neural net specification. The next hype in explainable AI seems to come from the "energy based methods".
@thecyberofficial9 ай бұрын
As an abstract handle theorist, everything is my nail, my screw, my bolt, ... :) Often, the details thrown away by categorisation are exactly what matters, otherwise you just end up working with the object theory in the roundabout Cat (or Topoi) meta-language.
@radscorpion88 ай бұрын
YOU THINK YOU'RE SOOOO SMART....and you probably are
@MDNQ-ud1ty8 ай бұрын
Details matter. Without details there isn't anything. No one is throwing out details in abstraction, they are abstracting details. That is, generalizing and finding the common representation for what generates the details or how to factor them into common objects that are general. Category theory isn't really anything special in the sense that humans have been doing "Category theory" or thousands of years. What makes formal category theory great is it gives the precise tools/definitions to deal with complexity. I'm really only talking about your use of the word "throw away" as it as connotations that details don't matter when, in fact, details matter. One of the biggest problems in complexity is being able to operate at the right level of detail at the right time while not losing other levels of detail. When you lose "detail" you can't go back(non-invertible). Because mathematics rely so heavily on functions and functions are usually non-injective this creates loss of detail(two things being merged in to one thing without a way to "get back"). This can be beneficial because of finite time and resources if one can precisely "throw away" the detail one doesn't need but usually if one has to "get back" it becomes an intractable problem or much more complicated. I think the main benefit of modern category theory is that it makes precise how to think about things rather than having that vague idea that there is a "better way" but not really understand how to go about doing it. In fact, much of formal category theory is simply dealing with representations. So many things exist in our world(so many details) that are really just the same thing. Being able to go about determining such things in a formal process makes life much easier, specially when the "objects" are extremely complex. Category theory allows one effectively to treat every layer of complexity as the same(the same tools work at every layer).
@asdf8asdf8asdf8asdf9 ай бұрын
Dizzying abstract complexity surfing on a sea of reasonable issues and goals.
@stacksmasherninja72669 ай бұрын
what was that template metaprogramming hack to pick the correct sorting algorithms? any references for that please? sounds super interesting
@nomenec9 ай бұрын
Any chance you can join our MLST Discord (link at the bottom of the description), and send me (duggar) a mention from the software-engineering channel? We can better share and discuss there.
@nomenec9 ай бұрын
Not sorting but here is an example from my recent code of providing two different downsample algorithms based on iterator traits: // random access iterators template < typename Iiter, typename Oiter > auto downsample ( Iiter & inext, Iiter idone, Oiter & onext, Oiter odone ) -> typename std::enable_if< std::is_same< typename std::iterator_traits::iterator_category, std::random_access_iterator_tag >::value, void>::type { // ... } // not random access iterators template < typename Iiter, typename Oiter > auto downsample ( Iiter & inext, Iiter idone, Oiter & onext, Oiter odone ) -> typename std::enable_if< !std::is_same< typename std::iterator_traits::iterator_category, std::random_access_iterator_tag >::value, void>::type { // ... } For very cool algebraic group examples check out Chapter 16 of "Scientific and Engineering C++: An Introduction With Advanced Techniques and Examples" by Barton & Nackman.
@andreismirnov699 ай бұрын
original paper by Stepanov and Lee: citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=658343dd4b5153eb59f834a2ec8d82106db522a8 Later it became known as STL and ended up as part of C++ std lib
@CharlesVanNoland9 ай бұрын
There was a paper about a cognitive architecture that combined an LSTM with an external memory to create a Neural Turing Machine called MERLIN a decade ago. There was a talk given about it that's over on the Simons Institute's KZbin channel called "An Integrated Cognitive Architecture".
@MachineLearningStreetTalk9 ай бұрын
There are a bunch of cool architectures out there to make NNs simulate some TM-like behaviours, but, none are TMs. It's a cool area of research! It's also possible to make an NN which is like a TM which is not possible to train with SGD. I hope we make some progress here. Researchers - take up arms!
@charllsquarra16779 ай бұрын
@@MachineLearningStreetTalk why wouldn't it be possible to train with a SGD? after all, commands in a TM are finite actions, which can be modelled with a GFlowNet, the only missing piece is an action that behaves as a terminal state and passes the output to a reward model that feedbacks into the GFlowNet
@nomenec9 ай бұрын
@@charllsquarra1677 it's more of an empirical finding that as you increase the computational power of NNs, for example the various MANNs (memory augmented NNs), training starts running into extreme instability problems. I.e. we haven't yet figured out how to train MANNs for general purpose that is to search the entire space of Turing Complete algorithms rather than small subspaces like the FSA space. We might at some point, and the solution might even involve SGD. Just, nobody knows yet.
@drdca82638 ай бұрын
39:28 : small caveat to the “quantum computers can’t do anything a Turing machine can’t do” statement: while it is true that any individual computation that can be done by a quantum computer, can be done with a Turing machine (as a TM can simulate a QC), a quantum computer could have its memory be entangled with something else outside of it, while a Turing machine simulating a quantum computer can’t have the simulated quantum computer’s data be entangled with something which exists outside of the Turing machine. This might seem super irrelevant, but surprisingly, if you have two computationally powerful provers who can’t communicate with each-other, but do have a bunch of entanglement between them, and there is a judge communicating with both of them, then the entanglement between them can allow them to demonstrate to the judge that many computational problems have the answers they do, which the judge wouldn’t be able to compute for himself, and where the numbers of such problems that they could prove* the answer to to the judge, is greatly expanded when by their having a bunch of entanglement between them. MIP* = RE is iirc the result But, yeah, this is mostly just an obscure edge case, doesn’t really detract from the point being made, But I think it is a cool fact 53:18 : mostly just a bookmark for myself, But hm. How might we have a NN implement a FSM in a way that makes TMs that do something useful, be more viable? Like, one idea could be to have the state transitions be probabilistic, but to me that feels, questionable? But like, if you want to learn the FSM controlling the TM by gradient descent, you need to have some kind of differentiable parameters? Oh, here’s an idea: what if instead of the TM being probabilistic, you consider a probability distribution over FSMs, but use the same realization from the FSM throughout? Hm. That doesn’t seem like it would really like, be particularly amenable to things like, “learning the easy case first, and then learning how to modify it to fix the other cases”? Like, it seems like it would get stuck in a local minimum... Hmm... I guess if one did have a uniform distribution over TMs with at most N states, and had the distribution as the parameter, and like, looked at the expected score of the machines sampled from the distribution (where the score would be, “over the training set, what fraction of inputs resulted in the desired output, within T steps”, taking the gradient of that with respect to the parameters (i.e. the distribution) would, in principle, learn the program, provided that there was a TM with at most N states which solved the task within time T.. but that’s totally impractical. You (practically speaking) can’t just simulate all the N state TMs for T steps on a bunch of inputs. There are too many N state TMs. Maybe if some other way of ordering the possible FSMs was such that plausible programs occurred first? Like, maybe some structure beyond just “this state goes to that state”? Asdf.Qwertyuiop. Idk. Hm, when I think about what I would do to try to find the pattern in some data, I think one thing I might try, is to apply some transformation on either the input or the output, where the transformation is either invertible or almost invertible, and see if this makes it simpler? .. hm, If a random TM which always halts is selected (from some distribution), and one is given a random set of inputs and whether the TM accepts or rejects that input, and one’s task is to find a TM which agrees with the randomly selected TM on *all* inputs (not just the ones you were told what it’s output is for), how much help is it to also be told how long the secret chosen TM took to run, for each of the inputs on which you are told it’s output? I feel like, it would probably help quite a bit?
@lincolnhannah29858 ай бұрын
LLMs store information in giant matrices of weights. Is there any model that can process a large amount of text and creat a relational database structure where the tables and fields are generated by the model as well as the data in them.
@briancornish20766 ай бұрын
As a public service I offer this reading list of polymaths to help to do what the title of the video seems to be looking for. Most of these were mathematically literate: one even invented the calculus. Aristotle on categories. Spinoza on substance and essence. Leibniz on universal language. Locke on empiricism and experience. Roget's synopsis of categories - still the only comprehensive workable schema of categories that I have been able to find. C S Peirce on semiotics. Nicolai Hartmann on levels of reality. Joseph Needham on integrative levels. Norbert Wiener, cybernetics. Turing not only on computing but patterning, activation-inhibition. George Spencer Brown, distinctions or the calculus of indications. Heinz von Forster on cybernetics. You could go back and read McCulloch et al on neural networks if you haven't already. Only then go back and continue coding, or trying to, if you haven't given up. But I still don't know what problem machine learning is trying to solve or how we will know when we've solved it.
@adokoka9 ай бұрын
I believe Category Theory is the route to uncover how DNN and LLM work. For now, I think of a category as a higher level object that represents a semantic or topology. Imagine how lovely it would be if LLMs could be trained on categories possibly flattened into bytes.
@MikePaixao9 ай бұрын
Nah, Numbers theory and fractal logic is where it's at :)
@adokoka9 ай бұрын
@@MikePaixao It depends on the application.
@blackmail18079 ай бұрын
Category theory isn’t a route to anything, it’s just the language of modern math. You can do whatever you want with it.
@grivza9 ай бұрын
@@blackmail1807You are ignoring the role of language in leading your prospective formulations. For a naive example try doing some calculations using the Roman numerals.
@MikePaixao9 ай бұрын
@@adokoka the problem with always relying on other people's theories is that you basically dead end your own creativity, my solutions to Ai have ended up looking like bits and pieces of a multitude of theories, but you honestly don't need any math or knowledge of existing models. By recreating or reverse engineering reality as a ground truth, you skip all the existing biases and limitations of existing solutions 🙂 I like to solve problems to truly understand the why they behave the way they do, I ask myself "why q* is effecient?" "Why does you know why converting to -101 can recreate 16bit float models precision?" I discovered all those systems last year when I reverse engineered how NERFS and GPT think and see the world -> then then did my own interpretation afterwards 🙃
@mrpocock8 ай бұрын
I kind of feel machine learning has a few foundational issues that you can only brute force by for so long. 1) as they say, there's no proper stack mechanism, so there are whole classes of problems that it can't actually model correctly but can only approximate special cases of. 2) the layers of a network build up to fit curves, but there's no proper way to extract equations for those curves and then replace that subnet with that equation. Including flow-control, so we are left with billions of parameters that are piece-wise fitting products and sine waves and exponentials and goodness knows what as complex sums of sums.
@SLAM29779 ай бұрын
This looks like very early stage academic research with very low prospects of a returns in the near/mid term, surprised that somebody was willing to put their money into it. Very interesting but too academic for a company, all the best to the guys.
@alelondon239 ай бұрын
what makes you think the returns are so far? Let me remind you "Attention is all you need" was a single paper that triggered all these APPARENT (and probably not scalable)AI capabilities producing real returns.
@SLAM29779 ай бұрын
@@alelondon23 there is no tangible evidence of it being applicable in a way that leads to competitive advantage at the moment, "just" a highly theoretical paper. Attention all you need had tangible results that supported the architecture("On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data."), then you can always remind me of a paper nobody cared about and years later was the solution to everything, somebody has to win the lottery...
@SLAM29779 ай бұрын
Also frankly Google can afford to throw money at anything they want, hoping that among the many results some of their research will hit jackpot.
@JumpDiffusion9 ай бұрын
@@alelondon23Attention paper had empirical results/evidence, not just architecture…
@eelcohoogendoorn80449 ай бұрын
'Early stage academic research' is a bit kind, imo. This 'lets just slather it in category theory jargon so we can sound smart' thing isnt exactly a new idea.
@chadx82698 ай бұрын
Professor Van Nostram do you allow questions?
@FranAbenza9 ай бұрын
Is human biological machinery better understood as a functional-driven system or OO? Why? from cell to cognition?
@glasperlinspiel8 ай бұрын
Read Amaranthine: How to create a regenerative civilization using artificial intelligence
@bwhit79199 ай бұрын
Damn this is the first podcast I couldn’t just leave on 2x speed Edit nvm it was just the first 5 min
@jonfe8 ай бұрын
Reasoning for me is like having a gigant graph of "things" or "concepts" in your brain, learning the relationships between them thru experience, for example you can relate parts of an event to different one, just by finding correlations in the relationships between their internal parts, and doing that you can pass the learning of one event to the other.
@sirkiz11818 ай бұрын
Yeah which makes sense considering the structure of your brain. This sort of structuring is clearly the way forward but as a newcomer to AI it’s unclear to me how easy it is for AI and computers to understand concepts in the way that it is so intuitive for us and what kind of program would make that sort of understanding and subsequent reasoning possible
@Walczyk8 ай бұрын
22:48 this is exactly how SQL formed!! The earlier structure of absolute trees stopped being practical once databases grew, and industry moved on fast. this will happen here for continued progress
@max0x7ba8 ай бұрын
You don't run RNN until bit 26 lights up. Rather you run it until it produces end-of-input token.
@alivecoding49957 ай бұрын
@Tim What paper are you refering to when speaking about “type-two generalization“ and reasoning?
@lukahead68 ай бұрын
At 32:27, Paul's brain lights up so brightly you can see it through his skull. Dude's so electric, electrons be changing orbitals, and releasing photons
@tonysu88608 ай бұрын
In the segment NNs are not Turing machines, a lot of discussion seemed to be about how to limit recursive search and possibly that Turing machines are not capable of recursive functionality. I'm not a data scientist but have read the published AlphaZero paper and am somewhat familiar how that technology is implemented in Lc0. I've never looked at how that app terminates search but it's reasonable to assume it's determined by the parameters of gameplay. But I would also assume that limitation can be determined by other means, the observation that a "bright bit" might never light up is true but only if you think in an absolute sense which is generally how engineers think, in terms of precise and accurate results. I'd argue that problems like this requires a change of thinking more akin to quantum physics or economics where accuracy might be desirable if achievable but is more often determined by predominance when the answer is good enough if all the accumulated data and metadata suggests some very high level but not yet exact accuracy. Someone if not the algorithm itself has to set that threshold to illuminate that bright bit to signal the end to search and produce a result.
@your_utube9 ай бұрын
In my view, with my limited knowledge, I think that the conversation about quantifying and classifying the primitives of ANNs should have been done by now and at least recording what has already now been learned over the last 2 decades into a format that allows you to merge it with existing systems is a given minimum. I ask myself whether one can explain existing ways to do computation in terms of the primitives of the ANN system that are popular now. In other words can we transform one process into another and back to at least prove what the limits and capabilities of the new ways are in terms of the well-known.
@hnanacc8 ай бұрын
why is the nature infinite? what if it's just the same things repeating but with some variance? So also a plausible assumption is there is a large amount of information to be memorized, which needs further scaling but the model can emulate the variance.
@dr.mikeybee9 ай бұрын
Semantic space is a model of human experience. Human experience is a real thing. Therefore the semantic space that is learned by masked learning is a model of a model. What intrigues me is that semantic space has a definite shape. This makes learned semantic spaces -- even in different languages -- similar.
@alivecoding49957 ай бұрын
I was thinking of Plato‘s Allegory of the Cave all the time through this episode.
@jumpstar90009 ай бұрын
I'm only 8 minutes in, but it is making me nervous. The assertion regarding systems that are non-composable breaks down. Lists and Trees arent composable because of their representation paired with the choice of algorithm you are using. We already know that we can flatten trees to lists, or make list like trees.. or more abstractly introduce an additional dimension to both trees and lists that normalize their representation so we can apply a uniform algorithm. If you want to look at it from a different angle, we know that atoms form the basis of machines and therefore atoms have no problem dealing with both trees or lists. 2D images can also represent both data-types. The thing is, we don't walk around the world and change brains when we see an actual tree vs a train. Anyway, like I said I just got going. It is very interesting so far.. I'm sure all will be revealed. onward...
@julianbruns74594 ай бұрын
You can actually make any directed multigraph (where each edge has an identity node) into a category consisting of two objects: a set V of verticies and a set A of edges. Then you define two morphisms s,t from A to V where s assigns each edge its source node and t assings each edge its target node. (You also add identity morphisms for A and V). Composition is trivial here. There is a common theme in category theory that you don't "look inside" the objects themselves but rather study the morphisms between them.
@cryoshakespeare44658 ай бұрын
Well, I had a great time watching this video, and considering I can abstract my own experiences into the category of human experiences in general, I'd say most people who watched it would enjoy it too. Thankfully, I'm also aware that my abductive capacities exist in the category of error-prone generalisations, and hence I can conclude that it's unlikely all human experiences of this show can be inferred from my own. While my ability to reason about why I typed this comment is limited, I can, at present, formalise it within the framework of human joke-making behaviours, itself in the category of appreciative homage-paying gestures.
@srivatsasrinivas62779 ай бұрын
I think that specificity is as important as abstraction Domain specific languages and programs mutually justify each other's existence
@davidallen51469 ай бұрын
I think the future of these AI systems should be structured data in and out. This would support the concept of geometric deep learning as well as AI systems that can be more understandable and composable with each other and with traditionally programmed systems. This would also support the generation and use of domain specific interfaces/languages. We they also need is the ability to operate recurrently on these structures. This recurrence can occur internally to the AI systems, or as part of the composition of AI's.
@hi-literyellow44838 ай бұрын
The british engineer is spot on. Respect sir for your clear vision and clarification of the BS sold by Google marketeers.
@dr.mikeybee9 ай бұрын
We also explicitly create abstractions in transformers. The attention heads are creating new embeddings.
@MachineLearningStreetTalk9 ай бұрын
Some abstractions are better than others. So far, we humans are rather good at creating some which machines can't learn. There are things like in-context learning or "algorithmic prompting" (arxiv.org/pdf/2211.09066.pdf) which explicitly code in certain (limited) types of reasoning in an LLM, like for example, adding numbers together out of distribution. If we could get NNs to learn this type of thing from data, that would be an advancement.
@charllsquarra16779 ай бұрын
@@MachineLearningStreetTalk I'm sure you saw Andrew Karpathy video about tokenization. TLDR; tokenization is currently a mess that is swept under the rug, it is very hard for a LLM to properly do math when some multi-digit numbers are single tokens in their corpus
@MachineLearningStreetTalk9 ай бұрын
I agree, tokenisation could be improved, but I don’t think it’s that big of a thing wrt learning to reason
@dr.mikeybee9 ай бұрын
@MachineLearningStreetTalk, Yes, we'll keep working on optimizing what we can, including for prompt engineering and injection engineering, I suppose we can view the attention heads as a case of in-context learning as we calculate similarity weights and produce a newly formed calculated context. Of course the projection matrices are also acting as a kind of database retrieval. So here something is learned in the projection matrices that results in many changes to vector values in the context signature. The built (dare I say structured) new embeddings coming out of the attention heads are "decoded" in the MLP blocks for the tasks the MLPs were trained on. Nevertheless, higher level abstractions are being learned in all the differentiable MLP blocks. I don't think that can be denied. All in all, we need to discuss the semantic learning that happens for the embedding model via masked learning. This creates a geometric high-dimensional representation of a semantic space, positional encoding for syntactical agreement, attention heads for calculating similarity scores, projection matrices for information retrieval and filtering, MLPs for hierarchical learning of abstract features and categories, and residual connections for logic filtering. Of course there are many other possible component parts within this symbolic/connectionist hybrid system, since the FFNN is potentially functionally complete, but I think these are the main parts.
@lemurpotatoes79888 ай бұрын
More intelligent, structured masking strategies would be extremely helpful IMO. I like thinking about generative music and poetry models in this context. Masking a random one out of n notes or measures doesn't necessarily let you learn all the structure that's there.
@alexforget9 ай бұрын
For sure you are right. Humans don't need to drive for 1000 of hours in each city. But if Tesla has the compute and data, they can always add a new algo and also win.
@tomaszjezak09 ай бұрын
Regardless, the method is hitting a wall. The next problem will need a better approach, like they talk about
@MikePaixao9 ай бұрын
Too funny. I've been saying transformer models all put infinity in the wrong place 😂 You can get around finite limit, but not with transformers I would describe it like a data compression singularity :) 28:06 not that hard once you think about it for a bit, you end up with circular quadratic algebra 🙂 34:54 you can create a Turing machine and get around the Von Neumann bottlneck, then you end up somewhere near my non-transformer model 😊
@dr.mikeybee9 ай бұрын
FFNNs are functionally complete, but recursive loops are a problem. They can be solved in two ways however. A deep enough NN can unroll the loop. And multiple passes through the LLM with acquired context can do recursive operations. So I would argue that the statement that LLMs can't do recursion is false.
@MachineLearningStreetTalk9 ай бұрын
LLMs can simulate recursion to some fixed size, but not unbounded depth - because they are finite state automatas. I would recommend to play back the segment a few times to grok it. Keith added a pinned note to our discord, and we have discussed it there in detail. This is an advanced topic so will take a few passes to understand. Keith's pinned comment below: "Traditional recurrent neural networks (RNNs) have a fixed, finite number of memory cells. In theory (assuming bounded range and precision), this limits their formal language recognition power to regular languages [Finite State Automata (FSA)], and in practice, RNNs have been shown to be unable to learn many context-free languages ... Standard recurrent neural networks (RNNs), including simple RNNs (Elman, 1990), GRUs (Cho et al., 2014), and LSTMs (Hochreiter & Schmidhuber, 1997), rely on a fixed, finite number of neurons to remember information across timesteps. When implemented with finite precision, they are theoretically just very large finite automata, restricting the class of formal languages they recognize to regular languages." Next, according to Hava Siegelmann herself, who originally "proved" the Turing-completeness of "RNNs"), we have: "To construct a Turing-complete RNN, we have to incorporate some encoding for the unbounded number of symbols on the Turing tape. This encoding can be done by: (a) unbounded precision of some neurons, (b) an unbounded number of neurons , or (c) a separate growing memory module." Such augmented RNNs are not RNNs, they are augmented RNNs. For example, calling a memory augmented NN (MANN) an NN would be as silly as calling a Turing machine an FSA because it is just a tape augmented FSA. That is pure obscurantism and Siegelmann is guilty of this same silliness depending on the paragraph. Distinguishing the different automata is vital and has practical consequences. Imagine if when Chomsky introduced the Chomsky Hierarchy some heckler in the audience was like "A Turing Machine is just an FSA. A Push Down Automata is just an FSA. All machines are FSAs. We don't need no stinking hierarchy!" arxiv.org/pdf/2210.01343.pdf
@dr.mikeybee9 ай бұрын
@@MachineLearningStreetTalk LOL! You're funny. I love that "We don't need no stinkin'" line. I loved the movie too. Anyway thank you for the very thoughtful response. I was aware of the limitations of finite systems, but I love how you make this explicit -- also that a growable memory can give the appearance of Turing-completeness. That's a keeper. Language is tough. Because I talk to synthetic artifacts every day, I'm very aware of how difficult it is to condense high-dimensional ideas into a representation that conveys intended ideas and context. And of course decoding is just as difficult. Thanks for the additional context injection. Cheers!
@markwrede88789 ай бұрын
We need A Mathematical Model of Relational Dialectical Reasoning.
@acortis8 ай бұрын
mhmm, ... interesting how we can agree on the premises and yet drift apart on the conclusions. I desperately want to be wrong here, but I am afraid my human nature prevents me from trusting anyone who is trying to sell me something for which they do not have the simplest example of implementation. And here you are going to tell me, "It is their secret sauce, they are not going to tell you that!" ... maybe, and yet I feel like I spent almost two hours of my life listening to a pitch for "Category Theory" which only implementation is a GOAT, that does not mean the Greatest of All Theories. ... Again, would not be happier that being proved wrong with the most spectacular commercial product of all time! ... oh, almost forgot, great job from the part of the hosts, love the sharp questions!
@nippur_x35708 ай бұрын
About NN and Turing completenes: I don't understand how you need specifically read/write memory to have a Turing Complete computing. You just need a Turing Complete Language like Lambda Calculus. So, I don't see any obstruction for neural network, with the right framework and the right language (probably using category theory) to do it.
@drdca82638 ай бұрын
Well, you do need like, unbounded state? But I think they are saying more, “a FSM the controls a read/write head, is sufficient for Turning completeness”, not “that’s the only way to be Turing complete”? To put a NN in there, you do need to put reals/floats in there somewhere I think. Idk where you’d put them in for lambda calculus? Like... hm.
@nippur_x35708 ай бұрын
@@drdca8263 Sorry for the misunderstanding it's not my point. My point is that the read/write state for the Turing Completeness property is not probably the right point of view on this problem. Lambda calculus was just to illustrate my point. You "just" need complete symbolic manipulation on a Turing Complete language for the NN to be Turing complete
@drdca82638 ай бұрын
@@nippur_x3570 I think the same essential thing should still apply? Like, in any Turing complete model of computation, there should be an analogy to the FSM part. The NN component will be a finite thing. Possibly it can take input from an unbounded sized part of the state of the computation, but this can always be split into parts of bounded size along with something that does the same thing over a subset of the data, and there will be like, some alternation between feeding things into the neural net components and getting outputs, and using those outputs to determine what parts are next to be used as inputs, right?
@u2b839 ай бұрын
34:45 This diagram is really cool. The same simple finite state controller is iterating over different data structures. The complexity of the data structures enables the recognition or generation of different formal language classes. The surprise to me is that we can use [essentially] the same state machine to drive it.
@k.o.o.p.a.7 ай бұрын
The Discord sounds are VERY distracting
@shadazmi54024 ай бұрын
Quantum computers are not Turing Machines. It is yet to be decided. People have conjectured about the possibility of a universal quantum turing machine, particularly people like David Deutsch’s paper : Quantum Theory, The Church Turing Principle and the universal Quantum Computer. This is for the simple reason of the correspondence between formal systems and the halting problem. Since, a quantum system is not strictly speaking a precise “formal system” therefore it doesnt follow that the rules of formal system also gets carrier over to quantum systems. In particular the incompleteness theorems of Godel, ergo quantum computers are not just mere turing machines (as of yet). Maybe, a modification of the notion of turing machines, commonly known as the universal quantum turing machines can be it, but the classical idea of the turing machine is simply not just a quantum computer. For further inquiries, I would suggest reading up people like Penrose (Emperor’s New Mind and the shadows of the mind) and Scott Aaronson.
@ariaden8 ай бұрын
Yeah. Big props to the thumbnail. Maybe I will even watch the video, some time in my future,
@derekpmoore8 ай бұрын
Re: domain specific languages - there are two domains: the problem domain and the solution domain.
@posthocprior7 ай бұрын
I work in the field. Specifically, I make math models. The researcher who said that it's an inherent contradiction between a finite machine "learning to memorize infinity" misses the point of this approach. I know nothing about Tesla's approach but other self-driving companies are using a similar approach. That is, the point of "memorizing" isn't to scale this approach, rather it's a work around the computational limits of predicting events and objects in a car and, also, the theoretical limits of computational geometry. To scale, then, requires these "memorized" spaces as a basis function in order to build a predictive geometry. (Sorry if this is unclear. This is a complex topic and I'm not sure it can be clearly explained in one paragraoh.)
@JTan-fq6vy5 ай бұрын
What do you mean by "memorized" space?
@jonfe8 ай бұрын
The guy talking about external read/write memory for improving AI is right for me, I was thinking exactly the same and have been developing a model that have a kind of memory for a timeseries problem, getting a lot of improvement in predictions.
@radupopescu19857 ай бұрын
I am not convinced. Of course you can slap category theory over this thing, it has arrows and compositions, so, well, you can put things in this language. I am missing that crunching argument were category theory reveals why it works, the same way functional analysis reveals why Fourier transform works.
@christhames87317 ай бұрын
It won't ..to a materialistic mindset asking such questions...it's abstract nd u gotta change ur mindset to see it clearly..it's even linked to neuroscience and quantum cognition..
@pebre797 ай бұрын
All input to a brain can be separated into categories. That's why it works
@shahzodadavlatova72038 ай бұрын
Can you share the andrej karpathy talk?
@felicityc8 ай бұрын
You cannot use those sound effects Please find a new sound library I'm going to flip
@robmorgan12148 ай бұрын
This focus abstraction whether algebraic or geometric is not the correct approach to this problem. Physicists made the same mistake with geometry.
@bacon_boat26418 ай бұрын
Every researcher on planet earth want the bitter pill to be false. What's more exciting: 1) get more data + GPUs 2) engineering smart solutions
@u2b839 ай бұрын
40:03 This is why I suspect NNs operated iteratively produce better results (e.g. stable diffusion, step by step reasoning, etc...). However finite recursion appears to be good enough in practice. In SAT problems you can pose recursive problems by unrolling the recursion loop, enabling proving properties of programs up to a certain size.
@samferrer9 ай бұрын
The real power of category theory is in the way it treats relations and specially functional relations. Objects are not first class anymore but a mere consequence ... hence the power of "yoneda". Yet, I don't think there is a programming language that brings the awesomeness of category theory.
@wanfuse8 ай бұрын
what an education one gets watching you guys! Thanks! on the stopping condition, why not stop on a proximity distance from the stop condition instead of exact? Trying iteratively can tell you what the limit of proximity is?
@Dr.Z.Moravcik-inventor-of-AGI9 ай бұрын
Guys, it must be hard for you to talk about AGI that is here already since 2016.
@jondor6548 ай бұрын
Is the thread of intuition the pervasive glue that grounds the formalisms .
@AutomatedLiving099 ай бұрын
I feel that my IQ increases just by watching this video.
@debunkthis9 ай бұрын
It didn’t
@JGeo16 ай бұрын
Maybe its because Dr. Lessard is sporting the "professor" look today. Glasses, beard, jacket, chair, painting, candles (overall eccentric hippie professor vibe)...
@GlobalTheatreSkitsoanalysis9 ай бұрын
In addition to Number Theory..any opinions about Group Theory vs Category Theory? And Set Theory vs Category Theory?
@oncedidactic9 ай бұрын
Great stuff! I enjoyed Paul’s way of talking about math - first the precise definition and then why do we care, part by part. Good work dragging it out until the pump primed itself 😅
@preston32918 ай бұрын
chill with the sound effects
@mattwesney5 ай бұрын
4:40 this is pentacle existence...10k hours of Skyrim alone
@ICopiedJohnOswald9 ай бұрын
The part about if statements was very confused. The guy said that if you have an If expression where both branches return type T then you need to return a union of T and T and that that is not the same as T. This is wrong. If you look at the typing rules for boolean elimination (if expressions) you have: Gamma |- (t1 : Boolean) Gamma |- (t2 : T) Gamma |- (t3 : T) ------------------------------------------------------------------------------------------------ (if t1 then t2 else t3) : T In other words, an if statement is well typed if your predicate evaluates to a boolean and both branches return the same type T and this makes the if expression have type T.
@СергейМакеев-ж2н9 ай бұрын
Agreed, putting it in terms of unions from the outset is rather weird. But I can see how one would arrive at a rule where boolean elimination is always in terms of unions. Specifically, if one is approaching it from the side of a language like Typescript, in which unions are already everywhere. Typescript's types are weird.
@ICopiedJohnOswald9 ай бұрын
@@СергейМакеев-ж2нI can't speak to typescript other then to say yeah that is probably a bad place to get an intuition for type theory, but talking about Union types (NOT Disjoint Union types), isn't it the case that `Union T T = T`? Regardless, you don't need Union types to deal with if expressions. I think the interviewer generally had trouble thinking with types and also was conflating type theory and category theory at times.
@СергейМакеев-ж2н9 ай бұрын
@@ICopiedJohnOswald Even if we assume actual union types, not disjoint ones, claiming that Union T T *equals* T is a very strong claim. Not all type theories have such a strong notion of equality. Am I correct that you are looking at it specifically from the perspective of *univalent* type theory?
@MachineLearningStreetTalk9 ай бұрын
Sorry, this is not an area of expertise for us but we hope to make more content and explore it further
@ICopiedJohnOswald9 ай бұрын
@@СергейМакеев-ж2нSorry I was playing fast and loose in youtube comments, disregard that comment. And no I'm not taking the perspective of univalent type theory as I am woefully under read on HoTT.
@EnesDeumic8 ай бұрын
Interesting. But too many interruptions, let the guest talk more. We know you know, no need to prove it all the time.
@erikowsiak9 ай бұрын
I love your podcasts it seems you get all the right people to talk to :) just when I needed it :)
@lemurpotatoes79888 ай бұрын
I don't see why types are more or less of a problem than values of the same type that are very far or different from one another. Suppose that every piece of data that goes down Branch 1 ends up with its output in a chaotic ugly region of a function and every piece of data that goes down Branch 2 ends up in a nice simple region. You can have a function that handles both cases follow, yes, but that's the exact same scenario as writing a function that takes in either lists or trees as its input.
@lemurpotatoes79888 ай бұрын
I know neither category theory nor functional programming and I didn't grok Abstract Algebra I, I'm just coming at this from an applied math and stats perspective.
@CharlesVanNoland9 ай бұрын
Also: Tim! Fix the chapter title, it's "Data and Code are one *and* the same". :]
@MachineLearningStreetTalk9 ай бұрын
Done! Thank you sir
@hammerdureason89268 ай бұрын
on domain specific languages -- hell is ( understanding ) other people's code where "other people" includes yourself 6 months ago
@andreismirnov699 ай бұрын
would anyone recommend textbook level publications on category theory and homotopy type theory?
@pounchoutz9 ай бұрын
Elements of infinity category theory by Emily Riehl and Dominic Verity
@rabbitcreative8 ай бұрын
Instead of trying to teach machines, we should be teaching people.
@Daniel-Six8 ай бұрын
This was an incredibly good discussion. Tim and company are definitely on to something elusive to articulate but crucial to appreciate regarding the real limitations of current machine "intelligence," and I can at least vaguely fathom how this will be made clear in the coming years.
@R0L9138 ай бұрын
Not meet they are making mistakes and need fresh input. I am noting all the terms so I can learn. One of my kids is a linguist. Another is a recruiter and must recruit/ find people who can create programming languages that fit. It’s all one exciting thing. Remember Java, remember object oriented programming you’re important keep at it, you may create a breakthrough ❤
@rylaczero37408 ай бұрын
Imagine imperative programming for a sec, now you know monads.
@Walczyk8 ай бұрын
we need more cutaways with explainers
@ScreenProductions9 ай бұрын
Great discussion that gets into the weeds. Love the software engineer’s point of view. Only thing missing from Dr. Lessard is an ascot and a Glencairn of bourbon - because he wouldn’t dare sip Old Rip Van Winkle from a Snifter.😂
@kurtdobson4 ай бұрын
LLM’s cannot give a confidence level or an audit trail of factual information used to create an answer.
@oryxchannel8 ай бұрын
1:00 Whats wrong with MITs liquid neural networks? Have they gone IP on it?
@Walczyk8 ай бұрын
those are not scalable now
@oryxchannel8 ай бұрын
@@Walczyk with the right combination of an AI science LLM config + right group of AI research assistants it may be revisited if its hit a wall. Wasn't it Euclid that saw everything through water?
@CristianGarcia9 ай бұрын
Thanks! Vibes from the first 5 mins is that FSD Beta 12 seems to be working extremely well so the bet against this will have a hard time. Eager to watch the rest.
@MachineLearningStreetTalk9 ай бұрын
I've not looked into it recently. I'm sure it's an incredible feat of engineering and may well work in many well-known situations (much like ChatGPT does). Would you trust it with your life though?
@Acceleratedpayloads4 ай бұрын
"ugh, it late, do I want to watch this? It's a bunch of talking". Yes. Watch it. Do the thing.
@vahidhosseinzadeh46307 ай бұрын
The topic is great the guest is great but why are you interrupting guest this much. Let him talk. The setting is also very bad.
@explicitlynotboundby9 ай бұрын
Re: "Grothendieck's Theory-building for problem solving (kzbin.info/www/bejne/qJrIXmx3es2Mmrs) reminds me of Rob Pike Rule 5: "Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming."
@mbengiepeter9658 ай бұрын
In the context of Large Language Models (LLMs) like me, the analogy of training data as the program and the neural network as the compiler isn't quite accurate. Instead, you might think of the **training data** as the **knowledge base** or **source code**, which provides the raw information and examples that the model learns from. The **neural network**, on the other hand, functions more like the **processor** or **execution environment** that interprets this data to generate responses or perform tasks. The **training process** itself could be likened to **compilation**, where the model is trained on the data to create a set of weights and parameters that define how it will respond to inputs. This is a bit like compiling source code into an executable program. However, unlike traditional compilation, this process is not a one-time conversion but an iterative optimization that can continue improving over time.
@CYBERPOX9 ай бұрын
@Siroitin9 ай бұрын
35:45 yeah!
@domenicperito46354 ай бұрын
how would you ever search unbounded space. you would have to search forever. we gave the AI infinite memory and it took forever to train. whats going on guys????
@jumpstar90009 ай бұрын
On the recursion topic, there was a little bit of confusion in the discussion. On the one hand there was something about language models not understanding recursion, but more key, they have trouble using recursion while producing output. Clearly LMs can write recursive code and even emulate it to some degree. In any case, it is possible to train an LM with action tokens that manipulate a stack in a way resembling FORTH and get full recursion. It may be possible to add this capability as a bolt-on to an existing LM via fine-tuning. Having this would expand capabilities no end, providing not just algorithmic execution but also features like context consolidation and general improvements to memory, especially if you also give them a tuple store where they can save and load state... yes, exactly, you said it.
@jumpstar90009 ай бұрын
@@deadeaded Yes, I was pointing out that there was initially some confusion in the discussion with regard to this.
@theorist197 ай бұрын
A wonderful roadmap for the future: computational architectures that can be argued algebraically, to have certain properties ?! Though, shouldn't categorical conversation ( even if it is informal one ) be accompanied with a lot of visual syntax diagrams, since we abstracted away semantics/structure of the underlying Obj, diagrams is all we got. Isn't it the modus operandi of reasoning in Category Theory? Especially for pedantic purposes. Maybe time for ML Street Talk to add a virtual white board to their excellent interview forum! It is quite evident that computational experts like Tim and Keith were not "thinking" categorically, while Paul (the Categorist) was not quite in the Algorithmic arena , but he groks it beautifully categorically ! -- Maybe a Grothendieck of ML in the making ! :) Should we throw in Topos Theory into the mix , while we are still trying to sway the VC to fund fundamental R&D in AI My Question, What are "Weil Conjectures" equivalent , of this bold Langland's program for Deep Learning ?
@jonfe8 ай бұрын
maybe we should get back to analog to improve our AI.
@domenicperito46354 ай бұрын
i feel like im watching 3 ai agents talk to each other for our sake.