OUTLINE: 0:00 - Introduction 2:20 - Summary of the Paper 16:10 - Start of Interview 17:15 - Why this research direction? 20:45 - Overview of the method 30:10 - Embedding space of input tokens 33:00 - Data generation process 42:40 - Why are transformers useful here? 46:40 - Beyond number sequences, where is this useful? 48:45 - Success cases and failure cases 58:10 - Experimental Results 1:06:30 - How did you overcome difficulties? 1:09:25 - Interactive demo Paper: arxiv.org/abs/2201.04600 Interactive demo: recur-env.eba-rm3fchmn.us-east-2.elasticbeanstalk.com/
@brandom2552 жыл бұрын
Great video! "cos mul 3 x" is Polish notation though, not reverse Polish (i.e. "3 x mul cos").
@anttikarttunen11262 жыл бұрын
@@brandom255 I guess it's RPN if you read it backwards... ;-) (Noticed the same mistake in the video).
@ChaiTimeDataScience2 жыл бұрын
This series is absolutely amazing! Thanks Yannic, your videos just keep getting better!
@yabdelm2 жыл бұрын
The paper explanation + interview format is amazing - the paper explanation provides the interesting nitty gritty, and the interview sometimes sheds light in less jargon and more intuitively to the overall concepts discussed in the paper.
@volotat2 жыл бұрын
Extremely interesting line of work. I imagine one day something like that could fully automate process of building mathematical models for any scientific data. Feels like a huge step toward automated scientific discovery process.
@Adhil_parammel2 жыл бұрын
kzbin.info/www/bejne/ppytnHt4lMhmpKM They are having good aproach
@majorminus662 жыл бұрын
Those visualizations in the appendix really look like you're staring into some transcendent, unfathomable principles at the very base of reality. Neat.
@me2-212 жыл бұрын
Cool approach to crack pseudo random generators
@Adhil_parammel2 жыл бұрын
Having this like website for every data science algorithms with dummy data and inference would be awsome for learners
@ericadar2 жыл бұрын
I wasn't familiar with the even/odd conditional function representation discussed 40:45 and had to work it out on paper to make sure I get it. "collatz" generator latex: (n \mod 2) \times (3n+1)+[1-(n \mod 2)]\times(n \div 2)
@grillmstr2 жыл бұрын
more like this plz. this is v informative
@benjamindonnot39022 жыл бұрын
Really inspiring topics. Interview really well done. Thanks
@billykotsos46422 жыл бұрын
This is you best series, after the always punctual ML news
@JTMoustache2 жыл бұрын
Neurosymbolic for the win, the return !
@WahranRai2 жыл бұрын
Ca me fait rappeler la programmation du "compte est bon" du jeu Chiffres et lettres (tv francaise)
@brll57332 жыл бұрын
Anyone else feel like they haven't seen a paper about decision making, planning, neural reasoning etc. in a long time? Nothing about agents acting in a complex environment?
@jabowery2 жыл бұрын
You need a simple model of a complex environment to make decisions otherwise Computing the consequences of your decisions becomes intractable. That is true even if you adopt an alpha zero approach.
@brll57332 жыл бұрын
@@jabowery I mean, we have pretty good toy simulations in the form of video games. Another challenge like Deepmind's Starcraft challenge would really help focus the field, imo.
@rothn22 жыл бұрын
Cool paper, interesting observations! I'd be curious to see the UMAP of these (learned) embeddings too. Sometimes these can capture global structure better, whereas t-SNE has a bias towards capturing local structure. The UMAP authors also made a huge effort to justify their approach with algorithmic geometry theory.
@fredericln2 жыл бұрын
Opening new frontiers for DL, congratulations! A maybe silly question (as I only watched until 31' so far): is the "6 is a magic number" finding robust to changes in the hyper parameter (fixed at 6 in the example table) "recurrence is limited to n-1… n-6" ?
@jrkirby932 жыл бұрын
So I tried out the demo and came across an interesting "paradox". When I click the "OEIS sequence" button on the demo, it almost always loads up a sequence and perfectly predicts a solution with 0% error. But when I go over to OEIS and type in a couple numbers, grab a sequence, and slap that into the user input, the results are... usually not great. Very rarely does my OEIS sampling strategy yield a sequence this model can solve. Usually the error is huge. Which is going on? Am I somehow only grabbing "very hard" sequences from OEIS? Or are the OEIS sequences that the demo samples coming from a smaller subsection of OEIS that this model can solve reliably?
@anttikarttunen11262 жыл бұрын
Many (I would say: Most) OEIS sequences are not expressible as such simple recurrences (with such a limited set of operations). For example, almost any elementary number theoretical sequences, sigma, phi, etc.
@rylaczero37402 жыл бұрын
OEIS do have some ridiculous sequences, if I were an alien, I would it was specifically generated by humans to train their ML models
@anttikarttunen11262 жыл бұрын
@@rylaczero3740 Well, OEIS has a _few_ ridiculous sequences, but admittedly there are many sequences of questionable relevance, when somebody has dug themselves in a rabbit hole little bit too deep for others to follow. As what comes to the training of AI, I admit that sometimes I like to submit sequences that are "a little bit on fringe", to serve as a challenge to any programmed agents.
@anttikarttunen11262 жыл бұрын
@@rylaczero3740 And in any case, the scope of mathematics is infinitely larger than what is "useful" for any application we might think of.
@yoshiyuki1732ify2 жыл бұрын
Would be interested in seeing how it will deal with chaotic systems with some noises.
@Emi-jh7gf2 жыл бұрын
Couldn't Wolfram Alpha do this for some time? How is this better?
@harunmwangi8135 Жыл бұрын
Amazing stuff
@yourpersonaldatadealer22392 жыл бұрын
Love your content, Yannic. Would be extremely interested in hearing your thoughts on and seeing any possibly interesting papers regarding AI in cybersec. I’m imagining the types of networks’ that were exceptional at playing Atari games would also be relatively easy to shift domains into a variety of cyber attacks and given how useless the corporate world appears to be at securing data, I’m guessing this will be unbearable for most to endure (at least early on).
@tclf902 жыл бұрын
amazing, the next step would be able to produce a sequence of prime numbers :P
@Adhil_parammel2 жыл бұрын
Reimann🤔🤔
@rylaczero37402 жыл бұрын
At this pace, machines will definitely beat humans at finding secrets of the prime.
@MIbra962 жыл бұрын
Hey Yannic, thank you for your videos. They have been very helpful for me. I just wanted to ask what tablet and app do you use for reading and annotating papers?
@YannicKilcher2 жыл бұрын
OneNote
@MIbra962 жыл бұрын
@@YannicKilcher Thank you!
@rylaczero37402 жыл бұрын
Next step should be predicting latent space, and when you sample an equation from it, it should give results as expected but also not diverge from other equations sampled from same space.
@axe8639 ай бұрын
Why not use Deep symbolic regression as a mapping mechanism between different neural architectures.
@Eltro1012 жыл бұрын
Why not represent the numbers as matricies or complex values? Then you could reproduce the group structure of addition or multiplication, in addition to being able to make each number high dim
@anttikarttunen11262 жыл бұрын
How hard would it be for the system like this to find the Euclid's gcd-algorithm by itself? With that it could then detect which sequences are divisibility sequences, multiplicative or additive sequences. That is, most of the essential number theoretic sequences, A000005, A000010, A000203, A001221, A001222, A001511, A007913, A007947, A008683, A008966, A048250 (and a couple of thousands others) are either multiplicative or additive. I'm not saying that it would yet find the formula for most such sequences, but at least make an informed hypothesis about them.
@anttikarttunen11262 жыл бұрын
While in the "base"-world, how hard it would be to detect which sequences are k-automatic or k-regular?
@anttikarttunen11262 жыл бұрын
I mean, could this be done in "AlphaZero way", so that it would find such concepts by itself, without need of hardcoding them?
@drdca82632 жыл бұрын
Hm, I guess if they wanted to add conditionals to the language in order to make it more able to recognize things using it, e.g. collatz / hailstone / collapse / ((3n+1)/2) sequences , they would have to make the tree support ternary operations. Not sure how much more that would require
@drdca82632 жыл бұрын
The thing about the embeddings for the tokens for integers, makes me wonder if it would be beneficial to (before normalizing them I guess) hard code in a few of the dimensions some basic properties of the integer, such as number of distinct prime factors, number of divisors, whether it is divisible by 2, whether divisible by 3, whether divisible by 5, whether it is one more than something divisible by 3, whether it is 2 more than something divisible by 3, and similarly for one more and one less than divisible by 5, and maybe a handful more, and then let the other dimensions of the embedded vector be initially random and learned . Would this increase performance, or reduce it?
@anttikarttunen11262 жыл бұрын
@@drdca8263 Well, having an access to the prime factorization of n would be great (e.g., for 12 = 2*2*3, 18 = 2*3*3, 19 = 19), or figuring that out by itself (of course then you can also detect which numbers are primes). Also, I wonder how utopistic it would be to go "full AlphaZero" with OEIS data (if that even makes any sense?). And whether it would then able to learn to detect some common patterns in sequences, like for example the divisibility sequences (of which keyword:mult seqs form a big subset), and the additive sequences. Sequences that are monotonic or not, injective or not, that satisfy the "restricted growth" constraint, etc, etc. Detecting sequences that are shifted convolutions of themselves (e.g. Catalans, A000108), or eigensequences of other simple transforms.
@meisherenow2 жыл бұрын
Work on discovering scientific laws from data has a very long history in AI--Pat Langley and Herb Simon's BACON system was built 40 years ago, with about as much computing power as a modern toaster. Damn I'm old.
@victorkasatkin97842 жыл бұрын
The linked demo fails to continue the sequence 0,0,0,0,0,0: "Predicted expression: Unknown error"
@anttikarttunen11262 жыл бұрын
What is the unary operator "relu" ?
@MrGreenWhiteRedTulip2 жыл бұрын
relu(x) = { x if x>0 , 0 otherwise
@anttikarttunen11262 жыл бұрын
@@MrGreenWhiteRedTulip Thanks, that was new to me, as I'm an outsider in this field. Just found a Wikipedia article about ReLU (Rectified Linear Unit) activation function.
@MrGreenWhiteRedTulip2 жыл бұрын
No prob. It’s just used as a function here though, not an activation function!
@rylaczero37402 жыл бұрын
@@anttikarttunen1126 Makes sense. For people in field, all operators except ReLU are outside their common operators.
@anttikarttunen11262 жыл бұрын
@@rylaczero3740 I see: that's why people in the field are so prejudiced about most of the mathematical sequences, thinking that they are absolutely ridiculous if not expressible with relu(s). 🤔
@TheDavddd2 жыл бұрын
Try this with digits of PI or some time series that obeys some visible but mysterious pattern :D
@Kram10322 жыл бұрын
I don't think base 12 enthusiasts have a club name lol But imo, base 720720 is *obviously* the *clear* way to go as any division involving stuff up tu 16ths are going to be super easy with it 🤓
@veliulvinen2 жыл бұрын
14:08 hmmm where have I heard about base 6 being the best fit before... kzbin.info/www/bejne/p3qnY3VqgrBqj5I
@anttikarttunen11262 жыл бұрын
kzbin.info/www/bejne/gpq4mmN4osSHfrs
@anttikarttunen11262 жыл бұрын
Sorry, but I must object what you both at point 1:04:20 seem to suggest, that only the keyword:easy sequences in OEIS (as of now 80236 sequences) are the ones with some logic behind them, and that the rest of the sequences (268849 as of now) were all some kind of "bus stop sequences" with no logic of whatsoever behind them. Certainly the absolute majority (98% at least) of the sequences in OEIS are well-defined in mathematical sense, even though they do not always conform to a simple recurrence model that your project is based on. For example, primes, and most of sequences arising in the elementary number theory. Moreover, although your paper is certainly interesting from the machine learning perspective, its performance doesn't seem to me any better than many of the programs and algorithms listed at the Superseeker page of the OEIS, some of which are of considerable age. (See the separate text file whose link is given in the section Source Code For Email Servers and Superseeker at the bottom of that OEIS page).
@anttikarttunen11262 жыл бұрын
Also, Christian Krause's LODA project and Jon Maiga's Sequence Machine are very good in mining new programs and formulas for the OEIS sequences, mainly because they can also search for the relations _between_ the sequences, instead of limiting themselves to just standalone expressions with a few arithmetical operations.
@julius3333332 жыл бұрын
Your French pronounciation is great for a non native
@viktortodosijevic32702 жыл бұрын
I liked it until he said it was trained for weeks on 16 GPUs 😢
@WhiterockFTP2 жыл бұрын
why do you wear sunglasses when watching a screen? :(
@MrGreenWhiteRedTulip2 жыл бұрын
Jan misali was right all along…
@anttikarttunen11262 жыл бұрын
Yes: kzbin.info/www/bejne/gpq4mmN4osSHfrs
@JP-re3bc2 жыл бұрын
There is little "symbolic" in this thing other than the name.
@ThetaPhiPsi2 жыл бұрын
or it is in the data? Tokens are basically representing semantics and the embeddings represent relations in the numbers.