Amazing video as always! I think now might be a good time to quote our good friend Chollet from his limitations of DL article; "This is because a deep learning model is 'just' a chain of simple, continuous geometric transformations mapping one vector space into another. All it can do is map one data manifold X into another manifold Y, assuming the existence of a learnable continuous transform from X to Y, and the availability of a dense sampling of X:Y to use as training data. So even though a deep learning model can be interpreted as a kind of program, inversely most programs cannot be expressed as deep learning models-for most tasks, either there exists no corresponding practically-sized deep neural network that solves the task, or even if there exists one, it may not be learnable, i.e. the corresponding geometric transform may be far too complex, or there may not be appropriate data available to learn it." The key thing you mentioned here is that DL models can just learn shortcuts. There are so many "surface patterns" i.e. "did they use the cosine in this context?" You said they should try to look for the presence of the intermediate step in the learned model. Well that was almost a rhetorical statement, of course it's not there. What we actually need is a way to remove the shortcuts. But then we will realise that we can't use DL because there are too many shortcuts. Even with program synthesis there are too many shortcuts and conversely too many programs which are too verbose which may or may not do the right thing on the testing sample. It reminds of of Chollet's ARC challenge in the sense that we really want models to find the rules, not the patterns/shortcuts. But as the authors point out, the shortcuts are super impressive.
@lightningblade93474 жыл бұрын
"... Then you are in the same category as most people." This statement just made feel 10 times smarter than I actually am. Thank you for the reassuring words Yannic!
@sanderbos42434 жыл бұрын
That intro was absolutely hilarious, I don't think you could've introduced the complexity of the problem this paper tries to tackle with AI in a better way!
@herp_derpingson4 жыл бұрын
0:00 Most of the times I fail to understand the solution proposed by the paper. This time, I fail to understand the problem itself. Now, if you excuse me, I will go eat some ice cream and cry on the floor. . Jokes apart. Where is the architecture? I have no idea what they are back propagating. Just sequence to sequence? Thats it? Where is the novelty?
@YannicKilcher4 жыл бұрын
Yes, it's a standard seq2seq language models and the math is fed as tokens. crazy.
@Chr0nalis4 жыл бұрын
The novelty is in finding a suitable architecture for learning these types of tasks and using learnability as a criterion for providing insight into the 'intrinsic difficulty' of various computational tasks that we are used to solving some specific way.
@JoaoVitor-mf8iq4 жыл бұрын
It would be nice to use some metrics that calculate the variability and mean of the most similar equations for all equations (knn) on the train and test data. They used the 10% margin but it would be nice to see a graphic with 10-1% margin. And also as you said in the end, they didn't do any experiment to prove if the model really learned math, most of the papers that use transformers and language models don't really test and know if the models really learned (how it works internally), I think that's a big problem we are having now.
@kazz8114 жыл бұрын
Excellent review and critique. I have always thought of integration as a language in itself. There are a finite set of techniques which can be learnt in similar fasion to process of learning a language. But the idea of representing numbers as tokens and learning them is kinda wild. It will be interesting to examing the latent space through simple clustering to see how the model is partitioning the problem space
@kaixuanwang76044 жыл бұрын
I agree with the last sentence that the model may just learn short cuts. However, such a model is questionable to be of any practical usage because control theory is always about guarantees. 90% is not enough although it seems a lot.
@kaixuanwang76044 жыл бұрын
I would be glad is if one can demonstrate that such a model can be trained in an unsupervised way instead of a label fitting way. Or, in a real control problem, a learned model can perform better than mathematical control policies (which we have seen in a few research papers).
@vitocorleone19912 жыл бұрын
The introduction was cool :))))))))))))))))))
@Zantorc4 жыл бұрын
I think it has managed to map these types of equation into some high dimensional space and is able to interpolate within that. I wouldn't be surprised if there was such an approximate mapping.
@julientane95564 жыл бұрын
not sure whether the poster is the speaker of the vidéo... but Guillaume is pronounced with a hard G like in great
@yabdelm4 жыл бұрын
haha funniest intro yet
@machinelearningdojo4 жыл бұрын
I find it fascinating that the model is just fed tokens for the numbers and the floating point representation. Surely to be able to do this it would have to know somehow internally the meaning of the number system. Maybe gpt3 can do maths after all 😂😂
@Soulixs4 жыл бұрын
thank you
@yusunliu48584 жыл бұрын
I may not get the point of why this paper is interesting. Isn't the neural network the universal approximator of arbitrary function? The math problems experimented in the paper are not just "arbitrary function"?
@YannicKilcher4 жыл бұрын
Yes, but they are very complicated and involve multiple steps of symbolic and numeric manipulation. And the model is a language model.
@yusunliu48584 жыл бұрын
@@YannicKilcher I cann't judge the complexity of problems. Assuming problems can be approximated by e.g. Tailor Serie, then it is equivalent to learn the coefficients of polynomial equations. But again I have no clue whether it is difficult to solve the polynomial equations.
@adamantidus4 жыл бұрын
Many equations are equivalent under simple manipulations, e.g., (cos(x)+4)^0 = 1, or 12x/4 = 3x, or x+3=3+x, etc. My guess is that the training set is full of such equivalences and that the transformer learns to recognize them as near duplicates. I think in the end there is some sort of k-nn as Yannic suggests. Believing the model is figuring out the complicated underlying procedure just by looking at examples is IMHO too pretentious.
@meditationMakesMeCranky4 жыл бұрын
Also, doing the interesting experiments its hard :p
@siyn0074 жыл бұрын
I don't know why anyone would want a language model to solve these problems given how they are not as reliable as symbolic computation or numerical computation.
@mgostIH4 жыл бұрын
They achieved better result than symbolic computation engines in their integration paper, who knows how much performance would've increased with a different input representation or depth
@teetanrobotics53634 жыл бұрын
Could please make a video on the paper "harvard-edge /airlearning-reinforcement-learning" Link to GitHub page : github.com/harvard-edge/airlearning-rl
@bluel1ng4 жыл бұрын
*g* Lets throw 'seemingly' complex problems at a black-box and celebrate AGI. .. 'I want to believe' click-bait science? Hey look this is super complex for humans while my one-layer transformer gets it within 10% margin 90% of the time - obviously the model knows all the math because it is too small to be a lookup-table. yip yip yip