nPlan

49:48

Peter presents: Binarized Neural Networks

14 күн бұрын

49:38

Inneke presents: Open-Endedness is Essential for Artificial Superhuman Intelligence

21 күн бұрын

51:36

Gerard presents: The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

Ай бұрын

44:08

Peter presents: Practical Markov Chain Monte Carlo Part 2

Ай бұрын

1:05:53

Arvid presents: Multi-Agent Learning using a Variable Learning Rate

Ай бұрын

47:31

Peter presents: Practical Markov Chain Monte Carlo Part 1

Ай бұрын

1:05:53

Peter presents: Einstein summation notation and the einsum function

2 ай бұрын

49:04

Gerard presents: Were RNNs All We Needed?

2 ай бұрын

42:13

Damian presents: ReFT: Representation Finetuning for Language Models

2 ай бұрын

47:57

Ben presents: Self-Consistency Improves Chain of Thought Reasoning in Language Models

2 ай бұрын

59:19

Sophie presents: RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs

2 ай бұрын

51:09

Inneke presents: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

3 ай бұрын

46:01

Peter presents: Long-Form Factuality in Large Language Models

3 ай бұрын

59:00

Vahan presents: Small Molecule Optimization with Large Language Models

3 ай бұрын

48:33

Tanya presents Constitutional AI: Harmlessness from AI Feedback

3 ай бұрын

55:08

Tanya presents: Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

3 ай бұрын

48:04

Gerard presents: ColPali: Efficient Document Retrieval with Vision Language Models

3 ай бұрын

1:10:54

Peter presents: Approximating Nash Equilibria in Normal-Form Games via Stochastic Optimization

4 ай бұрын

56:16

Sophie presents From Local to Global: A Graph RAG Approach to Query-Focused Summarization

4 ай бұрын

1:07:43

Peter presents Position: Levels of AGI for Operationalizing Progress on the Path to AGI

4 ай бұрын

57:46

Tanya presents Detecting Hallucinations in Large Language Models using Semantic Entropy

4 ай бұрын

51:54

Peter presents Robust Agents Learn Causal World Models

4 ай бұрын

31:35

Paper Club with Dwane - Abstract Systems and Control Theory Background

6 ай бұрын

52:18

Paper Club with Peter - Meta Continual Learning Revisited: Implicitly Enhancing Online Hessian...

6 ай бұрын

48:54

Paper Club with Vahan - Proving Test Set Contamination In Black Box Language Models

6 ай бұрын

41:00

Paper Club with Gerard - Is Cosine-Similarity of Embeddings Really About Similarity?

6 ай бұрын

57:23

Paper Club with Damian - KAN: Kolmogorov-Arnold Networks

6 ай бұрын

59:23

Paper Club with Peter - The Curse of Recursion: Training on Generated Data Makes Models Forget

7 ай бұрын

1:03:12

ML Paper Club with Arvind - TacticAI - An AI Assistant for football tactics

7 ай бұрын

Пікірлер

@ciaranhaines9094 6 күн бұрын

First, thank you for making this series, and for publishing it for later watching. I've watched a few of them and they've been interesting, have switched me on to solve areas I was unaware of, and occasionally of direct relevance to my work. Commenting to ask a question that I didn't understand about the MCMC error. It's the sum of the gamma auto-covariances for all step lags, fine. When calculating, does this mean a) we must add all lag covadianves even if we use a single lag for sampling? b) that the error changes depending on our sample size ie that we have to import a lag for each record in our sample? For a) I can understand that we're premise on the situation that things are "well mixed", so even if we consistently sample 1 time lag, the real distribution has overlapping causal effects as it dwell in its stationary space. I find b) more confusing though. I can't puzzle it out

@maloukemallouke9735 29 күн бұрын

what if your variable are time and you want predict when even happen at specific time

@THEPAGMAN Ай бұрын

2 years later, really helpful, thanks

@ash3844 2 ай бұрын

How come this is so underated explaintaion??!! by far the best explanation I have learnt so far!!

@shaohuailiu7683 2 ай бұрын

It's u in latex

@nPlan 3 ай бұрын

We apologize for the poor audio. We will make sure in the future to provide a quieter space for the speaker.

@BadWithNames123 3 ай бұрын

There is AI to fix your audio.. this is not possible to listen to

@amandajrmoore3216 4 ай бұрын

Audio background noise not filtered out makes this difficult to listen to bit surprising production quality - sardonically suggest use ai to clean up the audio

@moyprofile 4 ай бұрын

I would detect my hallucination in her large model, if you know what I mean.

@מוגוגוגו 4 ай бұрын

Isn't a better solution a specilzed agent that scans the answer of the larger LLM for bs , instead of training the model to detect its own bs ?

@Skeleman 4 ай бұрын

i agree. the llm is like the brocas area. it has generative grammar and semantic categories but there should be a separate model that checks relevant corpuses for agreement. the only issue would be the large energy and time costs at runtime. hence why they try to do both in the llm, i think.

@Luixxxd1 4 ай бұрын

Then wouldn't that make the whole tool redundant?

@מוגוגוגו 4 ай бұрын

@@Luixxxd1 Pretty much ...Its more effecient to run specialzed agent to check math when needed than blob everything together at runtime.

@AnshumanAwasthi-kd7qx 4 ай бұрын

Are you pursuing LLM security as research aim?

@nPlan 4 ай бұрын

We are performing research on using LLMs and a part of that research is safety related i.e. making sure that our models are truthful, useful, and robust to adversarial attacks. However, we are not focused on security specifically.

@AnshumanAwasthi-kd7qx 4 ай бұрын

@@nPlan good, my thesis is on LLM security and I was looking for any masters/ PhD researcher for a possible collaboration of sorts.

@vahan.hovhannisyan 4 ай бұрын

Market makers use a lot of Nash equilibria for asset pricing. Also poker strategies are based on Nash equilibrium

@yasminahachhouch8716 5 ай бұрын

Thank you for the video. I have a question: Is it necessary to transform the data into Q&A and distractor format as described in the paper?

@artukikemty 5 ай бұрын

Amazing, thanks for sharing!

@Jacquesds 6 ай бұрын

Amazing video! Thank you for it, helped me a lot to uderstand this paper :)

@w3w3w3 6 ай бұрын

interesting 🤓

@Fr0z3nMus1k 6 ай бұрын

it would be really helpful if you could post on the description your pdf files of the papers so we can read your pdf notes and possibly achieve a better understanding of the papers. Thank you. If you cant post them can u send them to me via mail or something?

@axe863 6 ай бұрын

Amazing paper

@amirhosseinalimohammadi4018 7 ай бұрын

It would be much better if you knew the topic better and knew more detail about the paper.

@nPlan 7 ай бұрын

that is true, do you have any resources you would suggest to learn more about the topic? We choose papers for paper club that we find interesting but are not necessarily our research background, so we cannot always give an in-depth presentation of the topic.

@alivecoding4995 8 ай бұрын

With respect to energy-based models, where we need Langevin Dynamics to sample data from the model (p_theta(x)), which role do the 'empirical' and prior distribution play then? Do we use training data as samples from the prior? And samples from our current model to model our empirical distribution?

@benboys_ 6 ай бұрын

Empirical distribution are the training data, it will be a mixture of point masses (look up 'Dirac delta') at the locations of the samples in the sample space. Then match forward and reverse markov chains that go from a p_theta(x, t=0) to a normal distribution at t=T which gives you a nice denoising score-matching objective that can be used to train energy based models (train p_theta(x, t)) or score based models (train grad_{x}(p_theta)(x, t)). This training is done by noising samples from the empirical distribution and predicting the amount of noise added. Inductive bias or regularisation gives inaccurate score after training resulting in that you don't recover empirical distribution after training but something more desirable to practitioners that can generalise and achieve good results on metrics they are interested in, such as FID score.

@alivecoding4995 3 ай бұрын

@@benboys_thank you very much!

@alivecoding4995 3 ай бұрын

I was wondering because I saw an explanation that said we need Langevin Dynamics for sampling from the model, such that those samples can then be used in a MCMC estimator for the true likelihood of the model.

@codeXXai 8 ай бұрын

Thank you for sharing the discussions. Very good discussions.

@BrunoAlmeidaSilveira 8 ай бұрын

I came here by chance, hooked by the discussion about climsim, and discovered a very nice content. Thanks for sharing

@131xcom 8 ай бұрын

I think the speaker did not understand the paper

@lion87563 9 ай бұрын

How feature grid is updated in backpropagation?

@aqsanaveed1942 10 ай бұрын

Solve the paper 'Solving Nonlinear conservation laws of Partial differential equations using Graph neural network " by Steinar Evje and Chunming Rong

@NguyenThiTam0910 10 ай бұрын

Thank youuu so much, it helps me alotttttt

@MisterDivineAdVenture Жыл бұрын

In this case I would excuse you for using a text to voice dub. Scheech.

@alivecoding4995 Жыл бұрын

Are you on Twitter, Ben?

@R24-q6b Жыл бұрын

😎

@axe863 Жыл бұрын

Is the Variable Selection technique is robust to high concurvity; non-cointegrating relationships; varying degrees of persistence amongst predictors ; pattern destroying non-stationarity (EMH) etc etc

@AK-wn1rm Жыл бұрын

I would think generally not. I mean that is your most general data science problem, how do you predict stuff that is wildly different from the past data that you have seen. Now, that doesn't mean that it isn't somewhat tracking, since the latest data does go in and most forecasts will be somewhat guided by the latest "is" state. The fact that you typically predict over many entities, you would also hope that some generalisation would naturally occur as some similarities have been already observed in other entities. Now these models are not truly causal, at least not out of the box. And they can't, because causality cannot be inferred from data (caveats caveats caveats). So it does fall back to the modeller to provide sensible covariates. As long as causal pathways don't break down (and sometimes that can happen, at least temporarily) the model generalises. If one only throws features at the model and hopes it finds all it needs by itself, one might be in for a bad surprise.

@TheAmazonExplorer731 Жыл бұрын

plese upload the link of the code of the paper

@nPlan Жыл бұрын

github.com/kyegomez/RT-2

@Zenchiyu Жыл бұрын

Concavity instead of convexity ? Since we try to push samples towards regions of high density (noisy gradient ascent)

@benboys_ 7 ай бұрын

Yes, you're right, same thing up to a sign change and people usually refer to convex optimization or log concave sampling (of a probability density)

@xtraeone5947 Жыл бұрын

Hi can we have the paper review of Direct voxel grid nerf .

@piotr780 Жыл бұрын

this is called in-context learning - it is nothing new in area of transformers, ps. "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes" - someone tried to learn transformer making linear regression

@CausalDisco Жыл бұрын

Regarding your points on synthetic data generation (starting around 32:27): The equation shown does not lose indirect relationships, which can be checked by solving for X and generating some data in this way. The resulting data is in fact causal data (33:56), not purely correlational data. Nonetheless, the scepticism towards the data generation is well-founded: It turns out that standardizing (or changing the weights or noise) changes the results tremendously, and usually for the worse. This is especially problematic in real-world problems with no canonical data scale. The reason is that the convex optimization procedure isn't guaranteed to converge on this non-convex problem - it's just that many ANM simulations give rise to mean-squared-error loss landscapes that are close to convex due to exploding variances (on the raw data scale) along the causal order. So the results should be treated very carefully, but the reason isn't quite so obvious and can affect other algorithms (for example such with the same or a similar score function) as well.

@nPlan Жыл бұрын

Arvid was kind enough to share his slides with us. Here is a link to the slides storage.googleapis.com/dockertest-191011/jc_temporal_fusion.html#1

@tashfeenahmed3526 Жыл бұрын

Thank you so much you explained greatly, what should be the loopholes in this paper which we can improve in this paper, like may be computation time etc something like that. I hope you suggestion will be quite valuable for us.

@SarahGhiyasi Жыл бұрын

Thank you for this incredible video. Really helpful. I just wanted to know how a reward function is set? like in section 4.1 (33:40 of this video), they choose a reward function, but based on what?

@nPlan Жыл бұрын

Based on my experience with reinforcement learning problems, the reward function is often a heuristic or a tuned hyperparameter. I think the authors chose the reward function in section 4.1 because by changing its weighting values they are able to change the difficulty of the learning problem. So while the reward function does not have a theoretical justification, it makes sense in terms of experimental design to illustrate an idea in the paper which is that GFlowNets are able to solve harder RL problems. I think there are two properties which mainly characterize reward functions. All reward functions output a real number which is used as the training signal for RL. One property of reward functions are dense versus sparse. A dense reward function is one provides different rewards for each state, action pair. A sparse reward function is usually one that provides a reward of 0 except when a goal or keypoints towards a goal are reached and then some positive reward is received by the agent (usually 1). A dense reward function makes it easier for an agent to learn to accomplish a task, but usually make it harder for the agent to overcome bottlenecks in the task. A sparse reward function provides the agent with the most accurate training signal for learning to accomplish a task, but may provide very few rewards and consequently very little training signal to learn from. The other property of reward functions is stochastic versus deterministic where a stochastic reward function determines its reward for a state action pair by sampling from some distribution it outputs as an intermediary step in the function. The researcher / engineer using reinforcement learning has to choose / design the reward function, if they choose to use a dense reward function like that in section 4.1. I hope this information was helpful!

@icechanz8279 Жыл бұрын

Hi, thank you for your helpful explanation. The paper club sounds interesting to me. How can I find you guys?

@vahan.hovhannisyan Жыл бұрын

Just search for nPlan paper club. We do weekly discussions open to the public and in-person in London once a month!

@ChristopherKlugmann Жыл бұрын

I would like to emphasize again that the diffusion models are not about generating isotropic Gaussian over the spatial dimensions of the image. It is about transforming the entire (!) distribution into another tractable distribution, in this case, the isotropic Gaussian. If you make the steps small enough and do enough steps, the forward process under Gaussian transition probabilities converges exactly to such an isotropic Gaussian distribution. The question that came up in the video about why this is important is quite simple to answer: we can only sample if we know the distribution after T time steps. The reversibility of the process is a remarkable property that arises under the assumptions formulated here: (i) small diffusion steps (ii) enough iterations and (iii) a transition probability in the forward process that has a conjugate functional form in the reverse process. The last condition is satisfied in the case of Gaussian or Bernoulli transitions. Only because of this property, it makes sense to parameterize the dynamics of the backward process as Gaussian/Bernoulli distributions - this is a result of the theory and not a heuristic.

@benboys_ Жыл бұрын

Thanks for your explanation about what assumptions are required to make a process reversible! I think it is also useful to understand the functional form of the transition density through the lens of Bayes rule on the reverse transition density and using the Taylor expansion (e.g., see the paper "Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling" eq. (3) and eq. (4)), since we always compute in discrete time. This also uses the small step size limit and conjugacy property of the forward Gaussian and the Gaussian term resulting from the truncated Taylor expansion, but has the advantage of being interpretable algorithmically, as well as theoretically. A couple other points of discussion I picked up from this video: Arvid's question at 21:35, which I paraphrase 'why the reverse process seems to denoise if it is a stochastic transition', Nayef and Arvid correctly come to the answer that it is because the mean function of the reverse transition density counteracts or 'wins out' over the diffusion, by forcing the particles to areas of high probability. 27:50, 29:40 The reverse step leads to a distribution that is not isotropic Gaussian because the mean function is parameterized by a neural network, which is nonlinear in the state, leading to a non-Gaussian distribution from the first step onwards. 28:20, 38:25 In practice, the forward chain does not quite reach an isotropic Gaussian. An approximation is made by sampling from the isotropic Gaussian in the first step of the reverse chain. I think that the initial sample of the reverse chain is not independent from the final denoised sample as Peter hints to at 28:49. I think this is complicated to show and depends on the number of steps, and the parameterization of forward chain. Usefully, running parallel chains independently will lead to independent samples from the reverse chain.

@arijaa.9315 Жыл бұрын

Thanks for this explanation I just did not understand one thing. In the paper, they mentioned that ( After pre-training, we throw out the generator and only fine-tune the discriminator (the ELECTRA model) on downstream tasks) In the huffing face the electra model itsself is the encoder(generator). So how come in this case that they through it?

@mateuszsmendowski2677 Жыл бұрын

That explanation is quite useful after having some basic understanding of TFT architecture. So reading the original manuscript beforehand is somehow recommended to get the essence from this video.

@cosmicfissure924 Жыл бұрын

me gustaria unirme al club

@michelemontebovi7750 Жыл бұрын

How can I participate in future meetings where you discuss papers of this type?

@nPlan Жыл бұрын

Sorry for the delay, you can sign up to our up and coming Paper Clubs by following this link www.meetup.com/home/?suggested=true&source=EVENTS

@alessela7119 Жыл бұрын

How do they compute backpropagation?

@JoeMcDonald-x8c Жыл бұрын

This was very helpful after reading the paper! Is there any chance that the slides used in this video are available somewhere?

@divugoel Жыл бұрын

Hello, Were you able to find the slides from this video?

@JoeMcDonald-x8c Жыл бұрын

@@divugoel no I haven't. Still hoping someone with them will see this.

@arisioz Жыл бұрын

Definitely not by Vahan, it's this guy in the background who actually understood and explained some points of this paper lol

@InturnetHaetMachine Жыл бұрын

30:00 I'm not sure I get the point. At least Facebook released this massive dataset for researchers and sourced it properly / ethically. OpenAI didn't even bother with that. I'm sure they have used writers and journalist's work without giving them credit, but we won't even know because they never released the data they used. And the Stable Diffusion companies who just stole Artists and Photographer's work without giving them credit or compensation. And I think it's completely fair to restrict to the usage to non-commercial purposes. If you want to make money then buy your own dataset. I'm sure if your plan is to commercialize your model, you won't publish back the model and dataset it was used on. This can't be a one way street where you want to have your cake and eat it too.

@jonatan01i Жыл бұрын

Thanks for sharing!

@dippatel1739 Жыл бұрын

Best discussion and presentation. Not sure if I can attend this Paper Club because it looks like company sponsored but definatelity keep posting this videos.

@vahan.hovhannisyan Жыл бұрын

Thanks for your interest! nPlan's paper club is open for all to attend in person in London or online! Just search for nPlan paper club :)

Ең жақсы KZbin

Пікірлер