First, thank you for making this series, and for publishing it for later watching. I've watched a few of them and they've been interesting, have switched me on to solve areas I was unaware of, and occasionally of direct relevance to my work. Commenting to ask a question that I didn't understand about the MCMC error. It's the sum of the gamma auto-covariances for all step lags, fine. When calculating, does this mean a) we must add all lag covadianves even if we use a single lag for sampling? b) that the error changes depending on our sample size ie that we have to import a lag for each record in our sample? For a) I can understand that we're premise on the situation that things are "well mixed", so even if we consistently sample 1 time lag, the real distribution has overlapping causal effects as it dwell in its stationary space. I find b) more confusing though. I can't puzzle it out
@maloukemallouke973529 күн бұрын
what if your variable are time and you want predict when even happen at specific time
@THEPAGMANАй бұрын
2 years later, really helpful, thanks
@ash38442 ай бұрын
How come this is so underated explaintaion??!! by far the best explanation I have learnt so far!!
@shaohuailiu76832 ай бұрын
It's u in latex
@nPlan3 ай бұрын
We apologize for the poor audio. We will make sure in the future to provide a quieter space for the speaker.
@BadWithNames1233 ай бұрын
There is AI to fix your audio.. this is not possible to listen to
@amandajrmoore32164 ай бұрын
Audio background noise not filtered out makes this difficult to listen to bit surprising production quality - sardonically suggest use ai to clean up the audio
@moyprofile4 ай бұрын
I would detect my hallucination in her large model, if you know what I mean.
@מוגוגוגו4 ай бұрын
Isn't a better solution a specilzed agent that scans the answer of the larger LLM for bs , instead of training the model to detect its own bs ?
@Skeleman4 ай бұрын
i agree. the llm is like the brocas area. it has generative grammar and semantic categories but there should be a separate model that checks relevant corpuses for agreement. the only issue would be the large energy and time costs at runtime. hence why they try to do both in the llm, i think.
@Luixxxd14 ай бұрын
Then wouldn't that make the whole tool redundant?
@מוגוגוגו4 ай бұрын
@@Luixxxd1 Pretty much ...Its more effecient to run specialzed agent to check math when needed than blob everything together at runtime.
@AnshumanAwasthi-kd7qx4 ай бұрын
Are you pursuing LLM security as research aim?
@nPlan4 ай бұрын
We are performing research on using LLMs and a part of that research is safety related i.e. making sure that our models are truthful, useful, and robust to adversarial attacks. However, we are not focused on security specifically.
@AnshumanAwasthi-kd7qx4 ай бұрын
@@nPlan good, my thesis is on LLM security and I was looking for any masters/ PhD researcher for a possible collaboration of sorts.
@vahan.hovhannisyan4 ай бұрын
Market makers use a lot of Nash equilibria for asset pricing. Also poker strategies are based on Nash equilibrium
@yasminahachhouch87165 ай бұрын
Thank you for the video. I have a question: Is it necessary to transform the data into Q&A and distractor format as described in the paper?
@artukikemty5 ай бұрын
Amazing, thanks for sharing!
@Jacquesds6 ай бұрын
Amazing video! Thank you for it, helped me a lot to uderstand this paper :)
@w3w3w36 ай бұрын
interesting 🤓
@Fr0z3nMus1k6 ай бұрын
it would be really helpful if you could post on the description your pdf files of the papers so we can read your pdf notes and possibly achieve a better understanding of the papers. Thank you. If you cant post them can u send them to me via mail or something?
@axe8636 ай бұрын
Amazing paper
@amirhosseinalimohammadi40187 ай бұрын
It would be much better if you knew the topic better and knew more detail about the paper.
@nPlan7 ай бұрын
that is true, do you have any resources you would suggest to learn more about the topic? We choose papers for paper club that we find interesting but are not necessarily our research background, so we cannot always give an in-depth presentation of the topic.
@alivecoding49958 ай бұрын
With respect to energy-based models, where we need Langevin Dynamics to sample data from the model (p_theta(x)), which role do the 'empirical' and prior distribution play then? Do we use training data as samples from the prior? And samples from our current model to model our empirical distribution?
@benboys_6 ай бұрын
Empirical distribution are the training data, it will be a mixture of point masses (look up 'Dirac delta') at the locations of the samples in the sample space. Then match forward and reverse markov chains that go from a p_theta(x, t=0) to a normal distribution at t=T which gives you a nice denoising score-matching objective that can be used to train energy based models (train p_theta(x, t)) or score based models (train grad_{x}(p_theta)(x, t)). This training is done by noising samples from the empirical distribution and predicting the amount of noise added. Inductive bias or regularisation gives inaccurate score after training resulting in that you don't recover empirical distribution after training but something more desirable to practitioners that can generalise and achieve good results on metrics they are interested in, such as FID score.
@alivecoding49953 ай бұрын
@@benboys_thank you very much!
@alivecoding49953 ай бұрын
I was wondering because I saw an explanation that said we need Langevin Dynamics for sampling from the model, such that those samples can then be used in a MCMC estimator for the true likelihood of the model.
@codeXXai8 ай бұрын
Thank you for sharing the discussions. Very good discussions.
@BrunoAlmeidaSilveira8 ай бұрын
I came here by chance, hooked by the discussion about climsim, and discovered a very nice content. Thanks for sharing
@131xcom8 ай бұрын
I think the speaker did not understand the paper
@lion875639 ай бұрын
How feature grid is updated in backpropagation?
@aqsanaveed194210 ай бұрын
Solve the paper 'Solving Nonlinear conservation laws of Partial differential equations using Graph neural network " by Steinar Evje and Chunming Rong
@NguyenThiTam091010 ай бұрын
Thank youuu so much, it helps me alotttttt
@MisterDivineAdVenture Жыл бұрын
In this case I would excuse you for using a text to voice dub. Scheech.
@alivecoding4995 Жыл бұрын
Are you on Twitter, Ben?
@R24-q6b Жыл бұрын
😎
@axe863 Жыл бұрын
Is the Variable Selection technique is robust to high concurvity; non-cointegrating relationships; varying degrees of persistence amongst predictors ; pattern destroying non-stationarity (EMH) etc etc
@AK-wn1rm Жыл бұрын
I would think generally not. I mean that is your most general data science problem, how do you predict stuff that is wildly different from the past data that you have seen. Now, that doesn't mean that it isn't somewhat tracking, since the latest data does go in and most forecasts will be somewhat guided by the latest "is" state. The fact that you typically predict over many entities, you would also hope that some generalisation would naturally occur as some similarities have been already observed in other entities. Now these models are not truly causal, at least not out of the box. And they can't, because causality cannot be inferred from data (caveats caveats caveats). So it does fall back to the modeller to provide sensible covariates. As long as causal pathways don't break down (and sometimes that can happen, at least temporarily) the model generalises. If one only throws features at the model and hopes it finds all it needs by itself, one might be in for a bad surprise.
@TheAmazonExplorer731 Жыл бұрын
plese upload the link of the code of the paper
@nPlan Жыл бұрын
github.com/kyegomez/RT-2
@Zenchiyu Жыл бұрын
Concavity instead of convexity ? Since we try to push samples towards regions of high density (noisy gradient ascent)
@benboys_7 ай бұрын
Yes, you're right, same thing up to a sign change and people usually refer to convex optimization or log concave sampling (of a probability density)
@xtraeone5947 Жыл бұрын
Hi can we have the paper review of Direct voxel grid nerf .
@piotr780 Жыл бұрын
this is called in-context learning - it is nothing new in area of transformers, ps. "What Can Transformers Learn In-Context? A Case Study of Simple Function Classes" - someone tried to learn transformer making linear regression
@CausalDisco Жыл бұрын
Regarding your points on synthetic data generation (starting around 32:27): The equation shown does not lose indirect relationships, which can be checked by solving for X and generating some data in this way. The resulting data is in fact causal data (33:56), not purely correlational data. Nonetheless, the scepticism towards the data generation is well-founded: It turns out that standardizing (or changing the weights or noise) changes the results tremendously, and usually for the worse. This is especially problematic in real-world problems with no canonical data scale. The reason is that the convex optimization procedure isn't guaranteed to converge on this non-convex problem - it's just that many ANM simulations give rise to mean-squared-error loss landscapes that are close to convex due to exploding variances (on the raw data scale) along the causal order. So the results should be treated very carefully, but the reason isn't quite so obvious and can affect other algorithms (for example such with the same or a similar score function) as well.
@nPlan Жыл бұрын
Arvid was kind enough to share his slides with us. Here is a link to the slides storage.googleapis.com/dockertest-191011/jc_temporal_fusion.html#1
@tashfeenahmed3526 Жыл бұрын
Thank you so much you explained greatly, what should be the loopholes in this paper which we can improve in this paper, like may be computation time etc something like that. I hope you suggestion will be quite valuable for us.
@SarahGhiyasi Жыл бұрын
Thank you for this incredible video. Really helpful. I just wanted to know how a reward function is set? like in section 4.1 (33:40 of this video), they choose a reward function, but based on what?
@nPlan Жыл бұрын
Based on my experience with reinforcement learning problems, the reward function is often a heuristic or a tuned hyperparameter. I think the authors chose the reward function in section 4.1 because by changing its weighting values they are able to change the difficulty of the learning problem. So while the reward function does not have a theoretical justification, it makes sense in terms of experimental design to illustrate an idea in the paper which is that GFlowNets are able to solve harder RL problems. I think there are two properties which mainly characterize reward functions. All reward functions output a real number which is used as the training signal for RL. One property of reward functions are dense versus sparse. A dense reward function is one provides different rewards for each state, action pair. A sparse reward function is usually one that provides a reward of 0 except when a goal or keypoints towards a goal are reached and then some positive reward is received by the agent (usually 1). A dense reward function makes it easier for an agent to learn to accomplish a task, but usually make it harder for the agent to overcome bottlenecks in the task. A sparse reward function provides the agent with the most accurate training signal for learning to accomplish a task, but may provide very few rewards and consequently very little training signal to learn from. The other property of reward functions is stochastic versus deterministic where a stochastic reward function determines its reward for a state action pair by sampling from some distribution it outputs as an intermediary step in the function. The researcher / engineer using reinforcement learning has to choose / design the reward function, if they choose to use a dense reward function like that in section 4.1. I hope this information was helpful!
@icechanz8279 Жыл бұрын
Hi, thank you for your helpful explanation. The paper club sounds interesting to me. How can I find you guys?
@vahan.hovhannisyan Жыл бұрын
Just search for nPlan paper club. We do weekly discussions open to the public and in-person in London once a month!
@ChristopherKlugmann Жыл бұрын
I would like to emphasize again that the diffusion models are not about generating isotropic Gaussian over the spatial dimensions of the image. It is about transforming the entire (!) distribution into another tractable distribution, in this case, the isotropic Gaussian. If you make the steps small enough and do enough steps, the forward process under Gaussian transition probabilities converges exactly to such an isotropic Gaussian distribution. The question that came up in the video about why this is important is quite simple to answer: we can only sample if we know the distribution after T time steps. The reversibility of the process is a remarkable property that arises under the assumptions formulated here: (i) small diffusion steps (ii) enough iterations and (iii) a transition probability in the forward process that has a conjugate functional form in the reverse process. The last condition is satisfied in the case of Gaussian or Bernoulli transitions. Only because of this property, it makes sense to parameterize the dynamics of the backward process as Gaussian/Bernoulli distributions - this is a result of the theory and not a heuristic.
@benboys_ Жыл бұрын
Thanks for your explanation about what assumptions are required to make a process reversible! I think it is also useful to understand the functional form of the transition density through the lens of Bayes rule on the reverse transition density and using the Taylor expansion (e.g., see the paper "Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling" eq. (3) and eq. (4)), since we always compute in discrete time. This also uses the small step size limit and conjugacy property of the forward Gaussian and the Gaussian term resulting from the truncated Taylor expansion, but has the advantage of being interpretable algorithmically, as well as theoretically. A couple other points of discussion I picked up from this video: Arvid's question at 21:35, which I paraphrase 'why the reverse process seems to denoise if it is a stochastic transition', Nayef and Arvid correctly come to the answer that it is because the mean function of the reverse transition density counteracts or 'wins out' over the diffusion, by forcing the particles to areas of high probability. 27:50, 29:40 The reverse step leads to a distribution that is not isotropic Gaussian because the mean function is parameterized by a neural network, which is nonlinear in the state, leading to a non-Gaussian distribution from the first step onwards. 28:20, 38:25 In practice, the forward chain does not quite reach an isotropic Gaussian. An approximation is made by sampling from the isotropic Gaussian in the first step of the reverse chain. I think that the initial sample of the reverse chain is not independent from the final denoised sample as Peter hints to at 28:49. I think this is complicated to show and depends on the number of steps, and the parameterization of forward chain. Usefully, running parallel chains independently will lead to independent samples from the reverse chain.
@arijaa.9315 Жыл бұрын
Thanks for this explanation I just did not understand one thing. In the paper, they mentioned that ( After pre-training, we throw out the generator and only fine-tune the discriminator (the ELECTRA model) on downstream tasks) In the huffing face the electra model itsself is the encoder(generator). So how come in this case that they through it?
@mateuszsmendowski2677 Жыл бұрын
That explanation is quite useful after having some basic understanding of TFT architecture. So reading the original manuscript beforehand is somehow recommended to get the essence from this video.
@cosmicfissure924 Жыл бұрын
me gustaria unirme al club
@michelemontebovi7750 Жыл бұрын
How can I participate in future meetings where you discuss papers of this type?
@nPlan Жыл бұрын
Sorry for the delay, you can sign up to our up and coming Paper Clubs by following this link www.meetup.com/home/?suggested=true&source=EVENTS
@alessela7119 Жыл бұрын
How do they compute backpropagation?
@JoeMcDonald-x8c Жыл бұрын
This was very helpful after reading the paper! Is there any chance that the slides used in this video are available somewhere?
@divugoel Жыл бұрын
Hello, Were you able to find the slides from this video?
@JoeMcDonald-x8c Жыл бұрын
@@divugoel no I haven't. Still hoping someone with them will see this.
@arisioz Жыл бұрын
Definitely not by Vahan, it's this guy in the background who actually understood and explained some points of this paper lol
@InturnetHaetMachine Жыл бұрын
30:00 I'm not sure I get the point. At least Facebook released this massive dataset for researchers and sourced it properly / ethically. OpenAI didn't even bother with that. I'm sure they have used writers and journalist's work without giving them credit, but we won't even know because they never released the data they used. And the Stable Diffusion companies who just stole Artists and Photographer's work without giving them credit or compensation. And I think it's completely fair to restrict to the usage to non-commercial purposes. If you want to make money then buy your own dataset. I'm sure if your plan is to commercialize your model, you won't publish back the model and dataset it was used on. This can't be a one way street where you want to have your cake and eat it too.
@jonatan01i Жыл бұрын
Thanks for sharing!
@dippatel1739 Жыл бұрын
Best discussion and presentation. Not sure if I can attend this Paper Club because it looks like company sponsored but definatelity keep posting this videos.
@vahan.hovhannisyan Жыл бұрын
Thanks for your interest! nPlan's paper club is open for all to attend in person in London or online! Just search for nPlan paper club :)