The Paradox Of Self-Locating Probabilities

Рет қаралды 18,727

Күн бұрын

Пікірлер: 141

@Flourish38 15 күн бұрын

Hey, I’m pretty sure you got it backwards at 3:03. Isn’t p representing the probability of proceeding, not turning? Setting p=0 makes the expected value 0, which corresponds to always turning.

@Poly_morphia 15 күн бұрын

Ah shoot LOL I had that typo and just ended up copy pasting the text in the video didn't I. Good catch. Yes, the boxes at the top of the screen should say proceed, my bad.

@xil12323 14 күн бұрын

The paradox arises because the action-optimal formula mixes world states and belief states. The formula essentially starts by summing up the contributions of the individual nodes as if you were an "outside" observer that knows where you are, but then calculates the probabilities at the nodes as if you were an absent-minded "inside" observer that merely believes to be there (to a degree). So the probabilities you're summing up are apples and oranges, so no wonder the result doesn't make any sense. As stated, the formula for action-optimal planning is a bit like looking into your wallet more often, and then observing the exact same money more often. Seeing the same 10 dollars twice isn't the same thing as owning 20 dollars. If you want to calculate the utility and optimal decision probability entirely in belief-space (i.e. action-optimal), then you need to take into account that you can be at X, and already know that you'll consider being at X again when you're at Y. So in belief space, your formula for the expected value also needs to take into account that you'll forget, and the formula becomes recursive. So the formula should actually be: E = alpha * p * E + alpha * (1 - p) * 0 + (1 - alpha) * p * 1 + (1 - alpha) * (1 - p) * 4 Explanation of the terms in order of appearance: - If we are in X and CONTINUE, then we will "expect the same value again" when we are in Y in the future. This enforces temporal consistency. - If we are in X and EXIT, then we should expect 0 utility - If we are in Y and CONTINUE, then we should expect 1 utility - If we are in Y and EXIT, then we should expect 4 utility We also know that a must be 1 / (1 + p), because when driving n times, you're in X for n times, and in Y for p * n times. Under that constraint, we get that E = -3 * p^2 + 4 * p The optimum here is at p=2/3 with an expected utility of 4/3, which matches the planning-optimal formula. TL;DR: There's no paradox, the planning-optimal and action-optimal value actually agree if you take into account that belief states and world states are different things.

@Poly_morphia 13 күн бұрын

Ooh this is a good explanation very solid. The wallet analogy is pretty good too.

@alex_zetsu 13 күн бұрын

I didn't think world states and belief states could be different. I mean I know that one's beliefs wouldn't have full certainty over the world state. However I thought as long as you knew what probability the world state was that wouldn't matter. I can see your explanation makes sense but I never realized I should think of it that way.

@xClairy 13 күн бұрын

Super good explanation

@sophigenitor 9 күн бұрын

Awesome explanation!

@Mikemk_ 14 күн бұрын

So what should Dave do? Pull over, put his hazards on, and take a nap.

@Mikemk_ 14 күн бұрын

Also, next city council meeting, he should bring up the road that drives directly off a cliff.

@preethamrn 14 күн бұрын

But then he falls prey to the sleeping beauty problem where he doesn't know what coin was flipped before he woke up.

@Poly_morphia 13 күн бұрын

These comments are gold

@Mikemk_ 12 күн бұрын

@preethamrn Once he's slept and can remember where he's been, drive all the way to the end, and then it's your first turn.

@gcewing 12 күн бұрын

If for some reason taking a nap isn't an option, he should just accept that he's never going to get home and start a new life at C. A utility advantage of 3 isn't enough to risk falling over a cliff for.

@dat-pup 14 күн бұрын

congrats you got me to click on and watch a statistics video hopefully I will never let this happen again but this is an achievement for you

@Poly_morphia 14 күн бұрын

Nooooo stat is really cool I promise plus you gotta at least watch the veritasium video linked in the description this stuff messes with your mind (in a good way, I think). Thanks for the support though!

@Nzargnalphabet 14 күн бұрын

@@Poly_morphiayeah, but I just want to do combinatorics because uncertainty gives me the heebie jeebies, I’m fine just being oh so close to probability, as long as they don’t make me sum my nCrs

@TheGoodMorty 13 күн бұрын

"probabilities" was in the title so I think it's a little on you

@isavenewspapers8890 13 күн бұрын

what are the odds?

@MaksimCherkazyanov-b9s 13 күн бұрын

It wasn’t for me

@deepspacemachines 12 күн бұрын

Dave is distracted because he's doing math while driving instead of focusing on the road

@fireballferret8146 12 күн бұрын

This problem might be better stated as being stuck in a roundabout: two or more exits, -1 utility for extra laps, all exits "look the same" And instead of selective (and highly specific) bad memory, we can say bad eyesight.

@Poly_morphia 12 күн бұрын

This is also very fair! It definitely would feel more generalizable and I think Dave would get a lot less roasted in the comments LOL

@michaelgarrow3239 13 күн бұрын

Dave needs to quit driving while he is this stoned… 😹

@Poly_morphia 13 күн бұрын

I love all these comments just roasting Dave LOL poor guy

@michaelgarrow3239 13 күн бұрын

@ - I resemble that! 😁 The thought I had with statistics and this problem is while Dave should turn 2/3 of the time he is not given an infinite # of chances: he has a binary choice to turn or not turn. And depending on his concern for living into the future he may choose to go strait all the time. However that is like going to Vegas(!) and just playing the machine that converts dollar bills into quarters. Not really human nature. 🙄 You could, perhaps, explain how the math works better for people that are mathematically curious but not sure if they swing that way. 😳 I think this is an excellent idea and has a lot of potential. 👍

@Pabloparsil 9 күн бұрын

You put so much effort into the video, clearly believing (correctly) that it's super interesting and then you decide to go so quickly like you feared that we wouldn't like it otherwise! Take your time dude, it's fine

@Poly_morphia 9 күн бұрын

Honestly this is really good advice I’ve probably needed to hear. I’ve been rushing too much with uploads cause I don’t want to be scrambling during the semester. The next video will be on another math topic and I’ve already recorded the voice part, so it’ll probably be similar in pace. However, the next next video will definitely be on a strategy game most people have played and I’m gonna do my best to make sure that one maintains a high quality. Thanks for the reassurance! Means a lot :)

@Pabloparsil 9 күн бұрын

@@Poly_morphia just to be clear I mean that you don't need to speak so fast or edit pauses too much, I wouldn't mind at all if this video was 20 minutes long for example. You are the one who sees the data so it's your decision of course. I could also see that the longer the video the lower the click through rate, so it's probably a balancing act.

@aliquis4460 5 күн бұрын

A follow-up to xil12323's answer, with a bit more theory sprinkled in: TL;DR: we are confusing two perspectives, i.e. Dave's and an observer's perspectives. We can't mix the two, but sticking to either gives the correct answer: xil12323's answer sticks to Dave's, while the calculations in planning-optimal sticks to the observer's. Amazing video by the way, just subscribed. --- the full answer starts here --- To make things clear we suppose Dave reached X at time 1am, and if he didn't go off the cliff, he reaches Y at 2am. The two perspectives, in math jargon, are probability measures (i.e. the function P when we write P[Dave is at X] or P[Dave goes straight]). We will call them P_subjective and P_objective. In calculating the action-optimal expected utility, we had the term P[Dave is at X at 1am] * P[Dave goes straight] * E[utility | Dave is at X and goes straight]. We used the subjective probability P_subjective for the first term. It can't be P_objective since that event has probability 1 under P_objective -- to an observer watching him driving to X, that's the only possibility. But for Dave himself, his P_subjective is not so sure. But in calculating the third term, we used the probability P[Dave is at Y at 2am | Dave is at X at 1am and goes straight] -- this has to be the objective probability, as Dave's subjective probability can't understand the event {Dave was at X and went straight}, since Dave has no clue where he was. With more jargon, we say that the event {Dave was at X and went straight} is not measurable under P_subjective. As a result, what we actually computed is P_subjective[Dave is at X at 1am] * P_objective[Dave is at Y at 2am | he was at X at 1am and went straight], which mixes the two people's perspectives and doesn't simplify to P[Dave was at X, went straight, and is at Y] as if they are the same measure. However, xil12323's answer salvaged the calculation by sticking to P_subjective: when calculating E[utility at Y | Dave is at X and goes straight], their answer replaced the expected utility under the objective measure (i.e. p * 1 by going straight at Y) that knows he is at Y, with the expected utility under the subjective measure (i.e. = p * E by going striaght at an intersection). Going the otherway by sticking to P_objective results in the same correct expected utility formula from the planning-optimal part.

@rzeqdw 14 күн бұрын

2:23 why does +0 hae a (1-P)^2 utility. Shoulnd't it be (1-P)? It's the probability of not proceeding at the first intersection, no?

@Poly_morphia 14 күн бұрын

TRUE. Thank god the math cancels the expectation out anyway I was debating whether to include that part at all but I guess in hindsight I really just shouldn't have. Sorry about that. Hope it didn't detract too much from the video.

@justyourfriendlyneighborho2061 3 күн бұрын

Was just wondering abt that, glad to see I haven’t gone mad!

@catcatcatcatcatcatcatcatcatca 14 күн бұрын

I suspect anthropic puzzles have a really useful real life equivalent in software development, because what it effectively boils down to is choosing the best, stateless policy. Why I say ”I suspect” is because if that were true, this would be really easy to test with random trials, and see how the initially minor gap in probability should become noticable pretty quickly. For a developer the action optimal line of reasoning might be the more intuitive choice, as that would be the point where the code actually runs: some other logic arrives at intersection, which is some gap in the stateful chain of logic. It then calls this policy-function. So this policy function at the very least knows it is at an intersection every time it is called. Yet, according to this video, the policy should still be derived from the Ex Ante point of view, and be planning optimal, thus accounting for all following calls to the same stateless function when choosing a policy. Obviously, in wast amount of cases, even in embedded systems, it would be trivial to record how many times the policy-function is called. It could just be stateful. But exceptions also exist: what if we don’t know when it is meaningful to reset this counter? Having this policy stateless and probabilistic is desirable as last resort.

@Poly_morphia 13 күн бұрын

That's actually pretty cool! I haven't really worked with policy functions before but might be a cool video on its own.

@orchidquack 15 күн бұрын

Excited for another video, keep up the quality content!

@Poly_morphia 15 күн бұрын

Thanks! Wild notification bell timing.

@drewmariani2964 15 күн бұрын

I'd be interested to double check that a simulated version of this setup really does match the "planning optimized" definition of utility and not the action optimized. (Cos my brain is still screaming that no!! It's got to be the action optimized! It made so much sense!!!)

@Poly_morphia 15 күн бұрын

This should work for a very naive simulation: import random def simulate_driver(p_continue, num_simulations=10000): total_payoff = 0 for _ in range(num_simulations): payoff = 0 if random.random() < p_continue: if random.random() < p_continue: payoff = 1 else: payoff = 4 else: payoff = 0 total_payoff += payoff return total_payoff / num_simulations def find_optimal_p(): best_p = 0 best_payoff = 0 for p in [i / 100 for i in range(101)]: average_payoff = simulate_driver(p) if average_payoff > best_payoff: best_payoff = average_payoff best_p = p return best_p, best_payoff if __name__ == "__main__": p_optimal = 2 / 3 avg_payoff = simulate_driver(p_optimal) print(f"Average payoff with p = {p_optimal:.2f}: {avg_payoff:.4f}") best_p, best_payoff = find_optimal_p() print(f"Optimal p found through simulation: {best_p:.2f}") print(f"Maximum payoff: {best_payoff:.4f}")

@Eve-esdf1234 21 сағат бұрын

Simple. Once Dave encounters an intersection, continue forwards, then turn at the next intersection.

@benniboi7231 13 күн бұрын

Wow thats a consistent ish upload schedule i thought id only see the next video from you in a few months

@Poly_morphia 13 күн бұрын

Haha thanks hopefully school won't be that bad this semester. Am currently working on frontloading a lot of video prep right now too.

@robertseater2326 14 күн бұрын

Please take a look at the game "Are You a Robot?" which I think exposes a similar dilemma (but without the paradox). That is a 2-player 3-card game of social deduction, which itself sounds impossible. In that game, I think there is no ability to cooperate in the moment (action optimality) offering a reward of 1/3 (human always shoots) but strong ability to plan prior to starting the game (planning optimality) offering a reward of 2/3 (everyone agrees to always shake before knowing if they are robot or a human). There is no paradox in that game, but the two strategies reveal a common issue in game theory, and an ambiguity in the printed rules -- can players make binding deals? In this case, the presence of binding deals raises the expected value of the game. However, I'm not sure I've done the game theory analysis quite right for the game, and I would love it if you took a look.

@Poly_morphia 14 күн бұрын

I’ll put it on the list and if it can fit somewhere, I’ll definitely try to include it in the future. Thanks for the suggestion!

@MrRyanroberson1 14 күн бұрын

The way i saw it was... he's either at intersection x (with weight 1) or y (weight p), and the expected value of y is 4-3p, so the weight of x is 4p-3p^2. Then the expectation should be weighted accordingly. 2(4-3p)p/(1+p). This way, whatever his probability of turning is, it should be optimal.

@Poly_morphia 14 күн бұрын

Yup! This is exactly the paradox presented in the video, where you're referring to the action-optimal standpoint. Unless your disagreeing with the reasoning presented in the video on why this is paradoxical?

@markos.5539 12 күн бұрын

This feels luke a busy beaver algorithm

@Poly_morphia 12 күн бұрын

I should lowkey look into that more

@markos.5539 12 күн бұрын

@@Poly_morphia i remember seeing numberphile videos from back in the day. there's also a veritasium video about big number or infinities i think. There was a great explanation what this is.

@SanokFrolic 15 күн бұрын

Watched "sleeping beauty problem". I guess PO are halfers and they bet once on a whole path, while AO are thirders - they bet every time they wake up. Each strategy has it's game where it's better then the other.

@Poly_morphia 15 күн бұрын

Yep, that was my conclusion too. There are also people called “double halvers” too though (check the descriptions references) and one of the articles said this group of people are wrong though. If you’re curious, I can look in it more, but I’d generally agree with your take.

@DarkAlgae 12 күн бұрын

so i'm one of those generally new at this (extent of my game-theory is the prisoners' dilemma, so... not much); but as you were explaining the paradox there was some intuitive buzzing that the action-optimal calculations were "wrong" and i was happy to see that formalized

@Poly_morphia 12 күн бұрын

Glad you liked it!

@Quincitic 11 күн бұрын

I think it's just the fact that, at an intersection, you know you're not on C, end of the day. Planning, you know that you might wind up wondering about what to do while you were already on C and unaware that you'd passed two intersections.

@tristancole8158 12 күн бұрын

Why would being late be closer in value to dying than to getting home on time?

@3eH09obp2 12 күн бұрын

I think there might be a math error at 2:30, the probabilities don't sum to 1 because you are doing (1-p)^2 instead of (1-p), and also you switch over from p being to proceed to p being to turn

@Poly_morphia 12 күн бұрын

Yea a few other people have commented, that part of the problem gets 0-Ed out so I didn’t really look as closely as I should have. The overall result will be the same though

@tylerrussell7560 12 күн бұрын

I am a math person but haven't done much probability and this video was great.

@Poly_morphia 12 күн бұрын

Thanks for the support!

@farklegriffen2624 4 күн бұрын

Quickly coding this, the probability seems to peak at around 33% chance to turn, with an average utility of 1.33 I wonder if a variable got mixed up somewhere

@Garathon 12 күн бұрын

if dave is allowed both a planning paper, a coin, and the ability to flip/interact with it at an intersection then I propose a different strategy coin is tails up the paper states "if you arrive at an intersection and the coin is tails, then turn it over to heads and pass through, but if you arrive at an intersection and the coin is heads take that intersection" and from the videos assumption dave always know he's at an intersection when hes at an intersection, and turning a coin over is 100% less needed interaction than a flip, it means that we have 4 expected utility this is only not the case if the starting setup isn't the case, dave acts not as stated (maybe doesnt notice he's at an intersection), or a paper and a coin aren't allowed which undermines the probability approach

@Poly_morphia 12 күн бұрын

True but it also does undermine the absent mindedness of the problem

@asdbanz316 12 күн бұрын

My solution for sleeping beauty paradox is to determine how many times this experiment happens. If only once, you're halfer, if it never ends - you're 3rder. And anything in-between if it's more than one.

@chadarmstrong7458 12 күн бұрын

My interpretation of the error was that the action optimal people incorrectly state that the driver has a "probability" of being at either location. In reality, he is at one or the other and just doesn't know which one. Just like the paradox with 2 envelopes of money, one with double.

@Poly_morphia 12 күн бұрын

Ah shoot I should have mentioned the 2 envelope paradox. I've been meaning to do a video on the topic and lowkey this would have been a good one to pair it with. Oh well. Hopefully I can in the future.

@ericherde1 14 күн бұрын

He can maximize his utility by pulling over and sleeping until he is no longer so tired as to be a threat to himself and others on the road.

@Poly_morphia 13 күн бұрын

Someone else said if he goes to sleep when he wakes up he'll be subject to the sleeping beauty problem and wondering if the coin came up heads with probability 1/2 or 1/3 though.

@52flyingbicycles 13 күн бұрын

All these probability paradoxes, yet we could easily solve them with Monte Carlo simulations…

@Poly_morphia 13 күн бұрын

True, might make a video on MCMC in the future.

@orpal 4 күн бұрын

This is why car dependency is so fucked. All the intersections look the same because of our shitty road designs. Taking a bus seems like the best choice.

@zoc2 12 күн бұрын

What do monte carlo trials say that the optimal p is?

@Poly_morphia 12 күн бұрын

I wrote a super brief but easy script here: import random def simulate_driver(p_continue, num_simulations=10000): total_payoff = 0 for _ in range(num_simulations): payoff = 0 if random.random() < p_continue: if random.random() < p_continue: payoff = 1 else: payoff = 4 else: payoff = 0 total_payoff += payoff return total_payoff / num_simulations def find_optimal_p(): best_p = 0 best_payoff = 0 for p in [i / 100 for i in range(101)]: average_payoff = simulate_driver(p) if average_payoff > best_payoff: best_payoff = average_payoff best_p = p return best_p, best_payoff if _name_ == "__main__": p_optimal = 2 / 3 avg_payoff = simulate_driver(p_optimal) print(f"Average payoff with p = {p_optimal:.2f}: {avg_payoff:.4f}") best_p, best_payoff = find_optimal_p() print(f"Optimal p found through simulation: {best_p:.2f}") print(f"Maximum payoff: {best_payoff:.4f}") It does output that the optimal p = 2/3. Also, side note, p does mean the probability to PROCEED. There was a typo in the video script that got copied a bunch of times :( Hopefully it didn't hurt engagement.

@YouTube_username_not_found 13 күн бұрын

I entered the video because I knew it is related to the Sleeping Beauty Problem.

@YouTube_username_not_found 10 күн бұрын

After watching the video: Apparently I was wrong. In the SB Problem, you try to calculate probabilities of events, but here, you try to choose some strategy. Edit: You try to choose some optimal mixed strategy by finding a probability for which the utility is maximized.

@tedarcher9120 13 күн бұрын

Dave has extra information when he arrrives at an intercection. The fact that he is alive means he didn't turn on the first intersection yet

@Poly_morphia 13 күн бұрын

Actually I think it’s the inverse; he’d have information if he weren’t alive since that would imply he took the first intersection. Unfortunately I don’t think he’d appreciate that scenario though. Better to be a lively fool sometimes than a knowledgeable cliff jumper

@tedarcher9120 13 күн бұрын

@Poly_morphia yes but we are interesting in his behaviour on an intersection, not in mid-flight down the cliff

@Poly_morphia 13 күн бұрын

Ah I see. But in that case he can’t actually know what “first” means. If he chooses to continue past X, he can’t remember that he went through it, so just cause he’s alive doesn’t mean he knows he’s at Y. From his POV, he’s still too tired to remember his previous driving/decisions.

@tedarcher9120 13 күн бұрын

@Poly_morphia yes but he knows he DIDN't turn on X yet. Also, he's so tired that falling off a cliff is only slightly less attractive than not falling lmao

@jessesmrekar2773 6 күн бұрын

By being at an intersection you gain the knowledge (and ability) to not terminate at the C (utility value 1) outcome, thus improving your bayesian score, no? Is this not sufficient to explain the increased score with the second approach?

@Phlosioneer 2 күн бұрын

Contrary to your video wrap-up, this does come up IRL. Some real world problems cannot have a cohesive plan-based approach, and you’re forced to use an action based approach. A simple yet effective example is a maze-solving robot with only bumper sensors. You might think that is solvable, but motors are imperfect and the robot only knows a probability distribution of possible motor outputs (they’re not servos). Even with infinite memory, even with a fixed prior knowledge of the maze, it still cannot navigate perfectly, it still cannot figure out where it is, and it still needs to move randomly to solve the maze.

@XahhaTheCrimson 13 күн бұрын

Interesting. Thank you for sharing this. But I'd like to say this: you should put more silence between sentences. Currently the intervals between audio are too short so that one can't digest each thought before the new sentence is hearing. Isn't it annoying to expect the viewers to stop themselves for each sentence?

@Poly_morphia 13 күн бұрын

Sounds good, thanks for the feedback! I did kind of notice that after my second video haha I've been working on speaking a lot slower ever since. I think hopefully February should be a lot better in this aspect.

@aquatoonzed 14 күн бұрын

Dave could just crash the car lol I cannot handle a car if I can’t remember where I am 😭 Great video as always ❤

@Poly_morphia 14 күн бұрын

Thanks for the support! I too don’t like driving much haha

@alex_zetsu 14 күн бұрын

Feel free to upload every 3 weeks or more infrequent. I'd rather see high quality videos instead of creator burnout. Also consider a buffer. Like make 4 videos before starting a release schedule. That way if you get sick or just hit a mental block, you have a bit of extra time. I remember for my fanfics, I generally only release chapter 1s after making chapter 4s.

@Poly_morphia 13 күн бұрын

Thanks for the support! Also not a bad idea to buffer videos for sure. Will do my best.

@farklegriffen2624 4 күн бұрын

Why not code it? If the action-optimal approach is actually better, then it should work in application too.

@boklasarmarkus 14 күн бұрын

I did really understand why the action optimal aproach didn’t work. To be fair I didn’t get the equation for it either 😢

@Poly_morphia 13 күн бұрын

Someone else left a pretty good explanation too if you're curious on more. Sorry about the clarity though, will try to improve in the future.

@Dudeguymansir 14 күн бұрын

Dave’s not here, man

@michaelgarrow3239 13 күн бұрын

@@Dudeguymansir - Cheach and Chong…

@NoNameAtAll2 13 күн бұрын

my nitpicks (with presentation, not the problem): 1) at ~7:46 I got really confused by all these "planning even further ahead" stuff... I think it boils down to "if we use action strategy in overall's calculation"? 2) I don't like how at 8:38 you first said action calculation overcounts - and then say it leaves out worlds the problem is opposite - 1 woken up Dave jumps off a cliff, but 2 awakened Daves drive through to +1 - thus bias towards turning

@Poly_morphia 13 күн бұрын

That's very fair criticism, thanks for the feedback! I'll try to be more clear for the future too.

@NoNameAtAll2 11 күн бұрын

@Poly_morphia I love your content! thank you for making these

@grignaak9292 11 күн бұрын

3:17 Short-term absent-mindedness is fairly common (examples: post-concussion, ADHD-Inattentive, short-term memory loss, etc.). That is, forgetting which intersection you're at while remembering rules to get home.

@Poly_morphia 10 күн бұрын

True true these are also much more realistic too

@Nosirrbro 8 күн бұрын

@@Poly_morphiaI was gonna say this thought experiment just sounds like me trying to get home

@Tumbolisu 14 күн бұрын

Doesn't alpha depend on p, since the chance of being at X instead of Y depends on how likely you are to turn? If you never turn, alpha is 50%. If you always turn, alpha is 100%. And because alpha depends on p, how can it be of any use? The moment the whole action-oriented thing with alpha was brought up, I simply got confused. The literal problem is that we don't know what intersection we are at, so what are we even talking about?

@Poly_morphia 13 күн бұрын

So alpha conceptually is the probability of being at X. The reason why it depends on p in those scenarios is because we're weighting the probabilities of being at each intersection. The probability of having to be at X at some point is 1, but the probability of being at Y is p. Thus, if you weight the intersections accordingly, you get 1/(1+p) and p/(1+p). But the last statement is basically the point that yes, we're absent minded so we can't take that approach.

@davidpercy6210 14 күн бұрын

It would be really interesting to see simulations that show what each approach is thinking: I think a simulation will show the planning-optimized agent getting the higher average payoff, of 1.33. Because in the story, and in the simulation, the driver gets his reward only at the end when he leaves the road. The average payoff is an average over complete trials of the game. The action-optimized agent is maximizing a different average: over observations. If we counted each decision as a separate game, I think we’d see this agent win, with average payoff 1.67. This agent is assuming it gets paid right away, which is wrong given the story.

@Poly_morphia 13 күн бұрын

Hmmm I left a very basic python script replying to someone else's comments, if you're interested further feel free to use it as a base!

@robertseater2326 14 күн бұрын

I see the troubling contradiction, but I'm not sure 5:16 is quite right -- I don't think that alpha =1 implies that Dave knows what intersection he is at. It just implies that it is optimal to always turn even when you aren't sure what intersection you are at. I think that contradicts the planning-optimal solution but does not contradict his absent-mindedness. Or do I have that wrong?

@Poly_morphia 14 күн бұрын

Check the definition of alpha again: the “reasoning” for why action optimal people think they’re right is they’re splitting cases of being at X and being at not X (Y), so alpha is the probability of being at X. Thus, if alpha is 1 then he has to be at X. Hope that makes sense! I think you’re mistaking alpha with p.

@foobar69 12 күн бұрын

2:21 why it isn't just (1-p) ? since he made only ine choice. also if we're being consistent shouldn't this be (1-p)*(1-p) + (1-p)*p which is is just, again, (i 1-p)?

@Poly_morphia 12 күн бұрын

I think someone else commented this, pretty sure it’s a typo but it gets zeroed out so it doesn’t end up mattering mathematically

@Koresaurus 14 күн бұрын

well, I tried to understand it, but I'm just not good enough at statistics.. as for the sleeping beauty problem.. I felt that it doesn't have a clear answer for a non-mathematical reason: the phrasing is ambiguous.

@Poly_morphia 13 күн бұрын

Someone else left a pretty good conceptual explanation in the comments, but I'm sure your statistics is fine! Also true statement on English being super ambiguous generally yeah.

@erikeriks 13 күн бұрын

Hi Polymorphia, could you maybe review this strategy game idea. Description Midnight is played on a board with 12 sections that are called hours. Each player has 4 stones that look a bit like checker pieces. These pieces have a value ranging from 1 through 4 (shocker). The four-piece is also referred to as the max-piece. Classical Rules Each round, a player must place all of his pieces on the board. For argument's sake, let's call the opponents blue (BL) and red (RD). Suppose that BL places his two-piece on hour 12, but RD places her max-piece on hour 1. In this case, BL would win nothing, because there has been a higher piece placed on a consecutive hour. The same logic would apply if RD were to place her piece on hour 11. Now suppose that BL didn't play his two-piece, but also placed his max-piece. Neither party would gain anything, as the pieces are equal. Objective The objective is to claim the most hours on the board. To make things more even, the losing party always goes first in the next round. If BL claims an hour, he must place one of the game's white stones (called "lockstones") on the section that he has won. Neither player can claim this section again. Now here is the fun part: you also win if you end a round on 12 points. If I claimed hour 3 in round 1, I would thus need hour 9 to win instantly. Midnight: Europe Variation The variation that I will choose to play will be called the Europe. It follows all the same rules, but has these additional one's. Special Rule 1: Sections If a player has a max-piece on one of the four sections of the game, and no other pieces are present within that section (whether that be his own or his opponent's) he wins the section and its sum of points. Special Rule 2: Royal Piece If a consecutive 2-, 3- and max-piece are played, the player who has the max-piece will claim each of the hours and gain an additional 21 points.

@Poly_morphia 13 күн бұрын

Ooh interesting. I've never heard of this game before at all but I'll add it to the list.

@erikeriks 12 күн бұрын

@@Poly_morphia great, I forgot to add though that the board is circular. The sections are all slices of the circle, kinda like a dartboard with 12 sections.

@CrystalLily1302 14 күн бұрын

Okay I've been looking over it and I feel like the action optimized formulation is mathematically identical to the planning optimized formulation, further subdividing the solution space shouldn't affect the final answer but it does in this case and I'm struggling to understand how that's actually happening. Like I can follow the math but since the optimization fails there must be some logical error and maybe I just don't get bayesian statistics at all but the argument provided here fails to really convince me that "when I arrive at a probability it has some probability of being each one" is an invalid claim to make, because like, that has to be true?

@Poly_morphia 14 күн бұрын

Hmmm I can understand the struggle of intuition but I'm not sure where it is mathematically correct. I'm just copying from my script here but see if you can find a math error (otherwise, I'm not sure they're identical). Action Optimal: alpha * (1 * p^2 + 4 * (1-p)*p + 0 * p) + (1 - alpha) * (4 *(1-p) + 1 * p), so alpha = 1/(1+p) means we get 1/(1+p) * (1 * p^2 + 4 * (1-p)*p + 0 * p) + (p/(1+p)) * (4 *(1-p) + 1 * p) = (2 p (4 - 3 p))/(p + 1) Planning Optimal: 1 * p^2 + 4 * (1-p)*p + 0 * p = 4p-3p^2 I'm not exactly sure how this can be identical. In fact, if we set these values equal, the solution is p = 0, 1, 4/3, none of which really make any sense if the equations were correctly identical, so it is beyond the optimization. I guess here's my best attempt to answer your question about intuition. Basically, it definitely is true that you're at an intersection if you see a fork in the road. The key reason why Bayesian statistics works is because the information you collect UPDATES your priors. However, in an absent-minded scenario, you can't update your priors ever. So yes, you know you're at AN intersection but crazy enough the probability you are at a SPECIFIC intersection somehow becomes invalid because you cannot ascribe value to these states. In decision theory, you usually represent decisions via trees (which makes sense i.e. we do A or B, in this case we're at X or not X). However, this would imply that there are two nodes, each with different actions and equilibria. In this case, there is literally just one node in the tree - just being at "some" intersection, so there is no Bayesian update. Hope this slightly clarifies things. Let me know if you want me to try another one though! Hope you liked the video too :)

@jonathansfox 14 күн бұрын

@@Poly_morphia I'm very tired, so maybe my wires just aren't connecting right, but does (1 - alpha) actually become (p/(1+p)) when you substitute 1/(1+p) for alpha in the simplification of the action optimal equation? Shouldn't it be (1 - 1/(1+p)), not (p/(1+p))?

@BigDBrian 14 күн бұрын

why did you calculate the probability p of continuing, and then continue as if you calculated the probability p of turning left instead?

@Poly_morphia 14 күн бұрын

Hi, check the pinned comment but basically it was a typo early in the script that unfortunately got copy pasted into the animation assets. The math is still correct, the English interpretation is unfortunately stuck in the animation :( apologies about the confusion

@patrickwright8552 9 күн бұрын

The obviously dumb part about this is the assigned utility. "Certain death" being -1 relative to "driving aimlessly", while "driving aimlessly" -3 relative to "getting home", is obviously absurd. Assigned numerical utility is a fraught and marginally useful concept as is, except in artificial contexts like games (the idea that it's useful to approximating "happiness" or some other ephemeral property and taught so widely is somehow related to the continuing decline of civilization). It's not even scaled properly (the brain is log), but I haven't thought through if that's significant

@Poly_morphia 9 күн бұрын

Tragic you feel that way; the utilities are just scaled for the story telling of the problem. Hope you got something out of the video though!

@patrickwright8552 9 күн бұрын

@@Poly_morphiaI really shouldn't have included my last part on the "scaling problem" vis a vis the brain, because without stated assumptions about what the utility values mean it could be claimed they are scaled properly for any challenges that may come, and it's a more technical criticism on calculation than anything too important. I understand the point you get to--there is some situation where the optimal outcome is calculated with the first method rather than the second. As you say, it doesn't seem applicable in many situations. The story telling is an issue. As I understand your video, it's not your story, it's someone else's, but it's really a dumb story. The amnesia is fine, the idea that the driver could write down and roll a die is fine (I've watched Memento), but the utilities make no sense, and the story is limited in other ways that don't make sense. The story does not help me imagine other scenarios for applying this technique--a different story with the same utilities might. A more realistic set of utilities will give different p. If he misses the turn, Dave has an option to pull over and sleep. Dave may not know he missed the turn, but he will feel sleepier and sleepier. Certainly he cannot drive forever. He will end up asleep on the road or the side of the road. For everyone who doesn't know Dave, him driving off a cliff is probably optimal, because sleepy drivers can kill you, so the optimal social outcome is p=1. Unless you believe "no man is an island", in which case you probably need to calculate the odds he kills himself and someone else with option C. Which points to utilities as ideological objects. Hence my problem with numerical utilities as good ways to understand or recommend behavior. The assignment is subjective and ideological, while the use of numbers gives an illusion of objectivity. This is often abused e.g. to "objectively calculate" the costs and benefits of killing off whole species for more wood. It's just one person's ideology shoved into the language of math, no more or less valid than another person's ideology shoved into the language of English.

@Tata-ps4gy 15 күн бұрын

Isn't this a reformulation of the sleeping beauty problem?

@Poly_morphia 15 күн бұрын

Yes and no. Both are a type of anthropic puzzle, but each has its own solution depending on your prior. For example, you wouldn’t say all angle chasing problems and triangle similarity proofs are the same, but they definitely belong in the same genre. However, if you believe the implications of the solution presented, you’ll have varying opinions about sleeping beauty too. I think reformulation implies they’re the same thing repackaged, which maybe isn’t exactly the case.

@ToasterLightning 13 күн бұрын

I'm confused about how you got the planning probability of "Turn with 2/3 probability", in my calculations I got "Turn with 1/3", which gets an expected utility of 1.333.. while turning with 2/3 gets an EU of 1 The equation for EU is P(Turn at X)*0 + P(!Turn at X)*(P(Turn at Y)*4 + P(!Turn at Y)*1), or (1-p)*(4p+(1-p)), which is -3p^2+2p+1, which peaks at 1/3.

@Poly_morphia 13 күн бұрын

Hey glad you did the math! Check the pinned comment for more but it’s unfortunately an English error that passed through the script, but yes we have the same answers!

@UltraRik 12 күн бұрын

Im too dum to understand the math stuff but this just sounds like gambling apologia

@Poly_morphia 12 күн бұрын

Nah you got it! But if you like gambling I think you’ll like the next upload I’m working on

@clairegu7871 15 күн бұрын

i love lotp

@Poly_morphia 15 күн бұрын

Thanks for the support! Crazy you're this early.

@hoagie911 10 күн бұрын

My problem with this is that the driver should be able to assign some probabilities as to which gate junction he is at. Even though they are identical to him, the probabilities he assigns will differ according to the strategy he thinks is most rational. For example, if he comes to the incorrect conclusion that always turning is the best strategy, and he knows he will always reason the same way, he knows with 100% certainty he is at the first junction. Anyway, once we find what those probabilities should be, his strategy should be consistent with those. I.e., it should maximise (expected util of strat starting at junction 1)*(prob at junction 1) + (expected util of strat starting at junction 2)*(prob starting at junction 2). I tried to do the maths but I can't see how it fits with the video.

@Poly_morphia 9 күн бұрын

Hey yes, this is actually a really common question! Thanks so much for commenting! Ok basically the way it works is that the approach you're describing i.e. assigning probabilities to which intersection he's at IS the planning optimal strategy. You can observe that you set p = probability of PROCEEDING (sorry this is a typo in the video that got extended for like 7 minutes my bad -_-). This means your probability of ending up at A is (1-p) and your probability of ending up at B is p(1-p) and the probability of ending up at C is p^2, which matches the description you're talking about. The catch is that once you find yourself at some intersection (remember, you can't tell which one), that's when your ability to assign probabilities vanishes. The idea of conditioning on the information that you are currently at an intersection doesn't actually help you if you are truly memoryless, so you're left sticking with that planning optimal approach from before you ever saw an intersection, even though it is tempting to say that you're more likely to be at Y now. So, the TLDR is that you can create and assign probabilities of where your destinations will be, but you cannot actually assign probabilities of being at an intersection. It's a weird, mind-bending thing, but I think it's a pretty cool topic.

@hoagie911 9 күн бұрын

@@Poly_morphia "... you cannot actually assign probabilities of being at an intersection." This is the core of the problem, and of my worry. In the Bayesian philosophy, we should always in principle be able to assign probabilities of anything. Having no information doesn't prevent that; if we flip a coin and don't look at the result, we still apply a 50-50 probability distribution over the possible states. So either this thought experiment is so profound it proves something is fundamentally wrong with that philosophy, or there is something wrong with your interpretation.

@Poly_morphia 9 күн бұрын

Hmmm I see your concern and I actually think it’s a little simpler than you imagine. Bayesian updating matters because you get new information that informs your priors, yes? Well we’ve simply just constructed a case that just because you’ve updated your location status, the rules explicitly make it so that you can’t actually use that information to update the priors, even though technically not being at an intersection is different than being at one in terms of a “world state” as supposed to a “belief state”. There’s another really good comment I would definitely recommend reading too but I hope this helps!

@hoagie911 9 күн бұрын

@@Poly_morphia You say you can't update your priors; okay, but priors are probabilities, so what are the probabilities for being at each intersection?

@mikefischbein3230 11 күн бұрын

The impossibility of Dave remembering whether he has been at an intersection means it's also impossible for him to ever know that he has arrived at any of the three destinations.

@Poly_morphia 10 күн бұрын

It could be really distinct destinations though

@mikefischbein3230 10 күн бұрын

@Poly_morphia Even so, none of the destinations are reachable without first arriving at X. Therefore, he can't know that he has arrived at a destination without also knowing he has been to at least one intersection. Since he is incapable of retaining knowledge of being at an intersection, he can't retain knowledge of arriving at a destination, or receiving a utility reward, or experiencing any of the consequences of arriving at a destination. Dave is doomed to spend the rest of his life believing he is still driving and unsure if he has even made it as far as the first intersection.

@JellyfishJellyfish-bk7cr 8 күн бұрын

This is not a vomment on the maths but on the presentation. You spoke way too fast. Give me room to think and comprehend. Also, it would be nice if you could explain a little more of how you get to these magical probability formula. I know this would make for a longer video and youtube doesn't like that, so you might break it up into 2 parts...

@bigfool8819 5 күн бұрын

The speed felt fine, and the formulas were pretty normal. Although a bit more explanation would make the flow better. 16 minutes would be better for youtube.