Welcome! In this ungraded lab you will see the counter intuitive nature of probability by studying the famous Monty Hall problem. This problem seems very trivial at first glance but it exemplifies the fact that probabilities can have behaviours you might not initially expect.
Begin by importing the required libraries for the lab:
Suppose you are in a TV show where you may win a car by playing a game. The game is very simple: you have to choose among three closed doors. One door has the car and the other two have goats.
The game is played in two steps:
The host lets you choose one among the three doors, but you do not open it yet.
Then, the host (who knows where the car is) choose one among the two remaining doors and open it, revealing a goat.
The time to choose has come and let's suppose you have chosen door number 1. Then, just before they open the door number 1, the Host - who already knows in which door the car is behind - opens door number 3, revealing a goat and leaving doors number 1 and 2 closed. The Host then asks you:
"Would you like to switch your choice to door number 2?"
This question seems weird, since the host knows which door is the winner, maybe he is trying to trick you into choosing poorly. What would you do? Would you change doors, or you would stick to door number 1?
Since you are becoming more familiar with Probability and Statistics, you can think even further. What would give you the highest probability to win? Switching doors or keeping your choice? Does it matter?
Well, you have Python in your hands, so, in this noteobok you will simulate this game and answer this question by yourself! At first, you can try the game in real time just below, and you may get some idea of what might goes on!
By running the next cell you can play the game for a while and try out different strategies. In the left panel you will get the actual game, these are the instructions to play:
To start a new game simply select one of the three available doors.
After you select an initial door the host will open one of the two remaining ones and it will always have a goat behind it.
Then you can decide if you would like to stay or switch doors.
If you pick the door opened by the host that game will not count.
If you click outside any of the three doors then the game will restart and not be counted.
After the prizes are shown (game has ended) click anywhere on the screen to restart the game.
If you want to restart the counters, run the cell again
The right panel keeps track of the number of games played and the success rate for both strategies. Try it for a while and see if you can find any patterns!
Before going forward make sure that you played the game enough times to formulate an hypothesis. Is is better to switch doors? Is it better to stay on your initial guess? Or it simply does not make a difference?
After playing for a while you might have come up with some hypothesis about the preferred strategy to beat this game. Now you will simulate the game for many iterations and see if the success rate varies from one strategy to the other.
In order to do this, the monty_hall function is provided. This function takes a single argument which is a boolean that controls if you decide to switch doors or not. Take a look at the code comments if you want to understand how the implementation works. Notice that the value of 0 is used to represent a goat, while 1 represents a car:
You can use the function above to simulate one run of the game. However this would not be very practical, it is better to use the function to try a bunch of different runs at once and save the results. This way you can know for sure if one strategy beats the other after consistently using it.
You can pass the above function to another function that lets you decide a strategy and perform simulations for 1, 10, 100 and 1000 runs. As you increase the number of runs you will see that the strategies converge to their true success rate:
Now you are familiar with the problem and you have gotten a strong evidence that the best strategy is to switch doors because it will make you win about 67% of the times!
You now will see it analytically! For this, first let's make some definitions.
Define the events:
E1β = the car is behind door 1
E2β = the car is behind door 2
E3β = the car is behind door 3
Or, in a more concise way: Eiβ = the car is behind door i for i=1,2,3.
Note that these events are mutually exclusive, in other words, you cannot have a car simuntaneously in two doors, because of the rules of the game. This means that,
Let's suppose you've chosen door number 1. Since there is an equal chance of the car being behind one of the three doors, we know that
P(E1β)=31β.
By the complement rule, we know that P(E1cβ)=1βP(E1β)=1β31β=32β
Since the universe is given by E1ββͺE2ββͺE3β (the car is behind door 1 OR door 2 OR door 3), then E1cβ=E2ββͺE3β, therefore P(E2ββͺE3β)=32β. You can have a visual idea in the image below.
Now that you chose door 1, the Host then opens door 3, revealing a goat and asks you if you want to switch doors. If you don't switch, the probability of winning remains 31β because this is your initial choice. If you do switch, then, you can notice that the Host gave you an additional information. They showed to you that door 3 does not have a car, which means that
In other words, the probability that the car is behind door 2, given that it is not behind door 3 is 32ββ0.67 as you have just seen in your simulations!
Again, the Eiβ's are independent from each other, because there is only 1 car available.
Note that, since the Host never opens the same door the player chose and also never opens the winning door, there is an upper bound for k, which is nβ2, so 0β€kβ€nβ2.
Two facts can be assumed:
The player chooses door 1
The host opens doors 2,β¦,k+1
This is because we can always rename the doors to get this result. For instance, if the player chooses door number 10, we can rename it as door 1 and door 1 will become door 10. This is just to avoid getting too complex on indices notations. In math terminology, it is usually said that we can do this without loss of generality, since it will not affect the final result.
Now that there are n doors, the probability that the car is behind door 1 is n1β, i.e.,
P(E1β)=n1β.
By the complement rule, the probability that the car is not behind door 1 is:
P(E1cβ)=1βP(E1β)=1βn1β=nnβ1β.
Note that E1cβ=E2ββͺE3ββͺβ¦βͺEnβ.
There is a notation to simplify the right hand side equation above, we can write it as:
βi=2nβEiβ.
This works in the same fashion as a summation symbol, but the opeartion being performed is set union.
So, we know that
P(βi=2nβEiβ)=nnβ1β.
Now we can answer the question: What is the probability of winning, given that we switch doors?
Let's take a look on the following image:
If the player switches to a random available door, then they must choose one of the k+2,k+3,β¦,nβ1,n. Therefore, the probability of picking the car is:
The probability of not picking the car in door 1(P(E1cβ)=nnβ1β) times the probability of picking the car now, which is nβkβ11β because this is the number of remaining doors.
And the equality only holds when k=0. This means that the host does not open any door.
Therefore, it is always better to switch doors. This may sound counterintuitive at first, but think that switching doors you are using the new piece of information that the host gave you, whereas if you choose not to switch, you will be ignoring this new information.