Welcome! This lab shows how you can use Numpy to simulate rolling dice from rolling a single die up to summing the results from multiple rolls. You will also see how to handle situations in which one of the sides of the dice is loaded (it has a greater probability of landing on that side comparing to the rest).
The first thing you will need is to define how many sides your dice will have. You can even go a step further and represent a dice by using a NumPy array and assigning to each side a label which will be equal to the number of that side:
With your dice ready it is time to roll it. For now you will assume that the dice is fair, which means the probability of landing on each side is the same (it follows a uniform distribution). To achieve this behaviour you can use the function np.random.choice, which given a NumPy array returns one of the entries in it randomnly:
This is great but if you wanted to roll the dice 20 times you will need to run the cell 20 times and record each result. Now you need a way to simulate several rolls at the same time. For this you can define the number of rolls you desire and use a list comprehension to roll the dice as many times as you like, you can also save every roll in a NumPy array:
Now you have a convenient way of keeping track of the result of each roll, nice!
What is you would like to know the mean and variance of this process. For this you can use NumPy's functions np.mean and np.var:
You can even check the distribution of the rolls by plotting a histogram of the NumPy array that holds the result of each throw. For this you will use the plotting library Seaborn, concretely the sns.histplot function:
You probably didn't get a distribution that looks uniform (since the results are random). This happened because you are only simulating 20 rolls so far. Now try doing the same but for 20000 rolls:
Does this plot and the metrics of mean and variance align with what you have learned about the uniform distribution during the course?
Simulations are a great way of contrasting results against analytical solutions. For example, in this case the theoretical mean and variance are 3.5 and 2.916 respectively (you can check the formulas to get this results here). The important thing to keep in mind is that the more simulations you perform the closer your results will be to the analytical values so always choose an appropriate number of simulations!
NumPy is quite fast so performing 20 thousand runs is done fairly quick.
Now you want to throw the dice twice and record the sum of the two rolls. For this you can do as before and save all results of the first roll in a NumPy array but this time you will have a second array that saves the results for the second rolls.
To get the sum you can simply sum the two arrays. This is possible because NumPy allows for vectorized operations such as this one. When you sum two NumPy arrays you will get a new array that includes the element-wise sum of the elements in the arrays you summed up.
Notice that now you can compute the the mean and variance for the first rolls, the second rolls and the sum of rolls. You can also compute the covariance between the first and second rolls:
The resulting plot looks pretty Gaussian, as you might expect. Notice that the covariance between the first and second rolls is very close to zero since these two processes are independant of one another.
Also notice that you can change the stat displayed in the histogram by changing the stat parameter of the sns.histplot function. In the previous exercises you were displaying the frequency but in this latter one you are plotting the probability, which makes more sense in this context. To check what other stats are available you can check the docs.
So far you have only simulated dice that are fair (all of the sides on them have the same probability of showing up), but what about simulating loaded dice (one or more of the sides have a greater probability of showing up)?
It is actually pretty simple. np.random.choice has support for these kind of scenarios by having a parameter p you can set. This parameter controls the probability of selecting each one of the entries in the array.
To see it in action, code a function that returns the probabilities of the dice landing on each side given that one of the sides must have twice as much probability as the rest of them:
Before using this function, check how the probabilities of a fair dice would look like:
Now get the probabilities by using the load_dice function. Try changing the loaded side!
Now, feed the probs_loaded_dice array into np.random.choice and see how this affect the metrics and plot:
Now the histogram is skewed towards some values since some sums are now more likely than others. Try changing the loaded side and see how the histogram changes!
Notice that covariance is still very close to zero since there is not any dependance between rolls of the dice.
To finish this lab you will now simulate the scenario in which the second roll depends on the result of the first one. Say that you are playing a variant of the game you have played so far and you only roll the dice a second time if the result of the first roll is greater or equal to 4.
Before doing the simulations reflect on what might happen in this scenario. Some behavior you will probably see:
1 is now a possible result since if you get a 1 in the first roll you don't roll again
1, 2 and 3 now have a greater chance of showing up
4 is now not a possible result since you need to roll again if you get a 4 in the first roll
To achieve this behaviour you can use the np.where function, which given a condition can be used to zero-out the elements that don't meet its criteria:
Looks like all of the predictions of this new scenario indeed happened. Notice that the covariance now is nowhere near zero since there is a dependency between the first and the second roll!