Report 9

Maximilian Fernaldy - C2TB1702

Part 1 - Poisson Distribution

Introduction to the Poisson Distribution

The Poisson distribution expresses the probability of a given number of events happening in a certain amount of time, if they occur with a known, constant mean rate and the events are independent of each other (meaning, an event happening does not change the probability of more events happening after it).

The Poisson distribution can be expressed as an equation of probability of nn events happening in a fixed amount of time:

P(Xt=k)=eλt(λt)kk!P(X_t = k) = \frac{e^{-\lambda t} (\lambda t)^k}{k!}

With P(Xt=k)P(X_t = k) as probability of kk events happening in time tt and λ\lambda as average number of events that has happened in time tt.

Something to note is that while λ\lambda can be any positive real number, kk is only defined for positive integers. Let's do an example problem to understand why this is.

An example

Suppose a restaurant is visited by 13.513.5 customers per day on average. The restaurant is underground, so people can't see inside and consequently, how many customers are inside the store doesn't affect the probability of new customers visiting it.

We can then denote λ=13.5\lambda = 13.5 as the average amount of customers per day and t0=1d=24ht_0 = 1d = 24h as the reference time interval.

An investor is interested in purchasing part ownership of the restaurant for a single trial week, but he wants to know if the restaurant will turn a profit. To get any return on investment (ROI), the investor needs 9090 or more customers in that week. He has a statistician calculate the probability of this happening.

Now we can get a sense of why XtX_t and kk has to be integers. It doesn't make sense to try and calculate the probability of, say, 66.66ˉ66.6\bar6 customers visiting the restaurant in a week. Events happen in integers: either they happen or they don't, 0 or 1.

The statistician uses the Poisson distribution to do the task assigned to them. They note that the investor wanted to know the probability of 9090 or more customers visiting the restaurant (k90)(k \geq 90) in tq=7t_q = 7 days, which means an easy way to calculate this probability is to instead calculate the inverse probability, then subtract it from unity:

P(Xt90)=1P(0Xt<90)P(X_t \geq 90) = 1 - P(0 \leq X_t < 90)

However, the statistician quickly realizes that plugging in these values one by one like this:

P(0Xt<90)=P(Xt=0)+P(Xt=1)+...+P(Xt=89)P(0 \leq X_t < 90) = P(X_t = 0) + P(X_t = 1) + ... + P(X_t = 89)

will take too long and it's too much work for how much they're getting paid. Instead, the statistician starts up Octave and writes this code to compute the probability:

lambda = 13.5 % Average events in reference time interval
t0 = 1 % Reference time interval in days
tq = 7 % Asked-for time interval
t = tq/t0 % Time interval as expressed in multiple of t0

for i = 0:89 % Iterate from k = 0 to k = 89
  % Use Poisson distribution formula
  p(i+1) = (e^(-lambda * t) * (lambda * t)^i) / factorial(i);
endfor

% Subtract sum of probabilities from unity to get answer
1 - sum(p)

Figure 1 - Output of restaurantpoisson.m

Apparently, the probability of 90 or more customers visiting the restaurant is 69.2%69.2\%. The investor is happy with this percentage, and goes through with his decision.

Solving Exercise 9.1

Parameters

My student number is C2TB1702.

  1. Earthquakes occur 0.7×(1+1)=1.40.7 \times (1+1) = 1.4 times in 7+1=87+1 = 8 days on average.
  2. There were 10+0+2=1210 + 0 + 2 = 12 earthquakes in the last four weeks.

Applying Poisson Distribution

From parameter (1), λ=1.4\lambda = 1.4 and the reference time period is t0=8t_0 = 8 days. From parameter (2), k=12k = 12 and tq=28t_q = 28 days.

We want to know P(Xt12)P(X_t \geq 12), which we can calculate by subtracting its inverse from unity.

P(Xt12)=1P(0Xt11)P(X_t \geq 12) = 1 - P(0 \leq X_t \leq 11)

To calculate the inverse probability:

P(0Xt11)=i=011P(Xt=i)=i=011eλt(λt)ii!P(0 \leq X_t \leq 11) = \displaystyle\sum_{i = 0}^{11} P(X_t = i) = \displaystyle\sum_{i = 0}^{11}\frac{e^{-\lambda t} (\lambda t)^i}{i!}

Because λ\lambda is defined for the reference time interval t0t_0, to continue to use λ\lambda, we want to linearly adjust tt so that it is a ratio of tqt_q and t0t_0. We can do this by dividing tqt_q by t0t_0, which will leave use with the appropriate number for λt\lambda t. In octave code:

lambda = 1.4; % Average number of earthquakes in 8 days
t0 = 8; % Reference time period
tq = 28;
t = tq/t0; % Expressed as multiple of t0
k = 12; % Limiter for events

for i = 0:k - 1 % Iterate from 0 to 11
  prob(i+1) = (lambda * t)^i * e^(-lambda * t) / factorial(i);
endfor

Then simply subtracting it from unity, we get the probability for 12 or more earthquakes happening (P(Xt12))(P(X_t \geq 12)):

probKAndMore = 1 - sum(prob)

which outputs

Figure 2 - Output of earthquakes.m

This means the probability of 12 earthquakes or more happening is very small under normal circumstances - smaller than 0.4%0.4\%. This suggests that something else might be at play, something out of the ordinary that caused the sudden jump in earthquake frequency.

Part 2 - Three Dice

Parameters

The sum of the last 4 digits of my student number is 1+7+0+2=101 + 7 + 0 + 2 = 10.

Applying basic probability theory

When rolling three dice, the number of possible sums is actually only 1616. The lowest possible sum is 1+1+1=31 + 1 + 1 = 3 and the highest possible sum is 6+6+6=186 + 6 + 6 = 18. Therefore there are only 1616 possible different sums. However, they have different odds of happening. For example, to get a sum of 33, there is only one possible combination of the three dice, that being all of them showing 11. On the other hand, there are many different combinations that result in a sum of 77. This means the probability of getting one sum is not equal to the others.

This is why when calculating the probability of getting a certain sum, we can't just say it's 116\frac{1}{16}. We need to consider all the possible events and take note of the combinations that result in that sum.

Applying this to our problem, there are 63=2166^3 = 216 possible combinations of the three dice, and the possible events that result in a sum of 10 are:

  1. 1+3+61 + 3 + 6 with unique numbers, so 3!=63! = 6 events
  2. 1+4+51 + 4 + 5 with unique numbers, so 3!=63! = 6 events
  3. 2+2+62 + 2 + 6 with a number appearing twice, so 3!/2!=33!/2! = 3 events
  4. 2+3+52 + 3 + 5 with unique numbers, so 3!=63! = 6 events
  5. 2+4+42 + 4 + 4 with a number appearing twice, so 3!/2!=33!/2! = 3 events
  6. 3+3+43 + 3 + 4 with a number appearing twice, so 3!/2!=33!/2! = 3 events

Adding them all up, we have 6×3+3×3=276 \times 3 + 3 \times 3 = 27 events. The probability of getting a sum of 10 is:

27216=18=12.5%\frac{27}{216} = \frac{1}{8} = 12.5\%

Using Octave to get the probability

Counting the different probable events one by one is impractical and prone to mistakes. Instead, we can use the help of computers to do the counting for us.

It's quite apparent that a quick and easy way to do this is utilizing nested for loops to repeatedly check the sum of the three dice and add the combination to an array if it equals 10.

wantedSum = 10;

combinations = []; % Initialize empty array
for i = 1:6 % Iterate through all possible combinations
  for j = 1:6
    for k = 1:6
      if i + j + k == wantedSum % If the sum matches
        newComb = sort([i,j,k]')'; % sort the combination
        % put the combination inside the array
        combinations = [combinations; newComb];
      endif
    endfor
  endfor
endfor

Before adding into the array, we sort the combination so we can take out the duplicates later. To compute the probability, we divide the total number of occurrences length(combinations) by the total number of possible events, 63=2166^3 = 216.

% Probability of getting sum of 10
probTen = length(combinations) / 6^3;
probText = ["The probability of getting " num2str(wantedSum) " as a sum is " num2str(probTen*100) "%%.\n"]; % Show probability as percentage
printf(probText)

To display the combinations that result in a sum of 10, we can use the unique() function. This function searches for duplicates, deletes them and sorts the remaining entries in ascending order. The documentation for unique() can be found here.

printf("The valid combinations are: \n")
disp(unique(combinations, "rows")); % Display unique combinations

As we don't need to show differently ordered, but otherwise identical combinations, this function is perfect for our purpose. The output of this script is:

Figure 3 - Output of threedice.m

Comparing this to our earlier manual computation, we get the same result.

Part 1 - Poisson Distribution

Introduction to the Poisson Distribution

Solving Exercise 9.1