##
More Bayesian Probability
*June 25, 2007*

*Posted by Peter in Exam 1/P, Exam 4/C.*

add a comment

add a comment

We know 10% of all proteins are membrane proteins. There are three types of amino acid: hydrophobic (H); polar (P); and charged (C). In globular protein (non-membrane protein) the percentage of each type of amino acids is equal: 1/3 each. In membrane protein the percentages are: H 50%; P 25%; C 25%. Now we have a unidentified sequence: HHHPH. What is the probability that it is a membrane protein?

**Solution:** Let M be the event that the protein is a membrane protein, and G be the event that the protein is globular (non-membrane). Then

if we assume that all proteins are classified as belonging to either type M or type G. Now, we are also given that

that is, the probability that a selected amino acid is hydrophobic, polar, or charged, given that it belongs to a globular protein, is 1/3 each. We also have

Our prior hypothesis is that there is a 0.1 probability that the protein is a membrane protein. Now, the likelihood of observing the amino sequence HHHPH given that the protein is membrane, is

This assumes that amino acid types are independent of each other within a given protein. Similarly, the likelihood of observing the same sequence given that the protein is globular, is

The joint probabilities are then

and therefore the unconditional probability of observing the sequence HHHPH is, by the law of total probability,

Hence by Bayes’ theorem, the posterior probability of the protein being membrane, given that we observed the particular amino sequence HHHPH, is

which is approximately 29.67%. This answer makes sense, because in the absence of any information, we can only conclude there is a 10% probability of selecting a membrane protein. However, once we observed the sequence HHHPH, the posterior probability is significantly greater, since it is far more likely to observe such a sequence if the protein were membrane than if it were globular—indeed, the likelihood was 1/64 versus 1/243. However, because the overall distribution of proteins is such that 90% are globular, the posterior probability is not vastly greater—only 30%.

**Exercise:** Suppose you observed the sequence HHPCHHPHHHCH. What is the posterior probability of the protein being membrane? Why do we get a different result here? Why do we have to observe far longer sequences before we can have a high posterior probability that the sequence belongs to a membrane protein, compared to a similar degree of confidence that the sequence belongs to a globular protein?

##
The moment calculation shortcut
*June 20, 2007*

*Posted by Peter in Exam 1/P, Exam 3/MLC, Exam 4/C.*

add a comment

add a comment

Suppose X is a continuous random variable that takes on nonnegative values. Then we have the following definition:

where is the probability density function of X. Indeed, in the general case, let g be a function on the support of X. Then

So when for some positive integer k, we obtain the formula for the **k(th) raw moment** of X. Let’s work through an example.

Show that the expected value (first raw moment) of a Pareto distribution with parameters α and θ is equal to θ/(α-1). Recall that the density of the Pareto distribution is

**Solution:** We compute

Then make the substitution , , to obtain

For , the integrals converge, giving

which proves the desired result. However, the computation is quite tedious, and there is often an easier approach. We will now show that instead of using the density of X, we can use the survival of X when computing moments. Recall that

that is, the survival function is the probability that X exceeds x, which is the integral of the density on the interval , or the complement of the cumulative distribution function F(x). With this in mind, let’s try integration by parts on the definition of the expected value, with the choices , ; , :

where the last equality holds because of two assumptions: First, that , and g(0) is finite; and second, that the resulting integral of g'(x) S(x) is convergent. Note that the individual terms of the integration by parts are not themselves convergent, but taken together, they are—thus, a fully rigorous proof requires a more formal treatment than what is furnished here.

A consequence of this result is that for positive integers k,

This formula is easier to work with in some instances, compared to the original definition. For instance, we know that the Pareto survival function is

so we find

which we can immediately see results in a simpler integrand. This result also proves the life contingencies relationship

since the complete expectation of life is simply the expected value E[T(x)] of the future lifetime variable T(x), and is the survival function of T(x). In life contingencies notation, the definition of expected value would then look like this:

which is usually more cumbersome than the formula using only the survival function.

##
Conditional Probability and Bayes’ Theorem
*June 6, 2007*

*Posted by Peter in Exam 1/P.*

add a comment

add a comment

Here’s a question that seems to make the rounds in the Exam 1/P circles every so often, because it’s an excellent example of the use of Bayes’ Theorem to compute conditional probability.

A writes to B and does not receive an answer. Assuming that one letter in

nis lost in the mail, find the chance that B received the letter. It is to be assumed that B would have answered the letter if he had received it.

**Solution:** What makes this question tricky is (1) no numerical probabilities are supplied, and (2) it is not clear how to specify the appropriate events. A typical candidate shouldn’t have issues with (1), but resolving (2) takes some careful consideration. We want to find the probability that B received A’s letter, given that A did not receive a response from B. The lack of response could have happened in two ways: either A’s letter did not reach B, or it did but B’s response didn’t reach A. This naturally suggests that the event to condition on is the initial delivery of A’s letter to B. So let us denote the following events:

*D*= A’s letter is delivered to B*R*= A receives B’s response

along with their complementary events, *D’ *and *R’*. Then the desired probability is

where the leftmost equality follows from Bayes’ Theorem, and the rightmost equality follows from the law of total probability. It’s easy to see that the given information corresponds to

since this is simply the probability of a successful delivery. Similarly,

since the probability of a lost response is 1/*n* given that A’s letter was successfully delivered to B, and 1 if A’s letter to B was lost (and thus B never writes a response). It is now a straightforward exercise in algebra to substitute and simplify:

##
Order Statistics of Exponential RVs
*June 5, 2007*

*Posted by Peter in Exam 1/P, Exam 3/MLC, Exam 4/C.*

add a comment

add a comment

Here’s a question I read from the AoPS forum, and answered therein:

In analyzing the risk of a catastrophic event, an insurer uses the exponential distribution with mean as the distribution of the time until the event occurs. The insured had

nindependent catastrophe policies of this type. Find the expected time until the insured will have the first catastrophe claim.

The sum of *n* independent and identically distributed exponential random variables is gamma distributed. Specifically, if are exponential with mean , then *S* is gamma with mean and variance .

It’s noteworthy that the sum *S* is a random variable that describes the time until the *n*-th claim if claims followed a Poisson process (whose **interarrival** times are exponentially distributed).

However, according to the model you specified, the events are not interarrival times, but rather they run concurrently. So the time until the *k*-th event is NOT gamma; rather, it is the *k*-th order statistic . Fortunately, the first such order statistic is exponential, which we show by recalling that has PDF

If *k* = 1, we immediately obtain

which is exponential with mean . Note that this answer makes sense because as the number of policies *n* held by the insured increases, the expected waiting time until the first claim **decreases**. This would not be the case if one and only one policy at a time were in force at all times until the last policy claim–such a scenario would correspond to the gamma distribution previously mentioned.

To check your understanding, here are some exercises:

- What is the PDF of the second order statistic , and what does it represent?
- Which of the belong to the gamma or exponential family of distributions?
- Prove that

##
A Tangled Tale
*May 31, 2007*

*Posted by Peter in Exam 1/P.*

add a comment

add a comment

As is now common knowledge, Lewis Carroll was not merely a writer of children’s tales, but an amateur mathematician. He was fond of puzzles of a logical nature, and in his work *A Tangled Tale*, he posed a question that is particularly relevant to the concepts of probability theory tested on Exam 1/P:

“Sad — but very curious when you come to look at it arithmetically,” was her aunt’s less romantic reply. “Some of them have lost an arm in their country’s service, some a leg, some an ear, some an eye — ”

“And some, perhaps, all!” Clara murmured dreamily….

“Say that 70 per cent have lost an eye — 75 per cent an ear — 80 per cent an arm — 85 per cent a leg — that’ll do it beautifully. Now, my dear, what percentage, at least, must have lost all four?”

Being the writer that he was, Carroll posed the question in the setting of a conversation between a young girl, Clara, and her apparently unsympathetic aunt, who I wonder must have had Asperger’s syndrome, and perhaps would have made an excellent actuary. But the question is this: Given that 70% of veterans have lost an eye, 75% an ear, 80% an arm, and 85% a leg, what is the minimum percentage of veterans that have lost all four appendages? The solution, as I furnished to the individual who directed my attention to this question, is as follows:

If no less than 70% of the soldiers lost one eye, then no more than 30% of the soldiers did not lose one eye. Similarly, no more than 25% of the soldiers did not lose one ear; no more than 20% of the soldiers did not lose one hand; and no more than 15% of the soldiers did not lose one leg.

We see that the minimum possible percentage of soldiers who lost each of these parts is attained when the maximum percentages of each who did not lose at least one part are mutually disjoint. This is because if we maximize the number of soldiers who retained at least one body part, we minimize the number of soldiers who lost all such parts. To this end, the maximum number of soldiers retaining at least one body part occurs if no soldier has more than one surviving body part; that is, the 30%, 25%, 20%, and 15% of soldiers who retained an eye, ear, hand, and leg, respectively, are assumed to have lost all other parts.

Consequently, the total percentage of soldiers who have retained at least one body part is the maximum 30+25+20+15 = 90%. Therefore, we are guaranteed that at least 10% of the soldiers have lost all such body parts.

##
Question 21, Spring 2007 MLC
*May 29, 2007*

*Posted by Peter in Exam 3/MLC.*

add a comment

add a comment

This question was received with a bit of controversy because of its wording.

21.You are given the following information about a new model for buildings with limiting age ω.

- The expected number of buildings surviving at age x will be , x < ω.
- The new model predicts a 33.3% higher complete life expectancy (over the previous DeMoivre model with the same ω) for buildings aged 30.
- The complete life expectancy for buildings aged 60 under the new model is 20 years.
Calculate the complete life expectancy under the previous DeMoivre model for buildings aged 70.

The problem with the way this question reads lies in the phrase “previous DeMoivre model” mentioned in item 2, and at the end of the question. Usually, one does not relegate essential information to an offhandedly casual and parenthetical remark. A properly posed question should read “…previous model, **which is DeMoivre**….” This eliminates the ambiguity of whether the word “previous” belongs to “DeMoivre” or to “model,” the latter being the intended meaning. This issue is further exacerbated by the fact that the new model is a modified/generalized DeMoivre, and that both models share the same ω. Together, this leads to a confusingly written question, because the author did not take care to make it absolutely clear (preferably in a separately listed item) that the old model was DeMoivre (or equivalently, α = 1). That said, the solution is as follows:

**Solution:** We first compute the complete life expectancy of a building aged (x) under the new model, noting that the old model has α = 1:

Then item (2) gives the condition , from which we obtain

and hence α = 1/2. Item (3) then gives the condition so ω = 90. Therefore,

##
From the AIME
*May 29, 2007*

*Posted by Peter in Exam 1/P.*

add a comment

add a comment

Some time ago, I answered a question that was featured in the American Invitational Mathematics Examination (AIME), which is one step in the series of AMC exams that leads to the US team selection for the International Mathematics Olympiad (IMO). The AMC is open to high school students in the US, and every now and then, a question from the theory of probability crops up on the exam. This one would make a particularly challenging question for CAS/SOA Exam 1/P:

A jar has 10 red candies and 10 blue candies. Terry picks two candies at random, then Mary picks two of the remaining candies at random. Given that the probability that they get the same color combination, irrespective of order, is

m/n, wheremandnare relatively prime positive integers, findm+n.

**Solution:**

Terry has a 1/2 probability of choosing a red candy on his first draw. He then has a 9/19 probability of choosing another red candy on his second draw. Thus the probability of his having two red candies is 9/38. Similarly, the probability of his having two blue candies is 9/38. Therefore, the probability of his having one of each color is 1 – 2(9/38) = 10/19.

Now, given that Terry has two candies of the same color, the probability that Mary selects two more candies of that same color is (8/18)(7/17). Therefore, the probability that Terry and Mary have all chosen candies of the same color (all red or all blue) is

(9/19)(8/18)(7/17) = 28/323.

However, given that Terry has one candy of each color, the probability that Mary also selects one red and one blue candy is simply 9/17. This is because it is equivalent to the probability of her second candy color not being the same as her first. Another way to view it is to see that she can either choose red, then blue, or blue, then red. Each occurs with a probability of

(9/18)(9/17) = 9/(2·17),

so their combined probability is twice this, or 9/17. Hence the combined probability that Terry and Mary each choose candies of both colors is

(10/19)(9/17) = 90/323.

Therefore, the probability Terry and Mary choose candies of the same type is

(28+90)/323 = 118/323,

and since 118 and 323 are relatively prime, the answer is 441.

One can also work out the problem for the general case where one has *n* red candies and *n* blue candies. The desired probability is

Can you generalize the question to the case where there are *r* red and *b *blue candies?

##
Question 12, Spring 2007 Exam 4/C
*May 27, 2007*

*Posted by Peter in Exam 4/C.*

add a comment

add a comment

**12.** For 200 auto accident claims you are given:

- Claims are submitted
*t*months after the accident occurs,*t*= 0, 1, 2, …. - There are no censored observations.
- is calculated using the Kaplan-Meier product limit estimator.
- , where is calculated using Greenwood’s approximation.
- .

Determine the number of claims that were submitted to the company 10 months after an accident occurred.

**Solution.** There are two key observations we need to make. The first is that we are given the risk set at time *t* = 0, namely . The second observation is that because no observations are censored, the Kaplan-Meier estimator of the survival time takes on a particularly simple form. This is because in the absence of censoring, ; that is, the risk set at time is simply the risk set at time minus those who died in the meantime. Therefore

So with this in mind, we have . Recalling Greenwood’s approximation,

so

Substituting and solving, we obtain .

##
Interpolation
*May 27, 2007*

*Posted by Peter in Exam 3/MLC.*

add a comment

add a comment

We use interpolation whenever we want to construct a continuous model from discrete data. Interpolation methods range from the rudimentary (linear interpolation) to the sophisticated (polynomial splines). Since splines are no longer on the 4/C syllabus, we’ll instead talk about forms of interpolation based on the relation

Here, s(x) represents the function to be interpolated, is an interpolation assumption, and *t* is a parameter on [0,1]. The simplest instance of the above is when is the identity function:

This is called linear interpolation, and it is the basis of the uniform distribution of deaths (UDD) assumption in life contingencies, where s is the survival distribution and x is age. But we also use this relationship (albeit slightly modified) when interpolating values in, say, a normal distribution table:

This assumes that adjacent entries in the table are listed in increments of 0.01. For instance, suppose we want to find Φ(1.263). 1.263 is between 1.26 and 1.27, so we have

and looking up the values in the table, we get Φ(1.263) = (0.7)(0.8962) + (0.3)(0.8980) = 0.89674.

In life contingencies, we are also sometimes interested in the constant force of mortality interpolation assumption; that is to say, deaths are not uniformly distributed at fractional ages, but rather, survival is exponentially distributed between integer ages. In this case, and the interpolation relation becomes

or equivalently,

To see that this indeed results in a constant force of mortality between integer ages, we differentiate the above with respect to t to obtain

since s(x) ≥ s(x+1). Finally, there is the Balducci, or hyperbolic, interpolation assumption, where we set :

This model is so called because the survival function at fractional ages forms an arc of a hyperbola. In all cases, we can use the resulting relation on the survival function to derive the other life table functions , etc. But as we have seen, these three interpolation assumptions are not the only ones we can use, even in the very simple case of two-point interpolation.

##
Random Variables
*May 26, 2007*

*Posted by Peter in Exam 1/P.*

add a comment

add a comment

Suppose you have a standard, fair, 6-sided die. We are interested in the possible outcomes of rolling this die. Naturally, the outcomes of multiple rolls of the die are not predetermined or fixed, but rather are the result of a *random* process. And yet, “random” doesn’t mean that we don’t have any information about what the possible outcomes might be. We could ask any of the following questions of any given roll of the die:

- What is the numerical outcome?
- What is the square of the numerical outcome?
- How many other possible outcomes are less than the value rolled?
- What is the sum of the top and opposite faces?

Each of these questions corresponds to a distinct **random variable** on the probability space of the die roll. Loosely speaking, a random variable (or RV) is simply a function of the outcome of a random process.

Question 1. Suppose we roll a 5. What are the values of the random variables described in the above items 1-4?

Now suppose we have two fair 6-sided dice, and let *X* be the RV that denotes the sum of the rolled values. What is the probability that *X* > 8? Well, we have the possible outcomes {(3,6), (4,5), (4,6), (5,4), (5,5), (5,6), (6,3), (6,4), (6,5), (6,6)}, so there are 10 outcomes where *X* > 8. But there are 6(6) = 36 total outcomes, so the desired probability Pr[*X* > 8] = 10/36 = 5/18. The idea behind this example is that we can construct a RV and compute an associated probability, because the RV has an associated *probability distribution* of possible values. For instance, we can compute

Pr[*X* = 2] = 1/36; Pr[*X* = 3] = 2/36; Pr[*X* = 4] = 3/36; ….

and in doing so, we have completely specified the probability distribution of *X*.

Question 2. Complete the above list by computing Pr[*X* = *n*] for any real number *n*.