jump to navigation

The moment calculation shortcut June 20, 2007

Posted by Peter in Exam 1/P, Exam 3/MLC, Exam 4/C.
add a comment

Suppose X is a continuous random variable that takes on nonnegative values. Then we have the following definition:

\displaystyle {\rm E}[X] = \int_0^\infty x f_X(x) \, dx,

where f_X(x) is the probability density function of X. Indeed, in the general case, let g be a function on the support of X. Then

\displaystyle {\rm E}[g(X)] = \int_0^\infty g(x) f_X(x) \, dx.

So when g(x) = x^k for some positive integer k, we obtain the formula for the k(th) raw moment of X. Let’s work through an example.

Show that the expected value (first raw moment) of a Pareto distribution with parameters α and θ is equal to θ/(α-1). Recall that the density of the Pareto distribution is \displaystyle f(x) = \frac{\alpha \theta^\alpha}{(x+\theta)^{\alpha+1}}.

Solution: We compute

\displaystyle {\rm E}[X] = \int_0^\infty \frac{x \alpha\theta^\alpha}{(x+\theta)^{\alpha+1}} \, dx = \alpha\theta^\alpha \!\! \int_0^\infty \frac{x}{(x+\theta)^{\alpha+1}} \, dx.

Then make the substitution u = x+\theta , du = dx , to obtain

\displaystyle {\rm E}[X] = \alpha\theta^\alpha \!\!\int_{u=\theta}^\infty \!\frac{u-\theta}{u^{\alpha+1}} \, du = \alpha\theta^\alpha \left( \int_\theta^\infty \!u^{-\alpha} \, du - \theta \!\int_\theta^\infty  \!u^{-\alpha-1} \, du \right) .

For \alpha > 1 , the integrals converge, giving

\displaystyle {\rm E}[X] = \alpha\theta^\alpha\! \left(\!-\frac{\theta^{-\alpha+1}}{-\alpha+1} + \theta \cdot \frac{\theta^{-\alpha}}{-\alpha}\right) = \alpha\theta \left(\frac{1}{\alpha-1} - \frac{1}{\alpha}\right) = \frac{\theta}{\alpha-1},

which proves the desired result. However, the computation is quite tedious, and there is often an easier approach. We will now show that instead of using the density of X, we can use the survival of X when computing moments. Recall that

\displaystyle S_X(x) = \Pr[X > x] = \int_x^\infty f_X(x) \, dx = 1 - F_X(x);

that is, the survival function is the probability that X exceeds x, which is the integral of the density on the interval [x, \infty) , or the complement of the cumulative distribution function F(x). With this in mind, let’s try integration by parts on the definition of the expected value, with the choices u = g(x) , du =  g'(x) \, dx ; dv = f_X(x) \, dx , v = \int f(x) \, dx = F_X(x) :

{\setlength\arraycolsep{2pt} \begin{array}{rcl} {\rm E}[g(X)] &=& \displaystyle \int_0^\infty \!\! g(x) f_X(x) \, dx = \bigg[g(x) F_X(x)\bigg]_{x=0}^\infty \! - \!\!\int_0^\infty \!\! g'(x) F_X(x) \, dx \\ &=& \displaystyle \bigg[g(x)\left(1-S(x)\right)\bigg]_{x=0}^\infty - \int_0^\infty \! g'(x)\left(1 - S(x)\right) \, dx \\ &=& \displaystyle\int_0^\infty g'(x) S(x) \, dx, \end{array}}

where the last equality holds because of two assumptions: First, that \displaystyle \lim_{x \rightarrow \infty} g(x) S(x) = 0 , and g(0) is finite; and second, that the resulting integral of g'(x) S(x) is convergent. Note that the individual terms of the integration by parts are not themselves convergent, but taken together, they are—thus, a fully rigorous proof requires a more formal treatment than what is furnished here.

A consequence of this result is that for positive integers k,

\displaystyle {\rm E}[X^k] = \int_0^\infty kx^{k-1} S(x) \, dx.

This formula is easier to work with in some instances, compared to the original definition. For instance, we know that the Pareto survival function is

\displaystyle S(x) = \left(\frac{\theta}{x+\theta}\right)^\alpha,

so we find

\displaystyle {\rm E}[X] = \int_0^\infty S(x) \, dx =  \int_0^\infty \left(\frac{\theta}{x+\theta}\right)^\alpha \, dx

which we can immediately see results in a simpler integrand. This result also proves the life contingencies relationship

\displaystyle \overset{\circ}{e}_x = \int_{t=0}^{\omega-x} \,_t p_x \, dt,

since the complete expectation of life is simply the expected value E[T(x)] of the future lifetime variable T(x), and \,_t p_x is the survival function of T(x). In life contingencies notation, the definition of expected value would then look like this:

\displaystyle \overset{\circ}{e}_x = \int_{t=0}^{\omega-x} x_{\,t} p_x \mu_x(t) \, dt,

which is usually more cumbersome than the formula using only the survival function.


Order Statistics of Exponential RVs June 5, 2007

Posted by Peter in Exam 1/P, Exam 3/MLC, Exam 4/C.
add a comment

Here’s a question I read from the AoPS forum, and answered therein:

In analyzing the risk of a catastrophic event, an insurer uses the exponential distribution with mean \theta as the distribution of the time until the event occurs. The insured had n independent catastrophe policies of this type. Find the expected time until the insured will have the first catastrophe claim.

The sum S = X_{1}+X_{2}+\cdots+X_{n} of n independent and identically distributed exponential random variables X_{1}, X_{2}, \ldots, X_{n} is gamma distributed. Specifically, if X_{k} are exponential with mean \theta , then S is gamma with mean n\theta and variance n\theta^{2} .

It’s noteworthy that the sum S is a random variable that describes the time until the n-th claim if claims followed a Poisson process (whose interarrival times are exponentially distributed).

However, according to the model you specified, the events are not interarrival times, but rather they run concurrently. So the time until the k-th event is NOT gamma; rather, it is the k-th order statistic Y_{k} . Fortunately, the first such order statistic Y_{1} is exponential, which we show by recalling that Y_{k} has PDF

\displaystyle f_{Y_{k}}(x) = \frac{n!}{(k-1)!(n-k)!}F_{X}(x)^{k-1}(1-F_{X}(x))^{n-k}f_{X}(x).

If k = 1, we immediately obtain

\displaystyle f_{Y_{1}}(x) = n(e^{-x/\theta})^{n-1}\frac{1}{\theta}e^{-x/\theta} = \frac{n}{\theta}e^{-x(n/\theta)},

which is exponential with mean \theta/n . Note that this answer makes sense because as the number of policies n held by the insured increases, the expected waiting time until the first claim decreases. This would not be the case if one and only one policy at a time were in force at all times until the last policy claim–such a scenario would correspond to the gamma distribution previously mentioned.

To check your understanding, here are some exercises:

  1. What is the PDF of the second order statistic Y_{2} , and what does it represent?
  2. Which of the Y_{k} belong to the gamma or exponential family of distributions?
  3. Prove that \displaystyle {\rm E}[Y_{k}] = \theta \sum_{j=1}^{k}\frac{1}{n-k+j}.

Question 21, Spring 2007 MLC May 29, 2007

Posted by Peter in Exam 3/MLC.
add a comment

This question was received with a bit of controversy because of its wording.

21. You are given the following information about a new model for buildings with limiting age ω.

  1. The expected number of buildings surviving at age x will be l_x = (\omega - x)^\alpha , x < ω.
  2. The new model predicts a 33.3% higher complete life expectancy (over the previous DeMoivre model with the same ω) for buildings aged 30.
  3. The complete life expectancy for buildings aged 60 under the new model is 20 years.

Calculate the complete life expectancy under the previous DeMoivre model for buildings aged 70.

The problem with the way this question reads lies in the phrase “previous DeMoivre model” mentioned in item 2, and at the end of the question. Usually, one does not relegate essential information to an offhandedly casual and parenthetical remark. A properly posed question should read “…previous model, which is DeMoivre….” This eliminates the ambiguity of whether the word “previous” belongs to “DeMoivre” or to “model,” the latter being the intended meaning. This issue is further exacerbated by the fact that the new model is a modified/generalized DeMoivre, and that both models share the same ω. Together, this leads to a confusingly written question, because the author did not take care to make it absolutely clear (preferably in a separately listed item) that the old model was DeMoivre (or equivalently, α = 1). That said, the solution is as follows:

Solution: We first compute the complete life expectancy of a building aged (x) under the new model, noting that the old model has α = 1:

{\setlength\arraycolsep{2pt} \begin{array}{rcl}\displaystyle\overset{\circ}{e}_x(\alpha) &=& \displaystyle\int_0^{\omega - x} \!\!_t p_x \, dt = \int_0^{\omega - x} \frac{l_{x+t}}{l_x} \, dt = \int_0^{\omega - x} \!\left(1 - \frac{t}{\omega - x}\right)^{\!\alpha} dt \\ &=& \displaystyle\left[\frac{\omega-x}{\alpha+1} \left(1 - \frac{t}{\omega - x}\right)^{\alpha+1}\right]_{t=0}^{\omega-x} = \frac{\omega-x}{\alpha+1}. \end{array}}

Then item (2) gives the condition \overset{\circ}{e}_{30}(\alpha) = \frac{4}{3} \overset{\circ}{e}_{30}(1) , from which we obtain

\displaystyle \frac{\omega-30}{\alpha+1} = \frac{4}{3} \cdot \frac{\omega - 30}{2},

and hence α = 1/2. Item (3) then gives the condition \displaystyle \overset{\circ}{e}_{60}(1/2) = \frac{\omega-60}{\frac{1}{2}+1} = 20, so ω = 90. Therefore,

\overset{\circ}{e}_{70}(1) = \frac{90-70}{2} = 10.

Interpolation May 27, 2007

Posted by Peter in Exam 3/MLC.
add a comment

We use interpolation whenever we want to construct a continuous model from discrete data. Interpolation methods range from the rudimentary (linear interpolation) to the sophisticated (polynomial splines). Since splines are no longer on the 4/C syllabus, we’ll instead talk about forms of interpolation based on the relation

\varphi(s(x+t)) = (1-t)\varphi(s(x)) + t\varphi(s(x+1)), \quad t \in [0,1].

Here, s(x) represents the function to be interpolated, \varphi is an interpolation assumption, and t is a parameter on [0,1]. The simplest instance of the above is when \varphi is the identity function:

s(x+t) = (1-t)s(x) + t s(x+1).

This is called linear interpolation, and it is the basis of the uniform distribution of deaths (UDD) assumption in life contingencies, where s is the survival distribution and x is age. But we also use this relationship (albeit slightly modified) when interpolating values in, say, a normal distribution table:

\Phi\left(x+\frac{t}{100}\right) = (1-t)\Phi(x) + t\Phi\left(x+\frac{1}{100}\right).

This assumes that adjacent entries in the table are listed in increments of 0.01. For instance, suppose we want to find Φ(1.263). 1.263 is between 1.26 and 1.27, so we have

\Phi(1.263) \approx (1-0.3)\Phi(1.26) + 0.3 \Phi(1.27),

and looking up the values in the table, we get Φ(1.263) = (0.7)(0.8962) + (0.3)(0.8980) = 0.89674.

In life contingencies, we are also sometimes interested in the constant force of mortality interpolation assumption; that is to say, deaths are not uniformly distributed at fractional ages, but rather, survival is exponentially distributed between integer ages. In this case, \varphi(s) = \log s and the interpolation relation becomes

\log s(x+t) = (1-t) \log s(x) + t \log s(x+1)

or equivalently,

s(x+t) = s(x)^{1-t} s(x+1)^t.

To see that this indeed results in a constant force of mortality between integer ages, we differentiate the above with respect to t to obtain

\displaystyle \mu_x(t) = -\frac{d}{dt}\left[\log s(x+t)\right] = \log s(x) - \log s(x+1) = \log \frac{s(x)}{s(x+1)} \ge 0,

since s(x) ≥ s(x+1). Finally, there is the Balducci, or hyperbolic, interpolation assumption, where we set \varphi(s) = 1/s :

\displaystyle \frac{1}{s(x+t)} = \frac{1-t}{s(x)} + \frac{t}{s(x+1)}.

This model is so called because the survival function at fractional ages forms an arc of a hyperbola. In all cases, we can use the resulting relation on the survival function to derive the other life table functions l_{x+t}, \,_t p_x, \,_t q_x, \mu_x(t) , etc. But as we have seen, these three interpolation assumptions are not the only ones we can use, even in the very simple case of two-point interpolation.