THE THEORY OF THE INTERNALLY STUDENTIZED RANGE DISTRIBUTION REVISITED

The present paper intends to revisit the distribution of the ratio of the range to the sample standard deviation, known as the distribution of the internally studentized range, in the normal case. This distribution has its importance recognized in several areas, as quality control and inference, for testing the lack of homogeneity of the data or kurtosis. An alternative distribution to the one presented by David et al. (1954), based on the distribution of the maximum, is proposed. We exhibit a detailed proof for the distribution of the internally studentized range in the normal case and sample size 3. We also provide a new result: the distribution for the uniform case with sample of size 3.


Introduction
Consider two independent random samples, both of size n, of a normal distribution.Let X (1) be the minimum, X (n) be the maximum, and W = X (n) − X (1) be the range of the first sample and S the standard deviation of the second sample.Then S and W are independent and the distribution of the quantity U = W/S is very well known, usually called externally studentized range distribution, and is very important in statistical inference.This situation occurs naturally in experimental statistics.Note that the assumed two samples is only a resource to guarantee the independence (RENCHER; SCHAALJE, 2008).The externally studentized range distribution is also intensively applied in multiple comparisons problems, as Tukey, Duncan and Student-Newman-Keuls's tests (HINKELMANN; KEMPTHORNE, 2007, p. 224).Case S and W are computed in the same sample, they are not independent anymore and have a complicated joint distribution.The exact distribution of internally studentized range U = W/S is not known (DAVID; HARTLEY; PEARSON, 1954;CURRIE, 1980) and will be the focus of this work.
David, Hartley e Pearson (1954) argue that approximations of this distribution has been efficient both in the exploration of homogeneity of data or departure from normality and in testing for kurtosis.They had studied it using two approximations.Firstly, they calculated the first four moments of U = W/S and approached the true distribution by the Pearson's curves.Secondly, they related the distribution of U = W/S to the Student t distribution and used the late to approach upper tail quantiles of the distribution of U = W/S.After the work of David, Hartley e Pearson (1954) very few was done to improve the knowledge of this distribution.Thomson (1955) exhibit, without proof, the distribution in the normal case and samples of size 3. Currie (1980), considering parent normal distribution and using geometric arguments, obtained the distribution for sample of size 4.
This work may be outlined as: in section 2 we review the theory of the internally studentized range distribution and exhibit detailed proofs for all the important and original theoretical mathematical statistics used in David, Hartley e Pearson (1954)'s article.We also provide a new approximation to the distribution of the internally studentized range for normal random samples in subsection 2.2 and a Monte Carlo approach in subsection 2.3.In section 3 our approximations are compared with that presented by David, Hartley e Pearson (1954).In section 4 we exhibit a detailed proof for the distribution of the internally studentized range, originally obtained by Thomson (1955), in the normal case with sample of size 3.We also provide a new result: the distribution, in the uniform case, for samples of size 3. Finally, in section 5, we show an application of the internally studentized range.We present a simple new normality test and the evaluation of its performance.

The theory of the internally studentized range distribution
Let X 1 , X 2 , . .., X n be a random sample of size n from a normal distribution with mean µ and variance σ 2 .Let X (1) , X (2) , . .., X (n) be the order statistics for this sample, where X (j) is the jth smallest order statistics.Thus X (1) is the minimum and X (n) is the maximum order statistics.The sample range is defined as the difference
Proposition 2.1.(X (n) − X)/S, (X (1) − X)/S and W/S = (X (n) − X (1) )/S are independent of S, where all those statistics are functions of a random sample X 1 , X 2 , . .., X n from a normal distribution with mean µ and variance σ 2 .
Proof.We present here a proof based on Basu's theorem which states that any ancillary statistic is independent of any minimal sufficient statistic.
The quantity (X i − X)/S, i = 1, 2, . .., n, is invariant for linear transformation αX + β, α > 0, and therefore it is an ancillary statistic.As (X (i) − X)/S depends only on (X i − X)/S, it is also ancillary.By Basu's theorem (CASELLA; BERGER, 2002, p. 287) it is independent of the minimal sufficient statistics X and S.An alternative proof is presented in Appendix A.
As the moments of W and S are known, this result allows an easy calculation of the moments of U = W/S: where (j, k) is any pair of elements in {1, 2, . .., n} with j = k.The random variable U is related to Student's t distribution.To see this we need a very trick identity in sum of squares.
Proof.An algebraic proof, that the authors could not find anywhere, is left to the appendix A. We present here a new geometric proof.We suppose, without loss of generality, j = 1 and k = 2.
In Figure 1 and in what follows we use the notations: . Now, if we prove the orthogonality of x − η and η − x we may write x − x 2 = x − η 2 + η − x 2 ; if we prove the orthogonality of η − ξ and ξ − x, we may also write η − x 2 = η − ξ 2 + ξ − x 2 .Putting all together we may, finally, write: To show the orthogonality of x − η and η − x we observe that and, so, the vectors x − η and η − x are orthogonal.In a similar way one shows that (η − ξ) (ξ − x) = 0 and, so, η − ξ and ξ − x are also orthogonal.Therefore and .
Observe that the expression ( * * ) simplifies as: then, the result follows: 2.1 An approximation for the upper tail of the internally studentized range distribution Taking (i, j) instead of (1, 2) and considering random variables, the last result may be written as We show below that the rank of left side quadratic form is n − 1 and the ranks of the right side quadratic forms are, respectively, 1, n − 3 and 1.Those quadratic forms are all invariant for the translation is equivalent to dividing each X i by σ 2 .So, the variables in the quadratic form may be taken as standard normal.
where T has a Student's t distribution with n − 2 degrees of freedom. Proof To derive the distribution of T we show that Q has a chi-square distribution with n − 2 degrees of freedom.For this we need the classical result: Fisher-Cochran Theorem (RAO, 2002, p. 185): Consider n independent standard normal variables X i ∼ N (0, 1).Let Q 1 , Q 2 , . .., Q k be quadratic forms with ranks n 1 , n 2 , . .., n k , respectively, such that where X = (X 1 , X 2 , . .., X n ) .Then, a necessary and sufficient condition that can be expressed as where are symmetric and idempotent matrices, and J is a n × n unit matrix and their ranks are their traces, that is, rank(P )= tr(P ) = n − 1, rank(P 1 )= tr(P 1 ) = 1, rank(P 2 )= tr(P 2 ) = n − 3 and rank(P 3 )= tr(P . ., X n and on X j and X k only through the sum X j + X k and, therefore, we have to prove only that X j −X k and X j +X k are independently distributed.Since both variables are normal, as a consequence of being linear combinations of normal variables, it suffices to show that their covariance is zero.That is So, Q and X j −X k are independently distributed and, therefore, T ∼ t n−2 . If now we consider a particular sample x = (x 1 , x 2 , . .., x n ) , there are n(n−1) different choices of j and k, each pair defining a value of U .These n(n − 1) values may be arranged in descending order and denoted by )/s is the internally studentized range of that particular sample.This construction defines the random variables . The random variable U may be written as a mixture of the variables U [i] , i = 1, 2, . .., n(n − 1).As j and k are arbitrary in the definition of U , we choose uniformly a number i in {1, 2, . .., n(n − 1)} and take the variable U [i] .This can be expressed formally as: where Z = (Z 1 , Z 2 , . .., Z n(n−1) ) is a multivariate random variable (ANDERSON, 2003) assuming values in the sample space Ω = {ω 1 , ω 2 , . .., ω n(n−1) }, uniformly, with ω i = (0, 0, . .., 1, . .., 0) , a vector n(n − 1) × 1 that has 1 in the i-th position and zeroes elsewhere.So, it is easy to see that the distribution of U may be expressed as the (Florescu and Tudor, 2014) mixture: We know f U (.) and we are interested in the distribution f U [1] (.).The idea, then, is, if there is no overlap between f U [1] (.) and f U [2] (.), the upper quantiles of U [1] can be approximated by the upper quantiles of U .For this we have the proposition, Proof.A detailed proof may be found in the appendix of the original article (David et al., 1954).For the sake of completeness, we outline it here.The key feature of the statistic U = (X j − X k )/S is its invariance for transformations of the form αX + β, that is, it does not depend on origin or scale.So, we may assume, without loss of generality, that, for any sample x = (x 1 , x 2 , . .., x n ) arranged in ascending order of magnitude, x 1 = 0, x n−1 = 1.So, the maximum of u [2] = (x n−1 − x 1 )/s = 1/s, can be found by minimizing s, restricted to samples with x 1 = 0, x n−1 = 1, in three steps.First consider all samples with a given x n .Then the possible values for x are in the interval [(1 + x n )/n; ∞), and x = (1 + x n )/n corresponds to the configuration Now, in the set of all samples with a given x, clearly the minimum of s is attained with the configuration Then we may consider only samples with the above configuration.Among them, it is clear that the minimum s corresponds to x n = 1 and, in this case, the minimum in given by the configuration It is clear that the minimum of the s is attained when So, as f U [2] (u) = 0 for u > 3(n − 1)/2, and is an strictly increasing function, is equal to the function g(.) applied to the quantile t p/[n(n−1)] of a Student t distribution with n − 2 degrees of freedom.For quantiles near the quantity 3(n − 1)/2, Rev. Bras. Biom., Lavras, v.36, n.4, p.802-826, 2018 -doi: 10.28951/rbb.v36i4.308 is still a good approximation, where t p/[n(n−1)];n−2 is the 100p/[n(n − 1)]% upper quantile of the Student's t distribution with n−2 degrees of freedom.This expression is valid only for computing upper quantiles.
The distribution of U has upper bound of 2(n − 1) and lower bound of 2 (n − 1)/n, case n is even, and 2 n/(n + 1), case n is odd.Note the overlap of U [1] and U [2] distributions in the interval above.Thomson (1955) justifies these limits only by the exhibition of the corresponding sampling configurations.As the authors could not find a proof for this anywhere, it is given here: The upper bound: Consider the sample x 1 ≤ x 2 ≤ . . .≤ x n , with x 1 = 0 and x n = 1.Then w = 1 and the upper bound for U = W/S depends only on S. In this case,  and, for i = 2, . . ., n − 1, Also, ∂ 2 s 2 /∂x i ∂x k = 0, i = k, then the Hessian matrix is a diagonal matrix with element h ii = 2/n.So, the determinant of the Hessian matrix is (2/n) n−2 > 0, indicating that the s 2 has a minimum value for x i = x.Therefore, the configuration that maximizes U = W/S is x i = x, i = 2, . .., n − 1.In this case, x = 1/2 and s = 1/[2(n − 1)] and, so, U = W/S ≤ 2(n − 1).
For another proof for the upper bound, consider (1): The lower bound: As s 2 has only one null derivative in the interior of the hypercube 0 ≤ x 2 ≤ . . .≤ x n−1 ≤ 1, other existing extremes will be at vertices, that is, Supposing k continuous and differentiating Case n is odd then n = 2α, where α is an integer, k . and Case n is even then

A new approximation to the upper tail based on the maximum of U
Our approach to the distribution of U = W/S is based on the distribution of the maximum of U and should be used only to obtain upper quantiles.Since we have n(n − 1) statistics, the distribution of the maximum is Following the proof of proposition 2.4, we consider that different U 's are identical and nearly independent, since their distributions have a small overlap.
If we use the relationship given by ( 1) we get where F T (t) is the cumulative distribution function of the Student's t with n − 2 degrees of freedom.Upper quantile of the distribution of U can be obtained by inverting (3) for a given probability 1 − α.Thus, considering p = (1 − α) 1/[n(n−1)] , we can obtain quantiles u 1−α by where t p;n−2 is the 100p% quantile from the Student's t distribution with n − 2 degrees of freedom.

Approximation based on Monte Carlo simulations
The above approximations are valid only when computing upper quantiles of the internally studentized range.Pearson's curves, used to approach the true distribution, must be obtained for each sample size and they are not available in David, Hartley e Pearson (1954).Also, only some combinations of lower percentile points and sample sizes are available, with other lower points obtained by interpolations.To overcome this limitations we propose here an approach based on Monte Carlo simulations.For this, a random sample X 1 , X 2 , . .., X n of size n from the standard normal distribution is generated by the Box-Müller algorithm.Then we construct a realization of U = W/S, denoted by u.This process is repeated to obtain a Monte Carlo sample of size N from the distribution of U .The sample size N should be chosen to achieve some previously fixed precision.The error bound of the Monte Carlo simulation is proportional to 1/ √ N .Let u 1 , u 2 , . .., u N be this sample.To compute the cumulative distribution function for a given value ξ, we compute where I(u i ≤ ξ) is the indicator function that returns 1, if u i ≤ ξ, and 0, otherwise.Given the cumulative distribution p, we arrange the elements in ascending order and pick up the N p-th element of the resulting vector.Since N p is not necessarily an integer we define j = N p , the largest integer less than or equal to N p.Considering, in this context, that u (1) , u (2) , . .., u (N ) represents the internally studentized range Monte Carlo sample of size N , arranged in ascending order, the quantile ξ p is computed by ξ p = u (j) , where j = N p . (6) The R (R Core Team, 2018) codes to generate random samples of the internally studentized range distribution and to compute cumulative distribution functions (5) and quantiles ( 6) are available in the appendix B. Illustrative examples are presented below.In the next section, we compare our Monte Carlo approach with the other two approaches with respect to chosen sample sizes and percentiles.Besides that, we show some lower tail quantiles from the distribution of the internally studentized range.

Comparison of approximations
In a first stage we obtain quantiles from both proposed approximations to compare their accuracy with that of the approach of David, Hartley e Pearson (1954).Quantiles from the approximations (2) and ( 4) are compared for different combinations of n and cumulative probability 1 − α.Values of n = 3(1)20, 20(5)50, 50(10)100, 500, 1, 000, 10, 0000 and 1 − α = 0.900, 0.950, 0.990 and 0.995 are considered.Results are shown in Table 1.There is a very good agreement between both approximations, even for small sample sizes.They should be used only for determining upper tails, since they are based on quantities that are not independently distributed and the lower quantiles are very inaccurate.
The upper and lower quantiles from the above approximations are compared to those from Monte Carlo simulations, considering some combinations of 1 − α and n (Table 2).Lower quantiles are computed for 1 − α = 0.005, 0.01, 0.05 and 0.10.The results for the upper quantiles should be compared to those showed in Table 1.There is a very accurate agreement among all approximations, specially for small sample sizes (n ≤ 20).So, for large n, the Monte Carlo simulations show larger differences for the upper quantiles among the three methods.It could be considered that the differences are due to the Monte Carlo errors.For example, with n = 100, using N = 100, 000 Monte Carlo simulations, the 95% upper quantile is 5.909 (Table 2).If we use N = 3, 000, 000 monte Carlo simulations, the 95% upper quantile is 5.903, what is neither close to the (DAVID; HARTLEY; PEARSON, 1954) approximation, 5.990, nor to our approximation, 5.983.
For very large samples, we expect that the range and the standard deviation are asymptotically independently distributed.Therefore, we can use the externally studentized range distribution to computes approximate quantiles.For n = 10, 000, using this approach, we find that the 95% upper quantile is 8.481.This value is

Exact distributions
The exact distribution for the studentized range is known only for sample sizes n = 2, 3, 4. For n = 2, with any parent distribution.
For n = 3, Thomson (1955), based in Lieblein (1952), presents, without the proof, the expression of the distribution for the normal case.As the authors could not find that proof anywhere, it is presented here, for the normal case.For the uniform distribution, a new result was obtained and was showed in the sequence of

Conclusions
The statistical mathematics of the theory of the internally studentized range is complex and plentiful of original ideas.For sample size 3, this problem is related to the distribution of the closest pair in a sample of three observations.The geometric approach has shown to be helpful in clarifying some aspects of the theory.
The approximation using the distribution of the maximum of the studentized sample pair differences were very accurate even for small sample sizes.It should be used only for determining upper tails.

A An algebraic proof for the decomposition of the sum of squares
We want to proof the identity Without loss of generality, we can consider (x j , x k ) = (x n−1 , x n ).Thus where Since, in this particular case, Rev. Bras. Biom., Lavras, v.36, n.4, p.802-826, 2018 -doi: 10.28951/rbb.v36i4.308 x i , resulting on Since (7)=(8), the proof is complete.

Figure 1 -
Figure 1 -Geometric construction of the partition of x − x.

Figure 1 (
Figure 1(A) just shows the vectors x, x and the diference x − x. Figure 1(B) shows the decomposition of x − x as the sum of the vectors x − η and η − x. Figure 1(C) decomposes the vector η − x as the sum of η − ξ and ξ − x, what allows us to writex − x = (x − η) + (η − ξ) + (ξ − x).Now, if we prove the orthogonality of x − η and η − x we may write x − x 2 = x − η 2 + η − x 2 ; if we prove the orthogonality of η − ξ and ξ − x, we may also write η − x 2 = η − ξ 2 + ξ − x 2 .Putting all together we may, finally, write: Figure 1(A) just shows the vectors x, x and the diference x − x. Figure 1(B) shows the decomposition of x − x as the sum of the vectors x − η and η − x. Figure 1(C) decomposes the vector η − x as the sum of η − ξ and ξ − x, what allows us to writex − x = (x − η) + (η − ξ) + (ξ − x).Now, if we prove the orthogonality of x − η and η − x we may write x − x 2 = x − η 2 + η − x 2 ; if we prove the orthogonality of η − ξ and ξ − x, we may also write η − x 2 = η − ξ 2 + ξ − x 2 .Putting all together we may, finally, write:

Table 1 -
Percentage points of the distribution of the ratio of range to the standard deviation U = W/S in sample size of n from a normal population using approximations of David, Hartley e Pearson (1954) and of the maximum of the studentized sample pair differences

Table 2 -
Percentage points of the distribution of the ratio of range to the standard deviation U = W/S in sample size of n from a normal population using Monte Carlo approximation with N = 100, 000 * N = 30, 000.

Table 3 -
Type I error rates computed in 5, 000 Monte Carlo simulations of the internally studentized range distribution test (ISRD) and the Shapiro-Wilk test (SW) for normality, considering several n and α

Table 4 -
Power computed in 5, 000 Monte Carlo simulations of the internally studentized range distribution test (ISRD) and the test of Shapiro-Wilk (SW) for normality, considering several alternative distributions, n and α