This class provides methods to compute several types of EDF goodness-of-fit test statistics and to apply certain transformations to a set of observations. More...
class OutcomeCategoriesChi2 This class helps managing the partitions of possible outcomes into categories for applying chi-square tests. More...sortedData
are already sorted in increasing order and computes the differences between the successive observations. More...
count[smin...smax]
, for which the corresponding expected values \(e_i\) are in nbExp[smin...smax]
. More...
count
, for which the corresponding expected values \(e_i\) are in cat
. More...
data
, assuming that these observations follow the discrete distribution dist
. More...
nbExp
. More...
chi2Equal (data, 10)
. More...
sortedData
. More...
This class provides methods to compute several types of EDF goodness-of-fit test statistics and to apply certain transformations to a set of observations.
This includes the probability integral transformation \(U_i = F(X_i)\), as well as the power ratio and iterated spacings transformations [226] . Here, \(U_{(0)}, …, U_{(n-1)}\) stand for \(n\) observations \(U_0,…,U_{n-1}\) sorted by increasing order, where \(0\le U_i\le1\).
Note: This class uses the Colt library.
◆ andersonDarling() [1/3] static double andersonDarling ( DoubleArrayList sortedData ) staticComputes and returns the Anderson-Darling statistic \(A_n^2\) (see method #andersonDarling(double[]) ).
Computes and returns the Anderson-Darling statistic \(A_n^2\) (see [165], [225], [6] ), defined by.
\begin{align*} A_n^2 & = -n -\frac{1}{ n} \sum_{j=0}^{n-1} \left\{ (2j+1)\ln(U_{(j)}) + (2n-1-2j) \ln(1-U_{(j)}) \right\}, \tag{Andar} \end{align*}
assuming that sortedData
contains \(U_{(0)},…,U_{(n-1)}\) sorted in increasing order. When computing \(A_n^2\), all observations \(U_i\) are projected on the interval \([\epsilon, 1-\epsilon]\) for some \(\epsilon> 0\), in order to avoid numerical overflow when taking the logarithm of \(U_i\) or \(1-U_i\). The variable EPSILONAD
gives the value of \(\epsilon\).
Computes the Anderson-Darling statistic \(A_n^2\) and the corresponding \(p\)-value \(p\).
The \(n\) (unsorted) observations in data
are assumed to be independent and to come from the continuous distribution dist
. Returns the 2-elements array [ \(A_n^2\), \(p\)].
Computes and returns the chi-square statistic for the observations \(o_i\) in count[smin...smax]
, for which the corresponding expected values \(e_i\) are in nbExp[smin...smax]
.
Assuming that \(i\) goes from 1 to \(k\), where \(k =\) smax-smin+1
is the number of categories, the chi-square statistic is defined as
\[ X^2 = \sum_{i=1}^k \frac{(o_i - e_i)^2}{e_i}. \tag{chi-square} \]
Under the hypothesis that the \(e_i\) are the correct expectations and if these \(e_i\) are large enough, \(X^2\) follows approximately the chi-square distribution with \(k-1\) degrees of freedom. If some of the \(e_i\) are too small, one can use OutcomeCategoriesChi2
to regroup categories.
count
and nbExp
smax index of the last valid data in count
and nbExp
Computes and returns the chi-square statistic for the observations \(o_i\) in count
, for which the corresponding expected values \(e_i\) are in cat
.
This assumes that cat.regroupCategories
has been called before to regroup categories in order to make sure that the expected numbers in each category are large enough for the chi-square test.
Computes and returns the chi-square statistic for the observations stored in data
, assuming that these observations follow the discrete distribution dist
.
For dist
, we assume that there is one set \(S=\{a, a+1,…, b-1, b\}\), where \(a<b\) and \(a\ge0\), for which \(p(s)>0\) if \(s\in S\) and \(p(s)=0\) otherwise.
Generally, it is not possible to divide the integers in intervals satisfying \(nP(a_0\le s< a_1)=nP(a_1\le s< a_2)=\cdots=nP(a_{j-1}\le s< a_j)\) for a discrete distribution, where \(n\) is the sample size, i.e., the number of observations stored into data
. To perform a general chi-square test, the method starts from smin
and finds the first non-negligible probability \(p(s)\ge\epsilon\), where \(\epsilon=\) DiscreteDistributionInt.EPSILON. It uses smax
to allocate an array storing the number of expected observations ( \(np(s)\)) for each \(s\ge\) smin
. Starting from \(s=\) smin
, the \(np(s)\) terms are computed and the allocated array grows if required until a negligible probability term is found. This gives the number of expected elements for each category, where an outcome category corresponds here to an interval in which sample observations could lie. The categories are regrouped to have at least minExp
observations per category. The method then counts the number of samples in each categories and calls #chi2(double[],int[],int,int) to get the chi-square test statistic. If numCat
is not null
, the number of categories after regrouping is returned in numCat[0]
. The number of degrees of freedom is equal to numCat[0]-1
. We usually choose minExp
= 10.
Similar to #chi2(double[],int[],int,int), except that the expected number of observations per category is assumed to be the same for all categories, and equal to nbExp
.
count
and nbExp
smax index of the last valid data in count
and nbExp
Computes the chi-square statistic for a continuous distribution.
Here, the equiprobable case can be used. Assuming that data
contains observations coming from the uniform distribution, the \([0,1]\) interval is divided into \(1/p\) subintervals, where \(p=\) minExp
\(/n\), \(n\) being the sample size, i.e., the number of observations stored in data
. For each subinterval, the method counts the number of contained observations and the chi-square statistic is computed using #chi2Equal(double,int[],int,int). We usually choose minExp
= 10.
Equivalent to chi2Equal (data, 10)
.
Computes and returns the Cramér-von Mises statistic \(W_n^2\) (see [55], [224], [225] ), defined by.
\[ W_n^2 = \frac{1}{ 12n} + \sum_{j=0}^{n-1} \left(U_{(j)} - \frac{(j+0.5) }{ n}\right)^2, \tag{CraMis} \]
assuming that sortedData
contains \(U_{(0)},…,U_{(n-1)}\) sorted in increasing order.
Assumes that the real-valued observations \(U_0,…,U_{n-1}\) contained in sortedData
are already sorted in increasing order and computes the differences between the successive observations.
Let \(D\) be the differences returned in spacings
. The difference \(U_i - U_{i-1}\) is put in \(D_i\) for n1 < i <= n2
, whereas \(U_{n1} - a\) is put into \(D_{n1}\) and \(b - U_{n2}\) is put into \(D_{n2+1}\). The number of observations must be greater or equal than n2
, we must have n1 < n2
, and n1
and n2
are greater than 0. The size of spacings
will be at least \(n+1\) after the call returns.
sortedData
, of the processed observations n2 ending index, in sortedData
of the processed observations a minimum value of the observations b maximum value of the observations
Same as method diff(IntArrayList,IntArrayList,int,int,int,int), but for the continuous case.
sortedData
n2 ending index, in sortedData
of the processed observations a minimum value of the observations b maximum value of the observations
Applies one iteration of the iterated spacings transformation [112], [226] .
Let \(U\) be the \(n\) observations contained into data
, and let \(S\) be the spacings contained into spacings
, Assumes that \(S[0..n]\) contains the spacings between \(n\) real numbers \(U_0,…,U_{n-1}\) in the interval \([0,1]\). These spacings are defined by
\[ S_i = U_{(i)} - U_{(i-1)}, \qquad1\le i < n, \]
where \(U_{(0)}=0\), \(U_{(n-1)}=1\), and \(U_{(0)},…,U_{(n-1)}\), are the \(U_i\) sorted in increasing order. These spacings may have been obtained by calling diff(DoubleArrayList,DoubleArrayList,int,int,double,double). This method transforms the spacings into new spacings, by a variant of the method described in section 11 of [177] and also by Stephens [226] : it sorts \(S_0,…,S_n\) to obtain \(S_{(0)} \le S_{(1)} \le S_{(2)} \le\cdots\le S_{(n)}\), computes the weighted differences
\begin{align*} S_0 & = (n+1) S_{(0)}, \\ S_1 & = n (S_{(1)}-S_{(0)}), \\ S_2 & = (n-1) (S_{(2)}-S_{(1)}), \\ & \vdots \\ S_n & = S_{(n)}-S_{(n-1)}, \end{align*}
and computes \(V_i = S_0 + S_1 + \cdots+ S_i\) for \(0\le i < n\). It then returns \(S_0,…,S_n\) in S[0..n]
and \(V_1,…,V_n\) in V[1..n]
.
Under the assumption that the \(U_i\) are i.i.d. \(U (0,1)\), the new \(S_i\) can be considered as a new set of spacings having the same distribution as the original spacings, and the \(V_i\) are a new sample of i.i.d. \(U (0,1)\) random variables, sorted by increasing order.
This transformation is useful to detect clustering in a data set: A pair of observations that are close to each other is transformed into an observation close to zero. A data set with unusually clustered observations is thus transformed to a data set with an accumulation of observations near zero, which is easily detected by the Anderson-Darling GOF test.
Computes the Kolmogorov-Smirnov (KS) test statistics \(D_n^+\), \(D_n^-\), and \(D_n\) (see method kolmogorovSmirnov(DoubleArrayList) ).
Returns the array [ \(D_n^+\), \(D_n^-\), \(D_n\)].
Computes the Kolmogorov-Smirnov (KS) test statistics \(D_n^+\), \(D_n^-\), and \(D_n\) defined by.
\begin{align} D_n^+ & = \max_{0\le j\le n-1} \left((j+1)/n - U_{(j)}\right), \tag{DNp} \\ D_n^- & = \max_{0\le j\le n-1} \left(U_{(j)} - j/n\right), \tag{DNm} \\ D_n & = \max (D_n^+, D_n^-). \tag{DN} \end{align}
and returns an array of length 3 that contains [ \(D_n^+\), \(D_n^-\), \(D_n\)]. These statistics compare the empirical distribution of \(U_{(1)},…,U_{(n)}\), which are assumed to be in sortedData
, with the uniform distribution over \([0,1]\).
Computes the KolmogorovSmirnov (KS) test statistics and their \(p\)-values.
This is to compare the empirical distribution of the (unsorted) observations in data
with the theoretical distribution dist
. The KS statistics \(D_n^+\), \(D_n^-\) and \(D_n\) are returned in sval[0]
, sval[1]
, and sval[2]
respectively, and their corresponding \(p\)-values are returned in pval[0]
, pval[1]
, and pval[2]
.
Compute the KS statistics \(D_n^+(a)\) and \(D_n^-(a)\) defined in the description of the method FDist.kolmogorovSmirnovPlusJumpOne, assuming that \(F\) is the uniform distribution over \([0,1]\) and that \(U_{(1)},…,U_{(n)}\) are in sortedData
.
Returns the array [ \(D_n^+\), \(D_n^-\)].
Computes a variant of the \(p\)-value \(p\) whenever a test statistic has a discrete probability distribution.
This \(p\)-value is defined as follows:
\begin{align*} p_L & = P[Y \le y] \\ p_R & = P[Y \ge y] \\ p & = \left\{ \begin{array}{l@{qquad}l} p_R, & \mbox{if } p_R < p_L \\ 1 - p_L, \mbox{if } p_R \ge p_L \mbox{ and } p_L < 0.5 \\ 0.5 & \mbox{otherwise.} \end{array} \right. \end{align*}
\[ \begin{array}{rll} p = & p_R, & \qquad\mbox{if } p_R < p_L, \\ p = & 1 - p_L, & \qquad\mbox{if } p_R \ge p_L \mbox{ and } p_L < 0.5, \\ p = & 0.5 & \qquad\mbox{otherwise.} \end{array} \]
The function takes \(p_L\) and \(p_R\) as input and returns \(p\).
Applies the power ratios transformation \(W\) described in section 8.4 of Stephens [226] .
Let \(U\) be the \(n\) observations contained into sortedData
. Assumes that \(U\) contains \(n\) real numbers \(U_{(0)},…,U_{(n-1)}\) from the interval \([0,1]\), already sorted in increasing order, and computes the transformations:
\[ U’_i = (U_{(i)} / U_{(i+1)})^{i+1}, \qquad i=0,…,n-1, \]
with \(U_{(n)} = 1\). These \(U’_i\) are sorted in increasing order and put back in U[1...n]
. If the \(U_{(i)}\) are i.i.d. \(U (0,1)\) sorted by increasing order, then the \(U’_i\) are also i.i.d. \(U (0,1)\).
This transformation is useful to detect clustering, as explained in iterateSpacings(DoubleArrayList,DoubleArrayList), except that here a pair of observations close to each other is transformed into an observation close to 1. An accumulation of observations near 1 is also easily detected by the Anderson-Darling GOF test.
Computes and returns the scan statistic \(S_n (d)\), defined in ( scan ).
Let \(U\) be the \(n\) observations contained into sortedData
. The \(n\) observations in \(U[0..n-1]\) must be real numbers in the interval \([0,1]\), sorted in increasing order. (See FBar.scan for the distribution function of \(S_n (d)\)).
Applies the probability integral transformation \(U_i = F (V_i)\) for \(i = 0, 1, …, n-1\), where \(F\) is a continuous distribution function, and returns the result as an array of length \(n\).
\(V\) represents the \(n\) observations contained in data
, and \(U\), the returned transformed observations. If data
contains random variables from the distribution function dist
, then the result will contain uniform random variables over \([0,1]\).
Applies the transformation \(U_i = F (V_i)\) for \(i = 0, 1, …, n-1\), where \(F\) is a discrete distribution function, and returns the result as an array of length \(n\).
\(V\) represents the \(n\) observations contained in data
, and \(U\), the returned transformed observations.
Note: If \(V\) are the values of random variables with distribution function dist
, then the result will contain the values of discrete random variables distributed over the set of values taken by dist
, not uniform random variables over \([0,1]\).
Computes and returns the Watson statistic \(G_n\) (see [238], [41] ), defined by.
\begin{align} G_n & = \sqrt{n} \max_{\Rule{0.0pt}{7.0pt}{0.0pt} 0\le j \le n-1} \left\{ (j+1)/n - U_{(j)} + \overline{U}_n - 1/2 \right\} \tag{WatsonG} \\ & = \sqrt{n}\left(D_n^+ + \overline{U}_n - 1/2\right), \nonumber \end{align}
where \(\overline{U}_n\) is the average of the observations \(U_{(j)}\), assuming that sortedData
contains the sorted \(U_{(0)},…,U_{(n-1)}\).
Computes and returns the Watson statistic \(U_n^2\) (see [55], [224], [225] ), defined by.
\begin{align} W_n^2 & = \frac{1}{ 12n} + \sum_{j=0}^{n-1} \left\{U_{(j)} - \frac{(j + 0.5)}{ n} \right\}^2, \\ U_n^2 & = W_n^2 - n\left(\overline{U}_n - 1/2\right)^2. \tag{WatsonU} \end{align}
where \(\overline{U}_n\) is the average of the observations \(U_{(j)}\), assuming that sortedData
contains the sorted \(U_{(0)},…,U_{(n-1)}\).
The documentation for this class was generated from the following file:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4