Signpost Testing to Navigate the Parameter Space of the Gaussian Graphical Model With High-Dimensional Data.
1/5 보강
We evaluate the relevance of external quantitative information on the parameter of a Gaussian graphical model from high-dimensional data.
APA
Ruan K, van de Wiel MA, van Wieringen WN (2026). Signpost Testing to Navigate the Parameter Space of the Gaussian Graphical Model With High-Dimensional Data.. Biometrical journal. Biometrische Zeitschrift, 68(1), e70115. https://doi.org/10.1002/bimj.70115
MLA
Ruan K, et al.. "Signpost Testing to Navigate the Parameter Space of the Gaussian Graphical Model With High-Dimensional Data.." Biometrical journal. Biometrische Zeitschrift, vol. 68, no. 1, 2026, pp. e70115.
PMID
41674311 ↗
Abstract 한글 요약
We evaluate the relevance of external quantitative information on the parameter of a Gaussian graphical model from high-dimensional data. This information comes in the form of a parameter value available from a related knowledge domain or population. We contrast the external information to 'null' information, i.e., an internally accepted parameter value. The direction from a null to this externally provided parameter value is dubbed the signpost. The signpost test evaluates whether to follow the signpost in the search of the true parameter value. We present various test statistics to measure the informativeness of the signpost and ways to obtain their distribution under the null hypothesis of non-informativeness. By simulation, we investigate the power and other properties of the various signpost tests, and compare them to the likelihood ratio test. Finally, we employ the signpost test to illustrate how the learning of the Gaussian graphical model of a low-prevalence breast cancer subtype benefits from external knowledge obtained from data of a more prevalent and related but fundamentally different subtype.
🏷️ 키워드 / MeSH 📖 같은 키워드 OA만
같은 제1저자의 인용 많은 논문 (1)
📖 전문 본문 읽기 PMC JATS · ~177 KB · 영문
Introduction
1
Introduction
Cellular regulatory pathways are often described stochastically by graphical models (Dobra et al. 2004). These models portray the pathway as a network. In this network, the nodes represent the entities comprising the pathway, and the edges connecting the nodes correspond to the molecular interactions among the nodes. The network topology is encoded in the graphical model's parameters. The model parameters and, subsequently, the network's topology are typically learned from scratch, often from high‐dimensional data. Apart from the challenges that this high‐dimensionality brings about, we need not necessarily start from scratch. Knowledge of the pathway's workings is often present from a related disease, context, or model organism. In this work, we present a test to assess whether that knowledge is relevant to the context at hand.
We consider data Y from the Gaussian graphical model, i.e., Y∼N(0p,Ω−1). The model parameter Ω is the precision matrix from the space of p×p‐dimensional, symmetric positive‐definite matrices Sp++. The support of the precision matrix encodes the conditional (in)dependencies among the variates of this model. If (Ω)j,j′=0 with j,j′∈{1,…,p} and j≠j′, the variate pair Yj and Yj′ is independent, conditional on all other variates. Simultaneously, a nonzero off‐diagonal element of Ω indicates a conditional dependency between the corresponding variate pair. For more on graphical models, see Whittaker (1990) and Lauritzen (1996).
The learning of a Gaussian graphical model from high‐dimensional data is geared towards the reconstruction of the conditional (in)dependencies. This poses a challenging estimation and multiple testing problem. These may be overcome by regularization (e.g., Friedman et al. 2008) and multiple testing procedures (e.g., Benjamini and Hochberg 1995). The resulting inferred network topologies of these methods are unstable (Husmeier 2003; van Wieringen and Chen 2021). This is countered by procedures that assess the inferred solution's stability (Bodinier et al. 2021; Liu et al. 2010; Meinshausen and Bühlmann 2010; Müller et al. 2016).
Notwithstanding the sophistication of the aforementioned methodology to learn a Gaussian graphical model, it may be too much to ask to infer such details like a particular conditional independency from high‐dimensional data. High‐dimensional data are generated by high‐throughput techniques designed for automated screening purposes. Those techniques comprise many parallel gauges, usually of a lesser quality than the standard measurement, e.g., microarrays/sequencing versus polymerase chain reaction (PCR). The use of such preliminary gauges suffices to arrive at a first impression of the complex phenomenon under study. In line with the screening nature of the high‐throughput techniques, we do not aim to infer anything regarding a particular element of the precision matrix. Instead, we present a test that facilitates a more ‘global’ but still constructive type of inference regarding the Gaussian graphical model.
Our test assumes knowledge on the phenomenon under study to be available from related domains and evaluates this knowledge's informativeness to the current setting. The test thus assesses whether we can borrow, information or if we should start from a clean sheet. For instance, if we aim to reconstruct a pathway's regulatory network in a rare cancer subtype, knowledge from different and more prevalent and well‐studied subtypes of the same tissue may benefit our endeavor. We make this more precise and assume that two proposals, denoted as T0,Ta∈Sp++, for the precision matrix Ω are available from related domains. The proposal T0 represents the current belief in the precision matrix. This proposal may represent the absence of knowledge, e.g., through a diagonal matrix that harbors no conditional dependencies. The proposal Ta is a competing one, e.g., derived from a different and more prevalent cancer subtype. This is illustrated in the application (Section 6), where we take Ta to be the precision matrix estimate of a different subtype's gene–gene interaction network obtained from publicly available transcriptomic data. We dub the direction from T0 to Ta the signpost. Our test, accordingly called the signpost test, evaluates whether for the current setting the signpost points in the direction of the precision matrix's true value.
The paper is structured as follows. We first introduce the signpost test statistic, which is defined implicitly but can be evaluated numerically. We show that, under conditions, the signpost test statistic may be conceived as the step size on the line in the direction of the alternative precision matrix proposal. We describe how to generate the signpost test statistic's null distribution by a parametric bootstrap procedure. Alternatively, if bootstrapping is computationally demanding, we propose an approximate signpost test for which, under the null hypothesis, an asymptotic distribution is available. We investigate by simulation the type I error and power of the proposed (approximate) signpost test, under a variety of proposals, sample sizes, and dimensions. Moreover, we compare the signpost test with the likelihood ratio test. Finally, we consider the reconstruction of a pathway's gene–gene interactions in estrogen negative breast cancer samples. In this context, we decide by means of the signpost test on the informativeness of the estimated precision matrix from the more prevalent estrogen positive breast cancer samples. We substantiate the test's significance by showing (i) the empirical validity of the signpost test statistic as a metric of informativeness, (ii) a loss improvement in the direction of the estrogen positive target confirming the signpost test's significance, and (iii) how the signpost test's result may be exploited for network reconstruction.
Introduction
Cellular regulatory pathways are often described stochastically by graphical models (Dobra et al. 2004). These models portray the pathway as a network. In this network, the nodes represent the entities comprising the pathway, and the edges connecting the nodes correspond to the molecular interactions among the nodes. The network topology is encoded in the graphical model's parameters. The model parameters and, subsequently, the network's topology are typically learned from scratch, often from high‐dimensional data. Apart from the challenges that this high‐dimensionality brings about, we need not necessarily start from scratch. Knowledge of the pathway's workings is often present from a related disease, context, or model organism. In this work, we present a test to assess whether that knowledge is relevant to the context at hand.
We consider data Y from the Gaussian graphical model, i.e., Y∼N(0p,Ω−1). The model parameter Ω is the precision matrix from the space of p×p‐dimensional, symmetric positive‐definite matrices Sp++. The support of the precision matrix encodes the conditional (in)dependencies among the variates of this model. If (Ω)j,j′=0 with j,j′∈{1,…,p} and j≠j′, the variate pair Yj and Yj′ is independent, conditional on all other variates. Simultaneously, a nonzero off‐diagonal element of Ω indicates a conditional dependency between the corresponding variate pair. For more on graphical models, see Whittaker (1990) and Lauritzen (1996).
The learning of a Gaussian graphical model from high‐dimensional data is geared towards the reconstruction of the conditional (in)dependencies. This poses a challenging estimation and multiple testing problem. These may be overcome by regularization (e.g., Friedman et al. 2008) and multiple testing procedures (e.g., Benjamini and Hochberg 1995). The resulting inferred network topologies of these methods are unstable (Husmeier 2003; van Wieringen and Chen 2021). This is countered by procedures that assess the inferred solution's stability (Bodinier et al. 2021; Liu et al. 2010; Meinshausen and Bühlmann 2010; Müller et al. 2016).
Notwithstanding the sophistication of the aforementioned methodology to learn a Gaussian graphical model, it may be too much to ask to infer such details like a particular conditional independency from high‐dimensional data. High‐dimensional data are generated by high‐throughput techniques designed for automated screening purposes. Those techniques comprise many parallel gauges, usually of a lesser quality than the standard measurement, e.g., microarrays/sequencing versus polymerase chain reaction (PCR). The use of such preliminary gauges suffices to arrive at a first impression of the complex phenomenon under study. In line with the screening nature of the high‐throughput techniques, we do not aim to infer anything regarding a particular element of the precision matrix. Instead, we present a test that facilitates a more ‘global’ but still constructive type of inference regarding the Gaussian graphical model.
Our test assumes knowledge on the phenomenon under study to be available from related domains and evaluates this knowledge's informativeness to the current setting. The test thus assesses whether we can borrow, information or if we should start from a clean sheet. For instance, if we aim to reconstruct a pathway's regulatory network in a rare cancer subtype, knowledge from different and more prevalent and well‐studied subtypes of the same tissue may benefit our endeavor. We make this more precise and assume that two proposals, denoted as T0,Ta∈Sp++, for the precision matrix Ω are available from related domains. The proposal T0 represents the current belief in the precision matrix. This proposal may represent the absence of knowledge, e.g., through a diagonal matrix that harbors no conditional dependencies. The proposal Ta is a competing one, e.g., derived from a different and more prevalent cancer subtype. This is illustrated in the application (Section 6), where we take Ta to be the precision matrix estimate of a different subtype's gene–gene interaction network obtained from publicly available transcriptomic data. We dub the direction from T0 to Ta the signpost. Our test, accordingly called the signpost test, evaluates whether for the current setting the signpost points in the direction of the precision matrix's true value.
The paper is structured as follows. We first introduce the signpost test statistic, which is defined implicitly but can be evaluated numerically. We show that, under conditions, the signpost test statistic may be conceived as the step size on the line in the direction of the alternative precision matrix proposal. We describe how to generate the signpost test statistic's null distribution by a parametric bootstrap procedure. Alternatively, if bootstrapping is computationally demanding, we propose an approximate signpost test for which, under the null hypothesis, an asymptotic distribution is available. We investigate by simulation the type I error and power of the proposed (approximate) signpost test, under a variety of proposals, sample sizes, and dimensions. Moreover, we compare the signpost test with the likelihood ratio test. Finally, we consider the reconstruction of a pathway's gene–gene interactions in estrogen negative breast cancer samples. In this context, we decide by means of the signpost test on the informativeness of the estimated precision matrix from the more prevalent estrogen positive breast cancer samples. We substantiate the test's significance by showing (i) the empirical validity of the signpost test statistic as a metric of informativeness, (ii) a loss improvement in the direction of the estrogen positive target confirming the signpost test's significance, and (iii) how the signpost test's result may be exploited for network reconstruction.
Test Statistic
2
Test Statistic
Suppose we have n independently and identically distributed draws Y1,…,Yn∈Rp from the Gaussian graphical model N(0p,Ω−1) available. Let T0 and Ta be the null and alternative proposals for the precision matrix. We aim to test whether the data support the signpost, i.e., the direction from T0 to Ta, to the true value of Ω. We assume that Ω=(1−γ)T0+γTa with γ∈[0,1], a weighted average of two precision matrices. But, when rewritten as Ω=T0+γ(Ta−T0), it can be viewed as a point on the line segment that connects the precision matrices T0 and Ta. In this view, γ quantifies how far Ω is from T0 in the direction of the signpost. The signpost test evaluates the null hypothesis H0:γ=0 against the alternative one Ha:γ>0. Rejection of H0 does not imply that proposal Ta is to be preferred over T0, rather than a better value of the precision matrix can be found in the direction of Ta. Similarly, failure to reject does not imply that the precision matrix equals T0, only that the data do not support the direction of Ta in the search for a better parameter value.
Our test statistic development relies on the bi‐targeted ridge precision matrix estimator (van Wieringen et al. 2020). This estimator maximizes the loglikelihood augmented with a ridge penalty centered at the weighted average of two different locations, called targets or proposals and denoted T0 and Ta, of the parameter space Sp++. Formally, we define this estimator as
with sample covariance matrix S=n−1∑i=1nYiYi⊤, penalty parameter λ>0 and weight parameter θ∈[0,1]. The estimator has an analytic expression:
see van Wieringen and Peeters (2016) and van Wieringen et al. (2020) for its derivation. Both λ and θ are tuning parameters, for instance chosen through cross‐validation. Simulations in van Wieringen et al. (2020), that take either of the two targets to equal the true Ω, indicate that the cross‐validated θ puts most weight on the true target. Hence, θ is potentially a good metric to decide between the two proposals.
We take the signpost test statistic to be the θ that yields the best fit as measured by the loglikelihood. Formally,
We use this test statistic to decide between T0 and Ta. It involves the penalty parameter, and its influence on the test's power is investigated later (see Section 5).
We prove that θ^(λ) is in some settings uniquely defined (the proof is, as are all proofs, deferred to the Supporting information, henceforth abbreviated to SM). In simulations, we explored other settings that indicate that the function maximized by θ^(λ) is concave. A possible non‐uniqueness is circumvented by choosing the smallest maximizing θ^(λ), which may render the signpost test to be conservative. The test statistic's definition in the display above can be straightforwardly evaluated by standard root finding procedures, which we initiate by a value close to zero to arrive at the smallest maximizing θ^(λ). Numerical experiments indicate that only for small λ the test statistic may differ noticeably from θ^∞ (see SM).
In the remainder, we focus mostly on the test statistic that corresponds to λ→∞‐limit and we denote by θ^∞. This θ^∞ is the limit of the test statistic definition, and the asymptote of θ^(λ) as λ tends to infinity. For finite but not too large λ, the difference between θ^(λ) and θ^∞ is small and the latter is a good approximation of the former. Practically, the use of θ^∞ avoids having to choose λ. Moreover, it makes the choice of the penalty, e.g., ridge or lasso, irrelevant. More importantly, within the context of ridge regression, we showed that this choice yields the most powerful test (van Wieringen 2024). This observation can be understood within the Bayesian framework. There, priors instead of penalties are used, but the Bayesian maximum a posteriori (MAP) estimator coincides with its penalized counterpart. This equivalence motivates the adoption of the Bayesian view to understand the power improvement of the signpost test with the θ^∞ test statistic. To see this, we contrast vague to concentrated priors. The former, corresponding to a small λ, result in overlapping priors that are hard to distinguish with respect to the corresponding, almost identical MAP estimators. The adopted choice of a large λ, on the other hand, corresponds to concentrated priors that yield clearly different MAP estimators and facilitate discrimination between the underlying priors. To formally define our test statistic, note that in the regularization limit, limλ→∞Ω^(λ)=(1−θ)T0+θTa. Substitute this in the loglikelihood and choose θ^∞ as its maximizer:
The θ^∞ is uniquely defined as it is the maximizer of a concave function, which follows from the negative sign of the function's second‐order derivative with respect to θ. To find the θ^∞, take the derivative of the loss function w.r.t. θ and equate it to zero:
The root of this equation is our test statistic θ^∞.
2.1
Computationally Efficient Evaluation
In the absence of an analytic expression of the test statistic θ^∞, we resort to numerical means for its evaluation. To that end it is computationally most efficient to reformulate Equation (1) to:
where ∘ is the Hadamard matrix product and dj,a/0 is the jth eigenvalue of T0−1/2TaT0−1/2 in which A1/2 denotes the matrix square root. Then, to avoid any matrix operation inside the search procedure, evaluate ∑j,j′=1p[(Ta−T0)∘S]j,j′ and the eigenvalues of T0−1/2TaT0−1/2 prior to the initiation of the search. Note in our computationally efficient reformulation of the test statistic equation, we have introduced an additional root, θ=0, which is to be ignored, unless its multiplicity exceeds one.
The search procedure for the test statistic can be sped up if we provide a warm start and are able to bound the search space. The warm start is discussed in Section 4 and Proposition 2.1 provides the bounds.
Proposition 2.1Let d1,a/0 and dp,a/0 denote the largest and smallest, respectively, eigenvalue of T0−1/2TaT0−1/2. Assume θ∈[0,1], tr[(Ta−T0)S]≠0, and d1,a/0≠1=dp,a/0. Then, the root of Equation (1) satisfies:
i)
θ≤{p−tr[T0(T0+Ta)−1]}/tr[(Ta−T0)S],
ii)
θ≤p/tr[(Ta−T0)S]−(d1,a/0−1)−1,
iii)
θ≥p/tr[(Ta−T0)S]−(dp,a/0−1)−1,
iv)
θ<{p−[tr(T0−1/2TaT0−1/2)]−1}{p−1−[tr(T0−1/2TaT0−1/2)]−1+tr[(Ta−T0)S]}−1.
2.2
Unbiasedness
The construction of the null distribution of our test hinges upon the following proposition, which states that—under conditions—the test statistic θ^∞ can be viewed as an approximate and asymptotically unbiased estimator of γ. Importantly, the test statistic can thereby inherits the interpretation of a step size, it measures how far to follow the direction of the signpost in the search for the true precision matrix.
Proposition 2.2Assume Ω=(1−γ)T0+γTa with γ∈[0,1]. Then,
i)
E(θ^∞)=γ if γ=0.
ii)
θ^∞ is an asymptotically unbiased estimator of γ∈[0,1].
iii)
if in addition 1−δ(1+δγ)−1<da/0,j<1+δ(1−δγ)−1 for j=1,…,p and δ=O(p−1/3), θ^∞ is unbiased for γ∈[0,1] in the high‐dimensional limit p→∞.
The proof of Proposition 2.2 also reveals that, when approximated by a Taylor series, θ^∞ is unbiased for γ up to second and higher order terms. The latter terms vanish asymptotically and, under conditions, as the number of dimensions grows large. Together, these findings suggest that θ^∞ is close to unbiased for γ on the whole unit interval for finite sample sizes and dimensions.
Simulation evidence indicates that θ^∞ is also a virtually unbiased estimator of step size γ for finite sample sizes and dimensions (see SM). This is illustrated by Figure 1, which results from a small simulation study. It comprises drawing from Y∼N{0p,[(1−γ)T0+γTa]−1} with n=50, p=300, γ∈{0,0.1,0.2,…,1}, and targets T0=Ipp and Ta a scale‐free inverse (details in the SM), respectively. From each draw, we evaluate θ^∞. For each choice of γ and Ta, this is repeated a thousand times. The resulting θ^∞ are plotted against γ (see Figure 1 and the SM). The plot reveals that the θ^∞’s are distributed along the x=y‐line with their mode at the corresponding γ value. Hence, also for finite sample sizes, the test statistic θ^∞ is indicative of the true precision matrix's position on the line between the targets.
The assumption of Proposition 2.2 that the true parameter Ω is on the line between the null and alternative target is stringent, but this assumption is exactly what underpins the signpost test. We will therefore investigate the signpost test's robustness against violations of this assumption (see Section 5).
Test Statistic
Suppose we have n independently and identically distributed draws Y1,…,Yn∈Rp from the Gaussian graphical model N(0p,Ω−1) available. Let T0 and Ta be the null and alternative proposals for the precision matrix. We aim to test whether the data support the signpost, i.e., the direction from T0 to Ta, to the true value of Ω. We assume that Ω=(1−γ)T0+γTa with γ∈[0,1], a weighted average of two precision matrices. But, when rewritten as Ω=T0+γ(Ta−T0), it can be viewed as a point on the line segment that connects the precision matrices T0 and Ta. In this view, γ quantifies how far Ω is from T0 in the direction of the signpost. The signpost test evaluates the null hypothesis H0:γ=0 against the alternative one Ha:γ>0. Rejection of H0 does not imply that proposal Ta is to be preferred over T0, rather than a better value of the precision matrix can be found in the direction of Ta. Similarly, failure to reject does not imply that the precision matrix equals T0, only that the data do not support the direction of Ta in the search for a better parameter value.
Our test statistic development relies on the bi‐targeted ridge precision matrix estimator (van Wieringen et al. 2020). This estimator maximizes the loglikelihood augmented with a ridge penalty centered at the weighted average of two different locations, called targets or proposals and denoted T0 and Ta, of the parameter space Sp++. Formally, we define this estimator as
with sample covariance matrix S=n−1∑i=1nYiYi⊤, penalty parameter λ>0 and weight parameter θ∈[0,1]. The estimator has an analytic expression:
see van Wieringen and Peeters (2016) and van Wieringen et al. (2020) for its derivation. Both λ and θ are tuning parameters, for instance chosen through cross‐validation. Simulations in van Wieringen et al. (2020), that take either of the two targets to equal the true Ω, indicate that the cross‐validated θ puts most weight on the true target. Hence, θ is potentially a good metric to decide between the two proposals.
We take the signpost test statistic to be the θ that yields the best fit as measured by the loglikelihood. Formally,
We use this test statistic to decide between T0 and Ta. It involves the penalty parameter, and its influence on the test's power is investigated later (see Section 5).
We prove that θ^(λ) is in some settings uniquely defined (the proof is, as are all proofs, deferred to the Supporting information, henceforth abbreviated to SM). In simulations, we explored other settings that indicate that the function maximized by θ^(λ) is concave. A possible non‐uniqueness is circumvented by choosing the smallest maximizing θ^(λ), which may render the signpost test to be conservative. The test statistic's definition in the display above can be straightforwardly evaluated by standard root finding procedures, which we initiate by a value close to zero to arrive at the smallest maximizing θ^(λ). Numerical experiments indicate that only for small λ the test statistic may differ noticeably from θ^∞ (see SM).
In the remainder, we focus mostly on the test statistic that corresponds to λ→∞‐limit and we denote by θ^∞. This θ^∞ is the limit of the test statistic definition, and the asymptote of θ^(λ) as λ tends to infinity. For finite but not too large λ, the difference between θ^(λ) and θ^∞ is small and the latter is a good approximation of the former. Practically, the use of θ^∞ avoids having to choose λ. Moreover, it makes the choice of the penalty, e.g., ridge or lasso, irrelevant. More importantly, within the context of ridge regression, we showed that this choice yields the most powerful test (van Wieringen 2024). This observation can be understood within the Bayesian framework. There, priors instead of penalties are used, but the Bayesian maximum a posteriori (MAP) estimator coincides with its penalized counterpart. This equivalence motivates the adoption of the Bayesian view to understand the power improvement of the signpost test with the θ^∞ test statistic. To see this, we contrast vague to concentrated priors. The former, corresponding to a small λ, result in overlapping priors that are hard to distinguish with respect to the corresponding, almost identical MAP estimators. The adopted choice of a large λ, on the other hand, corresponds to concentrated priors that yield clearly different MAP estimators and facilitate discrimination between the underlying priors. To formally define our test statistic, note that in the regularization limit, limλ→∞Ω^(λ)=(1−θ)T0+θTa. Substitute this in the loglikelihood and choose θ^∞ as its maximizer:
The θ^∞ is uniquely defined as it is the maximizer of a concave function, which follows from the negative sign of the function's second‐order derivative with respect to θ. To find the θ^∞, take the derivative of the loss function w.r.t. θ and equate it to zero:
The root of this equation is our test statistic θ^∞.
2.1
Computationally Efficient Evaluation
In the absence of an analytic expression of the test statistic θ^∞, we resort to numerical means for its evaluation. To that end it is computationally most efficient to reformulate Equation (1) to:
where ∘ is the Hadamard matrix product and dj,a/0 is the jth eigenvalue of T0−1/2TaT0−1/2 in which A1/2 denotes the matrix square root. Then, to avoid any matrix operation inside the search procedure, evaluate ∑j,j′=1p[(Ta−T0)∘S]j,j′ and the eigenvalues of T0−1/2TaT0−1/2 prior to the initiation of the search. Note in our computationally efficient reformulation of the test statistic equation, we have introduced an additional root, θ=0, which is to be ignored, unless its multiplicity exceeds one.
The search procedure for the test statistic can be sped up if we provide a warm start and are able to bound the search space. The warm start is discussed in Section 4 and Proposition 2.1 provides the bounds.
Proposition 2.1Let d1,a/0 and dp,a/0 denote the largest and smallest, respectively, eigenvalue of T0−1/2TaT0−1/2. Assume θ∈[0,1], tr[(Ta−T0)S]≠0, and d1,a/0≠1=dp,a/0. Then, the root of Equation (1) satisfies:
i)
θ≤{p−tr[T0(T0+Ta)−1]}/tr[(Ta−T0)S],
ii)
θ≤p/tr[(Ta−T0)S]−(d1,a/0−1)−1,
iii)
θ≥p/tr[(Ta−T0)S]−(dp,a/0−1)−1,
iv)
θ<{p−[tr(T0−1/2TaT0−1/2)]−1}{p−1−[tr(T0−1/2TaT0−1/2)]−1+tr[(Ta−T0)S]}−1.
2.2
Unbiasedness
The construction of the null distribution of our test hinges upon the following proposition, which states that—under conditions—the test statistic θ^∞ can be viewed as an approximate and asymptotically unbiased estimator of γ. Importantly, the test statistic can thereby inherits the interpretation of a step size, it measures how far to follow the direction of the signpost in the search for the true precision matrix.
Proposition 2.2Assume Ω=(1−γ)T0+γTa with γ∈[0,1]. Then,
i)
E(θ^∞)=γ if γ=0.
ii)
θ^∞ is an asymptotically unbiased estimator of γ∈[0,1].
iii)
if in addition 1−δ(1+δγ)−1<da/0,j<1+δ(1−δγ)−1 for j=1,…,p and δ=O(p−1/3), θ^∞ is unbiased for γ∈[0,1] in the high‐dimensional limit p→∞.
The proof of Proposition 2.2 also reveals that, when approximated by a Taylor series, θ^∞ is unbiased for γ up to second and higher order terms. The latter terms vanish asymptotically and, under conditions, as the number of dimensions grows large. Together, these findings suggest that θ^∞ is close to unbiased for γ on the whole unit interval for finite sample sizes and dimensions.
Simulation evidence indicates that θ^∞ is also a virtually unbiased estimator of step size γ for finite sample sizes and dimensions (see SM). This is illustrated by Figure 1, which results from a small simulation study. It comprises drawing from Y∼N{0p,[(1−γ)T0+γTa]−1} with n=50, p=300, γ∈{0,0.1,0.2,…,1}, and targets T0=Ipp and Ta a scale‐free inverse (details in the SM), respectively. From each draw, we evaluate θ^∞. For each choice of γ and Ta, this is repeated a thousand times. The resulting θ^∞ are plotted against γ (see Figure 1 and the SM). The plot reveals that the θ^∞’s are distributed along the x=y‐line with their mode at the corresponding γ value. Hence, also for finite sample sizes, the test statistic θ^∞ is indicative of the true precision matrix's position on the line between the targets.
The assumption of Proposition 2.2 that the true parameter Ω is on the line between the null and alternative target is stringent, but this assumption is exactly what underpins the signpost test. We will therefore investigate the signpost test's robustness against violations of this assumption (see Section 5).
Signpost Testing
3
Signpost Testing
The signpost test evaluates whether there is relevant information in the direction from the null to the alternative target with the parameter of the Gaussian graphical model. This direction is parameterized by the line segment (1−γ)T0+γTa for γ∈[0,1]. The signpost test assesses whether the model parameter is on this line. It does so by testing H0:γ=0 vs. Ha:γ>0. Rejection of H0 indicates that there is relevant information with respect to the true model parameter to be found in the direction from the null to the alternative target. Failure to reject the null hypothesis does not imply that T0 is the location of the model parameter, for that may well be found in alternative directions than that of Ta.
The signpost test of the Gaussian graphical model uses θ^∞, or θ^(λ), as test statistic. It is, under the Ω=(1−γ)T0+γTa assumption and by Proposition 2.2, a good metric to quantify the amount of relevant information regarding the location of the true parameter. The signpost test compares the observed test statistic to its null distribution generated by the parametric bootstrap. Formally, the outline of the signpost test procedure is:
Signpost test of the Gaussian graphical model. To test H0:γ=γ0 vs. Ha:γ≠γ0:
1)
Evaluate the test statistic from observed data and
denote it θ^∞H0,(0) and, for later use, θ^∞obs.
2)
To generate the null distribution, iterate the following steps K times:
i)
Sample null data Y1H0,(k),…,YnH0,(k)∼N(0p,T0−1) for
the k‐th iteration with k=1,…,K.
ii)
Evaluate θ^∞ using the null data. Denote this result by θ^∞H0,(k)
for the k‐th iteration with k=1,…,K.
3)
Calculate the p‐value by
The p‐value of the signpost test may be accompanied by a more subtle uncertainty quantification through the construction of an approximate confidence interval. This interval is obtained by similar means as the signpost test's null distribution. In the second step of the signpost test procedure, we sample from N{0p,[(1−θ^∞obs)T0+θ^∞obsTa]−1}. Then, by virtue of the approximately unbiasedness of θ^∞ for γ (Proposition 2.2), for α∈(0,1) the 12α and 1−12α quantiles of the resulting distribution form an approximate 100(1−α) confidence interval.
Signpost Testing
The signpost test evaluates whether there is relevant information in the direction from the null to the alternative target with the parameter of the Gaussian graphical model. This direction is parameterized by the line segment (1−γ)T0+γTa for γ∈[0,1]. The signpost test assesses whether the model parameter is on this line. It does so by testing H0:γ=0 vs. Ha:γ>0. Rejection of H0 indicates that there is relevant information with respect to the true model parameter to be found in the direction from the null to the alternative target. Failure to reject the null hypothesis does not imply that T0 is the location of the model parameter, for that may well be found in alternative directions than that of Ta.
The signpost test of the Gaussian graphical model uses θ^∞, or θ^(λ), as test statistic. It is, under the Ω=(1−γ)T0+γTa assumption and by Proposition 2.2, a good metric to quantify the amount of relevant information regarding the location of the true parameter. The signpost test compares the observed test statistic to its null distribution generated by the parametric bootstrap. Formally, the outline of the signpost test procedure is:
Signpost test of the Gaussian graphical model. To test H0:γ=γ0 vs. Ha:γ≠γ0:
1)
Evaluate the test statistic from observed data and
denote it θ^∞H0,(0) and, for later use, θ^∞obs.
2)
To generate the null distribution, iterate the following steps K times:
i)
Sample null data Y1H0,(k),…,YnH0,(k)∼N(0p,T0−1) for
the k‐th iteration with k=1,…,K.
ii)
Evaluate θ^∞ using the null data. Denote this result by θ^∞H0,(k)
for the k‐th iteration with k=1,…,K.
3)
Calculate the p‐value by
The p‐value of the signpost test may be accompanied by a more subtle uncertainty quantification through the construction of an approximate confidence interval. This interval is obtained by similar means as the signpost test's null distribution. In the second step of the signpost test procedure, we sample from N{0p,[(1−θ^∞obs)T0+θ^∞obsTa]−1}. Then, by virtue of the approximately unbiasedness of θ^∞ for γ (Proposition 2.2), for α∈(0,1) the 12α and 1−12α quantiles of the resulting distribution form an approximate 100(1−α) confidence interval.
An Approximate Signpost Test
4
An Approximate Signpost Test
Here, we present an alternative test based on an approximate signpost test statistic for which the asymptotic distribution is known. The corresponding exceedance probability can then directly be calculated. This circumvents the potentially computationally demanding generation of the test statistic's null distribution of the original signpost test introduced in Section 2.
We approximate the test statistic of the signpost test. To this end, we linearize the right‐hand side of the test statistic equation (1):
The root of this resulting equation,
serves as an approximate test statistic.
The approximate test statistic θ∼∞ also uses evidence present in the data to measure how far the true precision matrix is from the null target matrix, in the direction of the alternative target. This is expressed by the following proposition.
Proposition 4.1If Ω=(1−γ)T0+γTa, then
The interval bounding the bias of the approximate test statistic as an estimator of γ provided by Proposition 4.1 contains, under the mild assumption of da/0,1>1 and da/0,p<1, zero. This interval grows for larger γ. But for γ close to zero, the approximate test statistic is almost unbiased. In particular, if γ=0, the approximate test statistic is an unbiased estimator of γ. This suggests that the approximate test statistic θ∼∞ is sensitive enough to evaluate H0:γ=0 versus Ha:γ>0.
The SM provides an impression of the quality of θ∼∞ as an approximation of θ^∞. This impression recycles the simulation described in Section 2.2 but now calculates the original and approximate test statistics for each iteration. These are plotted against each other in a qq‐plot (see the SM). It reveals, as suggested by Proposition 4.1, that the approximation is good for small γ close to zero but becomes poorer for larger γ. Nonetheless, this shows the suitability of the approximate test statistic for testing H0:γ=0 versus Ha:γ>0.
The null distribution of the approximate signpost test statistic is known asymptotically (cf. Corollary 4.2) as (i)
θ∼∞∝tr[(Ta−T0)S] and (ii) the asymptotic distribution of this trace has been characterized (Fujikoshi 1970).
Corollary 4.2Let nS follow the central Wishart distribution Wp(n,Σ) and Ta−T0 be nonsingular. The asymptotic distribution of θ∼∞ iswhere σ=2tr[(R−1Σ)2], b=∥T0−1/2TaT0−1/2−Ipp∥F2, R−1=b−1(T0−Ta), θ∞ defined as θ∼∞ with S replaced by Σ, and Φ(r)(z) denotes the rth derivative of the standard normal distribution function Φ(z).
This is a direct consequence of Theorem 11.2 of Fujikoshi (1970), which for completeness is quoted in the SM.
Knowledge of the approximate test statistic's asymptotic distribution facilitates a parametric test. In low dimensions, under the null hypothesis H0:γ=0, and with a reasonable sample size, the asymptotic distribution of Corollary 4.2 is expected to be a good approximation to the null distribution. Simulation evidence presented in Section 5 indicates that even high‐dimensionally this parametric test has comparable power (see SM) to the test presented in Section 3.
An Approximate Signpost Test
Here, we present an alternative test based on an approximate signpost test statistic for which the asymptotic distribution is known. The corresponding exceedance probability can then directly be calculated. This circumvents the potentially computationally demanding generation of the test statistic's null distribution of the original signpost test introduced in Section 2.
We approximate the test statistic of the signpost test. To this end, we linearize the right‐hand side of the test statistic equation (1):
The root of this resulting equation,
serves as an approximate test statistic.
The approximate test statistic θ∼∞ also uses evidence present in the data to measure how far the true precision matrix is from the null target matrix, in the direction of the alternative target. This is expressed by the following proposition.
Proposition 4.1If Ω=(1−γ)T0+γTa, then
The interval bounding the bias of the approximate test statistic as an estimator of γ provided by Proposition 4.1 contains, under the mild assumption of da/0,1>1 and da/0,p<1, zero. This interval grows for larger γ. But for γ close to zero, the approximate test statistic is almost unbiased. In particular, if γ=0, the approximate test statistic is an unbiased estimator of γ. This suggests that the approximate test statistic θ∼∞ is sensitive enough to evaluate H0:γ=0 versus Ha:γ>0.
The SM provides an impression of the quality of θ∼∞ as an approximation of θ^∞. This impression recycles the simulation described in Section 2.2 but now calculates the original and approximate test statistics for each iteration. These are plotted against each other in a qq‐plot (see the SM). It reveals, as suggested by Proposition 4.1, that the approximation is good for small γ close to zero but becomes poorer for larger γ. Nonetheless, this shows the suitability of the approximate test statistic for testing H0:γ=0 versus Ha:γ>0.
The null distribution of the approximate signpost test statistic is known asymptotically (cf. Corollary 4.2) as (i)
θ∼∞∝tr[(Ta−T0)S] and (ii) the asymptotic distribution of this trace has been characterized (Fujikoshi 1970).
Corollary 4.2Let nS follow the central Wishart distribution Wp(n,Σ) and Ta−T0 be nonsingular. The asymptotic distribution of θ∼∞ iswhere σ=2tr[(R−1Σ)2], b=∥T0−1/2TaT0−1/2−Ipp∥F2, R−1=b−1(T0−Ta), θ∞ defined as θ∼∞ with S replaced by Σ, and Φ(r)(z) denotes the rth derivative of the standard normal distribution function Φ(z).
This is a direct consequence of Theorem 11.2 of Fujikoshi (1970), which for completeness is quoted in the SM.
Knowledge of the approximate test statistic's asymptotic distribution facilitates a parametric test. In low dimensions, under the null hypothesis H0:γ=0, and with a reasonable sample size, the asymptotic distribution of Corollary 4.2 is expected to be a good approximation to the null distribution. Simulation evidence presented in Section 5 indicates that even high‐dimensionally this parametric test has comparable power (see SM) to the test presented in Section 3.
Test Performance
5
Test Performance
We evaluate the power of the proposed signpost tests with test statistics θ^(λ), θ^∞ and θ∼∞ in simulation. We do so under various choices of sample sizes, dimensions, effect sizes and conditional dependence graphs. Moreover, we assess the signpost tests' robustness against violation of the key assumption that the true parameter is a weighted average of the null and alternative proposals.
We first specify the simulations details. We draw data Y1,…,Yn from the Gaussian graphical model N(0p,Ω−1). The sample size n and dimension p is chosen from the set {10,20,…,100} and {50,150}, respectively. The precision matrix Ω is a weighted average of the null and alternative proposal, i.e., Ω=(1−γ)T0+γTa. The weight parameter γ equals a value from {0,0.1,0.2,…,1}. Throughout we set T0=Ipp and employ various Ta. All employed Ta have a unit diagonal and differ in their off‐diagonal structure. These structures represent the topology of the conditional independence graphs: uniform (saturated), banded, stripe (two hubs), and scale‐free (see the SM for details). With generated data, we use the signpost test with a thousand parametric bootstraps to evaluate the null hypothesis H0:γ=0 vs. the alternative Ha:γ>0. The signpost test is declared significant if the p‐value exceeds the significance level α=0.05. We repeat the above a thousand times for all (n,p,γ,Ta)‐combinations. Per such combination, we calculate the proportion of rejected null hypothesis.
Before we report the signpost test's power, we note that the type I error is well‐controlled by construction. For, under the null hypothesis, both the observed and bootstrapped test statistics are obtained from data drawn from the same distribution. Superfluously, we confirmed in simulation that the histogram (not shown) of the signpost test's p‐values is indeed uniform under H0.
The power curves (see SM) of the signpost test equipped with the θ^∞ test statistic show an improvement in the signpost test's power with an increase in either sample size and/or effect size. Both are properties of a useful test, as larger sample sizes enable a better inference and a larger effect is easier to detect.
The θ^∞ test statistic equipped signpost test's power profits from a larger dimension of the precision matrix. The gain in power is consistent over sample sizes, effect sizes, and topologies. This gain can intuitively be understood in two ways.
i)
The number of elements that differ between the null and alternative targets increases with dimension. More differing elements are more locations that exhibit the effect of a nonzero γ. These enhance the signpost test's capacity to detect a nonzero γ. This is supported by the differences of the power curves among the simulations with the various employed topologies of the alternative target. For instance, compare the situations where Ta has a chain or banded conditional independence graph. The case involving the latter has two extra off‐diagonals with nonzero elements and its power curves are systematically above the case with the former.
ii)
The (absolute) size of the differences among elements of the null and alternative target affects the gain in power. This is due to the fact that the size and γ enter the target multiplicatively. Hence, a larger difference among the elements of T0 and Ta amplifies the effect of γ, making it easier for the signpost test to pick it up.
Practically, the power curves enable us to deduce a rough sample size indication for the study. For instance, would we want to detect a step size γ=0.20 to distinguish between two 150×150‐dimensional precision matrices with an empty and scale‐free topology, we require at least n=50 samples to do so with 90% power. This sample size is heavily dependent on the assumed topologies of the precision matrices' conditional independence graphs and the difference between the precision matrices' elements. Irrespective of the exact sample size, it tells us that a limited sample size is sufficient to detect the relevance of a proposed direction in a high‐dimensional parameter space.
The power curves of signpost test equipped with the approximate test statistic θ∼∞ using its asymptotic distribution as an approximation to its null distribution to calculate the p‐values are depicted in the SM. This version of the signpost test exhibits comparable—if not equal—power to the one equipped with the θ^∞ test statistic and the parametric bootstrap generated null distribution. This is understandable as the approximate test statistic θ∼∞ is an unbiased estimator of γ=0 (Proposition 4.1) and, thereby, an appropriate summary of the evidence against H0:γ=0.
We now study the power of the signpost test with the test statistic θ^(λ) as defined in Display (1). We adopt the same simulation set‐up as described above and use the modified test statistic with λ=10,100,1000. The resulting power curves for a scale‐free alternative target (with power curves representative for all alternative target types) are shown in the SM. These curves reveal that the signpost test equipped with the finite λ test statistic θ^(λ) has slightly less power than its counterpart with the θ^∞ test statistic. Hence, the signpost test with the θ^∞ test statistic is to be preferred as it yields a better power. We believe this to be rooted in the minimum variance of the bi‐targeted ridge precision estimator at λ=∞. Irrespectively, it conveniently avoids the issue of making an informed choice on λ.
So far we have investigated the signpost test's behavior under the assumption that the precision matrix Ω is on the line between the two proposals T0 and Ta. This assumption is rather stringent. We therefore provide some insight on the signpost test's behavior under violation of this assumption. We take Ω as a perturbed version of Ta (or vice versa). The perturbation amounts to a rotation of (part of) the eigenspace but leaves the eigenvalues unaffected. Technical details of the perturbation are deferred to the SM. The rotation adheres to the signpost test's motivation as the test evaluates a direction. Moreover, the rotation ensures the positive‐definite perturbed target/precision matrix. We then repeat the power simulations with identical settings as before but with a precision matrix that does not adhere to the key assumption. In these simulations, we vary the severity of the violation of the assumption through different ‘angles’ of the rotation.
The plots showing the signpost test's power under the aforementioned misspecification of its underpinning assumption Ω=(1−γ)T0+γTa can be found in the SM. Their main takeaway is that the signpost test still has considerable power under (mild) misspecification of the alternative proposal. Put differently, the signpost test can still detect (with some power) whether it is worthwhile to pursue the direction of its signpost, even though the true parameter is not on the line from the null to the alternative target.
5.1
Comparison
We compare the signpost test with the likelihood ratio test. The latter is a natural competitor of the signpost test as the null hypothesis (γ=0) is nested in the alternative (γ∈(0,1]). For λ=∞, the statistics of these tests are one‐to‐one related, as that of the likelihood ratio test is maxθ∈[0,1]2{L[S,(1−θ)T0+θ)Ta]−L(S,T0)}. A different test statistic brings about a different null distribution, i.e., a mixture of a point mass at zero and χ12 distribution asymptotically for the likelihood ratio test statistic (Self and Liang 1987). This may affect the p‐value calculation and inference. Both test statistics extend naturally into the finite λ domain, but there the quality of the asymptotic approximation of the likelihood ratio test's null distribution is unknown. The comparison between the two tests adopts the same set‐up as that for the type I error and power evaluation of Section 5. The resulting power curves can be found in the SM. They convey the following messages. The type I error of the likelihood ratio test is inflated. This questions the suitability of the asymptotic χ12 distribution as null distribution. Furthermore, the likelihood ratio test is on a par with the signpost test in terms of power.
An additional comparison involves the signpost test that employs the approximated signpost test statistic and its asymptotic distribution. Its type I error is well‐controlled, while its power curves overlay with that of the signpost test with a bootstrapped null distribution.
Test Performance
We evaluate the power of the proposed signpost tests with test statistics θ^(λ), θ^∞ and θ∼∞ in simulation. We do so under various choices of sample sizes, dimensions, effect sizes and conditional dependence graphs. Moreover, we assess the signpost tests' robustness against violation of the key assumption that the true parameter is a weighted average of the null and alternative proposals.
We first specify the simulations details. We draw data Y1,…,Yn from the Gaussian graphical model N(0p,Ω−1). The sample size n and dimension p is chosen from the set {10,20,…,100} and {50,150}, respectively. The precision matrix Ω is a weighted average of the null and alternative proposal, i.e., Ω=(1−γ)T0+γTa. The weight parameter γ equals a value from {0,0.1,0.2,…,1}. Throughout we set T0=Ipp and employ various Ta. All employed Ta have a unit diagonal and differ in their off‐diagonal structure. These structures represent the topology of the conditional independence graphs: uniform (saturated), banded, stripe (two hubs), and scale‐free (see the SM for details). With generated data, we use the signpost test with a thousand parametric bootstraps to evaluate the null hypothesis H0:γ=0 vs. the alternative Ha:γ>0. The signpost test is declared significant if the p‐value exceeds the significance level α=0.05. We repeat the above a thousand times for all (n,p,γ,Ta)‐combinations. Per such combination, we calculate the proportion of rejected null hypothesis.
Before we report the signpost test's power, we note that the type I error is well‐controlled by construction. For, under the null hypothesis, both the observed and bootstrapped test statistics are obtained from data drawn from the same distribution. Superfluously, we confirmed in simulation that the histogram (not shown) of the signpost test's p‐values is indeed uniform under H0.
The power curves (see SM) of the signpost test equipped with the θ^∞ test statistic show an improvement in the signpost test's power with an increase in either sample size and/or effect size. Both are properties of a useful test, as larger sample sizes enable a better inference and a larger effect is easier to detect.
The θ^∞ test statistic equipped signpost test's power profits from a larger dimension of the precision matrix. The gain in power is consistent over sample sizes, effect sizes, and topologies. This gain can intuitively be understood in two ways.
i)
The number of elements that differ between the null and alternative targets increases with dimension. More differing elements are more locations that exhibit the effect of a nonzero γ. These enhance the signpost test's capacity to detect a nonzero γ. This is supported by the differences of the power curves among the simulations with the various employed topologies of the alternative target. For instance, compare the situations where Ta has a chain or banded conditional independence graph. The case involving the latter has two extra off‐diagonals with nonzero elements and its power curves are systematically above the case with the former.
ii)
The (absolute) size of the differences among elements of the null and alternative target affects the gain in power. This is due to the fact that the size and γ enter the target multiplicatively. Hence, a larger difference among the elements of T0 and Ta amplifies the effect of γ, making it easier for the signpost test to pick it up.
Practically, the power curves enable us to deduce a rough sample size indication for the study. For instance, would we want to detect a step size γ=0.20 to distinguish between two 150×150‐dimensional precision matrices with an empty and scale‐free topology, we require at least n=50 samples to do so with 90% power. This sample size is heavily dependent on the assumed topologies of the precision matrices' conditional independence graphs and the difference between the precision matrices' elements. Irrespective of the exact sample size, it tells us that a limited sample size is sufficient to detect the relevance of a proposed direction in a high‐dimensional parameter space.
The power curves of signpost test equipped with the approximate test statistic θ∼∞ using its asymptotic distribution as an approximation to its null distribution to calculate the p‐values are depicted in the SM. This version of the signpost test exhibits comparable—if not equal—power to the one equipped with the θ^∞ test statistic and the parametric bootstrap generated null distribution. This is understandable as the approximate test statistic θ∼∞ is an unbiased estimator of γ=0 (Proposition 4.1) and, thereby, an appropriate summary of the evidence against H0:γ=0.
We now study the power of the signpost test with the test statistic θ^(λ) as defined in Display (1). We adopt the same simulation set‐up as described above and use the modified test statistic with λ=10,100,1000. The resulting power curves for a scale‐free alternative target (with power curves representative for all alternative target types) are shown in the SM. These curves reveal that the signpost test equipped with the finite λ test statistic θ^(λ) has slightly less power than its counterpart with the θ^∞ test statistic. Hence, the signpost test with the θ^∞ test statistic is to be preferred as it yields a better power. We believe this to be rooted in the minimum variance of the bi‐targeted ridge precision estimator at λ=∞. Irrespectively, it conveniently avoids the issue of making an informed choice on λ.
So far we have investigated the signpost test's behavior under the assumption that the precision matrix Ω is on the line between the two proposals T0 and Ta. This assumption is rather stringent. We therefore provide some insight on the signpost test's behavior under violation of this assumption. We take Ω as a perturbed version of Ta (or vice versa). The perturbation amounts to a rotation of (part of) the eigenspace but leaves the eigenvalues unaffected. Technical details of the perturbation are deferred to the SM. The rotation adheres to the signpost test's motivation as the test evaluates a direction. Moreover, the rotation ensures the positive‐definite perturbed target/precision matrix. We then repeat the power simulations with identical settings as before but with a precision matrix that does not adhere to the key assumption. In these simulations, we vary the severity of the violation of the assumption through different ‘angles’ of the rotation.
The plots showing the signpost test's power under the aforementioned misspecification of its underpinning assumption Ω=(1−γ)T0+γTa can be found in the SM. Their main takeaway is that the signpost test still has considerable power under (mild) misspecification of the alternative proposal. Put differently, the signpost test can still detect (with some power) whether it is worthwhile to pursue the direction of its signpost, even though the true parameter is not on the line from the null to the alternative target.
5.1
Comparison
We compare the signpost test with the likelihood ratio test. The latter is a natural competitor of the signpost test as the null hypothesis (γ=0) is nested in the alternative (γ∈(0,1]). For λ=∞, the statistics of these tests are one‐to‐one related, as that of the likelihood ratio test is maxθ∈[0,1]2{L[S,(1−θ)T0+θ)Ta]−L(S,T0)}. A different test statistic brings about a different null distribution, i.e., a mixture of a point mass at zero and χ12 distribution asymptotically for the likelihood ratio test statistic (Self and Liang 1987). This may affect the p‐value calculation and inference. Both test statistics extend naturally into the finite λ domain, but there the quality of the asymptotic approximation of the likelihood ratio test's null distribution is unknown. The comparison between the two tests adopts the same set‐up as that for the type I error and power evaluation of Section 5. The resulting power curves can be found in the SM. They convey the following messages. The type I error of the likelihood ratio test is inflated. This questions the suitability of the asymptotic χ12 distribution as null distribution. Furthermore, the likelihood ratio test is on a par with the signpost test in terms of power.
An additional comparison involves the signpost test that employs the approximated signpost test statistic and its asymptotic distribution. Its type I error is well‐controlled, while its power curves overlay with that of the signpost test with a bootstrapped null distribution.
Application
6
Application
We show how to employ the signpost test in practice through a re‐analysis of six data sets from transcriptomic breast cancer studies. These datasets have been curated and formatted in identical fashion and are publicly available from the Bioconductor platform (https://bioconductor.org/) through the breastCancer*** packages (Schroeder et al. 2022). Each data set includes the expression levels of the same 22283 transcripts. These transcripts are subsetted to 120 transcripts that map to the ‘breast cancer ‐ homo sapiens (human)’ pathway as defined by the KEGG‐repository (Ogata et al. 1999) with KEGG‐ID: hsa05224. To adhere to the distributional assumptions of the Gaussian graphical model, the data are Gaussianized marginally, which does not affect the underlying conditional independence graph (Liu et al. 2009). Furthermore, each dataset comprises samples with estrogen receptor negative and positive status, denoted ER− and ER+, indicating the absence and presence, respectively, of the estrogen hormone, which is a pivotal driver in breast cancer (Foulkes et al. 2010). The ER− samples are less prevalent (see Table 1) and will be the focus of our re‐analysis. The re‐analysis of these data by means of the signpost test's methodology aims to illustrate how the information of the ER+ samples can be utilized in the learning of the pathway's regulatory network (as operationalized by the conditional independence graph of the Gaussian graphical model) from the ER− samples.
6.1
Empirical Validity
First, we assess, as shown in simulation and by mathematical analysis in preceding sections, the empirical substantiation of the test statistic as a metric for detection of the relevance of a signpost. This requires knowledge of the truth, which is typically absent for the data at hand. We may nonetheless construct a situation that reveals the validity of the proposed metric of the signpost's relevance through its behavior.
We require informative targets for the ER− and ER+ groups that are independent of the data used to evaluate the informativeness of the signpost. This ensures that the data are not used twice. To this end, we sacrifice (say) the bth breast cancer data set. This dataset is only used for target construction. The resulting targets are then employed in the signpost test with another datasets. As such, the hypothesis, which involves the constructed targets, and the data used for its evaluation are independent. From the data of the bth dataset, we form the ER− and ER+ targets through ridge penalized precision matrix estimation with a leave‐one‐out cross‐validated penalty parameter (van Wieringen and Peeters 2016). We denote these targets Ter-,b and Ter+,b for b=1,…,B. In the SM, we discuss other methods of target construction and investigate their effect.
To assess the signpost test statistic's validity, we assume the ER− and ER+ targets are formed from the b′th breast cancer dataset and consider the bth breast cancer dataset with b≠b′ to construct a sequence of ner-,b+1 datasets of ner-,b samples. The first constructed dataset comprises only the ner-,b ER− samples of this dataset. Each subsequent dataset is formed (details in the SM) by randomly replacing an ER− sample by a randomly selected ER+ sample from the same dataset. At the beginning and end of this sequence of datasets, we expect, respectively, Ter-,b′ and Ter+,b′ to be more informative. In particular, if we run through this sequence from the first to the last dataset, any valid metric of signpost's relevance would increase monotonically thereby expressing an increased preference for the ER+ target.
We evaluate the signpost test statistic θ^∞ for the afore‐described sequence of datasets with an increasing ratio of ER+ to ER− samples. This is done for all possible (b,b′) dataset pairs, with b′ used for target formation and b for test statistic evaluation. We repeat this exercise 50 times to reduce the randomness in the construction of the dataset sequences as the ER− samples are replaced randomly by ER+ samples. Furthermore, to investigate the dependence of the target formation on the datasets' particulars, we also consider targets sampled from the convex hull of the, e.g., {Ter+,b}b=1B. This gives ER+ targets Ter+,all=∑b=1BwbTer+,b with the wb>0 randomly sampled from the B‐simplex to ensure that ∑k=1Kwb=1.
The estimated signpost test statistics of all aforementioned settings are plotted against the dataset order (Figure 2). The estimated signpost test statistics exhibit positive concordance with the order of the dataset sequence. This behavior is systematic over all considered settings, although concordance strengths vary, which—apart from inherent particulars of the lab and population—stem from difference in the sample size and ER+/ER− prevalence. The signpost test statistic is thus a valid metric of a signpost's relevance.
One may object to the concluded validity as there is no setting where the signpost test statistic runs from θ^∞=0 to θ^∞=1 over the order of the dataset sequence. This was not expected as the validity exercise hinges upon the strong assumption that the results from one breast cancer study are informative for the analysis of another. This informativeness is obscured by the inherent differences in their sampled population, lab protocols, measurement devices, and so on. Other reasons may explain the moderate differences of the θ^∞’s at both ends of the dataset sequences. First, high‐throughput platforms are notoriously noisy and this noise may obscure the detection of the signpost's relevance. Second, the ER− and ER+'s Gaussian graphical models are not too different after all. Or, either the ER− or the ER+ group is heterogeneous, which too hampers the detection of the signpost's relevance. Hence, our focus on the concordance to empirically substantiate the signpost test statistic's validity.
6.2
Test and Fit
Within the context of the breast cancer studies, we now apply the signpost test to the archetypical case we encounter in practice. In that case, we have no knowledge on the Gaussian graphical model from the ER− samples. This is represented by the null target T0=Ipp, which corresponds to an empty conditional independence graph and its unit diagonal matches with the variates' marginal variances after the Gaussianization. The alternative target is the Gaussian graphical model's precision matrix from a related population, the ER+ samples, which has been estimated by ridge penalized likelihood estimation with a cross‐validated penalty parameter and has been standardized to have a unit diagonal. We then apply the signpost test to each study's ER− samples for which we assess the relevance of an ER+ target.
The results of the signpost test, both the found test statistic θ∼∞ and the p‐value (based on K=1000 redraws), are displayed in Table 1. In all datasets, the signpost test is significant at the α=0.05 level. We thus conclude that (temporarily) following the direction of the alternative target derived from the ER+ samples brings us closer to the value of the parameter of the ER− samples' Gaussian graphical model. The test statistic indicates the step size to be taken in that direction, which is substantial as all are ≥0.2. The results of the signpost test with the approximate test statistic θ∼∞, provided in the SM, confirm the conclusion drawn from Table 1.
We diagnose the validity of the principal assumption, i.e., the precision matrix' parameterization Ω=T0+θ(Ta−T0), underlying the test for the breast cancer data. This assumption implies a linear relationship between the non‐redundant elements of Ω−T0 and those of Ta−T0. We assess this linearity visually, with the unknown Ω replaced by an estimate. The use of a (regularized) estimate may partially obscure the linearity. We deem the principal assumption tenable if the plot shows a concordant relation that can roughly be approximated by a linear one. We generated this diagnostic plot for each performed signpost test. Two representative ones are shown in Figure 3. In one plot, the principal assumption is clearly unproblematic, while in the other the assumption is not invalid but certainly very noisy.
A natural follow‐up question is to ask whether the model fit also improves in the proposed direction? To assess this, we exploit the interpretation of the test statistic as the position of the true precision matrix on the line segment connecting the targets. We evaluate whether the corresponding precision matrix, i.e., the average of the targets weighted by the test statistic, yields an improved fit. We therefore study the Frobenius and quadratic loss:
respectively. For each dataset, both losses are plotted against θ (see Figure 4). The curves are compared to the loss of the evaluated test statistic. A first takeaway from Figure 4 is that all curves decrease on the interval θ∈[0,a) for some a>0 clearly away from zero. This interval always includes θ=θ^∞. Moreover, the loss is generally minimized not at but certainly close to θ=θ^∞. This suggests that the test statistics, when interpreted as the informativeness of the direction of the alternative target, are conservative. In all, these findings corroborate the signpost test's result that the data support the proposed direction from T0 to Ter+ to find the parameter of the ER− samples’ Gaussian graphical model.
6.3
Further Analyses
The SM contains additional analyses of the breast cancer data. These
i)
provide further down‐stream analyses of the pathway's reconstructed conditional independence graph.
ii)
employ differently constructed targets to investigate their effect on the signpost test's inference.
Application
We show how to employ the signpost test in practice through a re‐analysis of six data sets from transcriptomic breast cancer studies. These datasets have been curated and formatted in identical fashion and are publicly available from the Bioconductor platform (https://bioconductor.org/) through the breastCancer*** packages (Schroeder et al. 2022). Each data set includes the expression levels of the same 22283 transcripts. These transcripts are subsetted to 120 transcripts that map to the ‘breast cancer ‐ homo sapiens (human)’ pathway as defined by the KEGG‐repository (Ogata et al. 1999) with KEGG‐ID: hsa05224. To adhere to the distributional assumptions of the Gaussian graphical model, the data are Gaussianized marginally, which does not affect the underlying conditional independence graph (Liu et al. 2009). Furthermore, each dataset comprises samples with estrogen receptor negative and positive status, denoted ER− and ER+, indicating the absence and presence, respectively, of the estrogen hormone, which is a pivotal driver in breast cancer (Foulkes et al. 2010). The ER− samples are less prevalent (see Table 1) and will be the focus of our re‐analysis. The re‐analysis of these data by means of the signpost test's methodology aims to illustrate how the information of the ER+ samples can be utilized in the learning of the pathway's regulatory network (as operationalized by the conditional independence graph of the Gaussian graphical model) from the ER− samples.
6.1
Empirical Validity
First, we assess, as shown in simulation and by mathematical analysis in preceding sections, the empirical substantiation of the test statistic as a metric for detection of the relevance of a signpost. This requires knowledge of the truth, which is typically absent for the data at hand. We may nonetheless construct a situation that reveals the validity of the proposed metric of the signpost's relevance through its behavior.
We require informative targets for the ER− and ER+ groups that are independent of the data used to evaluate the informativeness of the signpost. This ensures that the data are not used twice. To this end, we sacrifice (say) the bth breast cancer data set. This dataset is only used for target construction. The resulting targets are then employed in the signpost test with another datasets. As such, the hypothesis, which involves the constructed targets, and the data used for its evaluation are independent. From the data of the bth dataset, we form the ER− and ER+ targets through ridge penalized precision matrix estimation with a leave‐one‐out cross‐validated penalty parameter (van Wieringen and Peeters 2016). We denote these targets Ter-,b and Ter+,b for b=1,…,B. In the SM, we discuss other methods of target construction and investigate their effect.
To assess the signpost test statistic's validity, we assume the ER− and ER+ targets are formed from the b′th breast cancer dataset and consider the bth breast cancer dataset with b≠b′ to construct a sequence of ner-,b+1 datasets of ner-,b samples. The first constructed dataset comprises only the ner-,b ER− samples of this dataset. Each subsequent dataset is formed (details in the SM) by randomly replacing an ER− sample by a randomly selected ER+ sample from the same dataset. At the beginning and end of this sequence of datasets, we expect, respectively, Ter-,b′ and Ter+,b′ to be more informative. In particular, if we run through this sequence from the first to the last dataset, any valid metric of signpost's relevance would increase monotonically thereby expressing an increased preference for the ER+ target.
We evaluate the signpost test statistic θ^∞ for the afore‐described sequence of datasets with an increasing ratio of ER+ to ER− samples. This is done for all possible (b,b′) dataset pairs, with b′ used for target formation and b for test statistic evaluation. We repeat this exercise 50 times to reduce the randomness in the construction of the dataset sequences as the ER− samples are replaced randomly by ER+ samples. Furthermore, to investigate the dependence of the target formation on the datasets' particulars, we also consider targets sampled from the convex hull of the, e.g., {Ter+,b}b=1B. This gives ER+ targets Ter+,all=∑b=1BwbTer+,b with the wb>0 randomly sampled from the B‐simplex to ensure that ∑k=1Kwb=1.
The estimated signpost test statistics of all aforementioned settings are plotted against the dataset order (Figure 2). The estimated signpost test statistics exhibit positive concordance with the order of the dataset sequence. This behavior is systematic over all considered settings, although concordance strengths vary, which—apart from inherent particulars of the lab and population—stem from difference in the sample size and ER+/ER− prevalence. The signpost test statistic is thus a valid metric of a signpost's relevance.
One may object to the concluded validity as there is no setting where the signpost test statistic runs from θ^∞=0 to θ^∞=1 over the order of the dataset sequence. This was not expected as the validity exercise hinges upon the strong assumption that the results from one breast cancer study are informative for the analysis of another. This informativeness is obscured by the inherent differences in their sampled population, lab protocols, measurement devices, and so on. Other reasons may explain the moderate differences of the θ^∞’s at both ends of the dataset sequences. First, high‐throughput platforms are notoriously noisy and this noise may obscure the detection of the signpost's relevance. Second, the ER− and ER+'s Gaussian graphical models are not too different after all. Or, either the ER− or the ER+ group is heterogeneous, which too hampers the detection of the signpost's relevance. Hence, our focus on the concordance to empirically substantiate the signpost test statistic's validity.
6.2
Test and Fit
Within the context of the breast cancer studies, we now apply the signpost test to the archetypical case we encounter in practice. In that case, we have no knowledge on the Gaussian graphical model from the ER− samples. This is represented by the null target T0=Ipp, which corresponds to an empty conditional independence graph and its unit diagonal matches with the variates' marginal variances after the Gaussianization. The alternative target is the Gaussian graphical model's precision matrix from a related population, the ER+ samples, which has been estimated by ridge penalized likelihood estimation with a cross‐validated penalty parameter and has been standardized to have a unit diagonal. We then apply the signpost test to each study's ER− samples for which we assess the relevance of an ER+ target.
The results of the signpost test, both the found test statistic θ∼∞ and the p‐value (based on K=1000 redraws), are displayed in Table 1. In all datasets, the signpost test is significant at the α=0.05 level. We thus conclude that (temporarily) following the direction of the alternative target derived from the ER+ samples brings us closer to the value of the parameter of the ER− samples' Gaussian graphical model. The test statistic indicates the step size to be taken in that direction, which is substantial as all are ≥0.2. The results of the signpost test with the approximate test statistic θ∼∞, provided in the SM, confirm the conclusion drawn from Table 1.
We diagnose the validity of the principal assumption, i.e., the precision matrix' parameterization Ω=T0+θ(Ta−T0), underlying the test for the breast cancer data. This assumption implies a linear relationship between the non‐redundant elements of Ω−T0 and those of Ta−T0. We assess this linearity visually, with the unknown Ω replaced by an estimate. The use of a (regularized) estimate may partially obscure the linearity. We deem the principal assumption tenable if the plot shows a concordant relation that can roughly be approximated by a linear one. We generated this diagnostic plot for each performed signpost test. Two representative ones are shown in Figure 3. In one plot, the principal assumption is clearly unproblematic, while in the other the assumption is not invalid but certainly very noisy.
A natural follow‐up question is to ask whether the model fit also improves in the proposed direction? To assess this, we exploit the interpretation of the test statistic as the position of the true precision matrix on the line segment connecting the targets. We evaluate whether the corresponding precision matrix, i.e., the average of the targets weighted by the test statistic, yields an improved fit. We therefore study the Frobenius and quadratic loss:
respectively. For each dataset, both losses are plotted against θ (see Figure 4). The curves are compared to the loss of the evaluated test statistic. A first takeaway from Figure 4 is that all curves decrease on the interval θ∈[0,a) for some a>0 clearly away from zero. This interval always includes θ=θ^∞. Moreover, the loss is generally minimized not at but certainly close to θ=θ^∞. This suggests that the test statistics, when interpreted as the informativeness of the direction of the alternative target, are conservative. In all, these findings corroborate the signpost test's result that the data support the proposed direction from T0 to Ter+ to find the parameter of the ER− samples’ Gaussian graphical model.
6.3
Further Analyses
The SM contains additional analyses of the breast cancer data. These
i)
provide further down‐stream analyses of the pathway's reconstructed conditional independence graph.
ii)
employ differently constructed targets to investigate their effect on the signpost test's inference.
Conclusion
7
Conclusion
We presented a signpost test to detect whether valuable information regarding the Gaussian graphical model parameter is to be found in the direction of a suggested value. The test has good power to do so with limited sample size, and even benefits from larger dimensions. Moreover, the signpost test still has reasonable power if the true direction deviated from the suggested one. We illustrated the signpost test's practical use in a breast cancer application, where the reconstruction of the gene–gene interaction network of a lesser prevalent subgroup benefited from an estimate of the more prevalent one.
We envision that the signpost test will also find its use in federated learning. For privacy reasons, the EU's General Data Protection Regulation (GDPR) limits the exchange of data between medical institutions. But these medical institutions are allowed to exchange summary statistics, e.g., parameter estimates, of the data. These summary statistics serve as the external knowledge evaluated in the signpost test. For instance, this external knowledge may originate from a different country with a different population. Or, summary statistics may be exchanged between a general hospital and a cancer clinic. Both medical institutions treat cancer patients, but the latter typically receives the more severe cases. In either case, the signpost test assesses the informativeness of the external knowledge for the in‐house learning problem. This we illustrate in the SM.
Envisioned future work would extend this work in two directions. A first extension allows for the incorporation of more than one alternative target/proposal. The signpost test then evaluates whether the precision matrix lies in a hyperplane spanned by parameter values suggested by multiple external knowledge domains. Another line of future research would venture deeper into the realm of graphical models. For instance, vector auto‐regressive models do more justice to the dynamics of the cellular regulatory system.
Conclusion
We presented a signpost test to detect whether valuable information regarding the Gaussian graphical model parameter is to be found in the direction of a suggested value. The test has good power to do so with limited sample size, and even benefits from larger dimensions. Moreover, the signpost test still has reasonable power if the true direction deviated from the suggested one. We illustrated the signpost test's practical use in a breast cancer application, where the reconstruction of the gene–gene interaction network of a lesser prevalent subgroup benefited from an estimate of the more prevalent one.
We envision that the signpost test will also find its use in federated learning. For privacy reasons, the EU's General Data Protection Regulation (GDPR) limits the exchange of data between medical institutions. But these medical institutions are allowed to exchange summary statistics, e.g., parameter estimates, of the data. These summary statistics serve as the external knowledge evaluated in the signpost test. For instance, this external knowledge may originate from a different country with a different population. Or, summary statistics may be exchanged between a general hospital and a cancer clinic. Both medical institutions treat cancer patients, but the latter typically receives the more severe cases. In either case, the signpost test assesses the informativeness of the external knowledge for the in‐house learning problem. This we illustrate in the SM.
Envisioned future work would extend this work in two directions. A first extension allows for the incorporation of more than one alternative target/proposal. The signpost test then evaluates whether the precision matrix lies in a hyperplane spanned by parameter values suggested by multiple external knowledge domains. Another line of future research would venture deeper into the realm of graphical models. For instance, vector auto‐regressive models do more justice to the dynamics of the cellular regulatory system.
Conflicts of Interest
Conflicts of Interest
The authors declare no conflict of interest.
The authors declare no conflict of interest.
Open Research Badges
Open Research Badges
This article has earned an Open Data badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section.
This article has earned an open data badge “Reproducible Research” for making publicly available the code necessary to reproduce the reported results. “The results reported in this article could fully be reproduced.”
This article has earned an Open Data badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section.
This article has earned an open data badge “Reproducible Research” for making publicly available the code necessary to reproduce the reported results. “The results reported in this article could fully be reproduced.”
Supporting information
Supporting information
Supporting File: bimj70115‐sup‐0001‐Datacode.zip.
Supporting File: bimj70115‐sup‐0001‐Datacode.zip.
출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.
🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반
- Effective use of PROs for survival prediction: Transformer-based modelling in NSCLC patients.
- Bayes factor hypothesis testing in meta-analyses: Practical advantages and methodological considerations.
- Temporal dynamics of mesenchymal stem cell administration influence immune modulation in a 4T1 breast cancer model.
- Cost comparison of osimertinib plus platinum-pemetrexed versus amivantamab plus lazertinib for the first-line treatment of patients with locally advanced or metastatic epidermal growth factor receptor-mutated non-small cell lung cancer.
- Low serum pseudocholinesterase levels are associated with mortality in patients with hepatocellular carcinoma.
- Unraveling the Enigma of Melanoma Brain Metastasis: New Molecular Insights and Therapeutic Directions.