Title: | Two-Directional Simultaneous Inference for High-Dimensional Models |
---|---|
Description: | A general framework of two directional simultaneous inference is provided for high-dimensional as well as the fixed dimensional models with manifest variable or latent variable structure, such as high-dimensional mean models, high- dimensional sparse regression models, and high-dimensional latent factors models. It is making the simultaneous inference on a set of parameters from two directions, one is testing whether the estimated zero parameters indeed are zero and the other is testing whether there exists zero in the parameter set of non-zero. More details can be referred to Wei Liu, et al. (2023) <doi:10.1080/07350015.2023.2191672>. |
Authors: | Wei Liu [aut, cre], Huazhen Lin [aut] |
Maintainer: | Wei Liu <[email protected]> |
License: | GPL |
Version: | 0.3.0 |
Built: | 2024-12-19 03:05:51 UTC |
Source: | https://github.com/feiyoung/tosi |
Evaluate the model selection consistency rate (SCR), F-measure and the smallest canonical correlation and the larger values mean better peformance in model selection and parameter estimation.
assessBsFun(hB, B0)
assessBsFun(hB, B0)
hB |
a |
B0 |
a |
return a vecotor with three compoents whose names are scr,fmea, ccorB.
nothing
Liu Wei
dat <- gendata_Fac(n = 300, p = 500) res <- gsspFactorm(dat$X) assessBsFun(res$sphB, dat$B0) n <- nrow(dat$X) res <- gsspFactorm(dat$X, lambda1=0.05*n^(1/4), lambda2=9*n^(1/4)) assessBsFun(res$sphB, dat$B0)
dat <- gendata_Fac(n = 300, p = 500) res <- gsspFactorm(dat$X) assessBsFun(res$sphB, dat$B0) n <- nrow(dat$X) res <- gsspFactorm(dat$X, lambda1=0.05*n^(1/4), lambda2=9*n^(1/4)) assessBsFun(res$sphB, dat$B0)
Evalute the BIC values on a set of grids of penalty parameters.
bic.spfac(X, c1.max= 10, nlamb1=10, C10=4, c2.max=10, nlamb2=10, C20=4)
bic.spfac(X, c1.max= 10, nlamb1=10, C10=4, c2.max=10, nlamb2=10, C20=4)
X |
a |
c1.max |
a positve scalar, the maximum of the grids of c1. |
nlamb1 |
a positive integer, the length of grids of penalty parameter lambda1. |
C10 |
a positve scalar, the penalty factor C1 of modified BIC. |
c2.max |
a positve scalar, the maximum of the grids of c2. |
nlamb2 |
a positive integer, the length of grids of penalty parameter lambda2. |
C20 |
a positve scalar, the penalty factor C2 of modified BIC. |
return a list with class named pena_info
and BIC
, including following components:
lambda1.min |
a positive number, the penalty value for lambda1 corresponding to the minimum BIC on grids. |
lambda2.min |
a positive number, the penalty value for lambda2 corresponding to the minimum BIC on grids. |
bic1 |
a numeric matrix with three columns named c1, lambda1 and bic1, where each row is corresponding to each grid. |
bic2 |
a numeric matrix with three columns named c2, lambda2 and bic2, where each row is corresponding to each grid. |
nothing
Liu Wei
Wei Liu, Huazhen Lin, Jin Liu (2020). Estimation and inference on high-dimensional sparse factor models.
datlist1 <- gendata_Fac(n= 100, p = 500) X <- datlist1$X spfac <- gsspFactorm(X, q=NULL) # use default values for lambda's. assessBsFun(spfac$sphB, datlist1$B0) biclist <- bic.spfac(datlist1$X, c2.max=20,nlamb1 = 10) # # select lambda's values using BIC.
datlist1 <- gendata_Fac(n= 100, p = 500) X <- datlist1$X spfac <- gsspFactorm(X, q=NULL) # use default values for lambda's. assessBsFun(spfac$sphB, datlist1$B0) biclist <- bic.spfac(datlist1$X, c2.max=20,nlamb1 = 10) # # select lambda's values using BIC.
Evaluate the smallest canonical correlation for two set of variables, each set of variables is represented by a matrix whose columns are variables.
ccorFun(hH, H)
ccorFun(hH, H)
hH |
a |
H |
a |
return a scalar value, the smallest canonical correlation.
nothing
Liu Wei
dat <- gendata_Fac(n = 300, p = 500) res <- gsspFactorm(dat$X) ccorFun(res$hH, dat$H0)
dat <- gendata_Fac(n = 300, p = 500) res <- gsspFactorm(dat$X) ccorFun(res$hH, dat$H0)
Evalute the CV values on a set of grids of penalty parameters.
cv.spfac(X, lambda1_set, lambda2_set, nfolds=5)
cv.spfac(X, lambda1_set, lambda2_set, nfolds=5)
X |
a |
lambda1_set |
a positve vector, the grid for lambda_1. |
lambda2_set |
a positve vector, the grid for lambda_2. |
nfolds |
a positve integer, the folds of cross validation. |
return a list including following components:
lamcv.min |
a 3-dimensional vector, the penalty value for lambda_1 and lambda_2 corresponding to the minimum CV on grids. |
lamcvMat |
a numeric matrix with three columns named lambda_1, lambda_2 and cv, where each row is corresponding to each grid. |
lambda1_set |
the used grid for lambda_1. |
lambda2_set |
the used grid for lambda_2. |
nothing
Liu Wei
Wei Liu, Huazhen Lin, (2019). Estimation and inference on high-dimensional sparse factor models.
datlist1 <- gendata_Fac(n= 100, p = 300, rho=1) X <- datlist1$X spfac <- gsspFactorm(X, q=NULL) # use default values for lambda's. assessBsFun(spfac$sphB, datlist1$B0) lambda1_set <- seq(0.2, 2, by=0.3) lambda2_set <- 1:8 # select lambda's values using CV method. lamList <- cv.spfac(X, lambda1_set, lambda2_set, nfolds=5) spfac <- gsspFactorm(X, q=NULL,lamList$lamcv.min[1], lamList$lamcv.min[2]) assessBsFun(spfac$sphB, datlist1$B0)
datlist1 <- gendata_Fac(n= 100, p = 300, rho=1) X <- datlist1$X spfac <- gsspFactorm(X, q=NULL) # use default values for lambda's. assessBsFun(spfac$sphB, datlist1$B0) lambda1_set <- seq(0.2, 2, by=0.3) lambda2_set <- 1:8 # select lambda's values using CV method. lamList <- cv.spfac(X, lambda1_set, lambda2_set, nfolds=5) spfac <- gsspFactorm(X, q=NULL,lamList$lamcv.min[1], lamList$lamcv.min[2]) assessBsFun(spfac$sphB, datlist1$B0)
Conduct the simultaneous inference for a set of loading vectors in the NUll hypothesises H01 that assumes the set of loading vectors are all zeroes.
FacRowMaxST(X, G1, q=NULL, Nsplit= 5, sub.frac=0.5, alpha=0.05, standardized=FALSE,seed=1)
FacRowMaxST(X, G1, q=NULL, Nsplit= 5, sub.frac=0.5, alpha=0.05, standardized=FALSE,seed=1)
X |
a |
G1 |
a index set with values of components between 1 and p, the testing set in H01. |
q |
a positive integer, the number of factors. It will automatically selected by a criterion if it is NULL. |
Nsplit |
a positive integer, the number of data spliting, default as 5. |
sub.frac |
a positive number between 0 and 1, the proportion of the sample used in stage I. |
alpha |
a positive real, the significance level. |
standardized |
a logical value, whether use the standardized test statistic. |
seed |
a non-negative integer, the random seed. |
return a vector with names 'CriticalValue', 'TestStatistic', 'reject_status', 'p-value' if Nsplit=1, and 'reject_status' and 'adjusted_p-value' if Nsplit>1.
nothing
Liu Wei
Wei Liu, Huazhen Lin, Jin Liu (2020). Estimation and inference on high-dimensional sparse factor models.
### Example dat <- gendata_Fac(n = 300, p = 500) res <- Factorm(dat$X) X <- dat$X # ex1: H01 is false G1 <- 1:10; # all are nonzero loading vectors FacRowMaxST(X, G1=G1, alpha=0.05, sub.frac=0.5) FacRowMaxST(X, q= 6, G1=G1, alpha=0.05, sub.frac=0.5) # specify the true number of factors # ex2: H01 is true G1 <- 481:500 # all are zero loading vectors FacRowMaxST(X, G1=G1, alpha=0.05, sub.frac=0.5) FacRowMaxST(X, q= 7, G1=G1, alpha=0.05, sub.frac=0.5) # specify a false number of factors
### Example dat <- gendata_Fac(n = 300, p = 500) res <- Factorm(dat$X) X <- dat$X # ex1: H01 is false G1 <- 1:10; # all are nonzero loading vectors FacRowMaxST(X, G1=G1, alpha=0.05, sub.frac=0.5) FacRowMaxST(X, q= 6, G1=G1, alpha=0.05, sub.frac=0.5) # specify the true number of factors # ex2: H01 is true G1 <- 481:500 # all are zero loading vectors FacRowMaxST(X, G1=G1, alpha=0.05, sub.frac=0.5) FacRowMaxST(X, q= 7, G1=G1, alpha=0.05, sub.frac=0.5) # specify a false number of factors
Conduct the simultaneous inference for a set of loading vectors inr the NUll hypothesises H02 that assumes there is zero loading vector in the set of loading vectors.
FacRowMinST(X, G2, q=NULL, Nsplit= 5, sub.frac=0.5, alpha=0.05, standardized=FALSE,seed=1)
FacRowMinST(X, G2, q=NULL, Nsplit= 5, sub.frac=0.5, alpha=0.05, standardized=FALSE,seed=1)
X |
a |
G2 |
a positive vector with values between 1 and p, the set of H02. |
q |
a positive integer, the number of factors. It will automatically selected by a criterion if it is NULL. |
Nsplit |
a positive integer, the number of data spliting, default as 5. |
sub.frac |
a positive number between 0 and 1, the proportion of the sample used in stage I. |
alpha |
a positive real, the significance level. |
standardized |
a logical value, whether use the standardized test statistic. |
seed |
a non-negative integer, the random seed. |
return a vector with names 'CriticalValue', 'TestStatistic', 'reject_status', 'p-value' if Nsplit=1, and 'reject_status' and 'adjusted_p-value' if Nsplit>1.
nothing
Liu Wei
Wei Liu, Huazhen Lin, Jin Liu (2020). Estimation and inference on high-dimensional sparse factor models.
### Example dat <- gendata_Fac(n = 300, p = 500) res <- Factorm(dat$X) X <- dat$X # ex1: H01 is false G2 <- 1:200; # all are nonzero loading vectors FacRowMinST(X, G2=G2, alpha=0.05, sub.frac=0.5) FacRowMinST(X, q= 6, G2=G2, alpha=0.05, sub.frac=0.5) # specify the true number of factors # ex2: H01 is true G2 <- 1:500 # all are zero loading vectors FacRowMinST(X, G2=G2, alpha=0.05, sub.frac=0.5) FacRowMinST(X, q= 7, G2=G2, alpha=0.05, sub.frac=0.5) # specify a false number of factors
### Example dat <- gendata_Fac(n = 300, p = 500) res <- Factorm(dat$X) X <- dat$X # ex1: H01 is false G2 <- 1:200; # all are nonzero loading vectors FacRowMinST(X, G2=G2, alpha=0.05, sub.frac=0.5) FacRowMinST(X, q= 6, G2=G2, alpha=0.05, sub.frac=0.5) # specify the true number of factors # ex2: H01 is true G2 <- 1:500 # all are zero loading vectors FacRowMinST(X, G2=G2, alpha=0.05, sub.frac=0.5) FacRowMinST(X, q= 7, G2=G2, alpha=0.05, sub.frac=0.5) # specify a false number of factors
Factor analysis to extract latent linear factor and estimate loadings.
Factorm(X, q=NULL)
Factorm(X, q=NULL)
X |
a |
q |
an integer between 1 and |
return a list with class named fac
, including following components:
hH |
a |
hB |
a |
q |
an integer between 1 and |
sigma2vec |
a p-dimensional vector, the estimated variance for each error term in model. |
propvar |
a positive number between 0 and 1, the explained propotion of cummulative variance by the |
egvalues |
a n-dimensional(n<=p) or p-dimensional(p<n) vector, the eigenvalues of sample covariance matrix. |
nothing
Liu Wei
Fan, J., Xue, L., and Yao, J. (2017). Sufficient forecasting using factor models. Journal of Econometrics.
dat <- gendata_Fac(n = 300, p = 500) res <- Factorm(dat$X) ccorFun(res$hH, dat$H0) # the smallest canonical correlation
dat <- gendata_Fac(n = 300, p = 500) res <- Factorm(dat$X) ccorFun(res$hH, dat$H0) # the smallest canonical correlation
Generate simulated data from high dimensional sparse factor model.
gendata_Fac(n, p, seed=1, q=6, pzero= floor(p/4), sigma2=0.1, gamma=1, heter=FALSE, rho=1)
gendata_Fac(n, p, seed=1, q=6, pzero= floor(p/4), sigma2=0.1, gamma=1, heter=FALSE, rho=1)
n |
a positive integer, the sample size. |
p |
an positive integer, the variable dimension. |
seed |
a nonnegative integer, the random seed, default as 1. |
q |
a positive integer, the number of factors. |
pzero |
a positive integer, the number of zero loading vectors, default as p/4. |
sigma2 |
a positive real number, the homogenous variance of error term. |
gamma |
a positive number, the common component of heteroscedasticity of error term. |
heter |
a logical value, indicates whether generate heteroscendastic error term. |
rho |
a positive number, controlling the magnitude of loading matrix. |
return a list including two components:
X |
a |
H0 |
a |
B0 |
a |
ind_nz |
a integer vector, the index vector for which rows of |
nothing
Liu Wei
dat <- gendata_Fac(n=300, p = 500) str(dat)
dat <- gendata_Fac(n=300, p = 500) str(dat)
Generate simulated data from for high-dimensional mean model.
gendata_Mean(n, p, s0= floor(p/2), seed=1, rho= 1, tau=1)
gendata_Mean(n, p, s0= floor(p/2), seed=1, rho= 1, tau=1)
n |
a positive integer, the sample size. |
p |
an positive integer, the variable dimension. |
s0 |
a positive integer, the number of nonzero components of mean . |
seed |
a nonnegative integer, the random seed, default as 1. |
rho |
a positive number between 0 and 1, controlling the correlation of data. |
tau |
a positive number, controlling the magnitude of covriance matrix. |
return a list including two components:
X |
a |
mu |
a p-dimensional vector, the mean vector. |
p0 |
a integer vector, the number of nonzero components of mean. |
nothing
Liu Wei
dat <- gendata_Mean(n=100, p = 100, s0=3) str(dat)
dat <- gendata_Mean(n=100, p = 100, s0=3) str(dat)
Generate simulated data from high-dimensional sparse regression model.
gendata_Reg(n=100, p = 20, s0=5, rho=1, seed=1)
gendata_Reg(n=100, p = 20, s0=5, rho=1, seed=1)
n |
a positive integer, the sample size, default as 100. |
p |
an positive integer, the dimension of covriates, default as 20. |
s0 |
a positive integer, the number of nonzero components of regression coefficients, default as 5. |
rho |
a positive number, controlling the magnitude of coefficients. |
seed |
a nonnegative integer, the random seed, default as 1. |
return a list including two components:
Y |
a |
X |
a |
beta0 |
a p-dimensional vector, the Reg. coefficients. |
index_nz |
a integer vector, the index of nonzero components of Reg. coefficients. |
nothing
Liu Wei
dat <- gendata_Reg(n=100, p = 100, s0=3) str(dat)
dat <- gendata_Reg(n=100, p = 100, s0=3) str(dat)
sparse factor analysis to extract latent linear factor and estimate row-sparse and entry-wise-sparse loading matrix.
gsspFactorm(X, q=NULL, lambda1=nrow(X)^(1/4), lambda2=nrow(X)^(1/4))
gsspFactorm(X, q=NULL, lambda1=nrow(X)^(1/4), lambda2=nrow(X)^(1/4))
X |
a |
q |
an integer between 1 and |
lambda1 |
a non-negative number, the row-sparse penalty parameter, default as |
lambda2 |
a non-negative number, the entry-sparse penalty parameter, default as |
return a list with class named fac
, including following components:
hH |
a |
sphB |
a |
hB |
a |
q |
an integer between 1 and |
propvar |
a positive number between 0 and 1, the explained propotion of cummulative variance by the |
egvalues |
a n-dimensional(n<=p) or p-dimensional(p<n) vector, the eigenvalues of sample covariance matrix. |
nothing
Liu Wei
Liu, W., Lin, H., Liu, J., & Zheng, S. (2020). Two-directional simultaneous inference for high-dimensional models. arXiv preprint arXiv:2012.11100.
dat <- gendata_Fac(n = 300, p = 500) res <- gsspFactorm(dat$X) ccorFun(res$hH, dat$H0) # the smallest canonical correlation ## comparison of l2 norm oldpar <- par(mar = c(5, 5, 2, 2), mfrow = c(1, 2)) plot(rowSums(dat$B0^2), type='o', ylab='l2B', main='True') l2B <- rowSums(res$sphB^2) plot(l2B, type='o', main='Est.') Bind <- ifelse(dat$B0==0, 0, 1) hBind <- ifelse(res$sphB==0, 0, 1) ## Select good penalty parameters dat <- gendata_Fac(n = 300, p = 200) res <- gsspFactorm(dat$X, lambda1=0.04*nrow(dat$X)^(1/4) ,lambda2=1*nrow(dat$X)^(1/4)) ccorFun(res$hH, dat$H0) # the smallest canonical correlation ## comparison of l2 norm plot(rowSums(dat$B0^2), type='o', ylab='l2B', main='True') l2B <- rowSums(res$sphB^2) plot(l2B, type='o', main='Est.') ## comparison of structure of loading matrix Bind <- ifelse(dat$B0==0, 0, 1) hBind <- ifelse(res$sphB==0, 0, 1) par(oldpar)
dat <- gendata_Fac(n = 300, p = 500) res <- gsspFactorm(dat$X) ccorFun(res$hH, dat$H0) # the smallest canonical correlation ## comparison of l2 norm oldpar <- par(mar = c(5, 5, 2, 2), mfrow = c(1, 2)) plot(rowSums(dat$B0^2), type='o', ylab='l2B', main='True') l2B <- rowSums(res$sphB^2) plot(l2B, type='o', main='Est.') Bind <- ifelse(dat$B0==0, 0, 1) hBind <- ifelse(res$sphB==0, 0, 1) ## Select good penalty parameters dat <- gendata_Fac(n = 300, p = 200) res <- gsspFactorm(dat$X, lambda1=0.04*nrow(dat$X)^(1/4) ,lambda2=1*nrow(dat$X)^(1/4)) ccorFun(res$hH, dat$H0) # the smallest canonical correlation ## comparison of l2 norm plot(rowSums(dat$B0^2), type='o', ylab='l2B', main='True') l2B <- rowSums(res$sphB^2) plot(l2B, type='o', main='Est.') ## comparison of structure of loading matrix Bind <- ifelse(dat$B0==0, 0, 1) hBind <- ifelse(res$sphB==0, 0, 1) par(oldpar)
Conduct the simultaneous inference for a set of mean components in the NUll hypothesises H01 that assumes the set of mean components are all zeroes.
MeanMax(X, test.set, Nsplit = 5,frac.size=0.5, standardized=FALSE,alpha=0.05, seed=1)
MeanMax(X, test.set, Nsplit = 5,frac.size=0.5, standardized=FALSE,alpha=0.05, seed=1)
X |
a |
test.set |
a positive vector with values between 1 and p, the set of H01. |
Nsplit |
a positive integer, the random split times used, default as 5. |
frac.size |
a positive real between 0 and 1, the proportion of the sample used in stage I. |
standardized |
a logical value, whether standerdize variables in stage I. |
alpha |
a positive real, the significant level. |
seed |
a non-negative integer, the random seed. |
return a vector with names 'CriticalValue', 'TestStatistic', 'reject_status', 'p-value' if Nsplit=1, and 'reject_status' and 'adjusted_p-value' if Nsplit>1.
nothing
Liu Wei
### Example n <- 100; p <- 100;i <- 1 s0 <- 5 # First five components are nonzeros rho <- 1; tau <- 1; dat1 <- gendata_Mean(n, p, s0, seed=i, rho, tau) # ex1: H01 is false MeanMax(dat1$X, 1:p) MeanMax(dat1$X, 1:p, Nsplit=1) # ex1: H01 is true MeanMax(dat1$X, p) MeanMax(dat1$X, p, Nsplit=1)
### Example n <- 100; p <- 100;i <- 1 s0 <- 5 # First five components are nonzeros rho <- 1; tau <- 1; dat1 <- gendata_Mean(n, p, s0, seed=i, rho, tau) # ex1: H01 is false MeanMax(dat1$X, 1:p) MeanMax(dat1$X, 1:p, Nsplit=1) # ex1: H01 is true MeanMax(dat1$X, p) MeanMax(dat1$X, p, Nsplit=1)
Conduct the simultaneous inference for a set of mean components in the the Null hypothesises H02 that assumes the set of mean components exist zero.
MeanMin(X, test.set, Nsplit = 5, frac.size=0.5, standardized=FALSE, alpha=0.05, seed=1)
MeanMin(X, test.set, Nsplit = 5, frac.size=0.5, standardized=FALSE, alpha=0.05, seed=1)
X |
a |
test.set |
a positive vector with values between 1 and p, the set of H02. |
Nsplit |
a positive integer, the random split times used, default as 5. |
frac.size |
a positive number between 0 and 1, the proportion of the sample used in stage I. |
standardized |
a logical value, whether standerdize in stage I. |
alpha |
a positive number, the significant level. |
seed |
a non-negative integer, the random seed. |
return a vector with names 'CriticalValue', 'TestStatistic', 'reject_status', 'p-value' if Nsplit=1, and 'reject_status' and 'adjusted_p-value' if Nsplit>1.
nothing
Liu Wei
### Example n <- 100; p <- 100; i <- 1 s0 <- 5 # First five components are nonzeros rho <- 4; tau <- 1; dat1 <- gendata_Mean(n, p, s0, seed=i, rho, tau) # ex1: H01 is false MeanMin(dat1$X, 1:s0) MeanMin(dat1$X, 1:s0, Nsplit=1) # ex1: H01 is true MeanMin(dat1$X, 1:p) MeanMin(dat1$X, 1:p, Nsplit=1)
### Example n <- 100; p <- 100; i <- 1 s0 <- 5 # First five components are nonzeros rho <- 4; tau <- 1; dat1 <- gendata_Mean(n, p, s0, seed=i, rho, tau) # ex1: H01 is false MeanMin(dat1$X, 1:s0) MeanMin(dat1$X, 1:s0, Nsplit=1) # ex1: H01 is true MeanMin(dat1$X, 1:p) MeanMin(dat1$X, 1:p, Nsplit=1)
Conduct the simultaneous inference for a set of regression coefficients in the null hypothesises H01 that assume the set of regression coefficients components are all zeroes.
RegMax(X, Y, G1, Nsplit = 5, sub.frac=0.5, alpha=0.05, seed=1, standardized=FALSE)
RegMax(X, Y, G1, Nsplit = 5, sub.frac=0.5, alpha=0.05, seed=1, standardized=FALSE)
X |
a |
Y |
a |
G1 |
a positive vector with values between 1 and p, the set of H01. |
Nsplit |
a positive integer, the random split times used, default as 5. |
sub.frac |
a positive number between 0 and 1, the proportion of the sample used in the stage I. |
alpha |
a positive real, the significance level. |
seed |
a non-negative integer, the random seed. |
standardized |
a logical value, whether standerdize the covariates matrix in the stage I. |
return a vector with names 'CriticalValue', 'TestStatistic', 'reject_status', 'p-value' if Nsplit=1, and 'reject_status' and 'adjusted_p-value' if Nsplit>1.
nothing
Liu Wei
Liu, W., Lin, H., Liu, J., & Zheng, S. (2020). Two-directional simultaneous inference for high-dimensional models. arXiv preprint arXiv:2012.11100.
### Example n <- 50; p <- 20; i <- 1 s0 <- 5 # First five components are nonzeros rho <- 1; dat1 <- gendata_Reg(n, p, s0, seed=i, rho) # ex1: H01 is false RegMax(dat1$X, dat1$Y, 1:p) # ex1: H01 is true RegMax(dat1$X, dat1$Y, p)
### Example n <- 50; p <- 20; i <- 1 s0 <- 5 # First five components are nonzeros rho <- 1; dat1 <- gendata_Reg(n, p, s0, seed=i, rho) # ex1: H01 is false RegMax(dat1$X, dat1$Y, 1:p) # ex1: H01 is true RegMax(dat1$X, dat1$Y, p)
Conduct the simultaneous inference for a set of regression coefficients in a null hypothesises H02 that assumes the set of regression coefficients components exist zero.
RegMin(X, Y, G2, Nsplit = 5, sub.frac=0.5, alpha=0.05, seed=1, standardized=FALSE)
RegMin(X, Y, G2, Nsplit = 5, sub.frac=0.5, alpha=0.05, seed=1, standardized=FALSE)
X |
a |
Y |
a |
G2 |
a positive vector with values between 1 and p, the set of regression coefficients in the null hypothesises H02. |
Nsplit |
a positive integer, the random split times used, default as 5. |
sub.frac |
a positive number between 0 and 1, the proportion of the sample used in the stage I. |
alpha |
a positive real, the significance level. |
seed |
a non-negative integer, the random seed. |
standardized |
a logical value, whether standerdize the covariates matrix in the stage I. |
return a vector with names 'CriticalValue', 'TestStatistic', 'reject_status', 'p-value' if Nsplit=1, and 'reject_status' and 'adjusted_p-value' if Nsplit>1.
nothing
Liu Wei
Liu, W., Lin, H., Liu, J., & Zheng, S. (2020). Two-directional simultaneous inference for high-dimensional models. arXiv preprint arXiv:2012.11100.
### Example n <- 100; p <- 20;i <- 1 s0 <- 5 # First five components are nonzeros rho <- 1; dat1 <- gendata_Reg(n, p, s0, seed=i, rho) # ex1: H01 is false RegMin(dat1$X, dat1$Y, 1:s0) # ex1: H01 is true RegMin(dat1$X, dat1$Y, p)
### Example n <- 100; p <- 20;i <- 1 s0 <- 5 # First five components are nonzeros rho <- 1; dat1 <- gendata_Reg(n, p, s0, seed=i, rho) # ex1: H01 is false RegMin(dat1$X, dat1$Y, 1:s0) # ex1: H01 is true RegMin(dat1$X, dat1$Y, p)