Title: | High-Dimensional Covariate-Augmented Overdispersed Poisson Factor Model |
---|---|
Description: | A covariate-augmented overdispersed Poisson factor model is proposed to jointly perform a high-dimensional Poisson factor analysis and estimate a large coefficient matrix for overdispersed count data. More details can be referred to Liu et al. (2024) <doi:10.1093/biomtc/ujae031>. |
Authors: | Wei Liu [aut, cre], Qingzhi Zhong [aut] |
Maintainer: | Wei Liu <[email protected]> |
License: | GPL-3 |
Version: | 1.2 |
Built: | 2024-10-30 04:00:44 UTC |
Source: | https://github.com/feiyoung/coap |
Generate simulated data from covariate-augmented Poisson factor models
gendata_simu( seed = 1, n = 300, p = 50, d = 20, q = 6, rank0 = 3, rho = c(1.5, 1), sigma2_eps = 0.1, seed.beta = 1 )
gendata_simu( seed = 1, n = 300, p = 50, d = 20, q = 6, rank0 = 3, rho = c(1.5, 1), sigma2_eps = 0.1, seed.beta = 1 )
seed |
a postive integer, the random seed for reproducibility of data generation process. |
n |
a postive integer, specify the sample size. |
p |
a postive integer, specify the dimension of count variables. |
d |
a postive integer, specify the dimension of covariate matrix. |
q |
a postive integer, specify the number of factors. |
rank0 |
a postive integer, specify the rank of the coefficient matrix. |
rho |
a numeric vector with length 2 and positive elements, specify the signal strength of regression coefficient and loading matrix, respectively. |
sigma2_eps |
a positive real, the variance of overdispersion error. |
seed.beta |
a postive integer, the random seed for reproducibility of data generation process by fixing the regression coefficient matrix beta. |
None
return a list including the following components: (1) X, the high-dimensional count matrix; (2) Z, the high-dimensional covriate matrix; (3) bbeta0, the low-rank large coefficient matrix; (4) B0, the loading matrix; (5) H0, the factor matrix; (6) rank: the true rank of bbeta0; (7) q: the true number of factors.
None
n <- 300; p <- 100 d <- 20; q <- 6; r <- 3 datlist <- gendata_simu(n=n, p=p, d=20, q=q, rank0=r) str(datlist)
n <- 300; p <- 100 d <- 20; q <- 6; r <- 3 datlist <- gendata_simu(n=n, p=p, d=20, q=q, rank0=r) str(datlist)
Fit the covariate-augmented overdispersed Poisson factor model
RR_COAP( X_count, multiFac = rep(1, nrow(X_count)), Z = matrix(1, nrow(X_count), 1), rank_use = 5, q = 15, epsELBO = 1e-05, maxIter = 30, verbose = TRUE, joint_opt_beta = FALSE, fast_svd = TRUE )
RR_COAP( X_count, multiFac = rep(1, nrow(X_count)), Z = matrix(1, nrow(X_count), 1), rank_use = 5, q = 15, epsELBO = 1e-05, maxIter = 30, verbose = TRUE, joint_opt_beta = FALSE, fast_svd = TRUE )
X_count |
a count matrix, the observed count matrix. |
multiFac |
an optional vector, the normalization factor for each unit; default as full-one vector. |
Z |
an optional matrix, the covariate matrix; default as a full-one column vector if there is no additional covariates. |
rank_use |
an optional integer, specify the rank of the regression coefficient matrix; default as 5. |
q |
an optional string, specify the number of factors; default as 15. |
epsELBO |
an optional positive vlaue, tolerance of relative variation rate of the envidence lower bound value, defualt as '1e-5'. |
maxIter |
the maximum iteration of the VEM algorithm. The default is 30. |
verbose |
a logical value, whether output the information in iteration. |
joint_opt_beta |
a logical value, whether use the joint optimization method to update bbeta. The default is |
fast_svd |
a logical value, whether use the fast SVD algorithm in the update of bbeta; default is |
None
return a list including the following components: (1) H, the predicted factor matrix; (2) B, the estimated loading matrix; (3) bbeta, the estimated low-rank large coefficient matrix; (4) invLambda, the inverse of the estimated variances of error; (5) H0, the factor matrix; (6) ELBO: the ELBO value when algorithm stops; (7) ELBO_seq: the sequence of ELBO values.
Liu, W. and Q. Zhong (2024). High-dimensional covariate-augmented overdispersed poisson factor model. arXiv preprint arXiv:2402.15071.
None
n <- 300; p <- 100 d <- 20; q <- 6; r <- 3 datlist <- gendata_simu(n=n, p=p, d=20, q=q, rank0=r) str(datlist) fitlist <- RR_COAP(X_count=datlist$X, Z = datlist$Z, q=6, rank_use=3) str(fitlist)
n <- 300; p <- 100 d <- 20; q <- 6; r <- 3 datlist <- gendata_simu(n=n, p=p, d=20, q=q, rank0=r) str(datlist) fitlist <- RR_COAP(X_count=datlist$X, Z = datlist$Z, q=6, rank_use=3) str(fitlist)
Select the number of factors and the rank of coefficient matrix in the covariate-augmented overdispersed Poisson factor model
selectParams( X_count, Z, multiFac = rep(1, nrow(X_count)), q_max = 15, r_max = 24, threshold = c(0.1, 0.01), verbose = TRUE, ... )
selectParams( X_count, Z, multiFac = rep(1, nrow(X_count)), q_max = 15, r_max = 24, threshold = c(0.1, 0.01), verbose = TRUE, ... )
X_count |
a count matrix, the observed count matrix. |
Z |
an optional matrix, the covariate matrix; default as a full-one column vector if there is no additional covariates. |
multiFac |
an optional vector, the normalization factor for each unit; default as full-one vector. |
q_max |
an optional string, specify the upper bound for the number of factors; default as 15. |
r_max |
an optional integer, specify the upper bound for the rank of the regression coefficient matrix; default as 24. |
threshold |
an optional 2-dimensional positive vector, specify the the thresholds that filters the singular values of beta and B, respectively. |
verbose |
a logical value, whether output the information in iteration. |
... |
other arguments passed to the function |
The threshold is to filter the singular values with low signal, to assist the identification of underlying model structure.
return a named vector with names 'hr' and 'hq', the estimated rank and number of factors.
None
n <- 300; p <- 100 d <- 20; q <- 6; r <- 3 datlist <- gendata_simu(seed=30, n=n, p=p, d=20, q=q, rank0=r) str(datlist) set.seed(1) para_vec <- selectParams(X_count=datlist$X, Z = datlist$Z) print(para_vec)
n <- 300; p <- 100 d <- 20; q <- 6; r <- 3 datlist <- gendata_simu(seed=30, n=n, p=p, d=20, q=q, rank0=r) str(datlist) set.seed(1) para_vec <- selectParams(X_count=datlist$X, Z = datlist$Z) print(para_vec)