Title: | ALLelic Spectrum of Pleiotropy Informed Correlated Effects |
---|---|
Description: | Provides statistical tools to analyze heterogeneous effects of rare variants within genes that are associated with multiple traits. The package implements methods for assessing pleiotropic effects and identifying allelic heterogeneity, which can be useful in large-scale genetic studies. Methods include likelihood-based statistical tests to assess these effects. For more details, see Lu et al. (2024) <doi:10.1101/2024.10.01.614806>. |
Authors: | Wenhan Lu [aut, cre] |
Maintainer: | Wenhan Lu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.9 |
Built: | 2024-11-17 05:56:31 UTC |
Source: | https://github.com/cran/ALLSPICER |
ALLSPICE (ALLelic Spectrum of Pleiotropy Informed Correlated Effects)
ALLSPICE( data, pheno_corr, n_ind, gene = "GENENAME", pheno1 = "PHENO1", pheno2 = "PHENO2", beta1_field = "BETA1", beta2_field = "BETA2", af_field = "AF" )
ALLSPICE( data, pheno_corr, n_ind, gene = "GENENAME", pheno1 = "PHENO1", pheno2 = "PHENO2", beta1_field = "BETA1", beta2_field = "BETA2", af_field = "AF" )
data |
Input data with number of rows indicating number of variants, three columns are required: 1) effect sizes of variants for phenotype 1, 2) effect sizes of variants for phenotype 2, 3) allele frequency of variants Note: this should include variants from ONE gene that is associated with the two phenotypes, preferably of the SAME functional category after being filtered to variants with allele frequency below a certain threshold (e.g. 1e-4) |
pheno_corr |
phenotypic correlation between the two phenotypes being tested |
n_ind |
total number of individuals |
gene |
name of the gene being tested, default 'GENENAME' |
pheno1 |
descriptive name of phenotype 1, default 'PHENO1' |
pheno2 |
descriptive name of phenotype 2, default 'PHENO2' |
beta1_field |
field name for effect sizes of variants on phenotype 1, default 'BETA1' |
beta2_field |
field name for effect sizes of variants on phenotype 2, default 'BETA2' |
af_field |
field name for allele frequencies of variants, default 'AF' |
A list of summary statistics from ALLSPICE test including phenotype names, gene names, MLE of slope c, ALLSPICE test statistic - lambda, pvalue from a chi-square distribution, total number of variants being tested
data <- data.frame(x = rnorm(10), y = rnorm(10), z = runif(10, 0,1)) ALLSPICE(data,pheno_corr=0.5,n_ind=10000,beta1_field='x',beta2_field='y',af_field='z')
data <- data.frame(x = rnorm(10), y = rnorm(10), z = runif(10, 0,1)) ALLSPICE(data,pheno_corr=0.5,n_ind=10000,beta1_field='x',beta2_field='y',af_field='z')
Simulate data and run ALLSPICE
ALLSPICE_simulation(n_ind, n_var, c, r, pi, sigma, mle = TRUE, null = TRUE)
ALLSPICE_simulation(n_ind, n_var, c, r, pi, sigma, mle = TRUE, null = TRUE)
n_ind |
total number of individuals |
n_var |
total number of variants |
c |
slope between the two sets of variant effect sizes, only applicable when 'null' == TRUE |
r |
phenotypic correlation between the two phenotypes |
pi |
probability of variant of having no effect on the phenotype |
sigma |
variance of the two sets of effect sizes |
mle |
whether to use MLE of c to compute the test statistic, use true c value if FALSE |
null |
whether to simulate data under the null hypothesis (no linear relationship) or the alternative hypothesis |
A list of two pieces of results: 1) ALLSPICE test results 2) effect size table: true effect size simulated, effect size estimate from linear model, effect size estimated from MLE
ALLSPICE_simulation(n_ind=10000, n_var=100, c=0.6, r=0.5, pi=0.5, sigma=1, mle = TRUE, null=TRUE)
ALLSPICE_simulation(n_ind=10000, n_var=100, c=0.6, r=0.5, pi=0.5, sigma=1, mle = TRUE, null=TRUE)
data formatting function: format raw data to be loaded into ALLSPICE
format_ALLSPICE_data(data, beta1_field, beta2_field, af_field)
format_ALLSPICE_data(data, beta1_field, beta2_field, af_field)
data |
raw input data |
beta1_field |
field name of effect size for the first phenotype |
beta2_field |
field name of effect size for the second phenotype |
af_field |
field name of allele frequency information |
a data frame containing effect sizes of variants on two phenotypes and their allele frequency information
data <- data.frame(x = rnorm(10), y = rnorm(10), z = runif(10, 0,1)) data <- format_ALLSPICE_data(data=data, beta1_field = 'x', beta2_field = 'y', af_field = 'z')
data <- data.frame(x = rnorm(10), y = rnorm(10), z = runif(10, 0,1)) data <- format_ALLSPICE_data(data=data, beta1_field = 'x', beta2_field = 'y', af_field = 'z')
simulation function: simulate allele count information for 'n_var' variants, with a maximum allele count 'max_cnt'
get_ac_mat(n_var, max_cnt = 100)
get_ac_mat(n_var, max_cnt = 100)
n_var |
total number of variants |
max_cnt |
maximum allele count, default 100 |
A 'n_var'x'n_var' diagnal matrix of allele count information for 'n_var' variants
ac_mat <- get_ac_mat(n_var=100, max_cnt = 100)
ac_mat <- get_ac_mat(n_var=100, max_cnt = 100)
simulation function: compute allele frequency information variants with allele counts stored in diagonal matrix 'AC' from a population of sample size 'n_ind'
get_af_mat(AC, n_ind)
get_af_mat(AC, n_ind)
AC |
a diagonal matrix of allele count information for all variants |
n_ind |
total number of individuals in the population |
A 'n_var'x'n_var' diagnal matrix of allele frequency information for 'n_var' (dimension of 'AC') variants
af_mat <- get_af_mat(AC = c(20, 50, 10, 1, 5), n_ind = 10000)
af_mat <- get_af_mat(AC = c(20, 50, 10, 1, 5), n_ind = 10000)
simulation function: compute effect sizes estimated form linear regression model
get_beta_hat(Y, X, A, n_ind)
get_beta_hat(Y, X, A, n_ind)
Y |
phenotype information |
X |
genotype information |
A |
Allele frequency information |
n_ind |
total number of individuals |
A 2x'n_var' matrix of estimated effect size information (first row corresponds to the first phenotype, second row corresponds to the second phenotype)
AC <- get_ac_mat(n_var=100) A <- get_af_mat(AC=AC, n_ind=10000) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5) b_hat <- get_beta_hat(Y=Y, X=X, A=A, n_ind=10000)
AC <- get_ac_mat(n_var=100) A <- get_af_mat(AC=AC, n_ind=10000) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5) b_hat <- get_beta_hat(Y=Y, X=X, A=A, n_ind=10000)
ALLSPICE function: compute the slope 'c' that maximize the likelihood (maximum likelihood estimate - MLE)
get_c_hat(b1_hat, b2_hat, A, r)
get_c_hat(b1_hat, b2_hat, A, r)
b1_hat |
estimated effect size of the first phenotype across all variants |
b2_hat |
estimated effect size of the second phenotype across all variants |
A |
Allele frequency information |
r |
phenotypic correlation between the two phenotypes |
the MLE of slope between two sets of effect sizes
AC <- get_ac_mat(n_var=100) A <- get_af_mat(AC=AC, n_ind=10000) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5) b_hat <- get_beta_hat(Y=Y, X=X, A=A, n_ind=10000) b1_hat <- matrix(b_hat[1, ], nrow = 1) b2_hat <- matrix(b_hat[2, ], nrow = 1) c_hat <- get_c_hat(b1_hat=b1_hat, b2_hat=b2_hat, A=A, r=0.5)
AC <- get_ac_mat(n_var=100) A <- get_af_mat(AC=AC, n_ind=10000) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5) b_hat <- get_beta_hat(Y=Y, X=X, A=A, n_ind=10000) b1_hat <- matrix(b_hat[1, ], nrow = 1) b2_hat <- matrix(b_hat[2, ], nrow = 1) c_hat <- get_c_hat(b1_hat=b1_hat, b2_hat=b2_hat, A=A, r=0.5)
simulation function: simulate genotype information for a set of loci with allele counts 'AC'
get_geno_mat(AC, n_ind)
get_geno_mat(AC, n_ind)
AC |
allele counts of loci (length 'm') |
n_ind |
total number of indicitions |
An 'n_ind'x'm' matrix of genotype information of 'n_ind' individuals and 'm' variants
geno_mat <- get_geno_mat(AC = c(20, 50, 10, 1, 5), n_ind = 10000)
geno_mat <- get_geno_mat(AC = c(20, 50, 10, 1, 5), n_ind = 10000)
ALLSPICE function: compute the maximum likelihood ratio of the ALLSPICE test statistic
get_likelihood_test_stats(n_ind, r, b1_hat, b2_hat, c, A)
get_likelihood_test_stats(n_ind, r, b1_hat, b2_hat, c, A)
n_ind |
total number of individuals |
r |
phenotypic correlation between the two phenotypes |
b1_hat |
estimated effect size of the first phenotype across all variants |
b2_hat |
estimated effect size of the second phenotype across all variants |
c |
MLE of the slope between the two sets of variant effect sizes |
A |
Allele frequency information |
A single numeric value representing the test statistic of ALLSPICE (maximum likelihood ratio)
AC <- get_ac_mat(n_var=100) A <- get_af_mat(AC=AC, n_ind=10000) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5) b_hat <- get_beta_hat(Y=Y, X=X, A=A, n_ind=10000) b1_hat <- matrix(b_hat[1, ], nrow = 1) b2_hat <- matrix(b_hat[2, ], nrow = 1) c_hat <- get_c_hat(b1_hat=b1_hat, b2_hat=b2_hat, A=A, r=0.5) lambda <- get_likelihood_test_stats(n_ind=10000, r=0.5, b1_hat=b1_hat, b2_hat=b2_hat, c=c_hat, A=A)
AC <- get_ac_mat(n_var=100) A <- get_af_mat(AC=AC, n_ind=10000) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5) b_hat <- get_beta_hat(Y=Y, X=X, A=A, n_ind=10000) b1_hat <- matrix(b_hat[1, ], nrow = 1) b2_hat <- matrix(b_hat[2, ], nrow = 1) c_hat <- get_c_hat(b1_hat=b1_hat, b2_hat=b2_hat, A=A, r=0.5) lambda <- get_likelihood_test_stats(n_ind=10000, r=0.5, b1_hat=b1_hat, b2_hat=b2_hat, c=c_hat, A=A)
ALLSPICE function: compute the effect size estimates that maximize the likelihood (maximum likelihood estimate - MLE) conditioning on c
get_mle_beta(b1_hat, b2_hat, c, r, null = TRUE)
get_mle_beta(b1_hat, b2_hat, c, r, null = TRUE)
b1_hat |
estimated effect size of the first phenotype across all variants |
b2_hat |
estimated effect size of the second phenotype across all variants |
c |
slope between the two sets of variant effect sizes, only applicable when 'null' == TRUE |
r |
phenotypic correlation between the two phenotypes |
null |
whether to simulate data under the null hypothesis (no linear relationship) or the alternative hypothesis |
A 2x'n_var' matrix of MLE estimated effect size information (first row corresponds to the first phenotype, second row corresponds to the second phenotype)
AC <- get_ac_mat(n_var=100) A <- get_af_mat(AC=AC, n_ind=10000) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5) b_hat <- get_beta_hat(Y=Y, X=X, A=A, n_ind=10000) b1_hat <- matrix(b_hat[1, ], nrow = 1) b2_hat <- matrix(b_hat[2, ], nrow = 1) b_mle <- get_mle_beta(b1_hat=b1_hat, b2_hat=b2_hat, c=0.6, r=0.5, null=TRUE)
AC <- get_ac_mat(n_var=100) A <- get_af_mat(AC=AC, n_ind=10000) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5) b_hat <- get_beta_hat(Y=Y, X=X, A=A, n_ind=10000) b1_hat <- matrix(b_hat[1, ], nrow = 1) b2_hat <- matrix(b_hat[2, ], nrow = 1) b_mle <- get_mle_beta(b1_hat=b1_hat, b2_hat=b2_hat, c=0.6, r=0.5, null=TRUE)
simulation function: simulate true phenotype values of a pair of phenotypes
get_pheno_pair(b, X, r)
get_pheno_pair(b, X, r)
b |
true effect size matrix of variants on the two phenotypes |
X |
genotype matrix |
r |
phenotypic correlation between the two phenotypes |
A 2x'n_ind' matrix of phenotype information (first row corresponds to the first phenotype, second row corresponds to the second phenotype)
AC <- get_ac_mat(n_var=100) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5)
AC <- get_ac_mat(n_var=100) X <- get_geno_mat(AC, n_ind=10000) b <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE) Y <- get_pheno_pair(b=b, X=X, r=0.5)
simulation function: simulate genotype information for one locus, where 'cnt' samples out of 'n_ind' has the mutation
get_single_geno(cnt, n_ind)
get_single_geno(cnt, n_ind)
cnt |
number of individuals with the mutation |
n_ind |
total number of individuals |
A binary vector representing the genotype information of 'n_ind' individuals for a particular locus, where 'cnt' entries has value 1.
geno <- get_single_geno(cnt = 100, n_ind = 10000)
geno <- get_single_geno(cnt = 100, n_ind = 10000)
simulation function: simulate true effect size information of 'n_var' variants for two phenotypes
get_true_beta(n_var, c, pi, sigma, null = TRUE)
get_true_beta(n_var, c, pi, sigma, null = TRUE)
n_var |
total number of variants |
c |
slope between the two sets of variant effect sizes, only applicable when 'null' == TRUE |
pi |
probability of variant of having no effect on the phenotype |
sigma |
variance of the two sets of effect sizes |
null |
whether to simulate data under the null hypothesis (no linear relationship) or the alternative hypothesis |
A 2x'n_var' matrix of effect size information for 'n_var' variants (first row corresponds to the first phenotype, second row corresponds to the second phenotype)
true_beta <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE)
true_beta <- get_true_beta(n_var=100, c=0.6, pi=0.5, sigma=1, null=TRUE)