Ancestry-specific database for cross-tissue
Gene Expression Models
Welcome to AGEMdb, a web-based resource designed to support ancestry-specific Transcriptome-Wide Association Studies (TWAS).
AGEMdb offers a comprehensive collection of pre-trained gene expression imputation models, built using the cross-tissue approach from UTMOST (Hu, 2019). These models are trained using RNAseq and whole-genome sequence data from the Genotype Tissue Expression (GTEx) project (Ardlie, 2015) from both European ancestry and African-American individuals. These ancestry-specific resources are more inclusive of genetic diversity and reduce bias in genetics/omics research.
Gene expression imputation models have primarily been based on European ancestry populations due to the lack of diversity in available resources. Since the success of TWAS depends on the accuracy of these models, training them on European ancestry data alone limits their transferability to other populations. AGEMdb addresses this issue by providing models based on the two most representative ancestries in GTEx: European ancestry and African-American. By including a broader range of genetic backgrounds, AGEMdb potentially improves the accuracy of TWAS gene-trait association analyses and other omic studies.
Transcriptome-wide association studies (TWAS) link genetically regulated gene expression to complex traits or diseases. These studies typically use expression quantitative trait loci (eQTL) to predict gene expression levels, which are then tested for association with traits. However, the effectiveness of TWAS depends heavily on the accuracy of gene expression imputation models, which are traditionally trained on reference panels consisting mostly of individuals of European descent. This limitation can undermine the accuracy of predictions for individuals from other ancestry groups. AGEMdb addresses this challenge by offering ancestry-specific imputation models built from GTEx data for both European ancestry and African American populations, covering 49 tissue types.
Gene expression and genotype data from GTEx were used to create the AGEMdb models. After rigorous genotype quality control, data from 689 European ancestry and 111 African American individuals were retained (Pagnuco I, 2025). Gene expression was adjusted for potential confounders, and UTMOST cross-tissue models (Hu, 2019) were derived separately for each ancestry group. These models were then integrated into the AGEMdb platform, allowing users to access them through a user-friendly web interface.
AGEMdb is a web-based database hosting pre-trained gene expression imputation models in GTEx. Users can browse, search, and download models with flexible filtering by tissue type and gene. These models enable gene-trait association analysis in TWAS (UTMOST and similar tools) by predicting tissue-specific gene expression from genomic data. The inclusion of models derived from African American individuals enhances inclusivity in TWAS resources.
The AGEMdb database is a valuable resource for ancestry-specific TWAS analysis, enhancing multi-tissue gene expression prediction accuracy across diverse populations.
We acknowledge funding support from the Medical Research Council (grant number MR/V020749/1). We are grateful for the assistance given by The University of Manchester IT Services and for the use of the Computational Shared Facility.
The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI\Leidos Biomedical Research, Inc. subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the The Broad Institute, Inc. Biorepository operations were funded through a Leidos Biomedical Research, Inc. subcontract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc.(HHSN261200800001E). The Brain Bank was supported supplements to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941 & MH101814), the University of Chicago (MH090951,MH090937, MH101825, & MH101820), the University of North Carolina - Chapel Hill (MH090936), North Carolina State University (MH101819),Harvard University (MH090948), Stanford University (MH101782), Washington University (MH101810), and to the University of Pennsylvania (MH101822). The datasets used for this research were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000424.v9.p2.