AGEMdb

Ancestry-specific database for cross-tissue
Gene Expression Models

Welcome to AGEMdb, a web-based resource designed to support ancestry-specific Transcriptome-Wide Association Studies (TWAS).

AGEMdb offers a comprehensive collection of pre-trained gene expression imputation models, built using the cross-tissue approach from UTMOST (Hu, 2019). These models are trained using RNAseq and whole-genome sequence data from the Genotype Tissue Expression (GTEx) project (Ardlie, 2015) from both European ancestry and African-American individuals. These ancestry-specific resources are more inclusive of genetic diversity and reduce bias in genetics/omics research.

Getting started Head over to the Weights tab in the navigation bar to start exploring. You'll be able to filter and download the model weights from there.

You are currently exploring the African American Ancestry database.
You may want to access the European Ancestry database instead.

Key features

Ancestry-Specific Models: Independently trained models for European ancestry individuals and African-American individuals, covering 49 tissue types to ensure precise, population-specific gene expression predictions.
TWAS-Optimized Models: Designed to support TWAS analysis, these models predict ancestry-specific gene expression using a cross-tissue approach, allowing gene-trait association studies.
Flexible and Intuitive Interface: Easily browse, search, and download models with customizable filters for tissue type and gene, simplifying access to the resources you need.
Inclusive and Unbiased Research: By including models for African American populations, AGEMdb helps counteract the over-representation of European ancestry individuals in transcriptomic resources. This inclusivity improves the accuracy and generalizability of genomic predictions, fostering a more equitable approach to omics research.

Why AGEMdb matters

Gene expression imputation models have primarily been based on European ancestry populations due to the lack of diversity in available resources. Since the success of TWAS depends on the accuracy of these models, training them on European ancestry data alone limits their transferability to other populations. AGEMdb addresses this issue by providing models based on the two most representative ancestries in GTEx: European ancestry and African-American. By including a broader range of genetic backgrounds, AGEMdb potentially improves the accuracy of TWAS gene-trait association analyses and other omic studies.

Detailed description

Background

Transcriptome-wide association studies (TWAS) link genetically regulated gene expression to complex traits or diseases. These studies typically use expression quantitative trait loci (eQTL) to predict gene expression levels, which are then tested for association with traits. However, the effectiveness of TWAS depends heavily on the accuracy of gene expression imputation models, which are traditionally trained on reference panels consisting mostly of individuals of European descent. This limitation can undermine the accuracy of predictions for individuals from other ancestry groups. AGEMdb addresses this challenge by offering ancestry-specific imputation models built from GTEx data for both European ancestry and African American populations, covering 49 tissue types.

Materials and methods

Gene expression and genotype data from GTEx were used to create the AGEMdb models. After rigorous genotype quality control, data from 689 European ancestry and 111 African American individuals were retained (Pagnuco I, 2025). Gene expression was adjusted for potential confounders, and UTMOST cross-tissue models (Hu, 2019) were derived separately for each ancestry group. These models were then integrated into the AGEMdb platform, allowing users to access them through a user-friendly web interface.

Results

AGEMdb is a web-based database hosting pre-trained gene expression imputation models in GTEx. Users can browse, search, and download models with flexible filtering by tissue type and gene. These models enable gene-trait association analysis in TWAS (UTMOST and similar tools) by predicting tissue-specific gene expression from genomic data. The inclusion of models derived from African American individuals enhances inclusivity in TWAS resources.

Conclusions

The AGEMdb database is a valuable resource for ancestry-specific TWAS analysis, enhancing multi-tissue gene expression prediction accuracy across diverse populations.

Acknowledgements

We acknowledge funding support from the Medical Research Council (grant number MR/V020749/1). We are grateful for the assistance given by The University of Manchester IT Services and for the use of the Computational Shared Facility.

The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health (commonfund.nih.gov/GTEx). Additional funds were provided by the NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. Donors were enrolled at Biospecimen Source Sites funded by NCI\Leidos Biomedical Research, Inc. subcontracts to the National Disease Research Interchange (10XS170), Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center (LDACC) was funded through a contract (HHSN268201000029C) to the The Broad Institute, Inc. Biorepository operations were funded through a Leidos Biomedical Research, Inc. subcontract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc.(HHSN261200800001E). The Brain Bank was supported supplements to University of Miami grant DA006227. Statistical Methods development grants were made to the University of Geneva (MH090941 & MH101814), the University of Chicago (MH090951,MH090937, MH101825, & MH101820), the University of North Carolina - Chapel Hill (MH090936), North Carolina State University (MH101819),Harvard University (MH090948), Stanford University (MH101782), Washington University (MH101810), and to the University of Pennsylvania (MH101822). The datasets used for this research were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000424.v9.p2.

Bibliography

GTEx Consortium. (2015). The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science, 348(6235), 648-660. doi: 10.1126/science.1262110.
Hu, Y. et al. (2019). A statistical framework for cross-tissue transcriptome-wide association analysis. Nature Genetics, 51, 568-576. doi: 10.1038/s41588-019-0345-7.
Pagnuco I., Eyre S., et al. (2025). Transferability of Single- and Cross-Tissue Transcriptome Imputation Models Across Ancestry Groups. 49(1).