Martin-Geary, Alexandra C, Blakes, Alexander J M, Dawes, Ruebena, Findlay, Scott D, Lord, Jenny, Dong, Shan, Walker, Susan, Talbot-Martin, Jonathan, Wieder, Nechama, D'Souza, Elston N, Fernandes, Maria, Hilton, Sarah, Lahiri, Nayana, Campbell, Christopher, Jenkinson, Sarah, DeGoede, Christian G E L, Anderson, Emily R, Candler, Toby, Firth, Helen, Burge, Christopher B, Sanders, Stephan J, Ellingford, Jamie, Baralle, Diana, Banka, Siddharth and Whiffin, Nicola
(2025)
Systematic identification of disease-causing promoter and untranslated region variants in 8040 undiagnosed individuals with rare disease.
Genome Medicine, 17 (1).
40.
ISSN 1756-994X
Abstract
Both promoters and untranslated regions (UTRs) have critical regulatory roles, yet variants in these regions are largely excluded from clinical genetic testing due to difficulty in interpreting pathogenicity. The extent to which these regions may harbour diagnoses for individuals with rare disease is currently unknown. We present a framework for the identification and annotation of potentially deleterious proximal promoter and UTR variants in known dominant disease genes. We use this framework to annotate de novo variants (DNVs) in 8040 undiagnosed individuals in the Genomics England 100,000 genomes project, which were subject to strict region-based filtering, clinical review, and validation studies where possible. In addition, we performed region and variant annotation-based burden testing in 7862 unrelated probands against matched unaffected controls. We prioritised eleven DNVs and identified an additional variant overlapping one of the eleven. Ten of these twelve variants (82%) are in genes that are a strong match to the individual's phenotype and six had not previously been identified. Through burden testing, we did not observe a significant enrichment of potentially deleterious promoter and/or UTR variants in individuals with rare disease collectively across any of our region or variant annotations. Whilst screening promoters and UTRs can uncover additional diagnoses for individuals with rare disease, including these regions in diagnostic pipelines is not likely to dramatically increase diagnostic yield. Nevertheless, we provide a framework to aid identification of these variants.
Item Type: |
Article
(Article)
|
Peer-reviewed: |
Yes
|
Date Deposited: |
28 Apr 2025 11:08
|
Publisher: |
BMC |
Additional Information: |
This is an open access article published in Genome Medicine, by BMC. |
Divisions: |
Faculties > Science and Engineering Faculties > Science and Engineering > Department of Sport and Exercise Sciences |
Subject terms: |
Genetic Predisposition to Disease, Non-coding, Molecular Sequence Annotation, Promoter Regions, Genetic, Regulatory regions, Splicing, Rare Diseases - genetics - diagnosis, Untranslated Regions, Genetic Variation, Promoters, Genetic Testing, Humans, Untranslated regions, Rare disease |
Data Access Statement: |
The datasets supporting the conclusions of this article are available in the ‘near-coding annotation’ github repository [85]. This repository includes data and scripts relating to MANE transcripts of ‘green’ PanelApp genes with a dominant mode of inheritance (GRCh38), including; .tsv, .bed and .bb files containing the coordinates of UTR and promoter regions; The coordinates of annotation features not already available via the Ensembl VEP in .tsv format; And all scripts used to annotate near coding variants that fall within regulatory elements (https://github.com/Computational-Rare-Disease-Genomics-WHG/Near_coding_annotation). In addition, a UCSC genome browser public session entitled ‘CRDG Near coding regions’ Martin-Geary et al ‘24’ has been made available. This session contains a custom track showing near coding regions, including UTR exons, introns, and promoter regions in PanelApp green genes, and can be accessed via the UCSC Public Sessions portal: https://genome.ucsc.edu/cgi-bin/hgPublicSessions. This research was made possible through access to data in the National Genomic Research Library, which is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The National Genomic Research Library holds data provided by patients and collected by the NHS as part of their care and data collected as part of their participation in research. All data from the Genomics England 100k Genomes Project v15 (26.05.2022), including Genetic, phenotypic and RNA-seq data, was accessed via the Genomics England Trusted Research Environment (TRE) and is available to registered users of the National Genomics Research Library through the TRE platform. Detailed methylation data used in this analysis cannot be shared due to EpiSign-related proprietary issues and legal constraints on the redistribution of NHS patient data of this kind. If required, part of these data may be made available upon request subject to gaining the necessary approvals. |
URI: |
https://e-space.mmu.ac.uk/id/eprint/639697 |
DOI: |
https://doi.org/10.1186/s13073-025-01464-2 |
ISSN |
1756-994X |
Impact and Reach
Statistics
Additional statistics for this dataset are available via
IRStats2.
Altmetric
Repository staff only
 |
Edit record |