Abstract
BackgroundGenetic variants that severely alter protein products (e.g. nonsense, frameshift) are often associated with disease. For some genes, these predicted loss-of-function variants (pLoFs) are observed throughout the gene, whilst in others, they occur only at specific locations. We hypothesised that, for genes linked with monogenic diseases that display incomplete penetrance, pLoF variants present in apparently unaffected individuals may be limited to regions where pLoFs are tolerated. To test this, we investigated whether pLoF location could explain instances of incomplete penetrance of variants expected to be pathogenic for Mendelian conditions.MethodsWe used exome sequence data in 454,773 individuals in the UK Biobank (UKB) to investigate the locations of pLoFs in a population cohort. We counted numbers of unique pLoF, missense, and synonymous variants in UKB in each quintile of the coding sequence (CDS) of all protein-coding genes and clustered the variants using Gaussian mixture models. We limited the analyses to genes with ≥ 5 variants of each type (16,473 genes). We compared the locations of pLoFs in UKB with all theoretically possible pLoFs in a transcript, and pathogenic pLoFs from ClinVar, and performed simulations to estimate the false-positive rate of non-uniformly distributed variants.ResultsFor most genes, all variant classes fell into clusters representing broadly uniform variant distributions, but genes in which haploinsufficiency causes developmental disorders were less likely to have uniform pLoF distribution than other genes (P < 2.2 × 10−6). We identified a number of genes, including ARID1B and GATA6, where pLoF variants in the first quarter of the CDS were rescued by the presence of an alternative translation start site and should not be reported as pathogenic. For other genes, such as ODC1, pLoFs were located approximately uniformly across the gene, but pathogenic pLoFs were clustered only at the end, consistent with a gain-of-function disease mechanism.ConclusionsOur results suggest the potential benefits of localised constraint metrics and that the location of pLoF variants should be considered when interpreting variants.</p>