Abstract
Non-crossover gene conversion is a type of meiotic recombination characterized by the non-reciprocal transfer of genetic material between homologous chromosomes. Gene conversions are thought to occur within relatively short tracts of DNA. In this study, we propose a statistical method to model the length distribution of gene conversion tracts in humans, using nearly one million gene conversion tracts detected from the UK Biobank whole autosome data. To handle the large number of tracts, we designed a computationally efficient inferential framework. Our method further accounts for regional variation in the density of variant sites and heterozygosity across the genome, which can influence the observed length of gene conversion tracts. We allow for multiple candidate tract length distributions and select the best fitting distribution using the Bayesian Information Criterion (BIC). Using a mixture of two geometric components for the tract length distribution, we estimate that the smaller component has a mean of 16.9 bp (95% CI: [16.4, 17.0]), and the larger component has a mean of 724.7 bp (95% CI: [720.1, 728.7]). We further estimate the proportion of tracts from the second component to be 0.00525 (95% CI: [0.005, 0.00525]). After stratifying by crossover-hotspot overlap, we infer that tracts whose midpoints lie within crossover hotspots are, on average, longer than the remaining tracts.</p>