
The ellipses indicate MEIs, i.e., Alu (~300 bp) and L1 (~6 kb) insertions, associated with target site duplications of up to 28 bp in size at the breakpoints. The diagonal highlights tandem duplications (and few reciprocal deletion events), in which the length of the duplicated sequence correlates linearly with the length of the longest breakpoint junction sequence identity stretch. Gray lines mark groups of SVs likely formed by a common formation mechanism. Dots are colored according to the SVs’ classification as deletions, insertions/duplications, or “undetermined” relative to inferred ancestral genomic loci. Breakpoint junction homology/microhomology length plotted as a function of SV size for SVs originally identified as deletions compared to a human reference. The horizontal lines at the top of each plot mark the 98% confidence intervals (labeled for each panel), with vertical notches indicating the positions of the most probable breakpoint (the distribution mode).Ī. The blue and red histograms are the breakpoint residuals for predicted deletion start and end coordinates, respectively, relative to assembled coordinates (here assessed in low-coverage data). Breakpoint mapping resolution of three deletion discovery methods (the respective method names are in Supplementary Table 2). Vertical dotted lines correspond to the specificity threshold (FDR≤10%). All depicted estimates are summarized in Supplementary Tables 3, 4, 6. Sensitivity and FDR estimates for individual deletion discovery methods based on gold standard sets for individuals sequenced at high (NA12878) and low-coverage (NA12156), respectively. Of note, not all approaches were applied across all individuals (see Supplementary Table 2). Outer pie = based on number of SV calls inner pie = based on total number of variable nucleotides. Pie charts display the contribution of different SV discovery modes to the release set. Three groups are visible, with AS and SR, PD and RP, as well as RD and ‘RL’ (RP analysis involving relatively long range (≥1 kb) insert size libraries, resulting in a different deletion detection size range compared to the predominantly used <500kb insert size libraries), respectively, ascertaining similar size-ranges. Deletion size-range ascertained by different modes of SV discovery.

Our analytical framework and SV map serves as a resource for sequencing-based association studies.Ī. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms.

We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown.

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact.
