: Category 100315

Category 100315
Genomics ⏵ Genotypes ⏵ Genotype Results

Description

Genotype calls (based on genotyping array measurements) and related measurements. The genotypes are aligned to the + strand of the reference and positions are in GRCh37 coordinates.

The fields listed here are indicators and adding them to an Application basket will allow researchers to download the corresponding underlying information from the UKB repository. This information includes

Calls (0.1TB)
Confidences (2.9TB)
Intensities (2.9TB)
CNV B-allele frequencies (1.5TB)
CNV log2ratios (2.3TB)

Marker quality control (QC) and Sample QC information, including population structure (see Relatedness in the Notes below) is also available. Intensity and SNP posteriors are available for cluster plotting. B-allele frequency and log2ratio are available for CNV analysis.

The lists of SNPs in the Genotype datasets can be downloaded from the Field's Resources tabs on a per-chromosome basis or as a combined tar in Resource 1963.

Researchers who want only a few specific genotyped SNPs (rather than the whole-chromosome datasets below) may, once logged in, request them either by identiyfing them individually using the Genomics Search function or by entering their Affy IDs directly using the List Actions tab of the Basket screen.

See Resource 530 for additional information, including a citable reference for publications.

Notes

Calls

The Confidence files contain the Affymetrix 'confidence' that a genotype belongs to the call cluster. This is a plaintext file with space separated columns. Values are in the range 0-1 with 0 being most confident. Missing values are represented by -1. The order of markers and Samples are given by the BIM and FAM files.

The CNV files contain the B-Allele-Frequency (baf) and Log2Ratio (log2r) transformed intensity values for performing CNV calling. There is a separate file for baf and log2r per chromosome. These are plaintext files with space separated columns. The rows correspond to markers (ordered as the calls BIM file) and the columns correspond to samples (ordered as the calls FAM file) Missing values are represented by -1.

The Intensity files contains the A,B intensity data measured by Affymetrix. The files are in a simple custom binary format. There are two intensity values A,B for each genotype, each represented as a 4-byte float. The set of A,B values for each marker are ordered consecutively by sample (analagous to a matrix with rows=SNPs and columns=Samples) e.g. SNP_1_SAMPLE_1_A SNP_1_SAMPLE_1_B SNP_1_SAMPLE_2_A SNP_1_SAMPLE_2_B ... SNP_1_SAMPLE_N_A SNP_1_SAMPLE_N_B SNP_2_SAMPLE_1_A SNP_2_SAMPLE_1_B ... Missing pairs of intensities are represented by -1 -1. The order of the markers and Samples are given by the BIM and FAM files with the calls.
Affymetrix transform the A,B values into 'contrast' and 'strength' for their calling algorithm. If the intensity data is to be used for making cluster plots it is strongly suggested that the transformed values are plotted. The ellipses described by the snp-posterior data are only compatible with the transformed intensity values.

contrast (X) = log2(A/B)

strength (Y) = log2(AB)/2

The Relatedness of individuals is obtained using the gfetch utility (Resource 668) which generates a plaintext file with 5 space-separated columns:

ID for participant 1 in related pair;

ID for participant 2 in related pair;

HetHet : fraction of markers for which the pair both have a heterozygous genotype;

IBS0 : fraction of markers for which the pair shares zero alleles;

Estimate of the kinship coefficient for this pair based on the set of markers used in the kinship inference;

Field ID	Description
22418	Genotype calls
22419	Genotype confidences
22437	Genotype copy number variants B-allele frequencies
22431	Genotype copy number variants, log2ratios
22430	Genotype intensities

Field ID

Description

Genotype copy number variants B-allele frequencies

22431

Genotype copy number variants, log2ratios

22430

Genotype intensities

Category ID	Description	Items
263	Genotypes	+35

Category ID

Description

Items

263

Genotypes

+35

Preview	Name	Res ID
	BIM (marker) files for genotype SNP data	1963
	MD5 checksums for bulk genotype datasets	998
	PLINK 1.9 file format reference	19
	SNP Quality Control information	1955
	SNP posterior batch list	1968
	SNP posterior data	1817
	UKBiLEVE GWAS results	2077

Description

Notes

5 Data-Fields

1 Parent Category

7 Resources