VCFv4.3 and BCFv2.2 27 Jul 2021
The Variant Call Format Specification
VCFv4.3 and BCFv2.2 22 Aug 2022
The master version of this document can be found at . This printing is version 8073fda from that repository, last modified on the date shown above.
1
Contents
1 The VCF specification
4
1.1 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Character encoding, non-printable characters and characters with special meaning . . . . . . . . . . . 4
1.3 Data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Meta-information lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 File format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2 Information field format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.3 Filter field format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.4 Individual format field format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.5 Alternative allele field format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.6 Assembly field format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.7 Contig field format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.8 Sample field format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.9 Pedigree field format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Header line syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6 Data lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6.1 Fixed fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.6.2 Genotype fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Understanding the VCF format and the haplotype representation
12
2.1 VCF tag naming conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 INFO keys used for structural variants
12
4 FORMAT keys used for structural variants
13
5 Representing variation in VCF records
14
5.1 Creating VCF entries for SNPs and small indels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.3 Example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Decoding VCF entries for SNPs and small indels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.1 SNP VCF record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.2 Insertion VCF record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.3 Deletion VCF record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.4 Mixed VCF record for a microsatellite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Encoding Structural Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.4 Specifying complex rearrangements with breakends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.4.1 Inserted Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.2 Large Insertions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.4.3 Multiple mates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4.4 Explicit partners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4.5 Telomeres . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4.6 Event modifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4.7 Inversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4.8 Uncertainty around breakend location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4.9 Single breakends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4.10 Sample mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4.11 Clonal derivation relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4.12 Phasing adjacencies in an aneuploid context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.5 Representing unspecified alleles and REF-only blocks (gVCF) . . . . . . . . . . . . . . . . . . . . . . . 27
2
6 BCF specification
28
6.1 Overall file organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2.1 Dictionary of strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2.2 Dictionary of contigs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 BCF2 records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3.1 Site encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3.2 Genotype encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.3.3 Type encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.4 Encoding a VCF record example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4.1 Encoding CHROM and POS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4.2 Encoding QUAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4.3 Encoding ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4.4 Encoding REF/ALT fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4.5 Encoding FILTER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4.6 Encoding the INFO fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.4.7 Encoding Genotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.5 BCF2 block gzip and indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7 List of changes
36
7.1 Changes to VCFv4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.2 Changes between VCFv4.2 and VCFv4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.3 Changes between BCFv2.1 and BCFv2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.4 Changes between VCFv4.1 and VCFv4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3
1 The VCF specification
VCF is a text file format (most likely stored in a compressed manner). It contains meta-information lines (prefixed with "##"), a header line (prefixed with "#"), and data lines each containing information about a position in the genome and genotype information on samples for each position (text fields separated by tabs). Zero length fields are not allowed, a dot (".") must be used instead. In order to ensure interoperability across platforms, VCF compliant implementations must support both LF ("\n") and CR+LF ("\r\n") newline conventions.
1.1 An example
##fileformat=VCFv4.3
##fileDate=20090805
##source=myImputationProgramV3.1
##reference=
##contig=
##phasing=partial
##INFO=
##INFO=
##INFO=
##INFO=
##INFO=
##INFO=
##FILTER=
##FILTER=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
#CHROM POS
ID
REF ALT
QUAL FILTER INFO
FORMAT
NA00001
NA00002
NA00003
20
14370 rs6054257 G
A
29 PASS NS=3;DP=14;AF=0.5;DB;H2
GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
20
17330 .
T
A
3 q10 NS=3;DP=11;AF=0.017
GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
20
1110696 rs6040355 A
G,T
67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
20
1230237 .
T
.
47 PASS NS=3;DP=13;AA=T
GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
20
1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G
GT:GQ:DP 0/1:35:4
0/2:17:2
1/1:40:3
This example shows (in order): a good simple SNP, a possible SNP that has been filtered out because its quality is below 10, a site at which two alternate alleles are called, with one of them (T) being ancestral (possibly a reference sequencing error), a site that is called monomorphic reference (i.e. with no alternate alleles), and a microsatellite with two alternative alleles, one a deletion of 2 bases (TC), and the other an insertion of one base (T). Genotype data are given for three samples, two of which are phased and the third unphased, with per sample genotype quality, depth and haplotype qualities (the latter only for the phased samples) given as well as the genotypes. The microsatellite calls are unphased.
1.2 Character encoding, non-printable characters and characters with special meaning
The character encoding of VCF files is UTF-8. UTF-8 is a multi-byte character encoding that is a strict superset
of 7-bit ASCII and has the property that none of the bytes in any multi-byte characters are 7-bit ASCII bytes. As
a result, most software that processes VCF files does not have to be aware of the possible presence of multi-byte
UTF-8 characters. VCF files must not contain a byte order mark. Note that non-printable characters U+0000?
U+0008, U+000B?U+000C, U+000E?U+001F are disallowed. Line separators must be CR+LF or LF and they are
allowed only as line separators at end of line. Some characters have a special meaning when they appear (such as
field delimiters `;' in INFO or `:' FORMAT fields), and for any other meaning they must be represented with the
capitalized percent encoding:
%3A :
(colon)
%3B ;
(semicolon)
%3D =
(equal sign)
%25 %
(percent sign)
%2C ,
(comma)
%0D CR
%0A LF
%09 TAB
1.3 Data types
Data types supported by VCF are: Integer (32-bit, signed), Float (32-bit IEEE-754, formatted to match one of the regular expressions ^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$ or ^[-+]?(INF|INFINITY|NAN)$ case insensi-
4
tively), Flag, Character, and String. For the Integer type, the values from -231 to -231 + 7 cannot be stored in the binary version and therefore are disallowed in both VCF and BCF, see 6.3.3.
1.4 Meta-information lines
File meta-information lines start with "##" and must appear first in the VCF file, before the header line (section 1.5) and data record lines (section 1.6). They may be either unstructured or structured.
An unstructured meta-information line consists of a key (denoting the type of meta-information recorded) and a value (which may not be empty and must not start with a ` ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- sysinfotools excel to vcard converter
- csv to vcf android
- online vcf to csv converter
- online vcf to excel
- how to convert contacts from one phone to another
- the variant call format vcf version 4 2 specification
- convert vcf to vcard
- how to create vcf file for contacts
- atp7b variant penetrance explains differences between
- vcfv4 3 and bcfv2 2 27 jul 2021