Thursday 28 June 2012

FAQ: What is genome build 'hg_g1k_v37' ?

I have wondered about the g1k bit before as well ..
Here's a explanation lifted from the galaxy-user list

---------- Forwarded message ----------
From: Jennifer Jackson
Date: 27 June 2012 23:50
Subject: Re: [galaxy-user] Problem with Depth of Coverage on BAM files (GATK tools)

Hello Lilach,

The genome build 'hg_g1k_v37' is build "b37" in the GATK documentation. Hg19 is also included (as a distinct build). I encourage you to examine these if you are interested in crossing over between genomes or identifying other projects that have data based on the same genome build.

http://www.broadinstitute.org/gsa/wiki/index.php/Introduction_to_the_GATK ->
http://www.broadinstitute.org/gsa/wiki/index.php/GATK_resource_bundle

" GATK resource bundle: A collection of standard files for working with human resequencing data with the GATK.

The standard reference sequence we use in the GATK is the the b37 edition from the Human Genome Reference Consortium. All of the key GATK data files are available against this reference sequence. Additionally, we used to use UCSC-style (chr1, not 1) for build hg18, and provide lifted-over files from b37 to hg18 for those still using those files.

b37 resources: the standard data set
* Reference sequence (standard 1000 Genomes fasta) along with fai and dict files
<more, please follow link for details ...>

hg19 resources: lifted over from b37
* Includes the UCSC-style hg19 reference along with all lifted over VCF files."

Hopefully this helps,

Jen
Galaxy team


No comments:

Post a Comment

Datanami, Woe be me