Friday, 23 December 2011

biotoolbox - Tools for querying and analysis of genomic data - Google Project Hosting

http://code.google.com/p/biotoolbox/

A collection of various perl scripts that utilize BioPerl modules for use in bioinformatics analysis. Tools are included for processing microarray data, next generation sequencing data, data file format conversion, querying datasets, and general high level analysis of datasets.

This tool box of programs heavily relies on storing genome annotation, microarray, and next generation sequencing data in bioperl databases, allowing for data retrieval relative to any annotated feature in the database.

Also included are programs for converting and importing data from UCSC gene tables and ensEMBL, as well as a variety of other formats, into a GFF3 file that can be loaded into a bioperl database.

Who is this for?

These set of tools are designed to complement the Generic Genome Browser (GBrowse). If you view your model organism's genome annotation and microarray or next generation sequencing data using GBrowse, then these tools will assist you in fully analyzing your data.

Even if you don't use GBrowse, these programs may still be useful. Please check out the list of programs to see if it meets your needs.

What programs are available? How do I use them?

This is a list of programs.

This is an example on preparing and loading data into the database.

This is a list of supported data formats. In short, it will work with GFF, BED, wig, bigWig, bigBed, and Bam data formats and annotations. Most bioinformatic data can be represented in one or more of these formats, or at the very least converted.

This is an example of how to collect data.

This is an example on working with Next Generation Sequencing data.

What are the requirements?

These are command line Perl programs designed for modern Unix-based computers. Most of the analysis programs rely heavily on BioPerl modules, so at a minimum this should be installed. Additionally, if you want to use Bam, bigWig, bigBed, or wig data files, additional modules will need to be installed. Most (all?) of the programs should fail gracefully if the required modules are not installed. Some programs are quite minimal and may run without even BioPerl installed. A utility is provided to check for any missing or out of date modules.

Why were these programs written?

They were initially written to assist me in my own laboratory research. As they were expanded in scope, I realized they could be potentially useful to others in the same predicament as me. Thus, releasing these programs for others to use.

No comments:

Post a Comment

Datanami, Woe be me