Bioinformatics Software Packages
This software is free. You can redistribute and/or modify it under the terms of the GNU General Public License, as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed with helpful intent, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Please give credit where credit is due. If you use functions from Mayo Clinic, please acknowledge the original contributor of the material.
Genotype imputation has become a standard tool in genetics, but performing this analysis correctly requires considerable expertise and is time and labor intensive. We developed an impute2-based genotype imputation workflow that greatly simplifies the process of imputation and achieves a significant speedup of imputation using multiple CPU’s on a computer cluster. The user simply provides a genotype dataset and the workflow implements detailed steps of matching the strand of the input genotypes and reference, smart segmentation of the genome and generation of QC metrics.
Availability and implementation: The workflow works on the two most popular cluster management systems, Sun Grid Engine (SGE) and Portable Batch System (PBS). It is available under the GNU public license.
Authors: Hugues Sicotte, Naresh Prodduturi
GenomeSmasher is a set of tools used to create diploid FASTA files with containing snps, indels, duplications, deletions and translocations. These FASTA files can then be used in conjunction with next-generation sequencing simulators to artificially create sequencing experiments. The utility of these tools are to assess the performance and reliability of data analysis in next-generation sequencing pipelines.
Authors: Steven N. Hart, Naresh Prodduturi
SoftSearch was developed as a sensitive structural variant (SV) detection tool for Illumina paired-end next-generation sequencing data. SoftSearch simultaneously utilizes soft-clipping and read-pair strategies for detecting SVs to increase sensitivity. Soft clips are proxies for split-reads that indicate part of the read maps to the reference genome, but the other part is not localized at the same place (for example, breakpoint spanning reads). Discordant read-pairs refer to a read and its mate, where the insert size is greater (or less than) the expected distribution of the dataset — or where the mapping orientation of the reads is unexpected (for example, both on the same strand). SoftSearch looks for areas with soft-clipping in the genome that have discordant read pairs supporting the anomaly. Once areas with both these conditions are identified, the read and mate information is extracted directly from the BAM file containing the discordant reads, obviating the need for time-consuming and error-prone complex alignment strategies. Only a small number of soft-masked bases discordant read-pairs are necessary to identify an SV, which on their own would not be sufficient to make an SV call, thus highlighting Soft Search’s improved sensitivity (see Performance).
Authors: Steven N. Hart, Jaysheel Bhavsar, Saurabh Baheti, Vivekananda (Vivek) Sarangi, Jean-Pierre A. Kocher.
A post-processor to optimize the selection of tag SNPs from common bin-tagging programs. SNPPicker uses a multi-step search strategy in combination with a statistical model to produce optimal genotyping panels. Authors: Hugues Sicotte, David N. Rider, Gregory A. Poland, Neelam Dhiman, Jean-Pierre A. Kocher. [03/2011]
A Targeted RE-sequencing Annotation Tool that offers a comprehensive, open framework, end-to-end solution for analyzing and interpreting targeted re-sequencing data. TREAT encompasses sequence alignment, variant calling, variant annotation, variant filtering, and visualization in one comprehensive analytic workflow. The rich set of annotations provided by TREAT enables the filtering of detected variants based on their functional characteristics, and visualizations at the variant positions allow the investigators to closely examine the identified variants of interest. An Amazon Cloud Image of TREAT is provided for researchers with no access to local bioinformatics infrastructure with instructions given in the tutorial below. The source code for local installation is available via the link below.
Authors: Yan W. Asmann, Sumit Middha, Asif Hossain, Saurabh Baheti, Ying Li, High-Seng Chai, Zhifu Sun, Patrick H. Duffy, Ahmed A. Hadad, Asha Nair, Xiaoyu Liu, Yuji Zhang, Eric W. Klee, Jean-Pierre A. Kocher. [06/2011]
TREAT files available upon request.
A bioinformatics tool to identify fusion transcripts from paired-end transcriptome sequencing data. The tool employs multiple steps of false positive filtering and nominates the fusion candidates with high confidence (approaching 100% true positive rate). The unique features of SnowShoes-FTD include: (i) the ability to discover multiple fusion isoforms in which the two gene partners give rise to transcripts with different junctions; (ii) prediction of potential fusion mechanisms including inversion, translocation, and/or interstitial deletions; (iii) identification of whether the junction point in a fusion transcript occurs at the boundaries of known exons which implies the fusion events might have happened inside an intron in DNA and transcribed to the fusion transcript.
Furthermore, the SnowShoes-FTD greatly simplifies the validation process of the fusion candidates by giving a 5’ to 3’ oriented template region spanning fusion junction point which is long enough for designing primers for PCR validation of the fusion candidates. The SnowShoes-FTD also predicts the protein sequences of the fusion genes using known transcript sequences of fusion partners and identifies in-frame vs. out-of-frame fusion products. In addition, the mutations including non-synonymous single amino acid changes and insertions at the fusion junction points for the in-frame fusion proteins are identified. The source codes of SnowShoes-FTD are provided in two formats: one configured to run on the Sun Grid Engine for parallelization with shorter run time, and the other formatted to run on a single LINUX node.
Note: The download package of the SnowShoes-FTD contains the tool itself, the reference files necessary to run the tool, and the test data. Because of its large size, we will set up a FTP transfer site for each request. We apologize for the inconvenience and we are looking for alternative sites to host the download.
Authors: Yan W. Asmann, Asif Hossain, Brian M. Necela, Sumit Middha, Krishna R. Kalari, Zhifu Sun, H.S. Chai, D.W. Williamson, Derek C. Radisky, G.P. Schroth, Jean-Pierre A. Kocher, Edith A. Perez, E. Aubrey Thompson
Please contact the author to gain access to the software:
Reduced representation bisulfite sequencing (RRBS) is a cost-effective approach for genome-wide methylation pattern profiling. Analyzing RRBS sequencing data is challenging and specialized alignment/mapping programs are needed. Although such programs have been developed, a comprehensive solution that provides researchers with good quality and analyzable data is still lacking.
To address this need, we have developed a Streamlined Analysis and Annotation Pipeline for RRBS data (SAAP-RRBS) that integrates read quality assessment/clean-up, alignment, methylation data extraction, annotation, reporting, and visualization. With this package, bioinformaticians or investigators can start from sequencing reads and get a fully annotated CpG methylation report quickly allowing more time for biological interpretation. The SAAP-RRBS program:
Authors: Zhifu Sun, Saurabh Baheti, Sumit Middha, Rahul Kanwar, Y. Zhang, X. Li, Andreas S. Beutler, Eric W. Klee, Yan W. Asmann, E. Aubrey Thompson, Jean-Pierre A. Kocher
Zhifu Sun, M.D.
© 2013 Mayo Foundation for Medical Education and Research. All rights reserved.