Bioinformatics

SBGrid supports 23 bioinformatics software titles. The following software packages are supported by SBGrid; availability of a specific software package may be limited based on license requirements.

Software Title Description Linux 32-bit Linux 64-bit OS X Intel Links

Aline

- an interactive perl/tk application that can read common sequence alignment formats that the user can then alter, ...

yes yes yes

  • an interactive perl/tk application that can read common sequence alignment formats that the user can then alter, embellish, markup, etc. to produce the kind of sequence figure commonly found in biochemical articles. Developed by Charlie Bond and Alex Schüttelkopf.

Developers

Charlie Bond

Categories

Bioinformatics

Versions

Citations

Bond, C.S. and Schüttelkopff, A.W. (2009), Acta cryst. D65, 510-512


Static link to the SBGrid Aline page.

Alscript

- a program to format multiple sequence alignments in PostScript for publication and to assist in analysis. Alscript ...

yes yes

  • a program to format multiple sequence alignments in PostScript for publication and to assist in analysis. Alscript does not support point-and-click, but has a scripting language to allow complex effects.

Developers

Geoff Barton

Categories

Bioinformatics

Versions

Citations

Barton. ALSCRIPT a tool to format multiple sequence alignments Protein Engineering. 1993. 6(1):37-40.


Static link to the SBGrid Alscript page.

AMPS

- a suite of programs designed for the alignment of multiple protein sequences and flexible pattern matching.

yes yes yes

  • a suite of programs designed for the alignment of multiple protein sequences and flexible pattern matching.

Developers

Geoff Barton

Categories

Bioinformatics

Versions

Citations

Barton and Sternberg. Evaluation and Improvements in the Automatic Alignment of Protein Sequences. 1978a. Prot. Eng. 1:89-94.

Barton and Sternberg. A Strategy for the Rapid Multiple Alignment of Protein Sequences: Confidence Levels from Tertiary Structure Comparisons. J. Mol. Biol. 1978b. 198:327-337.


Static link to the SBGrid AMPS page.

BLAST

- (Basic Local Alignment Search Tool) finds regions of similarity between biological sequences.

yes yes yes

  • (Basic Local Alignment Search Tool) finds regions of similarity between biological sequences.

Categories

Bioinformatics

Versions

Citations

Altschul et al. Basic local alignment search tool. J Mol Biol (1990) vol. 215 (3) pp. 403-10

Technical Notes

Full Blast database documentation is here: ftp://ftp.ncbi.nlm.nih.gov/blast/db/blastdb.html

Pre-formatted databases must be downloaded using the update_blastdb.pl script or via FTP in binary mode. Documentation for the update_blastdb.pl script can be obtained by running the script without any arguments (perl is required).

The compressed files downloaded must be inflated with gzip or other decompress tools. The BLAST database files can then be extracted out of the resulting tar file using tar program on Unix/Linux or WinZip and StuffIt Expander on Windows and Macintosh platforms, respectively.

Large databases are formatted in multiple 1 Gigabytes volumes, which are named using the database.##.tar.gz convention. All relevant volumes are required. An alias file is provided so that the database can be called using the alias name without the extension (.nal or .pal). For example, to call est database, simply use "-d est" option in the commandline (without the quotes).

Certain databases are subsets of a larger parental database. For those databases, alias and mask files, rather than actual databases, are provided. The mask file needs the parent database to function properly. The parent databases should be generated on the same day as the mask file. For example, to use swissprot pre-formatted database, swissprot.tar.gz, one will need to get the nr.tar.gz with the same date stamp.


Static link to the SBGrid BLAST page.

breseq

- a computational pipeline for finding mutations relative to a reference sequence in short-read DNA.

yes yes yes

  • a computational pipeline for finding mutations relative to a reference sequence in short-read DNA.

Developers

Dave Knoester

Jeffrey Barrick

Categories

Bioinformatics

Versions


Static link to the SBGrid breseq page.

Clustal

- a general purpose multiple sequence alignment program for DNA or proteins.

yes yes yes

  • a general purpose multiple sequence alignment program for DNA or proteins.

Developers

Clustal Developer Group

Categories

Bioinformatics

Versions

Citations

Chenna et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res (2003) vol. 31 (13) pp. 3497-500


Static link to the SBGrid Clustal page.

EMBOSS

- integrates a range of currently available packages and tools for sequence analysis into a seamless whole.

yes yes yes

  • integrates a range of currently available packages and tools for sequence analysis into a seamless whole.

Developers

Alan Bleasby

Peter Rice

Categories

Bioinformatics

Utilities

Versions

Citations

Olson. EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite. Briefings in bioinformatics (2002) vol. 3 (1) pp. 87-91


Static link to the SBGrid EMBOSS page.

FASTA

- a DNA and protein sequence alignment software package that searches for matching sequence patterns or words, called ...

yes yes yes

  • a DNA and protein sequence alignment software package that searches for matching sequence patterns or words, called k-tuples. Its legacy is the FASTA format which is now ubiquitous in bioinformatics. First described (as FASTP) by David J. Lipman and William R. Pearson in 1985.

Developers

William Pearson

Categories

Bioinformatics

Versions

Citations

Mount. Using a FASTA Sequence Database Similarity Search. CSH protocols (2007) vol. 2007 pp. pdb.top16

Pearson. Rapid and sensitive sequence comparison with FASTP and FASTA. Methods in enzymology (1990) vol. 183 pp. 63-98


Static link to the SBGrid FASTA page.

Jalview

- a multiple sequence alignment editor written in Java. It is used widely in a variety of web ...

yes yes yes

  • a multiple sequence alignment editor written in Java. It is used widely in a variety of web pages (e.g. the EBI Clustalw server and the Pfam protein domain database) but is available as a general purpose alignment editor.

Developers

Geoff Barton

Categories

Bioinformatics

Versions

Citations

Clamp et al. The Jalview Java alignment editor. Bioinformatics (2004) vol. 20 (3) pp. 426-7

Waterhouse et al. Jalview Version 2--a multiple sequence alignment editor and analysis workbench. Bioinformatics (2009) vol. 25 (9) pp. 1189-91


Static link to the SBGrid Jalview page.

MAFFT

- a multiple sequence alignment program. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment ...

yes yes yes

  • a multiple sequence alignment program. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼10,000 sequences), etc.

Developers

Katoh Kazutaka

Categories

Bioinformatics

Versions

Citations

Katoh et al. Multiple alignment of DNA sequences with MAFFT. Methods Mol Biol (2009) vol. 537 pp. 39-64

Katoh and Toh. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinformatics (2008) vol. 9 (4) pp. 286-98


Static link to the SBGrid MAFFT page.

Matt

- a multiple protein structure alignment program. It uses local geometry to align segments of two sets of ...

yes yes yes

  • a multiple protein structure alignment program. It uses local geometry to align segments of two sets of proteins, allowing limited bends in the backbones between the segments.

Developers

Cowen

Matthew Menke

Categories

Bioinformatics

Versions

Citations

Menke et al. Matt: local flexibility aids protein multiple structure alignment. PLoS Comput Biol (2008) vol. 4 (1) pp. e10


Static link to the SBGrid Matt page.

MOLPHY

- (MOLecular PHYlogenetics) is a computer program package for molecular phylogenetics.

yes yes yes

  • (MOLecular PHYlogenetics) is a computer program package for molecular phylogenetics.

Developers

Masami Hasegawa

Institute for Statistical Mathematics software & Data Library

Jun Adachi

Categories

Bioinformatics

Versions

Citations

ADACHI, J., & HASEGAWA, M. MOLPHY, programs for molecular phylogenetics, I: PROTML, maximum likelihood inference of protein phylogeny. (1992). Tokyo, Japan, Institute of Statistical Mathematics.


Static link to the SBGrid MOLPHY page.

MUSCLE

- (multiple sequence comparison by log-expectation) is a public domain multiple alignment software for protein and nucleotide sequences.

yes yes yes

  • (multiple sequence comparison by log-expectation) is a public domain multiple alignment software for protein and nucleotide sequences.

Developers

Robert Edgar

Categories

Bioinformatics

Versions

Citations

Edgar. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research (2004) vol. 32 (5) pp. 1792-7


Static link to the SBGrid MUSCLE page.

NETBLAST

- NETBLAST is a simple command-line program that allows you to submit a single file of FASTA sequences ...

yes yes yes

  • NETBLAST is a simple command-line program that allows you to submit a single file of FASTA sequences over an internet connection to the NCBI BLAST databases. Searches are submitted through the client to the NCBI servers and do not need to download the databases locally (also called netblast and blastcl3).

Categories

Bioinformatics

Versions

Citations

Altschul et al. Basic local alignment search tool. 1990. J. Mol. Biol. 215:403-410.


Static link to the SBGrid NETBLAST page.

PHYLIP

- a free package of software programs for inferring phylogenies.

yes yes yes

  • a free package of software programs for inferring phylogenies.

Developers

Joseph Felsenstein

Categories

Bioinformatics

Utilities

Versions

Citations

Felsenstein, J. 2005. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle. or Felsenstein, J. 1989. PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166.


Static link to the SBGrid PHYLIP page.

Primer3

- a widely used program that designs PCR primers (PCR = "Polymerase Chain Reaction"). Primer3 can also design ...

yes yes yes

  • a widely used program that designs PCR primers (PCR = "Polymerase Chain Reaction"). Primer3 can also design hybridization probes and sequencing primers.

Developers

Steve Rozen

Categories

Bioinformatics

Versions

Citations

Koressaar and Remm. Enhancements and modifications of primer design program Primer3. Bioinformatics (Oxford, England) (2007) vol. 23 (10) pp. 1289-91


Static link to the SBGrid Primer3 page.

PROBCONS

- an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared ...

yes yes yes

  • an efficient protein multiple sequence alignment program, which has demonstrated a statistically significant improvement in accuracy compared to several leading alignment tools.

Categories

Bioinformatics

Versions

Citations

Do et al. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome research (2005) vol. 15 (2) pp. 330-40


Static link to the SBGrid PROBCONS page.

PSIPRED

- uses a simple and accurate secondary structure prediction method incorporating two feed-forward neural networks which perform an ...

yes yes yes

  • uses a simple and accurate secondary structure prediction method incorporating two feed-forward neural networks which perform an analysis on output obtained from BLAST.

Developers

Marina Santilli

David Jones

Categories

Bioinformatics

Versions

Citations

McGuffin et al. The PSIPRED protein structure prediction server. Bioinformatics (Oxford, England) (2000) vol. 16 (4) pp. 404-5


Static link to the SBGrid PSIPRED page.

SAM

- a collection of tools for creating, refining, and using linear hidden Markov models for biological sequence analysis. ...

yes yes yes

  • a collection of tools for creating, refining, and using linear hidden Markov models for biological sequence analysis. The model states can be viewed as representing the sequence of columns in a multiple sequence alignment, with provisions for arbitrary position-dependent insertions and deletions in each sequence. The models are trained on a family of protein or nucleic acid sequences using an expectation-maximization algorithm and a variety of algorithmic heuristics. A trained model can then be used to both generate multiple alignments and search databases for new members of the family.

Developers

Anders Krogh

Richard Hughey

SAM Developer Group

Categories

Bioinformatics

Versions

Citations

SAM: Hughey and Krogh. Hidden Markov models for sequence analysis: Extension and analysis of the basic method. CABIOS. 1996. 12(2):95-107.

SAM-T2K: Karplus et al. Hidden Markov Models for Detecting Remote Protein Homologies, Bioinformatics. 1998. 14(10):846-856.

HMM: Krogh et al. Hidden Markov models in computational biology: Applications to protein modeling. Journal of Molecular Biology. 1994. 235:1501-1531.


Static link to the SBGrid SAM page.

SCC

- a suite of programs for sequence alignment including: aln, swg, prrn, phyln and makmdm. aln: Pairwise alignment ...

yes yes yes

  • a suite of programs for sequence alignment including: aln, swg, prrn, phyln and makmdm.

aln: Pairwise alignment of biological sequences supporting spliced alignment procedures.

swg: locally aligns a pair of DNA or protein sequences by Smith-Waterman-Gotoh algorithm. Currently spliced alignment is not supported. Profile version is very slow.

prrn: global multiple alignment of a set of protein or DNA sequences by doubly nested iterative refinement method.

phyln: UPGMA or NJ method to make a phylogenetic tree from a multiple alignment.

makmdm: constructs binary PAM matrices. Must be run once before the first run of aln, swg or prrn.

Developers

Oasmu Gotoh

Categories

Bioinformatics

Versions


Static link to the SBGrid SCC page.

SSAHA2

- (Sequence Search and Alignment by Hashing Algorithm) is a pairwise sequence alignment program designed for the efficient ...

yes yes yes

  • (Sequence Search and Alignment by Hashing Algorithm) is a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences.

SSAHA2 reads of most sequencing platforms (ABI-Sanger, Roche 454, Illumina-Solexa) and a range of output formats (SAM, CIGAR, PSL etc.) are supported. A pile-up pipeline for analysis and genotype calling is available as a separate package.

Developers

Zemin Ning

Hannes Ponstingl

Categories

Bioinformatics

Versions

Citations

Ning et al. SSAHA: a fast search method for large DNA databases. Genome research 2001. 11(10):1725-9.


Static link to the SBGrid SSAHA2 page.

Staden

- a set of DNA sequence assembly, editing and analyzing tools. Developed at the Medical Research Council Laboratory ...

yes yes yes

  • a set of DNA sequence assembly, editing and analyzing tools. Developed at the Medical Research Council Laboratory of Molecular Biology, Cambridge, UK

Developers

James Bonfield

Categories

Bioinformatics

Versions

Citations

Staden et al. The Staden package, 1998. Methods Mol Biol. 2000. 132:115-30.


Static link to the SBGrid Staden page.

T-Coffee

- a multiple sequence alignment package. You can use T-Coffee to align sequences or to combine the output ...

yes yes yes

  • a multiple sequence alignment package. You can use T-Coffee to align sequences or to combine the output of your favorite alignment methods (Clustal, Mafft, Probcons, Muscle...) into one unique alignment (M-Coffee).

Developers

Cedric Notredame

Categories

Bioinformatics

Versions

Citations

Notredame et al.T-Coffee: A novel method for multiple sequence alignments. JMB. 2000. 302: 205-217.


Static link to the SBGrid T-Coffee page.