ANGSD
ANGSD is a software for analyzing next generation sequencing data. The software can handle a number of different input types from mapped reads to imputed genotype probabilities. Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. This is especially useful for low and medium depth data. The software is written in C++ and has been used on large sample sizes.
module load angsd/0.921
module load angsd/2019-11-05
For more information visit http://www.popgen.dk/angsd/index.php/ANGSD
BamTools
BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.
module load bamtools/2.4.1
module load bamtools/2.5.1
For more information visit https://github.com/pezmaster31/bamtools/wiki
Bartender
Bartender is a c++ tool that is designed to process random barcode data. Bartender is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY.
It currently has three functionalities.
- It extracts barcodes from FASTA or FASTQ files.
- It clusters barcode reads and counts the frequency of each cluster.
- It generates count trajectories for time-course data.
module load bartender/1.1
Example
/share/Apps/examples/bartender
For more information visit https://github.com/LaoZZZZZ/bartender-1.1
BayeScan
BayeScan aims at identifying candidate loci under natural selection from genetic data, using differences in allele frequencies between populations. BayeScan is based on the multinomial-Dirichlet model.
module load bayescan/2.1
For more information visit http://cmpg.unibe.ch/software/BayeScan/
BayesTraits
BayesTraits is a computer package for performing analyses of trait evolution among groups of species for which a phylogeny or sample of phylogenies is available. This new package incoporates our earlier and separate programes Multistate, Discrete and Continuous. BayesTraits can be applied to the analysis of traits that adopt a finite number of discrete states, or to the analysis of continuously varying traits. Hypotheses can be tested about models of evolution, about ancestral states and about correlations among pairs of traits.
module load bayestraits
For more information visit http://www.evolution.rdg.ac.uk/BayesTraitsV3.0.2/BayesTraitsV3.0.2.html
bgc
bgc implements Bayesian estimation of genomic clines to quantify introgression at many loci.
module load bgc/1.03
For more information visit https://sites.google.com/site/bgcsoftware/
BLAST
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
module load blast-plus
For more information visit https://blast.ncbi.nlm.nih.gov/Blast.cgi
BLAT
BLAT is a bioinformatics software a tool which performs rapid mRNA/DNA and cross-species protein alignments. BLAT is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. (Source: Kent, W.J. 2002. BLAT -- The BLAST-Like Alignment Tool. Genome Research 4: 656-664.)
BLAT is not BLAST. DNA BLAT works by keeping an index of the entire genome (but not the genome itself) in memory. Since the index takes up a bit less than a gigabyte of RAM, BLAT can deliver high performance on a reasonably priced Linux box.
module load blat
For more information, visit https://genome.ucsc.edu/cgi-bin/hgBlat or http://www.kentinformatics.com/
Bowtie
Bowtie, an ultrafast, memory-efficient short read aligner for short DNA sequences (reads) from next-gen sequencers. Please cite: Langmead B, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
module load bowtie
For more information visit https://sourceforge.net/projects/bowtie-bio/
Bowtie2
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
module load bowtie2/2.3.4.1
module load bowtie2/2.3.5
For more information visit http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
BWA
BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.
module load bwa/0.7.15
module load bwa/0.7.17
For more information visit https://github.com/lh3/bwa
Cactus
Cactus is a reference-free whole-genome multiple alignment program. The principal algorithms are described here: https://doi.org/10.1101/gr.123356.111
Canu
Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION).
module load canu
For more information visit https://canu.readthedocs.io/en/latest/
eXpress
eXpress is a streaming DNA/RNA sequence quantification tool. It has initially been tested for RNA-Seq transcriptome quantification but can be used in any application where abundances of target sequences need to be estimated from short reads sequenced from them. More details, installation instructions, and the manual can be found at http://bio.math.berkeley.edu/eXpress/
module load express
Fastqc
A quality control tool for high throughput sequence data.
module load fastqc
For more information visit http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
FASTX-Toolkit
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
module load fastx-toolkit/0.0.14
For more information visit http://hannonlab.cshl.edu/fastx_toolkit/index.html
FreeBayes
FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
module load freebayes/1.1.0
For more information visit https://github.com/ekg/freebayes
GATK
A genomic analysis toolkit focused on variant discovery. The GATK is the industry standard for identifying SNPs and indels in germline DNA and RNAseq data. Its scope is now expanding to include somatic short variant calling, and to tackle copy number (CNV) and structural variation (SV). In addition to the variant callers themselves, the GATK also includes many utilities to perform related tasks such as processing and quality control of high-throughput sequencing data, and bundles the popular Picard toolkit.
These tools were primarily designed to process exomes and whole genomes generated with Illumina sequencing technology, but they can be adapted to handle a variety of other technologies and experimental designs. And although it was originally developed for human genetics, the GATK has since evolved to handle genome data from any organism, with any level of ploidy.
module load gatk
For more information visit https://gatk.broadinstitute.org/hc/en-us
Guppy
Hal
Produces multiple alignments and trees from genomic data. Hal is a phylogenetic pipeline. The alignments can be produced by a choice of four alignment programs and analyzed by a variety of phylogenetic programs. The Hal pipeline connects the programs BLASTP, MCL, user specified alignment programs, GBlocks, ProtTest and user specified phylogenetic programs to produce species trees.
JELLYFISH
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.
JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command. See the documentation below for more details.
module load jellyfish
For more information visit http://www.cbcb.umd.edu/software/jellyfish/
Kallisto
kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate as existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools.
module load kallisto
kallisto is described in detail in:
Nicolas L Bray, Harold Pimentel, Páll Melsted and Lior Pachter, Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology 34, 525–527 (2016), doi:10.1038/nbt.3519
For more information visit http://pachterlab.github.io/kallisto
Miniasm
Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.
module load miniasm
For more information visit https://github.com/lh3/miniasm
Minimap2
Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%.
For ~10kb noisy reads sequences, minimap2 is tens of times faster than mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP. It is more accurate on simulated long reads and produces biologically meaningful alignment ready for downstream analyses. For >100bp Illumina short reads, minimap2 is three times as fast as BWA-MEM and Bowtie2, and as accurate on simulated data. Detailed evaluations are available from the minimap2 paper or the preprint.
module load minimap2
For more information visit https://github.com/lh3/minimap2
NGSTools
NGS (Next-Generation Sequencing) technologies have revolutionised population genetic research by enabling unparalleled data collection from the genomes or subsets of genomes from many individuals. Current technologies produce short fragments of sequenced DNA called reads that are either de novo assembled or mapped to a pre-existing reference genome. This leads to chromosomal positions being sequenced a variable number of times across the genome. This parameter is usually referred to as the sequencing depth. Individual genotypes are then inferred from the proportion of nucleotide bases covering each site after the reads have been aligned.
Low sequencing depth and high error rates stemming from base calling and mapping errors can cause SNP (Single Nucleotide Polymorphism) and genotype calling from NGS data to be associated with considerable statistical uncertainty. Probabilistic models, which take these errors into account, have been proposed to accurately assign genotypes and estimate allele frequencies (e.g. Nielsen et al., 2012; for a review Nielsen et al., 2011).
ngsTools is a collection of programs for population genetics analyses from NGS data, taking into account data statistical uncertainty. The methods implemented in these programs do not rely on SNP or genotype calling, and are particularly suitable for low sequencing depth data. An application note illustrating its application has published (Fumagalli et al., 2014).
module load ngstools
For more information visit https://github.com/mfumagalli/ngsTools
PAML
PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood.
module load paml
For more information visit http://abacus.gene.ucl.ac.uk/software/paml.html
PEAR
PEAR is an ultrafast, memory-efficient and highly accurate pair-end read merger. It is fully parallelized and can run with as low as just a few kilobytes of memory.
PEAR evaluates all possible paired-end read overlaps and without requiring the target fragment size as input. In addition, it implements a statistical test for minimizing false-positive results. Together with a highly optimized implementation, it can merge millions of paired end reads within a couple of minutes on a standard desktop computer.
module load pear
For more information visit http://abacus.gene.ucl.ac.uk/software/paml.html
PHAST
Phylogenetic Analysis with Space/Time models (PHAST) is a freely available software package consisting of a collection of command-line programs and supporting libraries for comparative and evolutionary genomics. Best known as the search engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser, PHAST also includes several tools for phylogenetic modeling, functional element identification, as well as utilities for manipulating alignments, trees and genomic annotations.
module load phast
For more information visit http://compgen.cshl.edu/phast/index.php
Pilon
Pilon is a software tool which can be used to:
- Automatically improve draft assemblies
- Find variation among strains, including large event detection
Pilon requires as input a FASTA file of the genome along with one or more BAM files of reads aligned to the input FASTA file. Pilon uses read alignment analysis to identify inconsistencies between the input genome and the evidence in the reads. It then attempts to make improvements to the input genome, including:
- Single base differences
- Small indels
- Larger indel or block substitution events
- Gap filling
- Identification of local misassemblies, including optional opening of new gaps
Pilon then outputs a FASTA file containing an improved representation of the genome from the read data and an optional VCF file detailing variation seen between the read data and the input genome.
To aid manual inspection and improvement by an analyst, Pilon can optionally produce tracks that can be displayed in genome viewers such as IGV and GenomeView, and it reports other events (such as possible large collapsed repeat regions) in its standard output.
module load pilon
For more information visit https://github.com/broadinstitute/pilon/wiki
Porechop
Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity.
Porechop also supports demultiplexing of Nanopore reads that were barcoded with the Native Barcoding Kit, PCR Barcoding Kit or Rapid Barcoding Kit.
module load porechop
For more information visit https://github.com/rrwick/Porechop
Relion
RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy. It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology.
module load relion
For more information visit https://github.com/3dem/relion
RSEM
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels. For visualization, It can generate BAM and Wiggle files in both transcript-coordinate and genomic-coordinate. Genomic-coordinate files can be visualized by both UCSC Genome browser and Broad Institute's Integrative Genomics Viewer (IGV). Transcript-coordinate files can be visualized by IGV. RSEM also has its own scripts to generate transcript read depth plots in pdf format. The unique feature of RSEM is, the read depth plots can be stacked, with read depth contributed to unique reads shown in black and contributed to multi-reads shown in red. In addition, models learned from data can also be visualized. Last but not least, RSEM contains a simulator.
module load rsem
For more information visit https://github.com/deweylab/RSEM
Salmon
Salmon is a tool for quantifying the expression of transcripts using RNA-seq data. Salmon uses new algorithms (specifically, coupling the concept of quasi-mapping with a two-phase inference procedure) to provide accurate expression estimates very quickly (i.e. wicked-fast) and while using little memory. Salmon performs its inference using an expressive and realistic model of RNA-seq data that takes into account experimental attributes and biases commonly observed in real RNA-seq data.
module load salmon
For more information visit http://combine-lab.github.io/salmon/
SAMTools
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format
module load samtools/1.4
module load samtools/1.9
For more information visit http://www.htslib.org/
SnpEff
SnpEff is a variant annotation and effect prediction tool. It annotates and predicts the effects of genetic variants (such as amino acid changes).
module load snpeff
For more information visit http://snpeff.sourceforge.net/
STAR
STAR is an ultrafast universal RNA-seq aligner.
module load star
For more information visit https://github.com/alexdobin/STAR
tabix
Generic indexer for TAB-delimited genome position files
module load tabix/2013-12-16
For more information visit https://github.com/samtools/tabix
trimmomatic
A flexible read trimming tool for Illumina NGS data.
module load trimmomatic/0.36
module load trimmomatic/0.38
For more information visit http://www.usadellab.org/cms/?page=trimmomatic
Trinity
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.
module load trinity
For more information visit http://trinityrnaseq.github.io/
VCFTools
VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
module load vcftools/0.1.14
module load vcftools/0.1.15
For more information visit http://vcftools.sourceforge.net/