Page Comparison

Excerpt

Expand

title	Package List

Table of Contents

maxLevel	2

ANGSD

ANGSD is a software for analyzing next generation sequencing data. The software can handle a number of different input types from mapped reads to imputed genotype probabilities. Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. This is especially useful for low and medium depth data. The software is written in C++ and has been used on large sample sizes.

Code Block

language	bash
title	Usage, version 0.921

module load angsd/0.921

...

Code Block

language	bash
title	Example: /share/Apps/examples/angsd
collapse	true

#!/bin/bash

#SBATCH -p lts
#SBATCH -t 60
#SBATCH -n 1
#SBATCH -N 1

echo "This examples downloads sample data if not present"

if [[ ! -d bams ]]; then
  if [[ ! -f bams.tar.gz ]]; then
    wget http://popgen.dk/software/download/angsd/bams.tar.gz
    tar -xvzf bams.tar.gz
  fi
  module load samtools/1.10
  for i in bams/*.bam
  do
    samtools index $i
  done
  ls bams/*.bam > bam.filelist
  module unload samtools/1.10
fi

module load angsd/2019-11-05

angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5

angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5 -minMapQ 30 -minQ 20 -minMaf 0.05

For more information visit http://www.popgen.dk/angsd/index.php/ANGSD

BamTools

BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.

...

Code Block

language	bash
title	Example: /share/Apps/examples/bartender
collapse	true

#!/bin/bash

#SBATCH -p lts
#SBATCH -t 60
#SBATCH -n 1
#SBATCH -N 5

module load bartender

cd /share/Apps/examples/bartender

bartender_single_com -f random_small_data/2M_extracted_barcode.txt -o 2M_barcode -d 3

bartender_single_com -f random_small_data/2M_extracted_barcode_umi.txt -o 2M_barcodei_umi -d 3

bartender_extractor_com -f random_small_data/2M_test.fq -o 2M_extracted -q ? -p TACC[4-7]AA[4-7]AA[4-7]TT[4-7]ATAA -m 2 -d both

bartender_extractor_com -f random_small_data/2M_test.fq -o 2M_extracted -q ? -p TACC[4-7]AA[4-7]AA[4-7]TT[4-7]ATAA -m 2 --direction=forward -u 0,10

echo "Deleting Files created"
rm -rf 2Mbarcode*

cd simulation_data

echo "This example requires 26GB RAM"

tar -xvzf simulated_data.tar.gz

bash clustering_test.sh
echo "Deleting Files created"
rm -rf cluster_result_new_barcode.csv  cluster_result_new_cluster.csv  cluster_result_new_quality.csv simulation_data.csv

...

For more information, visit https://genome.ucsc.edu/cgi-bin/hgBlat or http://www.kentinformatics.com/

Bowtie

Bowtie, an ultrafast, memory-efficient short read aligner for short DNA sequences (reads) from next-gen sequencers. Please cite: Langmead B, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Code Block

language	bash

module load bowtie

For more information visit https://sourceforge.net/projects/bowtie-bio/

Bowtie2

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

...

For more information visit https://github.com/lh3/bwa

Cactus

Cactus is a reference-free whole-genome multiple alignment program. The principal algorithms are described here: https://doi.org/10.1101/gr.123356.111

Canu

Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION).

Code Block

language	bash

module load canu

For more information visit https://canu.readthedocs.io/en/latest/

...

For more information visit http://www.cbcb.umd.edu/software/jellyfish/

Kallisto

kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate as existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools.

...

For more information visit http://pachterlab.github.io/kallisto

Miniasm

Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.

Code Block

language	bash

module load miniasm

For more information visit https://github.com/lh3/miniasm

Minimap2

Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%.

...

For more information visit https://github.com/lh3/minimap2

NGSTools

NGS (Next-Generation Sequencing) technologies have revolutionised population genetic research by enabling unparalleled data collection from the genomes or subsets of genomes from many individuals. Current technologies produce short fragments of sequenced DNA called reads that are either de novo assembled or mapped to a pre-existing reference genome. This leads to chromosomal positions being sequenced a variable number of times across the genome. This parameter is usually referred to as the sequencing depth. Individual genotypes are then inferred from the proportion of nucleotide bases covering each site after the reads have been aligned.

...

For more information visit http://abacus.gene.ucl.ac.uk/software/paml.html

PHAST

Phylogenetic Analysis with Space/Time models (PHAST) is a freely available software package consisting of a collection of command-line programs and supporting libraries for comparative and evolutionary genomics. Best known as the search engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser, PHAST also includes several tools for phylogenetic modeling, functional element identification, as well as utilities for manipulating alignments, trees and genomic annotations.

Code Block

language	bash

module load phast

...

For more information visit https://github.com/rrwick/Porechop

Relion

RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy. It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology.

Code Block

language	bash

module load relion

For more information visit https://github.com/3dem/relion

RSEM

RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels. For visualization, It can generate BAM and Wiggle files in both transcript-coordinate and genomic-coordinate. Genomic-coordinate files can be visualized by both UCSC Genome browser and Broad Institute's Integrative Genomics Viewer (IGV). Transcript-coordinate files can be visualized by IGV. RSEM also has its own scripts to generate transcript read depth plots in pdf format. The unique feature of RSEM is, the read depth plots can be stacked, with read depth contributed to unique reads shown in black and contributed to multi-reads shown in red. In addition, models learned from data can also be visualized. Last but not least, RSEM contains a simulator.

Code Block

language	bash

module load rsem

For more information visit https://github.com/deweylab/RSEM

Salmon

Salmon is a tool for quantifying the expression of transcripts using RNA-seq data. Salmon uses new algorithms (specifically, coupling the concept of quasi-mapping with a two-phase inference procedure) to provide accurate expression estimates very quickly (i.e. wicked-fast) and while using little memory. Salmon performs its inference using an expressive and realistic model of RNA-seq data that takes into account experimental attributes and biases commonly observed in real RNA-seq data.

Code Block

language	bash

module load salmon

For more information visit http://combine-lab.github.io/salmon/

SAMTools

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format

...

For more information visit http://snpeff.sourceforge.net/

STAR

STAR is an ultrafast universal RNA-seq aligner.

Code Block

language	bash

module load star

For more information visit https://github.com/alexdobin/STAR

tabix

Generic indexer for TAB-delimited genome position files

...

For more information visit http://www.usadellab.org/cms/?page=trimmomatic

Trinity

Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.

Code Block

language	bash

module load trinity

For more information visit http://trinityrnaseq.github.io/

VCFTools

VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.

...

Version	Old Version 32	New Version 33
Changes made by	Former user	Former user
Saved on	Mar 06, 2020	Mar 06, 2020

Versions Compared

Key

ANGSD

BamTools

Bowtie

Bowtie2

Cactus

Canu

Kallisto

Miniasm

Minimap2

NGSTools

PHAST

Relion

RSEM

Salmon

SAMTools

STAR

tabix

Trinity

VCFTools