Excerpt | |||||||||
---|---|---|---|---|---|---|---|---|---|
|
ANGSD
ANGSD is a software for analyzing next generation sequencing data. The software can handle a number of different input types from mapped reads to imputed genotype probabilities. Most methods take genotype uncertainty into account instead of basing the analysis on called genotypes. This is especially useful for low and medium depth data. The software is written in C++ and has been used on large sample sizes.
Code Block | ||||
---|---|---|---|---|
| ||||
module load angsd/0.921 |
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/bin/bash #SBATCH -p lts #SBATCH -t 60 #SBATCH -n 1 #SBATCH -N 1 echo "This examples downloads sample data if not present" if [[ ! -d bams ]]; then if [[ ! -f bams.tar.gz ]]; then wget http://popgen.dk/software/download/angsd/bams.tar.gz tar -xvzf bams.tar.gz fi module load samtools/1.10 for i in bams/*.bam do samtools index $i done ls bams/*.bam > bam.filelist module unload samtools/1.10 fi module load angsd/2019-11-05 angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5 angsd -b bam.filelist -GL 1 -doMajorMinor 1 -doMaf 2 -P 5 -minMapQ 30 -minQ 20 -minMaf 0.05 |
For more information visit http://www.popgen.dk/angsd/index.php/ANGSD
BamTools
BamTools is a project that provides both a C++ API and a command-line toolkit for reading, writing, and manipulating BAM (genome alignment) files.
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
#!/bin/bash
#SBATCH -p lts
#SBATCH -t 60
#SBATCH -n 1
#SBATCH -N 5
module load bartender
cd /share/Apps/examples/bartender
bartender_single_com -f random_small_data/2M_extracted_barcode.txt -o 2M_barcode -d 3
bartender_single_com -f random_small_data/2M_extracted_barcode_umi.txt -o 2M_barcodei_umi -d 3
bartender_extractor_com -f random_small_data/2M_test.fq -o 2M_extracted -q ? -p TACC[4-7]AA[4-7]AA[4-7]TT[4-7]ATAA -m 2 -d both
bartender_extractor_com -f random_small_data/2M_test.fq -o 2M_extracted -q ? -p TACC[4-7]AA[4-7]AA[4-7]TT[4-7]ATAA -m 2 --direction=forward -u 0,10
echo "Deleting Files created"
rm -rf 2Mbarcode*
cd simulation_data
echo "This example requires 26GB RAM"
tar -xvzf simulated_data.tar.gz
bash clustering_test.sh
echo "Deleting Files created"
rm -rf cluster_result_new_barcode.csv cluster_result_new_cluster.csv cluster_result_new_quality.csv simulation_data.csv
|
...
For more information, visit https://genome.ucsc.edu/cgi-bin/hgBlat or http://www.kentinformatics.com/
Bowtie
Bowtie, an ultrafast, memory-efficient short read aligner for short DNA sequences (reads) from next-gen sequencers. Please cite: Langmead B, et al. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
Code Block | ||
---|---|---|
| ||
module load bowtie |
For more information visit https://sourceforge.net/projects/bowtie-bio/
Bowtie2
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
...
For more information visit https://github.com/lh3/bwa
Cactus
Cactus is a reference-free whole-genome multiple alignment program. The principal algorithms are described here: https://doi.org/10.1101/gr.123356.111
Canu
Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing (such as the PacBio RSII or Oxford Nanopore MinION).
Code Block | ||
---|---|---|
| ||
module load canu |
For more information visit https://canu.readthedocs.io/en/latest/
...
For more information visit http://www.cbcb.umd.edu/software/jellyfish/
Kallisto
kallisto is a program for quantifying abundances of transcripts from bulk and single-cell RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate as existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools.
...
For more information visit http://pachterlab.github.io/kallisto
Miniasm
Miniasm is a very fast OLC-based de novo assembler for noisy long reads. It takes all-vs-all read self-mappings (typically by minimap) as input and outputs an assembly graph in the GFA format. Different from mainstream assemblers, miniasm does not have a consensus step. It simply concatenates pieces of read sequences to generate the final unitig sequences. Thus the per-base error rate is similar to the raw input reads.
Code Block | ||
---|---|---|
| ||
module load miniasm |
For more information visit https://github.com/lh3/miniasm
Minimap2
Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%.
...
For more information visit https://github.com/lh3/minimap2
NGSTools
NGS (Next-Generation Sequencing) technologies have revolutionised population genetic research by enabling unparalleled data collection from the genomes or subsets of genomes from many individuals. Current technologies produce short fragments of sequenced DNA called reads that are either de novo assembled or mapped to a pre-existing reference genome. This leads to chromosomal positions being sequenced a variable number of times across the genome. This parameter is usually referred to as the sequencing depth. Individual genotypes are then inferred from the proportion of nucleotide bases covering each site after the reads have been aligned.
...
For more information visit http://abacus.gene.ucl.ac.uk/software/paml.html
PHAST
Phylogenetic Analysis with Space/Time models (PHAST) is a freely available software package consisting of a collection of command-line programs and supporting libraries for comparative and evolutionary genomics. Best known as the search engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser, PHAST also includes several tools for phylogenetic modeling, functional element identification, as well as utilities for manipulating alignments, trees and genomic annotations.
Code Block | ||
---|---|---|
| ||
module load phast |
...
For more information visit https://github.com/rrwick/Porechop
Relion
RELION (for REgularised LIkelihood OptimisatioN) is a stand-alone computer program for Maximum A Posteriori refinement of (multiple) 3D reconstructions or 2D class averages in cryo-electron microscopy. It is developed in the research group of Sjors Scheres at the MRC Laboratory of Molecular Biology.
Code Block | ||
---|---|---|
| ||
module load relion |
For more information visit https://github.com/3dem/relion
RSEM
RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package provides an user-friendly interface, supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. In addition, it provides posterior mean and 95% credibility interval estimates for expression levels. For visualization, It can generate BAM and Wiggle files in both transcript-coordinate and genomic-coordinate. Genomic-coordinate files can be visualized by both UCSC Genome browser and Broad Institute's Integrative Genomics Viewer (IGV). Transcript-coordinate files can be visualized by IGV. RSEM also has its own scripts to generate transcript read depth plots in pdf format. The unique feature of RSEM is, the read depth plots can be stacked, with read depth contributed to unique reads shown in black and contributed to multi-reads shown in red. In addition, models learned from data can also be visualized. Last but not least, RSEM contains a simulator.
Code Block | ||
---|---|---|
| ||
module load rsem |
For more information visit https://github.com/deweylab/RSEM
Salmon
Salmon is a tool for quantifying the expression of transcripts using RNA-seq data. Salmon uses new algorithms (specifically, coupling the concept of quasi-mapping with a two-phase inference procedure) to provide accurate expression estimates very quickly (i.e. wicked-fast) and while using little memory. Salmon performs its inference using an expressive and realistic model of RNA-seq data that takes into account experimental attributes and biases commonly observed in real RNA-seq data.
Code Block | ||
---|---|---|
| ||
module load salmon |
For more information visit http://combine-lab.github.io/salmon/
SAMTools
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format
...
For more information visit http://snpeff.sourceforge.net/
STAR
STAR is an ultrafast universal RNA-seq aligner.
Code Block | ||
---|---|---|
| ||
module load star |
For more information visit https://github.com/alexdobin/STAR
tabix
Generic indexer for TAB-delimited genome position files
...
For more information visit http://www.usadellab.org/cms/?page=trimmomatic
Trinity
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.
Code Block | ||
---|---|---|
| ||
module load trinity |
For more information visit http://trinityrnaseq.github.io/
VCFTools
VCFtools is a program package designed for working with VCF files, such as those generated by the 1000 Genomes Project. The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
...