This page is aimed at highlighting especially useful bioinformatics web-services. Due to the sheer volume of tools out there, this page cannot ever be comprehensive. Rather, this site is aimed at useful rapid widgets that can save you time and effort.
Some other groups work to maintain comprehensive tool databases.
“Resource Resources”
Online Bioinformatics Resources Collection
Both of these sites have huge lists and descriptions of useful widgets
An excellent, maintained guide for bioinformatics programs and online resources
Not an online service, but a great free phylogenetic analysis program
https://github.com/microsud/Tools-Microbiome-Analysis
A great repository of R packages for analyzing microbiome data
Benchmarking Studies and Methods Overviews
(work in progress)
There are often many competing programs for performing the same task. Benchmarking studies are invaluable resources for selecting which program to use.
Comparing bioinformatic pipelines for microbial 16S rRNA amplicon sequencing
Comparison of Accuracy of Current Phylogenetic Tree-Building Programs
Comparison of Denoising Programs for 16S Amplicon Library Analysis
Also demonstrates the superiority of denoising approaches vs. classical alignment-based clustering approaches
Comparing Metagenome Assembly Tools
Spoiler: metaSPAdes, MEGAHIT, and IBDA-UD are all good, but which is superior depends on the community. Other assemblers are very poor in comparison.
Comparing Metatranscriptome Assembly Tools
Comparing and Contrasting Functional Annotation Systems
In-depth Comparison of KEGG and MetaCyc Systems
Microbiome Data as Compositional Data
Overview of Compositional Analyses for Amplicon Libraries and Transcriptomes
Overview of Microbiome Study Design Concepts
The most careful MAFFT algorithm (L-ins-I) is the most reliable
Overview of alpha-diversity metrics for microbiome data sets
The impact of PCR polymerase enzyme type on amplicon library error types and rates
Microbial Diversity
Integrated Microbial NGS Platform (IMNGS)
“a platform that uniformly and systematically screens for, retrieves, processes, and analyzes all available prokaryotic 16S rRNA gene amplicon datasets and use them to build sample-specific sequence databases and OTU-based profiles. The retrieved information can be used to address questions of relevance in microbial ecology, for example with respect to the occurrence of specific microorganisms in different ecosystems or to perform targeted diversity studies. For better comparison with existing data, imngs also offers a complete pipeline for de novo analysis of users’ own raw amplicon sequencing data generated using the Illumina technology.”
This is a potentially awesome tool; you can BLAST all 16S SRA data sets; IMNGS has pre-computed OTU tables based on UPARSE clustering. The taxonomy is not great, but the potential here is amazing.
A database of microbial taxonomy and *phenotypes* from available literature. There is also the option to get textmined phenotypic data. Currently not possible to do very taxonomically broad searches. Custom results can be downloaded in data tables for local processing.
Multifunctional Sites
A suite of fundamental sequence search and manipulation tools. Aligning, annotating, translating, etc. This is my new go to for general low-throughput work / exploratory analysis.
A huge humber of useful tools. Of note is BPROM, a bacterial promoter search tool.
The Eukaryotic Pathogens Database Resources page is a Galaxy instance which hosts a variety of tools. Highlights are an OrthoMCL pipeline (Markov protein clustering) and many RNA-seq tools. There are several other useful widgets in here also.
HIV Database at Los Alamos National Labs
Many general use tools for use here, mostly aimed at alignments and preparation for tree-building. A fun find was Pixel, which makes a picture from an ailgnment.
Functional Annotation
There are several web-based annotation systems out there to choose from. EggNOG is mentioned here because it is extremely fast and will provide, in addition to it’s own functional annotations (NOGs), COG, KEGG, and pfam all in one click (via the “sequence mapper” tool). It’s a great first-line function and orthology exploration tool.
A fast and free service for annotating proteins using a variety of systems (COG, pfam, TIGRFAM, prk). Users can submit many proteins at a time (I haven’t found the upper limit yet!)
A “fast” KEGG orthology annotation system. The only reliable way I have found to get your own KEGG annotations… EggNOG will do it, but it has limitations. The turn-around time is often several days, but there is no limit on how many proteins can be submitted. A similar system for smaller protein sets, BlastKOALA, is also available. There is a new HMM-model-based system also, called KofamKOALA.
Genome Classification
The Microbial Genomes Atlas (MiGA) is a web-tool (also downloadable and usable locally) which will perform rapid whole genome taxonomy explorations via ANI and AAI calculations. The web server maintains several very useful databases, such as the RefSeq representatives and TARA Oceans datasets.
This group developed GTDB-tk, which is supposedly a genome classifier tool. Like MiGA, this should handle taxonomy of genomes for which 16S rRNA genes are not available. They mention a web-server that runs GTDB-tk, but it is offline currently (Oct 2019). They also have many links to other useful taxonomy programs.
Phylogeny
Interactive Tree of Live (IToL)
The easiest way to generate high-quality, customized phylogenetic tree graphics from tree files (such as *.nwk).
Alignment of very large and diverse sequence sets is a challenging computational task. The MAFFT team maintains a powerful and easy to use webserver for using their program. Also provided are clear help files.
Free supercomputer on the XSEDE system with GUI-based interfaces of all of the best open-source alignment and tree-building tools. New algorithms are added regularly.
BOOSTER-BOOtstrap support by TransfER
This group hosts a server that implements their new algorithm for an updated Feselstein Bootstrapping which is adapted for large (100+) phylogenetic trees. Just upload your best tree and the file with bootstrap trees and the site will create new “transfer bootstrap” values. This algorithm was detailed in a recent Nature Methods publication.
IQ-TREE is a fairly new program for building ML-trees. There are potentially many innovations over other available programs (i.e. RaxML). W-IQ-TREE is a free computer cluster and web interface devoted to this program; while easy to use and generally free from traffic, user RAM and time limits are quite narrow. IQ-TREE was also recently installed at CIPRES, where wall times and RAM are very generous.
Other
A database of biodegradation pathways, hand-curated, searchable by reaction or compound.
This confusingly named web service offers several tests for positive selection in gene alignments, similar to the tests offered by the classic CodeML program. I haven’t tried it yet, but it is a reasonable place to run casual single gene queries.
A tool for doing core/pangenome analyses and generating… Venn Diagrams! Uses the standard MCL algorithm for delineating orthologs.
Last updated June 2, 2020 by RWMurdoch