This page is aimed at highlighting especially useful bioinformatics web-services. Due to the sheer volume of tools out there, this page cannot ever be comprehensive. Rather, this site is aimed at useful rapid widgets that can save you time and effort.
Some other groups work to maintain comprehensive tool databases.
Both of these sites have huge lists and descriptions of useful widgets
An excellent, maintained guide for bioinformatics programs and online resources
Not an online service, but a great free phylogenetic analysis program
A great repository of R packages for analyzing microbiome data
Benchmarking Studies and Methods Overviews
(work in progress)
There are often many competing programs for performing the same task. Benchmarking studies are invaluable resources for selecting which program to use.
Also demonstrates the superiority of denoising approaches vs. classical alignment-based clustering approaches
Spoiler: metaSPAdes, MEGAHIT, and IBDA-UD are all good, but which is superior depends on the community. Other assemblers are very poor in comparison.
The most careful MAFFT algorithm (L-ins-I) is the most reliable
“a platform that uniformly and systematically screens for, retrieves, processes, and analyzes all available prokaryotic 16S rRNA gene amplicon datasets and use them to build sample-specific sequence databases and OTU-based profiles. The retrieved information can be used to address questions of relevance in microbial ecology, for example with respect to the occurrence of specific microorganisms in different ecosystems or to perform targeted diversity studies. For better comparison with existing data, imngs also offers a complete pipeline for de novo analysis of users’ own raw amplicon sequencing data generated using the Illumina technology.”
This is a potentially awesome tool; you can BLAST all 16S SRA data sets; IMNGS has pre-computed OTU tables based on UPARSE clustering. The taxonomy is not great, but the potential here is amazing.
A database of microbial taxonomy and *phenotypes* from available literature. There is also the option to get textmined phenotypic data. Currently not possible to do very taxonomically broad searches. Custom results can be downloaded in data tables for local processing.
A suite of fundamental sequence search and manipulation tools. Aligning, annotating, translating, etc. This is my new go to for general low-throughput work / exploratory analysis.
A huge humber of useful tools. Of note is BPROM, a bacterial promoter search tool.
The Eukaryotic Pathogens Database Resources page is a Galaxy instance which hosts a variety of tools. Highlights are an OrthoMCL pipeline (Markov protein clustering) and many RNA-seq tools. There are several other useful widgets in here also.
There are several web-based annotation systems out there to choose from. EggNOG is mentioned here because it is extremely fast and will provide, in addition to it’s own functional annotations (NOGs), COG, KEGG, and pfam all in one click (via the “sequence mapper” tool). It’s a great first-line function and orthology exploration tool.
A fast and free service for annotating proteins using a variety of systems (COG, pfam, TIGRFAM, prk). Users can submit many proteins at a time (I haven’t found the upper limit yet!)
A “fast” KEGG orthology annotation system. The only reliable way I have found to get your own KEGG annotations… EggNOG will do it, but it has limitations. The turn-around time is often several days, but there is no limit on how many proteins can be submitted. A similar system for smaller protein sets, BlastKOALA, is also available. There is a new HMM-model-based system also, called KofamKOALA.
The Microbial Genomes Atlas (MiGA) is a web-tool (also downloadable and usable locally) which will perform rapid whole genome taxonomy explorations via ANI and AAI calculations. The web server maintains several very useful databases, such as the RefSeq representatives and TARA Oceans datasets.
This group developed GTDB-tk, which is supposedly a genome classifier tool. Like MiGA, this should handle taxonomy of genomes for which 16S rRNA genes are not available. They mention a web-server that runs GTDB-tk, but it is offline currently (Oct 2019). They also have many links to other useful taxonomy programs.
The easiest way to generate high-quality, customized phylogenetic tree graphics from tree files (such as *.nwk).
Alignment of very large and diverse sequence sets is a challenging computational task. The MAFFT team maintains a powerful and easy to use webserver for using their program. Also provided are clear help files.
Free supercomputer on the XSEDE system with GUI-based interfaces of all of the best open-source alignment and tree-building tools. New algorithms are added regularly.
This group hosts a server that implements their new algorithm for an updated Feselstein Bootstrapping which is adapted for large (100+) phylogenetic trees. Just upload your best tree and the file with bootstrap trees and the site will create new “transfer bootstrap” values. This algorithm was detailed in a recent Nature Methods publication.
IQ-TREE is a fairly new program for building ML-trees. There are potentially many innovations over other available programs (i.e. RaxML). W-IQ-TREE is a free computer cluster and web interface devoted to this program; while easy to use and generally free from traffic, user RAM and time limits are quite narrow. IQ-TREE was also recently installed at CIPRES, where wall times and RAM are very generous.
A database of biodegradation pathways, hand-curated, searchable by reaction or compound.
This confusingly named web service offers several tests for positive selection in gene alignments, similar to the tests offered by the classic CodeML program. I haven’t tried it yet, but it is a reasonable place to run casual single gene queries.
A tool for doing core/pangenome analyses and generating… Venn Diagrams! Uses the standard MCL algorithm for delineating orthologs.
Last updated June 2, 2020 by RWMurdoch