Bioinformatics
584 views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by Wei Shen
Scoop.it!

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation

SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation | Bioinformatics | Scoop.it
FASTA and FASTQ are basic and ubiquitous formats for storing nucleotide and protein sequences. Common manipulations of FASTA/Q file include converting, searching, filtering, deduplication, splitting, shuffling, and sampling. Existing tools only implement some of these manipulations, and not particularly efficiently, and some are only available for certain operating systems. Furthermore, the complicated installation process of required packages and running environments can render these programs less user friendly. This paper describes a cross-platform ultrafast comprehensive toolkit for FASTA/Q processing. SeqKit provides executable binary files for all major operating systems, including Windows, Linux, and Mac OSX, and can be directly used without any dependencies or pre-configurations. SeqKit demonstrates competitive performance in execution time and memory usage compared to similar tools. The efficiency and usability of SeqKit enable researchers to rapidly accomplish common FASTA/Q file manipulations. SeqKit is open source and available on Github at https://github.com/shenwei356/seqkit .
Wei Shen's insight:
It's really fast!
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

A novel algorithm for detecting multiple covariance and clustering of biological sequences

A novel algorithm for detecting multiple covariance and clustering of biological sequences | Bioinformatics | Scoop.it
Single genetic mutations are always followed by a set of compensatory mutations. Thus, multiple changes commonly occur in biological sequences and play crucial roles in maintaining conformational and functional stability.
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

gTaxon - a fast cross-platform NCBI taxonomy data querying (gi2taxid, taxid2taxon, name2taxid, LCA) tool

gTaxon - a fast cross-platform NCBI taxonomy data querying (gi2taxid, taxid2taxon, name2taxid, LCA) tool | Bioinformatics | Scoop.it
gtaxon - gTaxon - a fast cross-platform NCBI taxonomy data querying (gi2taxid, taxid2taxon, name2taxid, LCA) tool, with cmd client and REST API server for both local and remote server.
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

Cheatsheet – Python & R codes for common Machine Learning Algorithms

Cheatsheet – Python & R codes for common Machine Learning Algorithms | Bioinformatics | Scoop.it
Cheat sheet on machine learning algorithms in Python & R. Includes codes on decision trees, random forest, gradient boost, kmeans, knn etc
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

A catalog of the mouse gut metagenome : Nature Biotechnology : Nature Publishing Group

A catalog of the mouse gut metagenome : Nature Biotechnology : Nature Publishing Group | Bioinformatics | Scoop.it
The mouse gut microbiome is catalogued and compared to the human microbiome.
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

Viral dark matter and virus-host interactions resolved from publicly available microbial genomes

Viral dark matter and virus-host interactions resolved from publicly available microbial genomes | Bioinformatics | Scoop.it
Viral dark matter and virus-host interactions resolved from publicly available microbial genomes
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

Machine learning applications in genetics and genomics : Nature Reviews Genetics : Nature Publishing Group

Machine learning applications in genetics and genomics : Nature Reviews Genetics : Nature Publishing Group | Bioinformatics | Scoop.it
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

vg - ga4gh refvar 15-04-2015

vg - ga4gh refvar 15-04-2015 | Bioinformatics | Scoop.it
Resequencing against a human whole genome variation graph (vg) Erik Garrison Sanger Wellcome Trust Institute GA4GH Reference Variation Call April 15, 2015
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

DIDA: Distributed Indexing Dispatched Alignment

DIDA: Distributed Indexing Dispatched Alignment | Bioinformatics | Scoop.it
One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida . The software is released under the British Columbia Cancer Agency License (BCCA), and is free for academic use.
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

I'm Moving and Hiring

I'm Moving and Hiring | Bioinformatics | Scoop.it
Starting June 1, 2015, my lab is moving to Iowa State University in Ames, Iowa, and I'm very excited about this. I'll be joining a growing cohort of researchers as part of a presidential "big data"...
Wei Shen's insight:

The Friedberg Lab is recruiting postdoctoral fellows to several newly funded projects. The lab is relocating to Iowa State University in Ames, Iowa as part of a university-wide Big Data initiative. Iowa State is a large research university with world-leading computational resources, and a strong highly collaborative community of bioengineering, bioinformaticians, and life science researchers.

more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

F1000Research Article: Top 10 metrics for life science software good practices.

F1000Research Article: Top 10 metrics for life science software good practices. | Bioinformatics | Scoop.it
Read the latest article version by Haydee Artaza, Neil Chue Hong, Manuel Corpas, Angel Corpuz, Rob Hooft, Rafael C. Jimenez, Brane Leskošek, Brett G. Olivier, Jan Stourac, Radka Svobodová Vařeková, Thomas Van Parys, Daniel Vaughan, at F1000Research.
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome

MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome | Bioinformatics | Scoop.it
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

How to Catch a Virus: Targeted Capture for Viral Sequencing

How to Catch a Virus: Targeted Capture for Viral Sequencing | Bioinformatics | Scoop.it
Metagenomic profiling, also called metagenomic shotgun sequencing (MSS) represents a powerful application made possible by the digital nature of next-gen sequencing technologies. In it, one basical…
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

Python for big data - webbedfeet - XMind: The Most Professional Mind Mapping Software

Python for big data - webbedfeet - XMind: The Most Professional Mind Mapping Software | Bioinformatics | Scoop.it
more...
No comment yet.
Rescooped by Wei Shen from Viruses and Bioinformatics from Virology.uvic.ca
Scoop.it!

Biologists and bioinformaticians have different software needs

Biologists and bioinformaticians have different software needs | Bioinformatics | Scoop.it
I attended the Bioinformatics Open Source Conference last week in Dublin. Galaxy and Docker were the buzzwords of the conference. A recurring theme was grounding our bioinformatics back in biology,...

Via burkesquires
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

VirSorter: mining viral signal from microbial genomic data

VirSorter: mining viral signal from microbial genomic data | Bioinformatics | Scoop.it
Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

Microbiome | Full text | Stability of operational taxonomic units: an important but neglected property for analyzing microbial diversity

The operational taxonomic unit (OTU) is widely used in microbial ecology. Reproducibility in microbial ecology research depends on the reliability of OTU-based 16S ribosomal subunit RNA (rRNA) analyses.
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

Choosing R or Python for data analysis? An infographic

Choosing R or Python for data analysis? An infographic | Bioinformatics | Scoop.it
Wondering whether you should use R or Python for your next data analysis post? Check our infographic "Data Science Wars: R vs Python".
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

Home : Statistics for Biologists

Home : Statistics for Biologists | Bioinformatics | Scoop.it
Search NatureSearch: GoAdvanced search
HomePractical guidesStatistics in biologyPoints of SignificanceOther resou
more...
No comment yet.
Scooped by Wei Shen
Scoop.it!

It's time to reboot bioinformatics education

It's time to reboot bioinformatics education | Bioinformatics | Scoop.it
Nearly 15 years after completion of the human genome, undergraduate and graduate programs still aren't adequately training future scientists with the basic bioinformatics skills needed to be succes...
more...
No comment yet.