Virology and Bioinformatics from Virology.ca
81.7K views | +4 today
Follow
Virology and Bioinformatics from Virology.ca
Virus and bioinformatics articles with some microbiology and immunology thrown in for good measure
Your new post is loading...
Your new post is loading...
Scooped by Cindy
Scoop.it!

Assemblytics: a web analytics tool for the detection of variants from an assembly. - PubMed - NCBI

Assemblytics is a web app for detecting and analyzing variants from a de novo genome assembly aligned to a reference genome. It incorporates a unique anchor filtering approach to increase robustness to repetitive elements, and identifies six classes of variants based on their distinct alignment signatures. Assemblytics can be applied both to comparing aberrant genomes, such as human cancers, to a reference, or to identify differences between related species. Multiple interactive visualizations enable in-depth explorations of the genomic distributions of variants.
AVAILABILITY AND IMPLEMENTATION:
http://assemblytics.com, https://github.com/marianattestad/assemblytics ;

more...
No comment yet.
Scooped by Cindy
Scoop.it!

Evaluation of hybrid and non-hybrid methods for de novo assembly of nanopore reads. - PubMed - NCBI

AVAILABILITY:
https://github.com/kkrizanovic/NanoMark CONTACT: mile.sikic@fer.hr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly

The large amount of short read data that has to be assembled in future applications, such as in metagenomics or cancer genomics, strongly motivates the investigation of disk-based approaches to index next-generation sequencing (NGS) data. Positive results in this direction stimulate the investigation of efficient external memory algorithms for de novo assembly from NGS data. Our article is also motivated by the open problem of designing a space-efficient algorithm to compute a string graph using an indexing procedure based on the Burrows–Wheeler transform (BWT). We have developed a disk-based algorithm for computing string graphs in external memory: the light string graph (LSG). LSG relies on a new representation of the FM-index that is exploited to use an amount of main memory requirement that is independent from the size of the data set. Moreover, we have developed a pipeline for genome assembly from NGS data that integrates LSG with the assembly step of SGA (Simpson and Durbin, 2012), a state-of-the-art string graph-based assembler, and uses BEETL for indexing the input data. LSG is open source software and is available online. We have analyzed our implementation on a 875-million read whole-genome dataset, on which LSG has built the string graph using only 1GB of main memory (reducing the memory occupation by a factor of 50 with respect to SGA), while requiring slightly more than twice the time than SGA. The analysis of the entire pipeline shows an important decrease in memory usage, while managing to have only a moderate increase in the running time.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Assembly scaffolding with PE-contaminated mate-pair libraries

Results: We have addressed PE-contamination in an update to our scaffolder BESST. We formulate the problem as an integer linear program (ILP) which is solved using an efficient heuristic. The new method shows significant improvement over both integrated and stand-alone scaffolders in our experiments. The impact of modeling PE-contamination is quantified by comparing with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in an increased number of misassemblies, more conservative scaffolding, and inflated assembly sizes.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

EPGA2: memory-efficient de novo assembler

"In this article, we present an update algorithm called EPGA2, which applies some new modules and can bring about improved assembly results in small memory. For reducing peak memory in genome assembly, EPGA2 adopts memory-efficient DSK to count K-mers and revised BCALM to construct De Bruijn Graph. Moreover, EPGA2 parallels the step of Contigs Merging and adds Errors Correction in its pipeline. Our experiments demonstrate that all these changes in EPGA2 are more useful for genome assembly.
 

Availability and implementation: EPGA2 is publicly available for download at https://github.com/bioinfomaticsCSU/EPGA2."

Cindy's insight:

Memory-efficient for small bioinformatics lab!

more...
No comment yet.
Scooped by Cindy
Scoop.it!

Removal of redundant contigs from de novo RNA-Seq assemblies via homology search improves accurate detection of differentially expressed genes

Here we describe a method for removing redundant contigs within raw contigs; this method involves a homology search against a gene or protein database. In principal, this method can be used with unsequenced plant genomes that lack a well-developed gene database. Redundant contigs were not removed adequately via either of two existing methods, but our method allowed for removal of all redundant contigs. To our knowledge, this is the first reported improvement in accurate detection of DEGs via comparative RNA-Seq analysis that involved preparation of a non-redundant reference sequence. This method could be used to rapidly and cost-effectively detect useful genes in unsequenced plants.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

MetaQUAST: evaluation of metagenome assemblies

Summary: During the past years we have witnessed the rapid development of new metagenome assembly methods. While there are many benchmark utilities designed for single-genome assemblies, there is no well-recognised evaluation and comparison tool for metagenomic-specific analogues. In this paper we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (1) unknown species content by detecting and downloading reference sequences, (2) huge diversity by giving comprehensive reports for multiple genomes, (3) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets.
 

Availability: http://bioinf.spbau.ru/metaquast

more...
No comment yet.
Scooped by Ed Rybicki
Scoop.it!

Colocalization of Different Influenza Viral RNA Segments in the Cytoplasm before Viral Budding

Colocalization of Different Influenza Viral RNA Segments in the Cytoplasm before Viral Budding | Virology and Bioinformatics from Virology.ca | Scoop.it

Influenza A viruses cause one of the major respiratory infection diseases in humans. The viruses possess a genome consists of eight different RNA segments and the incorporation of all the eight RNA segments is required for the generation of an infectious virus particle. The precise process of how these eight viral RNA segments are co-packaged into progeny virus particles remains undefined due to the limitations of methodology to determine the locations of different vRNA segments in infected cells with single-molecule resolution. In this study, we established an experimental system to examine the localization of different viral RNA segments in an infected cell with high spatial precision. We found that viral RNA belonging to different segments gather together in the cytoplasm which is facilitated by cellular recycling endosomal protein Rab11. Our results supported the idea that eight different viral RNAs likely form a super-complex as they travel to the site for virion incorporation. These findings extend our knowledge on the process of influenza virus genome packaging and suggest a mechanism by which the genome assembly of different viral RNA segments is regulated.

 

Ed Rybicki's insight:

It has always been a bit of a mystery how the 8 different nucleoprotein components of the average influenza A virus come together for assembly - especially, if, as shown here, they assemble and are exported separately from the nucleus.  Yet another example of co-opting of cellular transport and organisation functions for the benefit of a parasite!  Nice use of a sophisticated microscopic technique, too.

more...
No comment yet.
Scooped by Chris Upton + helpers
Scoop.it!

BMC Bioinformatics | Abstract | Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer

The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Genome assembly from synthetic long read clouds

Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR’s underlying short reads, which we refer to as read clouds. This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

MOCAT2: a metagenomic assembly, annotation and profiling framework

Summary: MOCAT2 is a software pipeline for metagenomic sequence assembly and gene prediction with novel features for taxonomic and functional abundance profiling. The automated generation and efficient annotation of non-redundant reference catalogs by propagating pre-computed assignments from 18 databases covering various functional categories allows for fast and comprehensive functional characterization of metagenomes.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads

In this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

VirAmp: a galaxy-based viral genome assembly pipeline

Advances in next generation sequencing make it possible to obtain high-coverage sequence data for large numbers of viral strains in a short time. However, since most bioinformatics tools are developed for command line use, the selection and accessibility of computational tools for genome assembly and variation analysis limits the ability of individual labs to perform further bioinformatics analysis.
Cindy's insight:

Genome assembly pipeline for VIRUS!
Part of the Galaxy project!
I'm learning to use it now!

more...
No comment yet.
Scooped by Cindy
Scoop.it!

Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences « Homolog.us – Bioinformatics

"A few weeks back, we informed readers about Heng Li’s new program “Minimap – A Minimizer-based Mapping Program”. The paper is now available here.
 

Motivation: Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10kbp in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10-15%. Complex and computationally intensive pipelines are required to assemble such reads.
 

Results: We present a new mapper, minimap, and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold C. elegans data in 9 minutes, orders of magnitude faster than the existing pipelines. We also introduce a pairwise read mapping format (PAF) and a graphical fragment assembly format (GFA), and demonstrate the interoperability between ours and current tools.


Availability and implementation: this https URL and this https URL"

more...
No comment yet.
Scooped by Cindy
Scoop.it!

hybridSPAdes: an algorithm for hybrid assembly of short and long reads

Motivation: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost.
 

Results: We describe HYBRIDSPADES algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that HYBRIDSPADES generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads.
 

Availability and implementation: HYBRIDSPADES is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades

Cindy's insight:

You may also want to check out this article from Homolog.us: "Very High Efficient Hybrid Assembler for PacBio Data"
---
http://www.homolog.us/blogs/blog/2014/10/13/very-efficient-hybrid-assembler-for-pacbio-data/

more...
No comment yet.
Scooped by Chris Upton + helpers
Scoop.it!

PLOS Pathogens: Membrane Assembly during the Infection Cycle of the Giant Mimivirus

PLOS Pathogens: Membrane Assembly during the Infection Cycle of the Giant Mimivirus | Virology and Bioinformatics from Virology.ca | Scoop.it

With a particle size comparable to that of small bacteria and a 1.2 Mbp double-strand DNA genome that carries more than 1000 open reading frames, the amoeba-infecting Mimivirus, along with other recently identified members of the Mimiviridae family, are the largest and most complex viruses yet identified. The Mimivirus particle includes an internal membrane that underlies an icosahedral capsid. The assembly mechanism of internal membrane during Mimivirus infection remains unclear, as is the case for other viruses containing internal membranes. By using diverse imaging techniques, we showed that membrane biogenesis is an elaborate process that occurs at the periphery of viral factories generated at the host cytoplasm. This multistage process, which includes the formation of open membrane sheets, enables efficient and continuous assembly of multiple Mimivirus progeny. The membrane biogenesis process suggested here provides novel insights into the assembly of internal viral membranes in general.

  
more...
No comment yet.
Scooped by Chris Upton + helpers
Scoop.it!

BMC Genomics | De novo Assembly of highly diverse viral populations

Extensive genetic diversity in viral populations within infected hosts and the divergence of variants from existing reference genomes impede the analysis of deep viral sequencing data.

more...
No comment yet.