Viruses and Bioinformatics from Virology.uvic.ca
84.8K views | +10 today
Follow
Viruses and Bioinformatics from Virology.uvic.ca
Virus and bioinformatics articles with some microbiology and immunology thrown in for good measure
Your new post is loading...
Your new post is loading...
Scooped by Cindy
Scoop.it!

Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage

In this paper we propose a method and discuss its computational implementation as an integrated tool for the analysis of viral genetic diversity on data generated by high-throughput sequencing. The main motivation for this work is to better understand the genetic diversity of viruses with high rates of nucleotide substitution, as HIV-1 and Influenza. Most methods for viral diversity estimation proposed so far are intended to take benefit of the longer reads produced by some next-generation sequencing platforms in order to estimate a population of haplotypes which represent the diversity of the original population. The method proposed here is custom-made to take advantage of the very low error rate and extremely deep coverage per site, which are the main features of some neglected technologies that have not received much attention due to the short length of its reads, which precludes haplotype estimation. This approach allowed us to avoid some hard problems related to haplotype reconstruction (need of long reads, preliminary error filtering and assembly).
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Combined DECS Analysis and Next-Generation Sequencing Enable Efficient Detection of Novel Plant RNA Viruses

The presence of high molecular weight double-stranded RNA (dsRNA) within plant cells is an indicator of infection with RNA viruses as these possess genomic or replicative dsRNA. DECS (dsRNA isolation, exhaustive amplification, cloning, and sequencing) analysis has been shown to be capable of detecting unknown viruses. We postulated that a combination of DECS analysis and next-generation sequencing (NGS) would improve detection efficiency and usability of the technique. Here, we describe a model case in which we efficiently detected the presumed genome sequence of Blueberry shoestring virus (BSSV), a member of the genus Sobemovirus, which has not so far been reported. dsRNAs were isolated from BSSV-infected blueberry plants using the dsRNA-binding protein, reverse-transcribed, amplified, and sequenced using NGS. A contig of 4,020 nucleotides (nt) that shared similarities with sequences from other Sobemovirus species was obtained as a candidate of the BSSV genomic sequence. Reverse transcription (RT)-PCR primer sets based on sequences from this contig enabled the detection of BSSV in all BSSV-infected plants tested but not in healthy controls. A recombinant protein encoded by the putative coat protein gene was bound by the BSSV-antibody, indicating that the candidate sequence was that of BSSV itself. Our results suggest that a combination of DECS analysis and NGS, designated here as “DECS-C,” is a powerful method for detecting novel plant viruses.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Assembly scaffolding with PE-contaminated mate-pair libraries

Results: We have addressed PE-contamination in an update to our scaffolder BESST. We formulate the problem as an integer linear program (ILP) which is solved using an efficient heuristic. The new method shows significant improvement over both integrated and stand-alone scaffolders in our experiments. The impact of modeling PE-contamination is quantified by comparing with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in an increased number of misassemblies, more conservative scaffolding, and inflated assembly sizes.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

HPG pore: an efficient and scalable framework for nanopore sequencing data

The use of nanopore technologies is expected to spread in the future because they are portable and can sequence long fragments of DNA molecules without prior amplification. The first nanopore sequencer available, the MinION™ from Oxford Nanopore Technologies, is a USB-connected, portable device that allows real-time DNA analysis. 

 

Here we present HPG Pore, a toolkit for exploring and analysing nanopore sequencing data. HPG Pore can run on both individual computers and in the Hadoop distributed computing framework, which allows easy scale-up to manage the large amounts of data expected to result from extensive use of nanopore technologies in the future.

 
 

 

more...
No comment yet.
Scooped by Cindy
Scoop.it!

fqtools: An efficient software suite for modern FASTQ file manipulation

The rapidly increasing data volumes involved in NGS make any dataset manipulation a time-consuming and error-prone process. I have developed fqtools; a fast and reliable FASTQ file manipulation suite that can process the full set of valid FASTQ files, including those with multi-line sequences, whilst identifying invalid files. Fqtools is faster than similar tools, and is designed for use in automatic processing pipelines.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

ParDRe: Faster Parallel Duplicated Reads Removal Tool for Sequencing Studies. - PubMed - NCBI

"Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe, a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of Single-End or Paired-End sequences from fasta or fastq files. It uses a novel bitwise approach to compare the suffixes of DNA strings and employs hybrid MPI/multithreading to reduce runtime on multicore systems. We show that ParDRe is up to 27.29 times faster than Fulcrum (a representative state-of-the-art tool) on a platform with two 8-core Sandy-Bridge processors."

more...
No comment yet.
Scooped by burkesquires
Scoop.it!

POTION: an end-to-end pipeline for positive Darwinian selection detection in genome-scale data through phylogenetic comparison of protein-coding genes

BMC Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work. BMC series - open, inclusive and trusted.
more...
No comment yet.
Scooped by Chris Upton + helpers
Scoop.it!

Ngs lecture

Genome Wide Methodologies and Future Perspectives Brian Krueger, PhD Duke University Center for Human Genome Variation
more...
Christophe Jacquet's curator insight, January 17, 2014 3:00 AM

Many of the presented methodologies, if not all, can also be applied to plants...

Rescooped by Chris Upton + helpers from Cesar Medina
Scoop.it!

#NGS Developments in next generation sequencing – a visualisation

#NGS Developments in next generation sequencing – a visualisation | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it
With this post I present a figure I’ve been working on for a while now. With it, I try to summarise the developments in (next generation) sequencing, or at least a few aspects of it. I’...

Via César Augusto Medina Culma
more...
No comment yet.
Scooped by Chris Upton + helpers
Scoop.it!

The Genomic and Transcriptomic Landscape of a HeLa Cell Line

Chris Upton + helpers's insight:

Methinks, most HeLa cell samples from labs around the world are very different. Doubt all human. HeLa also has reputation for contaminating other stocks of cells.

Will it become routine to sequence genomes of cell cultures being used?

more...
No comment yet.
Scooped by Chris Upton + helpers
Scoop.it!

#NGS Developments in next generation sequencing – a visualisation

#NGS Developments in next generation sequencing – a visualisation | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it
With this post I present a figure I’ve been working on for a while now. With it, I try to summarise the developments in (next generation) sequencing, or at least a few aspects of it. I’...
more...
No comment yet.
Scooped by Cindy
Scoop.it!

LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly

The large amount of short read data that has to be assembled in future applications, such as in metagenomics or cancer genomics, strongly motivates the investigation of disk-based approaches to index next-generation sequencing (NGS) data. Positive results in this direction stimulate the investigation of efficient external memory algorithms for de novo assembly from NGS data. Our article is also motivated by the open problem of designing a space-efficient algorithm to compute a string graph using an indexing procedure based on the Burrows–Wheeler transform (BWT). We have developed a disk-based algorithm for computing string graphs in external memory: the light string graph (LSG). LSG relies on a new representation of the FM-index that is exploited to use an amount of main memory requirement that is independent from the size of the data set. Moreover, we have developed a pipeline for genome assembly from NGS data that integrates LSG with the assembly step of SGA (Simpson and Durbin, 2012), a state-of-the-art string graph-based assembler, and uses BEETL for indexing the input data. LSG is open source software and is available online. We have analyzed our implementation on a 875-million read whole-genome dataset, on which LSG has built the string graph using only 1GB of main memory (reducing the memory occupation by a factor of 50 with respect to SGA), while requiring slightly more than twice the time than SGA. The analysis of the entire pipeline shows an important decrease in memory usage, while managing to have only a moderate increase in the running time.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

HGA: de novo genome assembly method for bacterial genomes using high coverage short sequencing reads

In this paper, we introduce a novel hierarchical genome assembly (HGA) methodology that takes further advantage of such very high coverage by independently assembling disjoint subsets of reads, combining assemblies of the subsets, and finally re-assembling the combined contigs along with the original reads.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

MaGuS: a tool for quality assessment and scaffolding of genome assemblies with Whole Genome Profiling™ Data

We present MaGuS (map-guided scaffolding), a modular tool that uses a draft genome assembly, a Whole Genome Profiling™ (WGP) map, and high-throughput paired-end sequencing data to estimate the quality and to enhance the contiguity of an assembly. We generated several assemblies of the Arabidopsis genome using different scaffolding programs and applied MaGuS to select the best assembly using quality metrics. Then, we used MaGuS to perform map-guided scaffolding to increase contiguity by creating new scaffold links in low-covered and highly repetitive regions where other commonly used scaffolding methods lack consistency.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Identification of Known and Novel Recurrent Viral Sequences in Data from Multiple Patients and Multiple Cancers

Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association to disease. Albeit effective in some cases, the approach fails to detect novel pathogens and remote variants not present in reference databases. We have developed a species independent pipeline that utilises sequence clustering for the identification of nucleotide sequences that co-occur across multiple sequencing data instances. We applied the workflow to 686 sequencing libraries from 252 cancer samples of different cancer and tissue types, 32 non-template controls, and 24 test samples. Recurrent sequences were statistically associated to biological, methodological or technical features with the aim to identify novel pathogens or plausible contaminants that may associate to a particular kit or method. We provide examples of identified inhabitants of the healthy tissue flora as well as experimental contaminants. Unmapped sequences that co-occur with high statistical significance potentially represent the unknown sequence space where novel pathogens can be identified.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

NEAT: a framework for building fully automated NGS pipelines and analyses

In comparison to many publicly available tools including Galaxy, NEAT provides three main advantages: (1) Through the development of double-clickable executables, NEAT is efficient (completes within <24 hours), easy to implement and intuitive; (2) Storage space, maximum number of job submissions, wall time and cluster-specific parameters can be customized as NEAT is run on the institution’s cluster; (3) NEAT allows users to visualize and summarize NGS data rapidly and efficiently using various built-in exploratory data analysis tools including metagenomic and differentially expressed gene analysis.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Partitioning with khmer is not my recommended approach any more

Cindy's insight:

From the very person who invented the khmer -- tool used for digital normalization of sequence reads.

more...
No comment yet.
Rescooped by Chris Upton + helpers from Bioinformatics Training
Scoop.it!

GTPB: IB14S Introductory Bioinformatics, Second course - Home

GTPB: IB14S Introductory Bioinformatics, Second course - Home | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it

Via Pedro Fernandes
more...
Pedro Fernandes's curator insight, December 4, 2014 4:47 AM

This is a 4 day course with minimal pre-requisites that aims at equipping the participants with means to deal with biological sequence data at various levels. In the last day we will focus on NGS data analysis at a basic level.

Scooped by Chris Upton + helpers
Scoop.it!

On the Trail of Ancient Killers

Armed with new methods, researchers are interrogating the DNA of centuries-old pathogens extracted from the bones and teeth of victims.

more...
No comment yet.
Scooped by Chris Upton + helpers
Scoop.it!

PLOS Computational Biology: Shining a Light on Dark Sequencing: Characterising Errors in Ion Torrent PGM Data

PLOS Computational Biology: Shining a Light on Dark Sequencing: Characterising Errors in Ion Torrent PGM Data | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it
PLOS Computational Biology is an open-access
more...
Ed Rybicki's comment, April 12, 2013 9:31 AM
Abstract, dudes, abstract!
Rescooped by Chris Upton + helpers from Next generation sequencing for global infectious disease control
Scoop.it!

Next-generation sequencing technologies... [Brief Funct Genomics. 2013] - PubMed - NCBI


Via Marion Koopmans
more...
No comment yet.
Rescooped by Chris Upton + helpers from The Steep Slope of Bioinformatics!
Scoop.it!

Solving genome puzzles without a picture - Egghead

Solving genome puzzles without a picture - Egghead | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it

Via Neelima Sinha
more...
No comment yet.
Scooped by Chris Upton + helpers
Scoop.it!

# bioinformatics #NGS Tools for mapping high-throughput sequencing data

# bioinformatics #NGS Tools for mapping high-throughput sequencing data | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it

Results: This survey focuses on classifying mappers through a wide number of characteristics. The goal is to allow practitioners to compare the mappers more easily and find those that are most suitable for their specific problem.

more...
No comment yet.