Viruses and Bioinformatics from Virology.uvic.ca
91.9K views | +22 today
Follow
Viruses and Bioinformatics from Virology.uvic.ca
Virus and bioinformatics articles with some microbiology and immunology thrown in for good measure
Your new post is loading...
Your new post is loading...
Scooped by Cindy
Scoop.it!

New releases from NCBI: IgBLAST 1.7.0 and Sequence Viewer 3.21

New releases from NCBI: IgBLAST 1.7.0 and Sequence Viewer 3.21 | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it
IgBLAST 1.7.0 release

A new version of IgBLAST is now available on FTP, with the following new features:

Specify whether overlapping nucleotides at VDJ junctions are allowed in matching V, D, and J genes.
Set a custom J gene mismatch penalty
Report the CDR3 start and stop positions in the sub-region table
Use alignment length instead of percent identity as the tie-breaker for hits with identical blast scores, improving accuracy in the V, D, J gene assignment.
IgBLAST was developed at the NCBI to facilitate the analysis of immunoglobulin and T cell receptor variable domain sequences.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

QuickBLASTP adds pre-processing to BLAST search

QuickBLASTP adds pre-processing to BLAST search | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it
QuickBLASTP, an accelerated version of BLASTP, adds a new pre-processing step to the non-redundant (nr) protein database. In a matter of seconds, QuickBLASTP will find approximately 97% of the database sequences with 70% or more identity to your query and around 98% of the database sequence with 80% or more identity to your query.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

ViPTree: the viral proteomic tree server

ViPTree is a web server provided through GenomeNet to generate viral proteomic trees for classification of viruses based on genome-wide similarities. Users can upload viral genomes sequenced either by genomics or metagenomics. ViPTree generates proteomic trees for the uploaded genomes together with flexibly selected reference viral genomes. ViPTree also serves as a platform to visually investigate genomic alignments and automatically annotated gene functions for the uploaded viral genomes, thus providing virus researchers the first choice for classifying and understanding newly sequenced viral genomes.

ViPTree is freely available at: http://www.genome.jp/viptree.

more...
No comment yet.
Scooped by Cindy
Scoop.it!

EzMap: A Simple Pipeline for Reproducible Analysis of the Human Virome

SUMMARY:
In solid-organ transplant recipients, a delicate balance between immunosuppression and immunocompetence must be achieved, which can be difficult to monitor in real-time. Shotgun sequencing of cell-free DNA (cfDNA) has been recently proposed as a new way to indirectly assess immune function in transplant recipients through analysis of the status of the human virome. To facilitate exploration of the utility of the human virome as an indicator of immune status, and to enable rapid, straightforward analyses by clinicians, we developed a fully-automated computational pipeline, EzMap, for performing metagenomic analysis of the human virome. EzMap combines a number of tools to clean, filter, and subtract WGS reads by mapping to a reference human assembly. The relative abundance of each virus present is estimated using a maximum likelihood approach that accounts for genome size, and results are presented with interactive visualizations and taxonomy-based summaries that enable rapid insights. The pipeline is automated to run on both workstations and computing clusters for all steps. EzMap automates an otherwise tedious and time-consuming protocol and aims to facilitate rapid and reproducible insights from cfDNA.
AVAILABILITY:
EzMap is freely avaliable at https://github.com/dekoning-lab/ezmap.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

PISCES – alignment free RNA-seq quantiation and QC pipeline | RNA-Seq Blog

Matt Shirley -- PISCES: alignment free RNA-seq quantiation and QC pipeline
more...
No comment yet.
Scooped by Chris Upton + helpers
Scoop.it!

A global perspective on bioinformatics training needs

Abstract In the last decade, life-science research has become increasingly data-intensive and computational. Nevertheless, basic bioinformatics and data stewardship are still only rarely taught in life-science degree programmes, creating a widening skills gap that spans educational levels and career roles. To better understand this situation, we ran surveys to determine how the skills dearth is affecting the need for bioinformatics training worldwide. Perhaps unsurprisingly, we found that respondents wanted more short courses to help boost their expertise and confidence in data analysis and interpretation. However, it was evident that most respondents appreciated their need for training only after designing their experiments and collecting their data. This is clearly rather late in the research workflow, and suboptimal from a training perspective, as skills acquired to address a specific need at a particular time are seldom retained, engendering a cycle of low confidence in trainees. To ensure that such skill gaps do not continue to create barriers to the progress of research, we argue that universities should strive to bring their life-science curricula into the digital-data era. Meanwhile, the demand for point-of-need training in bioinformatics and data stewardship will grow. While this situation persists, international groups like GOBLET are increasing their efforts to enlarge the community of trainers and quench the global thirst for bioinformatics training.

more...
No comment yet.
Scooped by Cindy
Scoop.it!

KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis

Summary: Analysis of Next Generation Sequencing (NGS) data requires the processing of large datasets by chaining various tools with complex input and output formats. In order to automate data analysis, we propose to standardize NGS tasks into modular workflows. This simplifies reliable handling and processing of NGS data, and corresponding solutions become substantially more reproducible and easier to maintain. Here, we present a documented, linux-based, toolbox of 42 processing modules that are combined to construct workflows facilitating a variety of tasks such as DNAseq and RNAseq analysis. We also describe important technical extensions. The high throughput executor (HTE) helps to increase the reliability and to reduce manual interventions when processing complex datasets. We also provide a dedicated binary manager that assists users in obtaining the modules’ executables and keeping them up to date. As basis for this actively developed toolbox we use the workflow management software KNIME.
Availability and Implementation: See http://ibisngs.github.io/knime4ngs for nodes and user manual (GPLv3 license)
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Introducing COCOS: codon consequence scanner for annotating reading frame changes induced by stop-lost and frame shift variants 

Introducing COCOS: codon consequence scanner for annotating reading frame changes induced by stop-lost and frame shift variants  | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it
Summary: Reading frame altering genomic variants can impact gene expression levels and the structure of protein products, thus potentially inducing disease phenotypes. Current annotation approaches report the impact of such variants in the context of altered DNA sequence only; attributes of the resulting transcript, reading frame and translated protein product are not reported. To remedy this shortcoming, we present a new genetic annotation approach termed Codon Consequence Scanner (COCOS). Implemented as an Ensembl variant effect predictor (VEP) plugin, COCOS captures amino acid sequence alterations stemming from variants that produce an altered reading frame, such as stop-lost variants and small insertions and deletions (InDels). To highlight its significance, COCOS was applied to data from the 1000 Genomes Project. Transcripts affected by stop-lost variants introduce a median of 15 amino acids, while InDels have a more extensive impact with a median of 66 amino acids being incorporated. Captured sequence alterations are written out in FASTA format and can be further analyzed for impact on the underlying protein structure.
Availability and Implementation: COCOS is available to all users on github: https://github.com/butkiem/COCOS
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Making authentic science accessible—the benefits and challenges of integrating bioinformatics into a high-school science curriculum

Despite the central place held by bioinformatics in modern life sciences and related areas, it has only recently been integrated to a limited extent into high-school teaching and learning programs. Here we describe the assessment of a learning environment entitled ‘Bioinformatics in the Service of Biotechnology’.... Analysis of students’ affective outcomes revealed positive attitudes toward bioinformatics and the learning environment, as well as their perception of the teacher’s role. Insights from this analysis yielded implications and recommendations for curriculum design, classroom enactment, teacher education and research. For example, we recommend teaching bioinformatics in an integrative and comprehensive manner, through an inquiry process, and linking it to the wider science curriculum.

more...
No comment yet.
Scooped by Cindy
Scoop.it!

Collaboration Is Beautiful | Bioinformatics meets graphic design

Collaboration Is Beautiful | Bioinformatics meets graphic design | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it
EMBL-EBI bioinformatics collaborate with the London-based design company Science Practice to discover new scientific perspectives
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses | Abstract

The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques.

We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector.

more...
No comment yet.
Scooped by Cindy
Scoop.it!

UniProt-DAAC: domain architecture alignment and classification, a new method for automatic functional annotation in UniProtKB

Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach.

Availability and implementation: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/.

more...
No comment yet.
Scooped by Cindy
Scoop.it!

Computational clustering for viral reference proteomes

Motivation: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies.

Results: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt’s curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs.

Availability and implementation: http://proteininformationresource.org/rps/viruses/
more...
No comment yet.
Scooped by Cindy
Scoop.it!

MarDRe: efficient MapReduce-based removal of duplicate DNA reads in the cloud | Bioinformatics | Oxford Academic

Summary: This paper presents MarDRe, a de novo cloud-ready duplicate and near-duplicate removal tool that can process single-end and paired-end reads from FASTQ/FASTA datasets. MarDRe takes advantage of the widely adopted MapReduce programming model to fully exploit Big Data technologies on cloud-based infrastructures. Written in Java to maximize cross-platform compatibility, MarDRe is built upon the open-source Apache Hadoop project, the most popular distributed computing framework for scalable Big Data processing. On a 16-node cluster deployed on the Amazon EC2 cloud platform, MarDRe is up to 8.52 times faster than a representative state-of-the-art tool.
Availability and Implementation: Source code in Java and Hadoop as well as a user’s guide are freely available under the GNU GPLv3 license at http://mardre.des.udc.es.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Applying, Evaluating and Refining Bioinformatics Core Competencies (An Update from the Curriculum Task Force of ISCB’s Education Committee)

Applying, Evaluating and Refining Bioinformatics Core Competencies (An Update from the Curriculum Task Force of ISCB’s Education Committee) | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it

more...
No comment yet.
Scooped by Cindy
Scoop.it!

Use of AAScatterPlot tool for monitoring the evolution of hemagglutinin cleavage site in H9 avian viruses

Given genome sequences, the AAScatterPlot tool compacts into a single plot, information about the hydropathy index, Van der Waals volume, chemical property, and occurrence frequency of amino acid residues. The tool also shows the range of residues that could arise from a single point mutation in the genome, which can then be compared against the observed residues to identify mutation constraints. Through this approach, we found that the 2 nd position towards the N-terminus side of the HA PCS (P2 position) avoided hydrophobic residues, whereas the P3 position avoided hydrophilic residues.


AAScatterPlot is available at https://github.com/WhittakerLab/AAScatterPlot.

more...
No comment yet.
Scooped by Cindy
Scoop.it!

UniProt Protein Feature Viewer, a BioJS component

ProtVista is a comprehensive visualization tool for the graphical representation of protein sequence features in the UniProt Knowledgebase, experimental proteomics and variation public datasets. The complexity and relationships in this wealth of data pose a challenge in interpretation. Integrative visualization approaches such as provided by ProtVista are thus essential for researchers to understand the data and, for instance, discover patterns affecting function and disease associations.

more...
No comment yet.
Scooped by Cindy
Scoop.it!

KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies

KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies | Viruses and Bioinformatics from Virology.uvic.ca | Scoop.it
Motivation: De novo assembly of whole genome shotgun (WGS) next-generation sequencing (NGS) data benefits from high-quality input with high coverage. However, in practice, determining the quality and quantity of useful reads quickly and in a reference-free manner is not trivial. Gaining a better understanding of the WGS data, and how that data is utilized by assemblers, provides useful insights that can inform the assembly process and result in better assemblies.
Results: We present the K-mer Analysis Toolkit (KAT): a multi-purpose software toolkit for reference-free quality control (QC) of WGS reads and de novo genome assemblies, primarily via their k-mer frequencies and GC composition. KAT enables users to assess levels of errors, bias and contamination at various stages of the assembly process. In this paper we highlight KAT’s ability to provide valuable insights into assembly composition and quality of genome assemblies through pairwise comparison of k-mers present in both input reads and the assemblies.
Availability and Implementation: KAT is available under the GPLv3 license at: https://github.com/TGAC/KAT.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

ORCAN—a web-based meta-server for real-time detection and functional annotation of orthologs

Summary: ORCAN (ORtholog sCANner) is a web-based meta-server for one-click evolutionary and functional annotation of protein sequences. The server combines information from the most popular orthology-prediction resources, including four tools and four online databases. Functional annotation utilizes five additional comparisons between the query and identified homologs, including: sequence similarity, protein domain architectures, functional motifs, Gene Ontology term assignments and a list of associated articles. Furthermore, the server uses a plurality-based rating system to evaluate the orthology relationships and to rank the reference proteins by their evolutionary and functional relevance to the query. Using a dataset of ∼1 million true yeast orthologs as a sample reference set, we show that combining multiple orthology-prediction tools in ORCAN increases the sensitivity and precision by 1–2 percent points.
Availability and Implementation: The service is available for free at http://www.combio.pl/orcan/.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

phyx: Phylogenetic tools for Unix

Summary: The ease with which phylogenomic data can be generated has drastically escalated the computational burden for even routine phylogenetic investigations. To address this, we present phyx: a collection of programs written in C ++ to explore, manipulate, analyze, and simulate phylogenetic objects (alignments,trees, and MCMC logs). Modelled after Unix/GNU/Linux command line tools, individual programs perform a single task and operate on standard I/O streams that can be piped to quickly and easily form complex analytical pipelines. Because of the stream-centric paradigm, memory requirements are minimized (often only a single tree or sequence in memory at any instance), and hence phyx is capable of efficiently processing very large data sets.
Availability and Implementation: phyx runs on POSIX-compliant operating systems. Source code, installation instructions, documentation, and example files are freely available under the GNU General Public License at https://github.com/FePhyFoFum/phyx
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Can we replace curation with information extraction software?

Can we use programs for automated or semi-automated information extraction from scientific texts as practical alternatives to professional curation? I show that error rates of current information extraction programs are too high to replace professional curation today. Furthermore, current IEP programs extract single narrow slivers of information, such as individual protein interactions; they cannot extract the large breadth of information extracted by professional curators for databases such as EcoCyc. They also cannot arbitrate among conflicting statements in the literature as curators can. Therefore, funding agencies should not hobble the curation efforts of existing databases on the assumption that a problem that has stymied Artificial Intelligence researchers for more than 60 years will be solved tomorrow. Semi-automated extraction techniques appear to have significantly more potential based on a review of recent tools that enhance curator productivity. But a full cost-benefit analysis for these tools is lacking. Without such analysis it is possible to expend significant effort developing information-extraction tools that automate small parts of the overall curation workflow without achieving a significant decrease in curation costs.
more...
Gilbert C FAURE's curator insight, January 13, 4:05 AM
curation is curation and curation?
Scooped by Cindy
Scoop.it!

Multiple sequence alignment modeling: methods and applications

This review provides an overview on the development of Multiple sequence alignment (MSA) methods and their main applications. It is focused on progress made over the past decade. The three first sections review recent algorithmic developments for protein, RNA/DNA and genomic alignments. The fourth section deals with benchmarks and explores the relationship between empirical and simulated data, along with the impact on method developments. The last part of the review gives an overview on available MSA local reliability estimators and their dependence on various algorithmic properties of available methods.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Evolutionary conservation of Ebola virus proteins predicts important functions at residue level

Abstract
MOTIVATION:
The recent outbreak of Ebola virus disease (EVD) resulted in a large number of human deaths. Due to this devastation, the Ebola virus has attracted renewed interest as model for virus evolution. Recent literature on Ebola virus (EBOV) has contributed substantially to our understanding of the underlying genetics and its scope with reference to the 2014 outbreak. But no study yet, has focused on the conservation patterns of EBOV proteins.

RESULTS:
We analyzed the evolution of functional regions of EBOV and highlight the function of conserved residues in protein activities. We apply an array of computational tools to dissect the functions of EBOV proteins in detail, i) protein sequence conservation ii) protein-protein interactome analysis iii) structural modeling iv) kinase prediction. Our results suggest the presence of novel post-translational modifications in EBOV proteins and their role in the modulation of protein functions and protein interactions. Moreover, on the basis of the presence of ATM recognition motifs in all EBOV proteins we postulate a role of DNA damage response pathways and ATM kinase in EVD. The ATM kinase is put forward, for further evaluation, as novel potential therapeutic target.
more...
No comment yet.
Scooped by Cindy
Scoop.it!

Icarus: visualizer for de novo assembly evaluation

Summary: Data visualization plays an increasingly important role in NGS data analysis. With advances in both sequencing and computational technologies, it has become a new bottleneck in genomics studies. Indeed, evaluation of de novo genome assemblies is one of the areas that can benefit from the visualization. However, even though multiple quality assessment methods are now available, existing visualization tools are hardly suitable for this purpose.

Here we present Icarus – a novel genome visualizer for accurate assessment and analysis of genomic draft assemblies, which is based on the tool QUAST. Icarus can be used in studies where a related reference genome is available, as well as for non-model organisms. The tool is available online and as a standalone application.

Availability: http://quast.sf.net/icarus
more...
No comment yet.
Scooped by Cindy
Scoop.it!

ORFanFinder: automated identification of taxonomically restricted orphan genes

Motivation: Orphan genes, also known as ORFans, are newly evolved genes in a genome that enable the organism to adapt to specific living environment. The gene content of every sequenced genome can be classified into different age groups, based on how widely/narrowly a gene’s homologs are distributed in the context of species taxonomy. Those having homologs restricted to organisms of particular taxonomic ranks are classified as taxonomically restricted ORFans.

Results: Implementing this idea, we have developed an open source program named ORFanFinder and a free web server to allow automated classification of a genome’s gene content and identification of ORFans at different taxonomic ranks. ORFanFinder and its web server will contribute to the comparative genomics field by facilitating the study of the origin of new genes and the emergence of lineage-specific traits in both prokaryotes and eukaryotes.

Availability and implementation: http://cys.bios.niu.edu/orfanfinder

more...
No comment yet.