with Rutger Vos and Darin LondonMay 11th - May 15th 2015Course DescriptionIntroduction
Next generation sequencing (NGS) technologies for DNA have resulted in a yet bigger deluge of data. Researchers are learning that analyzing the data efficiently requires the creation of sophisticated pipelines, typically using commandline tools in a Linux or other Opensource Unix variant compute environment. Many researchers have created these pipelines to successfully analyze their data. Now they are faced with the challenge of making these pipelines available to their colleagues. The issue of reproducibility has emerged as a major issue (TODO REF), as researchers, peer reviewers, and even pharmaceutical companies discover that the software and data used to produce a particular research finding are either not available, poorly documented, or targetted to specific compute infrastructures that are not available to the wider research community. To remedy this, funding agencies and journals are creating policies to promote software reproducibility. In this brief workshop we will establish several best practices of reproducibility in the (comparative) analysis of data obtained by NGS. In doing so we will encounter the commonly used technologies that enable these best practices by working through use casesthat illustrate the underlying principles. Building on the basis of an existing pipeline of commandline utilities, we will illustrate how the entire compute environment used to run the pipeline can be packaged into a unit that can be shared with other researchers such that they can make full use of the environment on their own machines, or on standard cloud compute environments such as amazon or google.
Best practicesCommandline scripting of analysis stepsProvisioning systems to standardize software environment requirementsPackaging of compute environment into static, portable unitsSharing of compute environment packagesTechnologiesNext generation sequencing platformsCommand-line executables, command line scripting and batchingProvisioning Systems: Puppet, DockerfileVirtualization with Virtualbox and VagrantContainerization with DockerTarget audience
This course is aimed at researchers who've developed pipelines to analyze NGS data and now, faced with new reproducibility requirements, would like to learn how to package their analysis pipeline into in a reproducible (and shareable) way. This course will start with a very basic NGS pipeline that runs in a Linux commandline environment, and develop this pipeline into two packages that can be shared with, and used by other researchers. The ideal attendee is a scientist who is already comfortable developing scripted pipelines on the commandline, or who is not afraid to get his/her hands dirty to acquire the computer-literacy skills for dealing with the informatics side of data analysis.
The course assumes that attendees are not intimidated by the prospect of gaining experience working on UNIX-like operating systems (including the shell, and shell scripting). Attendees should understand some of the science behind high-throughput DNA sequencing and sequence analysis, as we will not go deeply into underlying theory (or the mechanics of given algorithms, for example) as such. What will be taught are technical solutions for automating and sharing such analyses in shareable, reusable compute environments, which will include (but is not limited to) beginner-level programming, and basic Linux provisioning. General computer literacy, (e.g. editing plain text data files, navigating the command line) will be assumed.
From left to right: High school teachers Tami Caraballo and Jennifer Duncan-Taylor work with ISB’s Claudia Ludwig, Baliga Lab Education Program Manager, to learn about ocean acidification, cancer cells, and biofuel.
The non-parametric bootstrap was my first love. I was lost in a muddy swamp of zs, ts and ps when I first saw her. Conceptually beautiful, simple to implement, easy to understand (I thought back then, at least).
Nowadays, the study of environmental samples has been developing rapidly. Characterization of the environment composition broadens the knowledge about the relationship between species composition and environmental conditions.
A new scaffolding paper is published in Bioinformatics. MOTIVATION: Next-generation high-throughput sequencing (HTS) has become a state-of-the-art technique in genome assembly. Scaffolding is one of the main stages of the assembly pipeline.
With the Ebola outbreak not yet behind us, global health workers are already scrambling to prevent what could be the next big outbreak of an emerging disease caused by a virus that jumped from animals into humans.
In the evolutionary arms race between microbes, their parasites, and their neighbours, the capacity for rapid protein diversification is a potent weapon. Diversity-generating retroelements (DGRs) use mutagenic reverse transcription and retrohoming to generate myriad variants of a target gene. Originally discovered in pathogens, these retroelements have been identified in bacteria and their viruses, but never in archaea. Here we report the discovery of intact DGRs in two distinct intraterrestrial archaeal systems: a novel virus that appears to infect archaea in the marine subsurface, and, separately, two uncultivated nanoarchaea from the terrestrial subsurface. The viral DGR system targets putative tail fibre ligand-binding domains, potentially generating >1018 protein variants. The two single-cell nanoarchaeal genomes each possess ≥4 distinct DGRs. Against an expected background of low genome-wide mutation rates, these results demonstrate a previously unsuspected potential for rapid, targeted sequence diversification in intraterrestrial archaea and their viruses.
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.