Bioinformatics Training
21.5K views | +0 today
Follow
Bioinformatics Training
Bioinformatics, blended with education and health sciences subjects
Your new post is loading...
Your new post is loading...
Scooped by Pedro Fernandes
Scoop.it!

OPEN ACCESS e-book on Open Science, Open Data, Open Source

OPEN ACCESS e-book on Open Science, Open Data, Open Source | Bioinformatics Training | Scoop.it
OPEN e-book @ http://osodos.org

The goal of these resources is to give a bird's eye view of the developments in open scientific research. That is, we cover both social developments (e.g. the culture in various communities) as well as technological ones. As such, no part of the contents are especially in-depth or geared towards advanced users of specific practices or tools. Nevertheless, certain sections are more relevant to some people than to others. Specifically:

The most interesting sections for Graduate students will be about navigating the literature, managing evolving projects, and publishing and reviewing.

Lab technicians may derive the most benefit from the sections about capturing data, working with reproducibility in mind and sharing data.

For data scientists, the sections on organizing computational projects as workflows, managing versions of data and source code, open source software development, and data representation will be most relevant.

Principal investigators may be most interested in the sections on data management, data sharing, and coping with evolving projects.

Scientific publishers may be interested to know how scientists navigate the literature, what the expectations are for enhanced publications, and the needs for data publishing.

Science funders and policy makers may easily find value in the capturing data, data management, data sharing and navigating the literature.

Science communicators may be more interested in exploring the content by starting with navigating the literature, working with reproducibility in mind and sharing data.
Pedro Fernandes's insight:
Citation: Rutger Vos, & Pedro Fernandes. (2017, October 17). Pfern/OSODOS v1.0.0. Zenodo. http://doi.org/10.5281/zenodo.1015288
more...
No comment yet.
Scooped by Pedro Fernandes
Scoop.it!

Reproducibility Initiative receives $1.3M grant to validate 50 landmark cancer studies | The Science Exchange Blog

Reproducibility Initiative receives $1.3M grant to validate 50 landmark cancer studies | The Science Exchange Blog | Bioinformatics Training | Scoop.it
$1.3 million grant provided by the Laura and John Arnold Foundation through the Center for Open Science will fund the validation of 50 landmark cancer studies.
Pedro Fernandes's insight:

Congratulations!

more...
No comment yet.
Scooped by Pedro Fernandes
Scoop.it!

GTPB: ARANGS15 Automated and reproducible analysis of NGS data - Home

GTPB: ARANGS15 Automated and reproducible analysis of NGS data - Home | Bioinformatics Training | Scoop.it
Pedro Fernandes's insight:
with Rutger Vos and Darin LondonMay 11th - May 15th 2015Course DescriptionIntroduction

Next generation sequencing (NGS) technologies for DNA have resulted in a yet bigger deluge of data. Researchers are learning that analyzing the data efficiently requires the creation of sophisticated pipelines, typically using commandline tools in a Linux or other Opensource Unix variant compute environment. Many researchers have created these pipelines to successfully analyze their data. Now they are faced with the challenge of making these pipelines available to their colleagues. The issue of reproducibility has emerged as a major issue (TODO REF), as researchers, peer reviewers, and even pharmaceutical companies discover that the software and data used to produce a particular research finding are either not available, poorly documented, or targetted to specific compute infrastructures that are not available to the wider research community. To remedy this, funding agencies and journals are creating policies to promote software reproducibility. In this brief workshop we will establish several best practices of reproducibility in the (comparative) analysis of data obtained by NGS. In doing so we will encounter the commonly used technologies that enable these best practices by working through use casesthat illustrate the underlying principles. Building on the basis of an existing pipeline of commandline utilities, we will illustrate how the entire compute environment used to run the pipeline can be packaged into a unit that can be shared with other researchers such that they can make full use of the environment on their own machines, or on standard cloud compute environments such as amazon or google.

Best practicesCommandline scripting of analysis stepsProvisioning systems to standardize software environment requirementsPackaging of compute environment into static, portable unitsSharing of compute environment packagesTechnologiesNext generation sequencing platformsCommand-line executables, command line scripting and batchingProvisioning Systems: Puppet, DockerfileVirtualization with Virtualbox and VagrantContainerization with DockerTarget audience

This course is aimed at researchers who've developed pipelines to analyze NGS data and now, faced with new reproducibility requirements, would like to learn how to package their analysis pipeline into in a reproducible (and shareable) way. This course will start with a very basic NGS pipeline that runs in a Linux commandline environment, and develop this pipeline into two packages that can be shared with, and used by other researchers. The ideal attendee is a scientist who is already comfortable developing scripted pipelines on the commandline, or who is not afraid to get his/her hands dirty to acquire the computer-literacy skills for dealing with the informatics side of data analysis.

Pre-requisites

The course assumes that attendees are not intimidated by the prospect of gaining experience working on UNIX-like operating systems (including the shell, and shell scripting). Attendees should understand some of the science behind high-throughput DNA sequencing and sequence analysis, as we will not go deeply into underlying theory (or the mechanics of given algorithms, for example) as such. What will be taught are technical solutions for automating and sharing such analyses in shareable, reusable compute environments, which will include (but is not limited to) beginner-level programming, and basic Linux provisioning. General computer literacy, (e.g. editing plain text data files, navigating the command line) will be assumed.

more...
No comment yet.