Open Data Sets
626 views | +0 today
Follow
Open Data Sets
Curating open data sets for scientific research
Curated by Claudia Mihai
Your new post is loading...
Your new post is loading...
Scooped by Claudia Mihai
Scoop.it!

100+ Interesting Data Sets for Statistics

100+ Interesting Data Sets for Statistics | Open Data Sets | Scoop.it
Looking for interesting data sets? Here's a list of more than 100 of the best stuff, from dolphin relationships to political campaign donations to death row prisoners.
more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Web of Life: ecological networks database

Web of Life: ecological networks database | Open Data Sets | Scoop.it
Web of Life: ecological networks database. The Web of Life project is developed at Jordi Bascompte's lab (www.bascompte.net), a research group focused on the structure and dynamics of ecological networks. It is supported by an ERC's Advanced Grant of the European Union.
more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Scientists losing data at a rapid rate

Scientists losing data at a rapid rate | Open Data Sets | Scoop.it
Decline can mean 80% of data are unavailable after 20 years.

In their parents' attic, in boxes in the garage, or stored on now-defunct floppy disks — these are just some of the inaccessible places in which scientists have admitted to keeping their old research data. Such practices mean that data are being lost to science at a rapid rate, a study has now found.

The authors of the study, which is published today in Current Biology1, looked for the data behind 516 ecology papers published between 1991 and 2011. The researchers selected studies that involved measuring characteristics associated with the size and form of plants and animals, something that has been done in the same way for decades. By contacting the authors of the papers, they found that, whereas data for almost all studies published just two years ago were still accessible, the chance of them being so fell by 17% per year. Availability dropped to as little as 20% for research from the early 1990s. 

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Commission launches pilot to open up publicly funded research data

Commission launches pilot to open up publicly funded research data | Open Data Sets | Scoop.it

Valuable information produced by researchers in many EU-funded projects will be shared freely as a result of a Pilot on Open Research Data in Horizon 2020. Researchers in projects participating in the pilot are asked to make the underlying data needed to validate the results presented in scientific publications and other scientific information available for use by other researchers, innovative industries and citizens. This will lead to better and more efficient science and improved transparency for citizens and society. It will also contribute to economic growth through open innovation. For 2014-2015, topic areas participating in the Open Research Data Pilot will receive funding of around €3 billion.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Integrated Cancer Drug Discovery Platform

Integrated Cancer Drug Discovery Platform | Open Data Sets | Scoop.it

canSAR is an integrated knowledge-base that brings together multidisciplinary data across biology, chemistry, pharmacology, structural biology, cellular networks and clinical annotations, and applies machine learning approaches to provide drug-discovery useful predictions.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Unlocking the link between brain power, personality and life chances

Unlocking the link between brain power, personality and life chances | Open Data Sets | Scoop.it

Newly released data allows researchers to review cognitive function and personality traits across the whole life course, from ages 16 to 102.

Wave 3 of Understanding Society – the UK Household Longitudinal Study – includes the results of testing the cognitive ability of nearly 50,000 adults and 4,500 children aged 10 to 15. The study integrates 18 years of data from the British Household Panel Survey (BHPS) and Wave 3 provides up to 21 years of evidence about the changing nature of our society, individuals’ circumstances and behaviours.

more...
No comment yet.
Rescooped by Claudia Mihai from Complexity
Scoop.it!

Syria Tracker - crowdmap of reported casualties

Syria Tracker - crowdmap of reported casualties | Open Data Sets | Scoop.it

Syria Tracker is a crowdsourcing effort that has been collecting citizen reports on human rights violations and casualties in Syria, since April 2011. Syria Tracker's ultimate goal is not only to provide the number of the fatalities, but also to preserve the name, location and details of each victim. Whenever possible, each name is linked to a photo or video of each casualty.Syria Tracker providesA continually updated list of eye witness reports from within Syria, often accompanied by media linksAggregate reports including analysis and visualizations of deaths and atrocities in SyriaA stream of content-filtered media from news, social media (Twitter and Facebook) and official sources


Via Hiquda
more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Exploring Open Data Sets

Exploring Open Data Sets | Open Data Sets | Scoop.it

It’s always fascinating to take a look at the data visualizations and in-depth reports widely available on the web. As an aspiring (or active) data scientist, however, one of the best things you can do to learn about a particular field is to get your own hands dirty.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Datasets from Yahoo! Labs

Datasets from Yahoo! Labs | Open Data Sets | Scoop.it

We have various types of data available to share. They are categorized into Ratings, Language, Graph, Advertising and Market Data, Computing Systems and an appendix of other relevant data and resources available via the Yahoo Developer Network.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Referencing: The reuse factor

Referencing: The reuse factor | Open Data Sets | Scoop.it
The reference is not dead — it is exploding to encompass the full spectrum of research outputs from lines of code to video frames, explains Mark Hahnel.
more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

The sierpinski triangle page to end most sierpinski triangle pages

The sierpinski triangle page to end most sierpinski triangle pages | Open Data Sets | Scoop.it
more...
luiy's curator insight, October 17, 2013 9:54 AM
Constructing the Sierpinski triangle

Throughout my years playing around with fractals, the Sierpinski triangle has been a consistent staple. The triangle is named after Wacław Sierpiński and as fractals are wont the pattern appears in many places, so there are many different ways of constructing the triangle on a computer.

All of the methods are fundamentally iterative. The most obvious method is probably the triangle-in-triangle approach. We start with one triangle, and at every step we replace each triangle with 3 subtriangles:

Scooped by Claudia Mihai
Scoop.it!

Rescuing neuroscience from its data deluge

Rescuing neuroscience from its data deluge | Open Data Sets | Scoop.it

From medicalxpress.com -

Before the digital age, neuroscientists got their information in the library like the rest of us. But the field's explosion has created nearly 2 million papers—more data than any researcher can read and absorb in a lifetime.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Met releases 400,000 hi-rez scans for free download

Met releases 400,000 hi-rez scans for free download | Open Data Sets | Scoop.it
Robbo sez, "The Metropolitan Museum of Art has just released almost 400,000 visual works in an online searchable database. The images are high rez (10 megapixels) and free to download.
more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Academic Torrents

Academic Torrents | Open Data Sets | Scoop.it
Welcome to Academic Torrents!Currently making 207.87GB of research data available.

Sharing data is hard. Emails have size limits, and setting up servers is too much work. We've designed a distributed system for sharing enormous datasets - for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.

more...
luiy's curator insight, January 31, 2014 6:02 AM
We are a distributed data repository

The academic torrents network is built for researchers, by researchers. Its distributed peer-to-peer library system automatically replicates your datasets on many servers, so you don't have to worry about managing your own servers or file availability. Everyone who has data becomes a mirror for those data so the system is fault-tolerant.

 What it means for you?

The academic torrents system offers blazing fast download speeds and a site for searching available datasets from various sources. For sharing, distributing datasets on the network means no more setting up file servers, less bandwidth usage, and maximum uptime.

Scooped by Claudia Mihai
Scoop.it!

British Library uploads one million public domain images to the net for remix and reuse

British Library uploads one million public domain images to the net for remix and reuse | Open Data Sets | Scoop.it
The British Library has uploaded one million public domain scans from 17th-19th century books to Flickr! They're embarking on an ambitious programme to crowdsource novel uses and navigation tools for the huge corpus.
more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

LHC plans for open data future

LHC plans for open data future | Open Data Sets | Scoop.it

When the Large Hadron Collider (LHC) is humming along, the data come in a deluge. The four experimental detectors at the facility, based at CERN, Europe’s particle-physics laboratory near Geneva, Switzerland, collect some 25 petabytes of information each year. Storing the data is not a problem: hard drives are cheap and getting cheaper. The challenge is preserving knowledge that is less commonly stored — the software, algorithms and reference plots specific to each experiment.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles

JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles | Open Data Sets | Scoop.it

JASPAR (http://jaspar.genereg.net) is the largest open-access database of matrix-based nucleotide profiles describing the binding preference of transcription factors from multiple species. The fifth major release greatly expands the heart of JASPAR—the JASPAR CORE subcollection, which contains curated, non-redundant profiles—with 135 new curated profiles (74 in vertebrates, 8 in Drosophila melanogaster, 10 in Caenorhabditis elegans and 43 in Arabidopsis thaliana; a 30% increase in total) and 43 older updated profiles (36 in vertebrates, 3 in D. melanogaster and 4 in A. thaliana; a 9% update in total). The new and updated profiles are mainly derived from published chromatin immunoprecipitation-seq experimental datasets. In addition, the web interface has been enhanced with advanced capabilities in browsing, searching and subsetting. Finally, the new JASPAR release is accompanied by a new BioPython package, a new R tool package and a new R/Bioconductor data package to facilitate access for both manual and automated methods.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Iraq Body Count

Iraq Body Count | Open Data Sets | Scoop.it

Iraq Body Count (IBC) records the violent civilian deaths that have resulted from the 2003 military intervention in Iraq. Its public database includes deaths caused by US-led coalition forces and paramilitary or criminal attacks by others.

IBC’s documentary evidence is drawn from crosschecked media reports of violent events leading to the death of civilians, or of bodies being found, and is supplemented by the careful review and integration of hospital, morgue, NGO and official figures

more...
No comment yet.
Rescooped by Claudia Mihai from Data Science
Scoop.it!

Project Tycho - Data for health

Project Tycho - Data for health | Open Data Sets | Scoop.it

After four years of data digitization and processing, the Project Tycho™ Web site provites open access to newly digitized and integrated data from the entire 125 years history of United States weekly nationally notifiable disease surveillance data since 1888. These data can now be used by scientists, decision makers, investors, and the general public for any purpose. The Project Tycho™ aim is to advance the availability and use of public health data for science and decision making in public health, leading to better programs and more efficient control of diseases.

 

The Project Tycho™ data are organized as counts. A count is defined as the number of cases or deaths due to a disease in a specific location and time period. A count is equivalent to a data point. During the 125 year period of weekly disease reporting, the types of reports have been changed regularly, leading to different types of data counts across time. This makes the integration and standardization of these data a complex task. Currently, available data are categorized in three levels based on the type of counts included. Level 1 includes different types of counts that have been standardized into a common format for a specific analysis published recently in the NEJM. Level 2 data only includes counts that have been reported in a common format, e.g. diseases reported for a one week period and without disease subcategories. These data can be used immediately for analysis, includes a wide range of diseases and locations but this level does not include data that have not been standardized yet. Level 3 data include all the different types of counts ever reported. Although this is the most complete data, the large number of different counts requires extensive standardization and various judgment calls before they can be used for analysis.


Via Hiquda
more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Project ranks billions of drug interactions

Project ranks billions of drug interactions | Open Data Sets | Scoop.it

For decades, drug development was mostly a game of trial and error, with brute-force candidate screens throwing up millions more duds than winners. Researchers are now using computers to get a head start. By analysing the chemical structure of a drug, they can see if it is likely to bind to, or ‘dock’ with, a biological target such as a protein. Such algorithms are particularly useful for finding potentially toxic side effects that may come from unintended dockings to structurally similar, but untargeted, proteins.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

OpenCorporates, the open database of the corporate world

OpenCorporates, the open database of the corporate world | Open Data Sets | Scoop.it

OpenCorporates is a database which aims to gather information on all the companies in the world. The database currently offers information on 50 million companies in 65 different jurisdictions.

Information that can be found on OpenCorporates includes a company's incorporation date, its registered addresses and its registry page, as well as a list of directors and officers.

Government data relating to the companies featured on the database is also being imported.

The website also shows corporate groupings, to help you see which other companies are connected to the one you are investigating.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

::.:: Air Transportation Multiplex Datasets ::.::

::.:: Air Transportation Multiplex Datasets ::.:: | Open Data Sets | Scoop.it

Here, you can find the dataset of a multiplex network composed of the airlines operating in Europe. You are free to use the dataset. We kindly ask you to cite the following paper as the source of the data: Emergence of Network Features from Multiplexity  A. Cardillo, J. Gómez-Gardeñes, M. Zanin, M. Romance, D. Papo,  F. del Pozo, S. Boccaletti, Scientific Reports 3, 1344 (2013).

The data contains up to thirty-seven different layers each one corresponding to a different airline. We have also prepared two subsets: one containing only the three biggest mayor airlines and another one similar but containing only low-fare (low-cost) airlines.

more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It

Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It | Open Data Sets | Scoop.it
more...
No comment yet.
Scooped by Claudia Mihai
Scoop.it!

Open data: we need to share research results, even when they are wrong

Open data: we need to share research results, even when they are wrong | Open Data Sets | Scoop.it
There are huge flaws in the way research data is uploaded, says Mark Hahnel, but how far are we from a universal solution?
more...
No comment yet.