Public Datasets -...
Follow
Find
8.1K views | +0 today
 
Scooped by luiy
onto Public Datasets - Open Data -
Scoop.it!

#OpenData - what the citizens really want

Open Data - what the citizens really want
luiy's insight:

To design the first steps, the City of Berlin performed an Open Data Online Voting. More than 1.300 participants voted for three interesting categories out of a list of 20. A list from “Administration” via  “Education” and “Health” to “Waste” was given. Five categories covered nearly 50 percent of the responders interest. The statistical results of the Survey were compared to a SAS survey about the general interest in Open Data, Transparency and Access from 2010. The Citizens were also asked, if they would read published Open Data and work with Open Data, to create Apps and Visualisation of combined Data Sets. This could be used as a blueprint for other cities.

more...
No comment yet.
Public Datasets - Open Data -
Your new post is loading...
Your new post is loading...
Rescooped by luiy from Vulgarisation et médiation scientifiques
Scoop.it!

L'arbre de l'évolution à l'heure numérique | #dataviz #openscience

L'arbre de l'évolution à l'heure numérique | #dataviz #openscience | Public Datasets - Open Data - | Scoop.it
Quand Charles Darwin rencontre Larry Page Le premier arbre phylogénétique, tel qu'il apparaît en 1859 dans De l'origine des espèces au moyen de la sélection naturelle de Charles...

"Dessine-moi un mouton", demandait le Petit Prince à l'aviateur. La représentation graphique a toujours été vecteur de connaissance, mais aujourd'hui, dans une culture qui abandonne peu à peu l'écrit pour l'image, elle devient une composante essentielle de la transmission du savoir. Un second défi se dresse : comment rendre compte de la masse d'informations disponibles pour décrire fidèlement la complexité du monde dans lequel nous évoluons, et notamment la diversité de la biosphère ? La conjugaison de ces deux ambitions a donné naissance au projet OneZoom (www.onezoom.org), un outil numérique qui permet de visualiser une version numérique de l'arbre de la vie. Une immersion dans la biodiversité, et une belle invitation à la curiosité. (...)  - par Guillaume Frasca, La science infuse, 16/10/2012

 

Source : J. Rosindell et L.J. Harmon, OneZoom: A Fractal Explorer for the Tree of Life, PLoS Biology, 16 octobre 2012.


Via Julien Hering, PhD
luiy's insight:

The authors introduce a new phylogenetic tree viewer that allows interactive display of large trees. The key concept of our solution is that all the data is on one page so that all the user has to do iszoom to reveal it—hence the name OneZoom (http://www.onezoom.org). Our interface is analogous to Google Earth, where one can smoothly zoom into any local landmark from a start page showing the whole globe, recognizing familiar landmarks at different scales along the way (e.g., continents, countries, regions, and towns). Equivalently, OneZoom can zoom smoothly to one tip of the tree of life—say, human beings—passing the familiar clades of animals, vertebrates, mammals, and primates at different scales along the way

more...
No comment yet.
Scooped by luiy
Scoop.it!

The First Interactive Network and Graph Data #Repository with Interactive Graph Analytics and Visualization | #opendata #SNA

The First Interactive Network and Graph Data #Repository with Interactive Graph Analytics and Visualization | #opendata #SNA | Public Datasets - Open Data - | Scoop.it
The First Interactive Network Data Repository with Real-time Interactive Visualization and Analytics
luiy's insight:

Network Data Repository. Exploratory Analysis & Visualization.

 

A network and graph data repository containing hundreds of real-world networks and benchmark datasets. This large comprehensive collection of network graph data is useful for making significant research findings as well as benchmark data sets for machine learning and network science. All data sets are easily downloaded into a standard consistent format. We also have built a multi-level interactive graph analytics engine that allows for visualizing the structure of networks as well as many global graph statistics and local node level properties. 

more...
No comment yet.
Scooped by luiy
Scoop.it!

All the Open #Datasets from New York City Visualized in a Single View | #opendata #dataviz

All the Open #Datasets from New York City Visualized in a Single View | #opendata #dataviz | Public Datasets - Open Data - | Scoop.it
luiy's insight:

"Visualizing NYC's Open Data" [chriswhong.com] by self-proclaimed urbanist, map maker and data junkie Chris Wong provides a single view of the more than 1,100 open datasets made available by New York City.

 

The visualization of the "dataset of datasets" consists of a force-directed graph, of which the nodes are colored according to whether the according dataset is a table, chart, map, a file or a user-created view (colored blue).

 

The graph acts as an alternative portal to explore the available data, while demonstrating its scale and diversity.

 

more...
No comment yet.
Scooped by luiy
Scoop.it!

#DataMining: Practical Machine Learning #Tools and Techniques | #Weka #datascience #openaccess

#DataMining: Practical Machine Learning #Tools and Techniques | #Weka #datascience #openaccess | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Teaching material

 

Slides for Chapters 1-5 of the 3rd edition can be found here.

Slides for Chapters 6-8 of the 3rd edition can be found here

 

These archives contain .pdf files as well as .odp files in Open Document Format that were generated using OpenOffice 2.0. Note that there are several free office programs now that can read .odp files. There is also a plug-in for Word made by Sun for reading this format. Corresponding information is on this Wikipedia page.

more...
No comment yet.
Rescooped by luiy from Politique des algorithmes
Scoop.it!

Google has #open sourced a #tool for inferring cause from correlations | #algorithms #datascience

Google has #open sourced a #tool for inferring cause from correlations | #algorithms #datascience | Public Datasets - Open Data - | Scoop.it
Google open sourced a new package for the R statistical computing software that’s designed to help users infer whether a particular action really did cause subsequent activity. Google has been using the tool, called CausalImpact, to measure AdWords campaigns but it has broader appeal.

Via Dominique Cardon
luiy's insight:

Google announced on Tuesday a new open source tool that can help data analysts decide if changes to products or policies resulted in measurable change, or if the change would have happened anyway. The tool, called CausalImpact, is a package for the R statistical computing software, and Google details it in a blog post.

 

According to blog post author Kay H. Brodersen, Google uses the tool — created it, in fact — primarily for quantifying the effectiveness of AdWords campaigns. However, he noted, the same method could be used to gauge everything from whether adding a new feature caused an increase in app downloads to questions involving events in medical, social or political science.

 

http://google.github.io/CausalImpact/

 

 

more...
No comment yet.
Scooped by luiy
Scoop.it!

Seven Ways to Create a Storymap | #opendata #maps #ddj

Seven Ways to Create a Storymap | #opendata #maps #ddj | Public Datasets - Open Data - | Scoop.it
Evidence is Power
luiy's insight:

The above examples describe a wide range of geographical and geotemporal storytelling models, often based around quite simple data files containing information about individual events. Many of the tools make a strong use of image files as pat of the display. it may be interesting to complete a more detailed review that describes the exact data models used by each of the techniques, with a view to identifying a generic data model that could be used by each of the different models, or transformed into the distinct data representations supported by each of the separate tools.


- See more at: http://schoolofdata.org/2014/08/25/seven-ways-to-create-a-storymap/#sthash.tWi68hgm.dpuf

more...
No comment yet.
Scooped by luiy
Scoop.it!

The #OpenData Research network | #opengov

The #OpenData Research network | #opengov | Public Datasets - Open Data - | Scoop.it
luiy's insight:

The Open Data Research network.

 

Governments, civil society organisations and companies across the world are actively engaging with open data: publishing and using datasets to promote innovation, development and democratic change.

 

The Open Data Research network has been established to connect researchers from across the world working to explore the implementation and impact of open data initiatives. It is a joint project of IDRC and the Web Foundation, and is seeking to develop wider partnerships over the coming year.

 

The network currently hosts the 'Exploring the Emerging Impacts of Open Data in Development Countries (ODDC)' programme.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Digital Ecologies Research Partnership | #openResearch #opendata

Digital Ecologies Research Partnership | #openResearch #opendata | Public Datasets - Open Data - | Scoop.it
The Digital Ecologies Research Partnership (DERP) is a joint initiative to promote academic inquiry into social dynamics of the web.
luiy's insight:

Launched in 2014, the Digital Ecologies Research Partnership (DERP) is a joint initiative by an alliance of community websites to promote open, publicly accessible, and ethical academic inquiry into the vibrant social dynamics of the web.

 

DERP seeks to solve two problems in the academic research space:

 

First, it is difficult for academic researchers to easily obtain data for their work beyond the confines of the largest social media platforms. DERP is a single point of contact for researchers to get in touch with relevant team members across a range of different community sites. We envision that this will lower the friction to investigating these sites in more depth, and broaden the scope of research happening within the academic community.

 

Second, it remains difficult to conduct good cross-platform analyses in academic research. By bringing a number community of sites together under a single cooperative effort, we intend to lower the friction of doing so, as well as better enable the sites themselves to coordinate with one another on supporting researchers.

 

DERP focuses on providing public data to academic researchers while facilitating an active online research community of Fellows. DERP will only support research that respects user privacy, responsibly uses data, and meets IRB approval. All research supported by DERP will be released openly and made publicly available. Partner platforms may also have additional guidelines and privacy commitments that apply to the research they support.

more...
No comment yet.
Scooped by luiy
Scoop.it!

#Opendata, crowdsourcing, and #sharing economy tech take on new roles in disasters

#Opendata, crowdsourcing, and #sharing economy tech take on new roles in disasters | Public Datasets - Open Data - | Scoop.it
Sharing economy services and crowdsourced tech can help aggregate and distribute aid, housing, energy, and transportation to disaster survivors. Airbnb is one company that's stepping up.
luiy's insight:

In a disaster, figuring out what technology you have still works, how long it will work, and who knows how to use it is precious information. The idea of "disaster tech" is a vocative of that notion: what works when nothing else does? Last week, at a forum convened by the White House to share commitments from the private sector and to demonstrate data-driven innovation to helping people and disasters, I was reminded of that need.

 

In a sign of the times, the categories in the program included the "sharing economy" and survivor support, crowdsourcing, open data, and public alerts. Behind the buzzwords, what matters most about any disaster technology is whether it works (from being interoperable with existing systems to being accessible to users), and whether it improves upon existing systems used by first responders and aid workers. If sharing economy services aggregate and distribute the demands of survivors to aid, excess housing, energy, or transport capacity, they can help people in need. These outcomes aren't theoretical.

more...
No comment yet.
Rescooped by luiy from Data Visualization & Open data
Scoop.it!

10 Great Places to Find #Datasets for Infographics | #opendata

10 Great Places to Find #Datasets for Infographics | #opendata | Public Datasets - Open Data - | Scoop.it

“Creating an infographic is an excellent way to break down complex information and statistics into an easy-to-follow visual that is designed with your target audience in mind. Infographics have grown in popularity because they are easy to share and a simple tactic for promoting a business. Perhaps you’ve considered incorporating infographics into your content marketing…”


Via massimo facchinetti, Jesse Soininen
luiy's insight:

1. Government Websites
2. Google’s Public Data Directory 
3. Reddit.com 
4. Infochimp 
5. UNData 
6. Visual.ly
7. DataMarket
8. Number Of
9. Gallup Polls
10. Get the Data

more...
No comment yet.
Scooped by luiy
Scoop.it!

Mapping Data: A guide for making #geodata visualizations | #ddj #methods #tools

Mapping Data: A guide for making #geodata visualizations | #ddj #methods #tools | Public Datasets - Open Data - | Scoop.it
luiy's insight:

As an add-on to our presentation we produced two more things, that some of you out there mind find helpful too:

 

- Mappable Toolset: The number of tools to process data, make maps, interactive visualizations etc. is continuously growing. While we love new tools, this leads to a situation that makes it quite hard to keep an overview of which tools are good for a certain tasks, where to find them and how much they cost. To keep track of the tools we've used so far and as a guide for others we thus collected our toolset. Have a look at it here:English version, German version.


-  Mappable Cheat-Sheet: Making maps and other visualizations with a geospatial component is certainly not a trivial tasks. There are many pitfalls, take alone spatial reference systems as an example, that might completely mess up your visualization if you don't handle them correctly. We thus created a checklist for making geodata visualizations in (data-driven) journalism. You can find it here: English version, German version.

more...
No comment yet.
Scooped by luiy
Scoop.it!

The Open Database Of The #Corporate World | #opendata #economy

The Open Database Of The #Corporate World | #opendata #economy | Public Datasets - Open Data - | Scoop.it
Free and Open Company Data on millions of companies and corporations in over 20 countries, including UK, Spain, US, ...
luiy's insight:

What is OpenCorporates?

 

OpenCorporates aims to do a straightforward (though big) thing: have a URL for every company in the world.

 

Is that all?

Well, no useful though that would be, we're also gradually importing government data relating to companies, and trying to match it to specific companies

 

Why do this?

Few parts of the corporate world are limited to a single country, and so the world needs a way of bringing the information together in a single place, and more than that, a place that's accessible to anyone, not just those who subscribe to proprietary datasets. See also the OpenCorporates Principles

 

There are quite a few countries you're missing

We've grown from 3 territories and a few million companies to over 75 jurisdictions and 55 million companies, and are working with the open data community to add more each week.

 

How can we get hold of the data?

We have a new API service, as well as our highly popular Google Refine reconciliation service (seedocumentation), and this allows access to the information as JSON or XML. If you need data in bulk, either for academic research work, for another cool open data project, or commercially, drop us an email atinfo@opencorporates.com.

more...
No comment yet.
Scooped by luiy
Scoop.it!

ckan - The #OpenSource data #portal software | #opendata #Opengob

ckan - The #OpenSource data #portal software | #opendata #Opengob | Public Datasets - Open Data - | Scoop.it
The open source data portal software
luiy's insight:

CKAN is open source, free software. This means that you can use it without any license fees, but more importantly, when you choose CKAN for your catalog you are also ensuring that you retain all rights to the data and metadata you enter, giving you freedom to move it elsewhere or manipulate it with your own tools without restriction.


There are lots of different open source licenses (you can find them at http://opensource.org) – CKAN is licensed specifically under the terms of the Affero GNU GPL v3.0. One of the strengths of the open source model is in the communities that form around free software products. The CKAN community is no different, and is arguably one of the strongest open data communities in the world. Together, the CKAN community has a wealth of knowledge and expertise that other people using the CKAN software can draw on. The Open Knowledge Foundation draw on and contribute to this rich resource to help us drive CKAN product development. - See more at: http://ckan.org/developers/about-ckan/#sthash.nY6V2GU7.dpuf

 

- See more at: http://ckan.org/#sthash.LMq0zg2B.dpuf

more...
No comment yet.
Scooped by luiy
Scoop.it!

Project #BigData. Expanding on Project C to look at a different use case | #datascience #opendata

Project #BigData. Expanding on Project C to look at a different use case | #datascience #opendata | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Project Big Data is an interactive tool which enables you to visualize and explore the funding patterns of over 600 companies in the Big Data ecosystem! It is based on the work I did for Project C (which you see and can read about here). The list of companies and their classification into categories is based on a dozen published sources and rough text analytics of the Crunchbase database. Crunchbase is a curated crowed sourced database of over 285k companies.

 

As for the data, there are 645 public & private companies in the data set. From Teradata and IBM to Actuate & Zoomdata. I began by harvesting data from Crunchbase by using their free API w/ Python. As of September, Crunchbase had 1250 funding events for 410 of the companies on my list. I've grouped these companies into 18 categories, allowing you to compare peers as well as trends across categories. Some of the categories are broken down further. For example, the tool allows you to differentiate between cloud-based and on premise solutions or SQL vs. NoSQL databases. I gathered additional data from a variety of sources. For example, LinkedIn was used to find the number of employees.

 

 

OPENACCESS Workbook: Project Big Data v1.0 

https://public.tableausoftware.com/download/workbooks/ProjectBigDatav1_0?format=html

 

more...
No comment yet.
Rescooped by luiy from Data is big
Scoop.it!

Mining of Massive Datasets | #datascience #freebook

Mining of Massive Datasets | #datascience #freebook | Public Datasets - Open Data - | Scoop.it

Via ukituki
luiy's insight:

Preface and Table of Content

Chapter 1. Data Mining

Chapter 2. Map-Reduce and the New Software Stack

Chapter 3. Finding Similar Items

Chapter 4. Mining Data Streams

Chapter 5. Link Analysis

Chapter 6. Frequent Itemsets

Chapter 7. Clustering

Chapter 8. Advertising on the Web

Chapter 9. Recommendation Systems

Chapter 10. Mining Social-Network Graphs

Chapter 11. Dimensionality Reduction

Chapter 12. Large-Scale Machine Learning

 

Download Full Book :

http://infolab.stanford.edu/~ullman/mmds/book.pdf

more...
ukituki's curator insight, August 28, 6:22 PM

The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining).

Rescooped by luiy from Geo-visualization
Scoop.it!

Visualizing Publicly Available US Government Data Online | #dataviz #opengov

Visualizing Publicly Available US Government Data Online | #dataviz #opengov | Public Datasets - Open Data - | Scoop.it

Via Nicholas Goubert
luiy's insight:

Brightpoint Consulting recently released a small collection of interactive visualizations based on open, publicly available data from the US government. Characterized by a rather organic graphic design style and color palette, each visualization makes a socially and politically relevant dataset easily accessible.

 

more...
Scooped by luiy
Scoop.it!

Europeana Labs: 30 million metadata records linking to millions of openly licensed media objects | #opendata #datasets

Europeana Labs: 30 million metadata records linking to millions of openly licensed media objects | #opendata #datasets | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Data

 

Our database contains over 30 million metadata records linking to millions of openly licensed media objects - books, photos, art, artefacts, audio clips and more. We'll be featuring some of our very best content here.

 

Europeana Labs combines rights-cleared images, videos, audio and text files with technical expertise, tools, services and business knowledge.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Data Repositories - Mother's Milk for Data Scientists | #datasets #opendata

Data Repositories - Mother's Milk for Data Scientists | #datasets #opendata | Public Datasets - Open Data - | Scoop.it
Mothers are life givers, giving the milk of life. While there are so very few analogies so apropos, data is often considered the Mother's Milk of Corporate Valuation. So, as a data scientist, we sh...
luiy's insight:

Here are a few repositories from KDnuggets that are worth taking a look at:

more...
No comment yet.
Scooped by luiy
Scoop.it!

#OpenData Barometer Data | #opengov #dataviz

#OpenData Barometer Data | #opengov #dataviz | Public Datasets - Open Data - | Scoop.it
luiy's insight:

The Open Data Barometer takes a multidimensional look at the current adoption level of open data policy and practice around the world. Three main categories are considered as part of the barometer:

 

- Readiness - identifies how far a country has in place the political, social and economic foundations for realising the potential benefits of open data. The Barometer covers the readiness of government, entrepreneurs and business, and citizen and civil society.

 

- Implementation - identifies the extent to which government has published a range of key datasets to support innovation, accountability and more improved social policy. The barometer covers 14 datasets split across three clusters to capture datasets commonly used for: securing government accountability; improving social policy; and enabling innovation and economic activity.

 

- Emerging impacts - identifies the extent to which open data has been seen to lead to positive political, social and environment, and economic change. The Barometer looks for political impacts – including transparency & accountability, and improved government efficiency and effectiveness; economic impacts – through supporting start-up entrepreneurs and existing businesses; and social impacts – including environmental impacts, and contributing to greater inclusion for marginalised groups in society.

 

These factors have been combined onto a Radar chart, this represents the countries barometer.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Quantifying Memory: Mapping the #GDELT data in #R (and some Russian protests, too) | #opendata #datascience

Quantifying Memory: Mapping the #GDELT data in #R (and some Russian protests, too) | #opendata #datascience | Public Datasets - Open Data - | Scoop.it
luiy's insight:

The Guardian recently published an article linking to a database of 250 million events. Sounds too good to be true, but as I'm writing a PhD on recent Russian memory events, I was excited to try it out. I downloaded the data, generously made available by Kalev Leetaru of the University of Illinois, and got going. It's a large 650mb zip file (4.6gb uncompressed!), and this is apparently the abbreviated version. Consequently this early stage of the analysis was dominated by eager anticipation, as the Cambridge University internet did its thing.

 

Meanwhile I headed over to David Masad's writeup on getting started with GDELT in python

more...
No comment yet.
Scooped by luiy
Scoop.it!

import.io + Open Refine + Google Fusion Tables = Magic! | #SNA #ddj

import.io + Open Refine + Google Fusion Tables = Magic! | #SNA #ddj | Public Datasets - Open Data - | Scoop.it
luiy's insight:

The University of Ottawa Library holds an employee training week every year, giving colleagues the opportunity to share experiences, skills, and insights with one another. I jumped on this opportunity to showcase import.io as a means of creating datasets from website content. The tutorial I developed demonstrated how to create a dataset from the City of Ottawa’s open data catalogue. It’s a really simple example to get users familiar with the functionality of import.io, an easy way to scrape web content via a simple interface and without having to code. In this post I will also demo how to use Open Refine to clean the data captured by import.io and how to visualize it using Google Fusion Tables.

more...
No comment yet.
Scooped by luiy
Scoop.it!

DiRT Directory : digital research #tools | #openaccess #dh

DiRT Directory : digital research #tools | #openaccess #dh | Public Datasets - Open Data - | Scoop.it
luiy's insight:

The DiRT Directory is a registry of digital research tools for scholarly use. DiRT makes it easy for digital humanists and others conducting digital research to find and compare resources ranging from content management systems to music OCR, statistical analysis packages to mindmapping software.

more...
QLET's curator insight, August 21, 11:32 AM

Great index of digital research tools for the curious contemporary researcher.

Scooped by luiy
Scoop.it!

#Open311 : An #OpenModel Provides Transparency, #Participation, and Collaboration.

#Open311 : An #OpenModel Provides Transparency, #Participation, and Collaboration. | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Open311 technologies use the internet to enable these interactions to be asynchronous and many-to-many. This means that several different people can openly exchange information centered around a single public issue. This open model allows people to provide more actionable information for those who need it most and it encourages the public to be engaged with civic issues because they know their voices are being heard. Yet Open311 isn’t just about this more open internet-enabled model for 311 services, it’s also about making sure the technology itself is open so that 311 services and applications are interoperable and can be used everywhere.

more...
No comment yet.
Scooped by luiy
Scoop.it!

@DemocracyOS will become the operating system of a more #open and #participatory #government

@DemocracyOS will become the operating system of a more #open and #participatory #government | Public Datasets - Open Data - | Scoop.it

Click here to edit the title

luiy's insight:

We are working on a user-friendly, open-source, vote and debate tool, crafted for parliaments, parties and decision-making institutions that will allow citizens to get informed, join the conversation and vote on topics, just how they want their representatives to vote. A tool that will transform the noise we create during protests into a signal that has a clear, direct and strong impact on the political system. Our vision is that DemocracyOS will become the operating system of a more open and participatory government. Live Demo for the City of Buenos Aires.

 

more...
No comment yet.
Scooped by luiy
Scoop.it!

#OpenCorporates : How #complex are corporate structures? | #Opendata #dataviz

#OpenCorporates : How #complex are corporate structures? | #Opendata #dataviz | Public Datasets - Open Data - | Scoop.it
luiy's insight:

How complex are international corporate structures?

If you want to understand how complex multinational companies are, consider this:

 

In Hong Kong, there's a company called Goldman Sachs Structured Products (Asia) Limited. It's controlled by another company called Goldman Sachs (Asia) Finance, registered in Mauritius.

 

That's controlled by a company in Hong Kong, which is controlled by a company in New York, which is controlled by a company in Delaware, and that company is controlled by another company in Delaware called GS Holdings (Delaware) L.L.C. 

 

Which itself is a subsidiary of the only Goldman you're likely to have heard of, The Goldman Sachs Group in New York City.

That's only one of hundreds of such chains. All told, Goldman Sachs consists of more than 4000 separate corporate entities all over the world, some of which are around ten layers of control below the New York HQ.

 

Of those companies approximately a third are registered in nations that might be described as tax havens.Indeed, in the world of Goldman Sachs, the Cayman Islands are bigger than South America, and Mauritius is bigger than Africa.

These are maps of the top five banking companies in the US, and are based on publicy available data from the Federal Reserve. Read more about our data on the link at the top left.

more...
No comment yet.