Public Datasets - Open Data -
10.0K views | +3 today
Follow
Public Datasets - Open Data -
Your new post is loading...
Your new post is loading...
Scooped by luiy
Scoop.it!

The First Interactive Network and Graph Data #Repository with Interactive Graph Analytics and Visualization | #opendata #SNA

The First Interactive Network and Graph Data #Repository with Interactive Graph Analytics and Visualization | #opendata #SNA | Public Datasets - Open Data - | Scoop.it
The First Interactive Network Data Repository with Real-time Interactive Visualization and Analytics
luiy's insight:

Network Data Repository. Exploratory Analysis & Visualization.

 

A network and graph data repository containing hundreds of real-world networks and benchmark datasets. This large comprehensive collection of network graph data is useful for making significant research findings as well as benchmark data sets for machine learning and network science. All data sets are easily downloaded into a standard consistent format. We also have built a multi-level interactive graph analytics engine that allows for visualizing the structure of networks as well as many global graph statistics and local node level properties. 

more...
No comment yet.
Rescooped by luiy from Geo-visualization
Scoop.it!

Visualizing Publicly Available US Government Data Online | #dataviz #opengov

Visualizing Publicly Available US Government Data Online | #dataviz #opengov | Public Datasets - Open Data - | Scoop.it

Via Nicholas Goubert
luiy's insight:

Brightpoint Consulting recently released a small collection of interactive visualizations based on open, publicly available data from the US government. Characterized by a rather organic graphic design style and color palette, each visualization makes a socially and politically relevant dataset easily accessible.

 

more...
Scooped by luiy
Scoop.it!

Data Repositories - Mother's Milk for Data Scientists | #datasets #opendata

Data Repositories - Mother's Milk for Data Scientists | #datasets #opendata | Public Datasets - Open Data - | Scoop.it
Mothers are life givers, giving the milk of life. While there are so very few analogies so apropos, data is often considered the Mother's Milk of Corporate Valuation. So, as a data scientist, we sh...
luiy's insight:

Here are a few repositories from KDnuggets that are worth taking a look at:

more...
No comment yet.
Rescooped by luiy from Data Visualization & Open data
Scoop.it!

10 Great Places to Find #Datasets for Infographics | #opendata

10 Great Places to Find #Datasets for Infographics | #opendata | Public Datasets - Open Data - | Scoop.it

“Creating an infographic is an excellent way to break down complex information and statistics into an easy-to-follow visual that is designed with your target audience in mind. Infographics have grown in popularity because they are easy to share and a simple tactic for promoting a business. Perhaps you’ve considered incorporating infographics into your content marketing…”


Via massimo facchinetti, Jesse Soininen
luiy's insight:

1. Government Websites
2. Google’s Public Data Directory 
3. Reddit.com 
4. Infochimp 
5. UNData 
6. Visual.ly
7. DataMarket
8. Number Of
9. Gallup Polls
10. Get the Data

more...
No comment yet.
Scooped by luiy
Scoop.it!

#OpenCorporates : How #complex are corporate structures? | #Opendata #dataviz

#OpenCorporates : How #complex are corporate structures? | #Opendata #dataviz | Public Datasets - Open Data - | Scoop.it
luiy's insight:

How complex are international corporate structures?

If you want to understand how complex multinational companies are, consider this:

 

In Hong Kong, there's a company called Goldman Sachs Structured Products (Asia) Limited. It's controlled by another company called Goldman Sachs (Asia) Finance, registered in Mauritius.

 

That's controlled by a company in Hong Kong, which is controlled by a company in New York, which is controlled by a company in Delaware, and that company is controlled by another company in Delaware called GS Holdings (Delaware) L.L.C. 

 

Which itself is a subsidiary of the only Goldman you're likely to have heard of, The Goldman Sachs Group in New York City.

That's only one of hundreds of such chains. All told, Goldman Sachs consists of more than 4000 separate corporate entities all over the world, some of which are around ten layers of control below the New York HQ.

 

Of those companies approximately a third are registered in nations that might be described as tax havens.Indeed, in the world of Goldman Sachs, the Cayman Islands are bigger than South America, and Mauritius is bigger than Africa.

These are maps of the top five banking companies in the US, and are based on publicy available data from the Federal Reserve. Read more about our data on the link at the top left.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Where can I find large #datasets open to the public? | #opendata

Where can I find large #datasets open to the public? | #opendata | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Single datasets and data repositories

http://archive.ics.uci.edu/ml/
http://crawdad.org/
http://data.austintexas.gov
http://data.cityofchicago.org
http://data.govloop.com
http://data.gov.uk/
data.gov.in
http://data.medicare.gov
http://data.seattle.gov
http://data.sfgov.org
http://data.sunlightlabs.com
https://datamarket.azure.com/
http://developer.yahoo.com/geo/g...
http://econ.worldbank.org/datasets
http://en.wikipedia.org/wiki/Wik...
http://factfinder.census.gov/ser...
http://ftp.ncbi.nih.gov/
http://gettingpastgo.socrata.com
http://googleresearch.blogspot.c...
http://books.google.com/ngrams/
http://medihal.archives-ouvertes.fr
http://public.resource.org/
http://rechercheisidore.fr
http://snap.stanford.edu/data/in...
http://timetric.com/public-data/
https://wist.echo.nasa.gov/~wist...
http://www2.jpl.nasa.gov/srtm
http://www.archives.gov/research...
http://www.bls.gov/
http://www.crunchbase.com/
http://www.dartmouthatlas.org/
http://www.data.gov/
http://www.datakc.org
http://dbpedia.org
http://www.delicious.com/jbaldwi...
http://www.faa.gov/data_research/
http://www.factual.com/
http://research.stlouisfed.org/f... ;
http://www.freebase.com/
http://www.google.com/publicdata...
http://www.guardian.co.uk/news/d...
http://www.infochimps.com
http://www.kaggle.com/
http://build.kiva.org/
http://www.nationalarchives.gov....
http://www.nyc.gov/html/datamine...
http://www.ordnancesurvey.co.uk/...
http://www.philwhln.com/how-to-g...
http://www.imdb.com/interfaces
http://imat-relpred.yandex.ru/en...
http://www.dados.gov.pt/pt/catal...
http://knoema.com
http://daten.berlin.de/
http://www.qunb.com
http://databib.org/
http://datacite.org/
http://data.reegle.info/
http://data.wien.gv.at/
http://data.gov.bc.ca
https://pslcdatashop.web.cmu.edu/ (interaction data in learning environments)
http://www.icpsr.umich.edu/icpsrweb/CPES/ - Collaborative Psychiatric Epidemiology Surveys: (A collection of three national surveys focused on each of the major ethnic groups to study psychiatric illnesses and health services use)
http://www.dati.gov.it
http://dati.trentino.it

more...
Yves Mulkers's curator insight, June 3, 2014 1:51 PM

Governmental dataset op to the public, should get you started building your innovative solutions 

Scooped by luiy
Scoop.it!

The Big Data #open source #tools | #bigdata

The Big Data #open source #tools | #bigdata | Public Datasets - Open Data - | Scoop.it
The Big Data open source tools landscape is growing rapidly. Check it out here.
luiy's insight:

There are already so many open source tools related to Big Data. Check out the below figure to find out about the most important open source tools for big data. In the near future we will describe each open source tool in more detail. At the moment you can click the logo’s to open the respective website

more...
No comment yet.
Scooped by luiy
Scoop.it!

Applications and datasets from the open government data | #OpenData #OpenGob #dataviz

Applications and datasets from the open government data | #OpenData #OpenGob #dataviz | Public Datasets - Open Data - | Scoop.it
luiy's insight:

As part of the “Open Government Data Switzerland” project, the Swiss Federal Archives and their project partners are operating a central pilot portal providing access to open data from the Swiss authorities (“open government data” or OGD). The pilot portal was launched on 16 September 2013 and is expected to remain online until the end of 2014. The authorities involved in the project are supplying some of their already accessible data for use in the pilot portal. These include a wide variety of data records, such as Swiss municipal boundaries, population statistics, up-to-date camera images of weather in Switzerland, historical documents and a directory of Swiss literature. 

more...
No comment yet.
Scooped by luiy
Scoop.it!

#Privacy Tools: Opting Out from #DataBrokers | #datasets

#Privacy Tools: Opting Out from #DataBrokers | #datasets | Public Datasets - Open Data - | Scoop.it
Data brokers don't make it easy to see the data they hold about you. Here's what you can do to opt-out.
luiy's insight:

Data brokers have been around forever, selling mailing lists to companies that send junk mail. But in today’s data-saturated economy, data brokers know more information than ever about us, with sometimes disturbing results.


The first spreadsheet below is a list of data brokers who will give you copies of your data. (You can scroll around inside the box below, and you can also download your own copy of the spreadsheet, here.) The second is the list of data brokers from whom I sought to opt-out, with the ones that allowed opt-outs highlighted. (Download that onehere.)

more...
No comment yet.
Rescooped by luiy from Open Data Sets
Scoop.it!

Academic Torrents | #opendata #openscience #datasets

Academic Torrents | #opendata #openscience #datasets | Public Datasets - Open Data - | Scoop.it

Welcome to Academic Torrents!Currently making 207.87GB of research data available.

Sharing data is hard. Emails have size limits, and setting up servers is too much work. We've designed a distributed system for sharing enormous datasets - for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds.


Via Claudia Mihai
luiy's insight:
We are a distributed data repository

The academic torrents network is built for researchers, by researchers. Its distributed peer-to-peer library system automatically replicates your datasets on many servers, so you don't have to worry about managing your own servers or file availability. Everyone who has data becomes a mirror for those data so the system is fault-tolerant.

 What it means for you?

The academic torrents system offers blazing fast download speeds and a site for searching available datasets from various sources. For sharing, distributing datasets on the network means no more setting up file servers, less bandwidth usage, and maximum uptime.

more...
No comment yet.
Scooped by luiy
Scoop.it!

All the Open #Datasets from New York City Visualized in a Single View | #opendata #dataviz

All the Open #Datasets from New York City Visualized in a Single View | #opendata #dataviz | Public Datasets - Open Data - | Scoop.it
luiy's insight:

"Visualizing NYC's Open Data" [chriswhong.com] by self-proclaimed urbanist, map maker and data junkie Chris Wong provides a single view of the more than 1,100 open datasets made available by New York City.

 

The visualization of the "dataset of datasets" consists of a force-directed graph, of which the nodes are colored according to whether the according dataset is a table, chart, map, a file or a user-created view (colored blue).

 

The graph acts as an alternative portal to explore the available data, while demonstrating its scale and diversity.

 

more...
No comment yet.
Scooped by luiy
Scoop.it!

Europeana Labs: 30 million metadata records linking to millions of openly licensed media objects | #opendata #datasets

Europeana Labs: 30 million metadata records linking to millions of openly licensed media objects | #opendata #datasets | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Data

 

Our database contains over 30 million metadata records linking to millions of openly licensed media objects - books, photos, art, artefacts, audio clips and more. We'll be featuring some of our very best content here.

 

Europeana Labs combines rights-cleared images, videos, audio and text files with technical expertise, tools, services and business knowledge.

more...
No comment yet.
Rescooped by luiy from Data is big
Scoop.it!

Mining of Massive Datasets | #datascience #freebook

Mining of Massive Datasets | #datascience #freebook | Public Datasets - Open Data - | Scoop.it

Via ukituki
luiy's insight:

Preface and Table of Content

Chapter 1. Data Mining

Chapter 2. Map-Reduce and the New Software Stack

Chapter 3. Finding Similar Items

Chapter 4. Mining Data Streams

Chapter 5. Link Analysis

Chapter 6. Frequent Itemsets

Chapter 7. Clustering

Chapter 8. Advertising on the Web

Chapter 9. Recommendation Systems

Chapter 10. Mining Social-Network Graphs

Chapter 11. Dimensionality Reduction

Chapter 12. Large-Scale Machine Learning

 

Download Full Book :

http://infolab.stanford.edu/~ullman/mmds/book.pdf

more...
ukituki's curator insight, August 28, 2014 6:22 PM

The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining).

Scooped by luiy
Scoop.it!

The Open Database Of The #Corporate World | #opendata #economy

The Open Database Of The #Corporate World | #opendata #economy | Public Datasets - Open Data - | Scoop.it
Free and Open Company Data on millions of companies and corporations in over 20 countries, including UK, Spain, US, ...
luiy's insight:

What is OpenCorporates?

 

OpenCorporates aims to do a straightforward (though big) thing: have a URL for every company in the world.

 

Is that all?

Well, no useful though that would be, we're also gradually importing government data relating to companies, and trying to match it to specific companies

 

Why do this?

Few parts of the corporate world are limited to a single country, and so the world needs a way of bringing the information together in a single place, and more than that, a place that's accessible to anyone, not just those who subscribe to proprietary datasets. See also the OpenCorporates Principles

 

There are quite a few countries you're missing

We've grown from 3 territories and a few million companies to over 75 jurisdictions and 55 million companies, and are working with the open data community to add more each week.

 

How can we get hold of the data?

We have a new API service, as well as our highly popular Google Refine reconciliation service (seedocumentation), and this allows access to the information as JSON or XML. If you need data in bulk, either for academic research work, for another cool open data project, or commercially, drop us an email atinfo@opencorporates.com.

more...
No comment yet.
Scooped by luiy
Scoop.it!

The GDELT Project: realtime network diagram and database of global human society for open research | #opendata

The GDELT Project: realtime network diagram and database of global human society for open research | #opendata | Public Datasets - Open Data - | Scoop.it
The GDELT Project
luiy's insight:

The GDELT Project is a realtime network diagram and database of global human society for open research Watching The Entire World

GDELT monitors the world's news media from nearly every corner of every country in print, broadcast, and web formats, in over 100 languages, every moment of every day.

 


Global Reach

 

GDELT monitors print, broadcast, and web news media in over 100 languages from across every country in the world to keep continually updated on breaking developments anywhere on the planet. Its historical archives stretch back to January 1, 1979 and update daily (soon to be every 15 minutes). Through its ability to leverage the world's collective news media, GDELT moves beyond the focus of the Western media towards a far more global perspective on what's happening and how the world is feeling about it.

 

 

Querying, Analyzing and Downloading

 

The entire GDELT database is 100% free and open and you can
download the raw datafiles, visualize it using the GDELT Analysis Service, or analyze it at limitless scale with Google BigQuery.

 

more...
No comment yet.
Rescooped by luiy from Social Network Analysis #sna
Scoop.it!

#Bitcoin Transaction Network #Dataset | #datamodel

#Bitcoin Transaction Network #Dataset | #datamodel | Public Datasets - Open Data - | Scoop.it

Click here to edit the title


Via ukituki
luiy's insight:

1.  Overview:

 

Bitcoin (bitcoin.org) is a digital, cryptographically secure currency. Transactions between public-key "addresses" maintained in a distributed, verified public ledger form a transaction network that can be studied by network scientists. This code processes binary-format Bitcoin .dat files generated by the Bitcoin client (bitcoin.org, tested on v0.5.3.1 or lower) into human-readable flat-file formats, retaining all available information. Furthermore, we provide a data model to facilitate storage and querying in a relational database. 

 

 

 

2.  Bitcoin transaction overview:

 

The bitcoin digital currency allows users to securely prove ownership of a portion of coins that cascade through the network as a chain of re-assigned ownershiptransactions over time.A transaction on the bitcoin network is a many-to-many function, executed by a user who has ownership to (potentially many) outputs of previous transactions; the user takes this owned value and writes ownership to (potentially many) output nodes (users, represented by addresses in the network).

more...
No comment yet.
Scooped by luiy
Scoop.it!

Serendipity is an faceted search engine based on #Semantic Web Technologies | #OpenCourse #dataviz

Serendipity is an faceted search engine based on #Semantic Web Technologies | #OpenCourse #dataviz | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Serendipity is an faceted search engine based on Semantic Web Technologies. As an important feature of Serendipity, Serendipity POIs (Points of Interest), allows users visualize OCW Repositories from an dataset based on LInked Data technologies.


Serendipity is sponsored by the research group GICAC from the Universidad Politécnica de Madrid (GICAC-UPM) and the Universidad Técnica Particular de Loja (UTPL) in collaboration with the OCW Institutions. This project aims to improve the searchability and discoverability of open educational content, which will enhance the ability for learners and educators to find and use OCW courses.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Lancement d'une API pour les résultats des élections - #opendata - Montpellier #opengob | #datasets

Lancement d'une API pour les résultats des élections - #opendata - Montpellier #opengob | #datasets | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Une API (interface de programmation) est désormais disponible pour les résultats des élections depuis 1994 (Régionale 1994 jusqu’au Présidentielle 2012).

 

Les données sont extraites une seul fois du fichier opendatahttp://opendata.montpelliernumerique.fr/datastore/VilleMTP_MTP_Elections_2012.csv et intégrées dans une base mongodb.

Le code est disponible sur le dépôt libre sur GitHub (serveur en nodejs) :https://github.com/Bype/apimtnlab

 

Les 57 élections ayant eu lieu depuis 1994, sont disponibles dans un fichier json à cette url, chaque élection à un numéro propre : http://api.mtnlab.org/elections/scrutins/election.json

 

Ensuite un fichier json peut être généré par numéro d’élection (disponible dans le fichier json des 57 élections), par exemple http://api.mtnlab.org/elections/resultat/232 pour le l’élection numéro 232.

 

Les résultats sont aussi disponibles par numéro d’élection avec la géométrie (cartographie) des bureaux de votes. Par exemple http://api.mtnlab.org/elections/resultat/232?geometry=1 pour l’élection numéro 232.

 

La géométrie des bureaux de vote est issue de la donnée open data des bureaux de vote disponible à cette url : http://opendata.montpelliernumerique.fr/Bureaux-de-vote

more...
No comment yet.
Scooped by luiy
Scoop.it!

Mining of Massive Datasets | #bigdata #datascience #freebook

more...
Jay Ratcliff's curator insight, February 20, 2014 11:24 AM

I like this book.  I had a class a Coursera where we used this text.  One of the things it helped me with was the mechanics of clustering and using different ways to measure distance between objects in a euclidean or non-euclidean space.  Plus there is a lot of stuff on Map-Reduce as well.

Rescooped by luiy from Learning English Language
Scoop.it!

#datasets | Datamob: Public data put to good use I #opendata

#datasets | Datamob: Public data put to good use I #opendata | Public Datasets - Open Data - | Scoop.it
Datamob: Public data put to good use.

Via Mariana Soffer
luiy's insight:

Datamob currently lists 227 data sources, 165 apps and 66 resources, which are categorized by 67 tags. It's sort of like this:

more...
Mariana Soffer's curator insight, December 11, 2013 6:52 AM

Datamob aims to show, in a very simple way, how public data sources can be used.

We believe good things happen when governments and public institutions make data available in developer-friendly formats. Things that can help save us from bad government and bad decisions.

We're out to find the good things, and get developers excited about the data.

You can help. Contribute high-quality public data sources and apps. Post a link to the data behind an app. Build an app on top of a data source and post a comment about it.

 

Datamob currently lists 227 data sources, 165 apps and 66 resources, which are categorized by 67 tags. It's sort of like this: