Public Datasets -...
Follow
Find
8.0K views | +1 today
Public Datasets - Open Data -
Your new post is loading...
Your new post is loading...
Rescooped by luiy from Augmented Collective Intelligence
Scoop.it!

Hip-hip-Hadoop: Data mining for science

Hip-hip-Hadoop: Data mining for science | Public Datasets - Open Data - | Scoop.it

The model of distributed calculations, where a problem is broken down into distinct parts that can be solved individually on a computer and then recombined, has been around for decades. But when Google developed the MapReduce algorithm, it added a distinct wrinkle to this method of distributed computing and opened new doors for commercial and scientific endeavors.

Read more at: http://phys.org/news/2013-05-hip-hip-hadoop-science.html#jCphttp://phys.org/news/2013-05-hip-hip-hadoop-science.html


Via Howard Rheingold
luiy's insight:

But when Google developed the MapReduce algorithm, it added a distinct wrinkle to this method of distributed computing and opened new doors for commercial and scientific endeavors.

Apache Hadoop is an open-source software framework that evolved from Google's MapReduce algorithm. Many Internet giants—Facebook, Yahoo, eBay, Twitter—rely on Hadoop to crunch data across thousands of computer servers in order to quickly identify and serve customized data to consumers.


--------------------------------------------------------

 

Training Data Scientists

Deploying a new cluster with important, but largely untested technology for scientists is a great first step. But you also have to identify and build a community to take advantage of these emerging tools. TACC has been a leader in education and outreach to the public, offering training, tutorials and university-level instruction on Hadoop as it relates to high-performance parallel computing.

In Fall 2011 and 2012, Xu introduced Hadoop to students in the Visualization and Data Analysis course he co-teaches in the Division of Statistics and Scientific Computing at the university. In addition, Baldridge and Lease jointly designed a new course, "Data-Intensive Computing for Text Analysis," which was offered in Fall 2011, that involved significant use of TACC's Hadoop resources. Interestingly, the course attracted a multi-disciplinary group with 16 computer science students, four iSchool students, three linguistics students, and two electrical and computer engineering students.

At the end of May 2013, Xu will chair a workshop on Benchmarks, Performance Optimization, and Emerging Hardware of Big Data Systems and Applications in conjunction with 2013 IEEE International Conference on Big Data.

Which of the host of new heterogeneous hardware and software technologies available for high-performance clusters are best suited for data-intensive applications? And how can HPC systems be optimally designed to solve big data problems? These are the questions that TACC's Hadoop R&D seeks to answer.



Read more at: http://phys.org/news/2013-05-hip-hip-hadoop-science.html#jCp

more...
Howard Rheingold's curator insight, May 28, 2013 10:41 PM

Distributed computation and big data meets collective intelligence. Expect this hybrid to develop.

Scooped by luiy
Scoop.it!

Visualizing the TEDx idea network | #SNA

TED Fellows Eric Berlow and Sean Gourley of Quid collaborated with TEDx to visualize the explosion of ideas from the TEDx network through the ideas and theme...
more...
No comment yet.
Rescooped by luiy from The Programmable City
Scoop.it!

If My Data Is an Open Book, Why Can’t I Read It? New York Times | #bigdata #opendata

If My Data Is an Open Book, Why Can’t I Read It? New York Times | #bigdata #opendata | Public Datasets - Open Data - | Scoop.it

Despite all the hoopla about an “open data” society, many consumers are being kept in the dark ... Our mobile carriers know our locations: where our phones travel during working hours and leisure time, where they reside overnight when we sleep. Verizon Wireless even sells demographic profiles of customer groups — including ZIP codes for where they “live, work, shop and more” — to marketers. But when I called my wireless providers, Verizon and T-Mobile, last week in search of data on my comings and goings, call-center agents told me that their companies didn’t share customers’ own location logs with them without a subpoena. .


Via Rob Kitchin
luiy's insight:

“Stock data, bank data, and bond data are all more valuable when they are looked at together,” says Mr. Searls, the author of “The Intention Economy: When Customers Take Charge.” “If I have a choice between apps and one of them shares the data that I can use more easily, I am going to choose that one.”

 

INTEL, for instance, recently introduced a “data economy” project, intended to encourage companies to think of consumers as participants in the information economy, and not just as data-harvesting opportunities. The venture includes a site called WeTheData.com, which looks at current obstacles to information sharing.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Wikipedia Recent Changes Map. #datavis #opendata

Wikipedia Recent Changes Map. #datavis #opendata | Public Datasets - Open Data - | Scoop.it
A map of recent contributions to Wikipedia from unregistered users.
luiy's insight:

When an unregistered user edits Wikipedia, he or she is identified by his or her IP address. These IP addresses are translated to users' approximate geographic location. Unregistered users only make a fraction of total edits -- only 15% of the contributions to English Wikipedia are from unregistered users. Edits by registered users do not have associated IP information, so the map actually represents only a small portion of the total edit activity on Wikipedia.

 

Built using d3, DataMaps, freegeoip.net, and the Wikimedia RecentChanges IRC feed, broadcast through wikimon. Sourceavailable on github.

 

Built by Stephen LaPorte and Mahmoud Hashemi.

more...
No comment yet.
Scooped by luiy
Scoop.it!

LinkedUp Catalogue - Linked Education Cloud. #opendata

luiy's insight:

The Linked Education Cloud is a repository/catalogue of Web datasets relevant to educational applications. It is provided according to the standard of the Web of Data, and is constructed based on input from the LinkedUp Community.

more...
No comment yet.
Scooped by luiy
Scoop.it!

New algorithm maps cancer cells like nodes on a social network #SNA #health

New algorithm maps cancer cells like nodes on a social network #SNA #health | Public Datasets - Open Data - | Scoop.it
A group of researchers from Columbia and Stanford have created a method for turning complex cellular datasets into visualizations that map the similarities between tens of thousands of cells within a tissue sample.
luiy's insight:

The idea of representing large or complex data as a graph is nothing new, but it has taken on more prominence thanks to the rise of social media and those ubiquitous social graphs that map out who’s connected to whom. As we highlighted recently, however, graph analysis is becoming more popular outside the realm of social networks, and is being applied to problems that are more complex than just figuring out simple relationships within a network. In cases such as medical research, especially, graphs can provide a very effective way of seeing how potentially hundreds of thousands of data points spanning perhaps hundreds of variables are similar to each other.

 

That’s exactly what the team at Columbia and Stanford has done with a new algorithm that they’ve demonstrated within the realm of mass cytometry. According to a press release announcing the research (which is available via paid download at Nature Biotechnology):

“The method, called viSNE (visual interactive Stochastic Neighbor Embedding), is based on a sophisticated algorithm that translates high-dimensional data (e.g., a dataset that includes many different simultaneous measurements from single cells) into visual representations similar to two-dimensional ‘scatter plots’ ….

“The viSNE software can analyze measurements of dozens of molecular markers. In the two-dimensional maps that result, the distance between points represents the degree of similarity between single cells. The maps can reveal clearly defined groups of cells with distinct behaviors (e.g., drug resistance) even if they are only a tiny fraction of the total population. This should enable the design of ways to physically isolate and study these cell subpopulations in the laboratory.”

 

I assume they say similar to scatter plots because the algorithm is analyzing data across more than two dimensions, although the resulting chart is essentially the same (i.e., data points with similar characteristics will form clusters).

more...
No comment yet.
Rescooped by luiy from Digital Literacy for my students
Scoop.it!

Libro en libre descarga: Jóvenes en la era de la #hiperconectividad, tendencias, claves, miradas

Libro en libre descarga: Jóvenes en la era de la #hiperconectividad, tendencias, claves, miradas | Public Datasets - Open Data - | Scoop.it

La mirada de padres y profesores parece mayoritariamente prisionera de una visión reactiva que les dificulta articular un pensamiento estratégico de aprovechamiento de las grandes potencialidades –no exentas de riesgos– que, desde un punto de vista cognitivo, emocional, moral y cívico, este nuevo contexto nos aporta. Sabemos que el cambio es inexorable; podemos aprovecharlo o sufrirlo.


Via A Petapouca, Pierre Levy
more...
Dr. Doris Molero's curator insight, May 20, 2013 5:50 AM

Jóvenes en la era de la hiperconectividad, tendencias, clave... 

Rescooped by luiy from Open Knowledge
Scoop.it!

rOpenSci - open source tools for open science #openscience

rOpenSci - open source tools for open science #openscience | Public Datasets - Open Data - | Scoop.it

At rOpenSci we are creating packages that allow access to data repositories through the R statistical programming environment that is already a familiar part of the workflow of many scientists. We hope that our tools will not only facilitate drawing data into an environment where it can readily be manipulated, but also one in which those analyses and methods can be easily shared, replicated, and extended by other researchers. While all the pieces for connecting researchers with these data sources exist as disparate entities, our efforts will provide a unified framework that will be quickly connect researchers to open data.


Via Irina Radchenko
more...
No comment yet.
Scooped by luiy
Scoop.it!

What participants recommend you read before the conference | Governing Algorithms, #algorithms #PersonalData

What participants recommend you read before the conference | Governing Algorithms,  #algorithms #PersonalData | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Bibliography, links, articles and books about : bigdata, algorithms, personal Data, AI, etc, etc

more...
No comment yet.
Rescooped by luiy from wrightmindweb
Scoop.it!

TIME and Space : Stunning Satellite Images of Earth | TIME.com

TIME and Space : Stunning Satellite Images of Earth | TIME.com | Public Datasets - Open Data - | Scoop.it
Exclusive timelapse: See climate change, deforestation and urban sprawl unfold as Earth evolves over 30 years.

Via Beth Dichter, mtmeme
luiy's insight:
TIME and Space | By Jeffrey Kluger

Spacecraft and telescopes are not built by people interested in what’s going on at home. Rockets fly in one direction: up. Telescopes point in one direction: out. Of all the cosmic bodies studied in the long history of astronomy and space travel, the one that got the least attention was the one that ought to matter most to us—Earth.

That changed when NASA created the Landsat program, a series of satellites that would perpetually orbit our planet, looking not out but down. Surveillance spacecraft had done that before, of course, but they paid attention only to military or tactical sites. Landsat was a notable exception, built not for spycraft but for public monitoring of how the human species was altering the surface of the planet. Two generations, eight satellites and millions of pictures later, the space agency, along with the U.S. Geological Survey (USGS), has accumulated a stunning catalog of images that, when riffled through and stitched together, create a high-definition slide show of our rapidly changing Earth. TIME is proud to host the public unveiling of these images from orbit, which for the first time date all the way back to 1984.

Over here is Dubai, growing from sparse desert metropolis to modern, sprawling megalopolis. Over there are the central-pivot irrigation systems turning the sands of Saudi Arabia into an agricultural breadbasket — a surreal green-on-brown polka-dot pattern in the desert. Elsewhere is the bad news: the high-speed retreat of Mendenhall Glacier in Alaska; the West Virginia Mountains decapitated by the mining industry; the denuded forests of the Amazon, cut to stubble by loggers.

more...
Beth Dichter's curator insight, May 12, 2013 10:57 PM

How did this come to be? The Landsat program. “Two generations, eight satellites and millions of pictures later, the space agency, along with the U.S. Geological Survey (USGS), has accumulated a stunning catalog of images that, when riffled through and stitched together, create a high-definition slide show of our rapidly changing Earth. TIME is proud to host the public unveiling of these images from orbit, which for the first time date all the way back to 1984.”

Google has taken these “choppy images” and upgraded them into stunning videos with incredible details (more information on this is at the website). TIME has also created a story that utilizes the videos and text to help understand the story they tell.
* Chapter 1 – Satellite Story

* Chapter 2 – Extreme Resources

* Chapter 3 – Climate Change

* Chapter 4 – Urban Explosion

It is said that a picture is worth a thousand words, and these moving images tell a story that is often hard to understand. If we are interested in learning more about how we have impacted our planet this is a great resource.

Tracy Shaw's curator insight, May 13, 2013 12:07 PM

Incredible images showing not only deforestation, but increase in urban sprawl & vanishing glaciers. 

Darren Smith's curator insight, May 13, 2013 6:38 PM

Wow!

Scooped by luiy
Scoop.it!

MIMIC II Health Databases #bigdata #clinicalData

MIMIC II Health Databases #bigdata #clinicalData | Public Datasets - Open Data - | Scoop.it
luiy's insight:

The MIMIC II (Multiparameter Intelligent Monitoring in Intensive Care) Databases contain physiologic signals and vital signs time series captured from patient monitors, and comprehensive clinical data obtained from hospital medical information systems, for tens of thousands of Intensive Care Unit (ICU) patients*. Data were collected between 2001 and 2008 from a variety of ICUs (medical, surgical, coronary care, and neonatal) in a single tertiary teaching hospital. The MIMIC II Clinical Database contains clinical data from bedside workstations as well as hospital archives. The MIMIC II Waveform Database includes records of continuous high-resolution physiologic waveforms and minute-by-minute numeric time series (trends) of physiologic measurements. Many, but not all, of the Waveform Database records are matched to corresponding Clinical Database records (for more information, see Record Matching). The databases are thoroughly de-identified (all PHI has been removed and all dates have been changed).

Both databases are distributed freely via PhysioNet. There are no restrictions on access to the MIMIC II Waveform Database. Access to the MIMIC II Clinical Database is available to qualified researchers who obtain human subjects training and sign a simple data use agreement (see Getting Access).

more...
No comment yet.
Rescooped by luiy from Open Knowledge
Scoop.it!

Global Roads Open Access Data Set (gROADS), v1: Global Roads | SEDAC

Global Roads Open Access Data Set (gROADS), v1: Global Roads | SEDAC | Public Datasets - Open Data - | Scoop.it

The Global Roads Open Access Data Set, Version 1 (gROADSv1) was developed under the auspices of the CODATA Global Roads Data Development Task Group. The data set combines the best available roads data by country into a global roads coverage, using the UN Spatial Data Infrastructure Transport (UNSDI-T) version 2 as a common data model. All country road networks have been joined topologically at the borders, and many countries have been edited for internal topology. Source data for each country are provided in the documentation, and users are encouraged to refer to the readme file for use constraints that apply to a small number of countries. Because the data are compiled from multiple sources, the date range for road network representations ranges from the 1980s to 2010 depending on the country (most countries have no confirmed date), and spatial accuracy varies. The baseline global data set was compiled by the Information Technology Outreach Services (ITOS) of the University of Georgia. Updated data for 27 countries and 6 smaller geographic entities were assembled by Columbia University's Center for International Earth Science Information Network (CIESIN), with a focus largely on developing countries with the poorest data coverage.


Via Irina Radchenko
more...
No comment yet.
Rescooped by luiy from Excellent Business Blogs
Scoop.it!

Big data and the democratisation of decisions


Via Kenneth Mikkelsen
more...
Kenneth Mikkelsen's curator insight, April 10, 2013 8:14 PM
Great Report from The Economist Intelligence Unit.
Rescooped by luiy from Big Data, Cloud and Social everything
Scoop.it!

Data Intelligence and Analytics Resources

This page contains links to various resources available throughout our network, for analytics practitioners. It is an attempt to add structure to our content.…

Via Pierre Levy
luiy's insight:

1. General Resources

Data Science ApprenticeshipData Science eBookData Science Links (this page) | Share this page on TwitterRSS feeds | Big Data on Google+ | @analyticbridge | AnalyticTalentMost popular blog posts on AnalyticBridgeMost popular blog posts on DataScienceCentralJobs | Books | Training | Competitions | Code Snippets | Conferences
more...
Pierre Levy's curator insight, May 28, 2013 10:27 PM

Rich collection of resources

Scooped by luiy
Scoop.it!

Science as an open enterprise Final report | #openscience #opendata

luiy's insight:

The Science as an open enterprise report highlights the need to grapple with the huge deluge of data created by modern technologies in order to preserve the principle of openness and to exploit data in ways that have the potential to create a second open science revolution.

Exploring massive amounts of data using modern digital technologies has enormous potential for science and its application in public policy and business. The report maps out the changes that are required by scientists, their institutions and those that fund and support science if this potential is to be realised.

Areas for action

Six key areas for action are highlighted in the report:

Scientists need to be more open among themselves and with the public and mediaGreater recognition needs to be given to the value of data gathering, analysis and communicationCommon standards for sharing information are required to make it widely usablePublishing data in a reusable form to support findings must be mandatoryMore experts in managing and supporting the use of digital data are requiredNew software tools need to be developed to analyse the growing amount of data being gathered

 

http://royalsociety.org/policy/projects/science-public-enterprise/report/

more...
No comment yet.
Scooped by luiy
Scoop.it!

Socrata Releases “Open Source Data Server, Community Edition” | #opendata

Socrata Releases “Open Source Data Server, Community Edition” | #opendata | Public Datasets - Open Data - | Scoop.it
Open data platform provider Socrata releases an open source option: “Socrata Open Data Server, Community Edition.” (Socrata Releases “Open Source Data Server, Community Edition”: http://t.co/G5jTBqLPg4...
luiy's insight:

Our Goals for Community Edition

We’re offering an open source product for a number of reasons, all related to accelerating and broadening the growth of open data. We want to:

- Promote data portability throughout the open data ecosystems.

- Support open source software policies in public organizations around the globe.

- Encourage the development of software on top of open data.

In the words of Kevin Merritt, our CEO, “Socrata is investing in an open source product because it will help us accelerate mainstream adoption of the open data cloud model as the de facto enterprise data architecture. We envision a future where the 99 percent of data still locked up in legacy proprietary systems will be open and accessible to the masses.”

 

What “Community Edition” Will Offer

The “Socrata Open Data Server, Community Edition” supports the ongoing development of open data standards in three key areas, all required for a thriving ecosystem:

 

Data Catalog Interoperability – Enables universal federation of different open data catalogs using a standard catalog schema, based on the W3C Data Catalog Vocabulary (DCAT).

 

Data Portability Based on Standard Data Formats – Standardizes outputs including JSON, XML, and CSV, as well as RDF and other Linked Data standards. The goal is to evolve towards standard schemas that developers can use for popular data sets, based on real-world examples and collaboration between data publishers.

 

Application Portability Based on Open Data API Standards – Standardizes the Application Programming Interfaces (APIs) used to programmatically access open data, using established paradigms and protocols such as REST, HTTP, and Structured Query Language (SQL).

more...
No comment yet.
Scooped by luiy
Scoop.it!

LinkedUp: Linking Web Data for Education

LinkedUp: Linking Web Data for Education | Public Datasets - Open Data - | Scoop.it
An EU project about the potential of open data in education
luiy's insight:
About LinkedUp

LinkedUp aims to push forward the exploitation of the vast amounts of public, open data available on the Web, in particular by educational institutions and organizations.

 
more...
No comment yet.
Rescooped by luiy from Linked Data and Semantic Web
Scoop.it!

Veni Competition is inviting submissions | LinkedUp: Linking Web Data for Education

Veni Competition is inviting submissions | LinkedUp: Linking Web Data for Education | Public Datasets - Open Data - | Scoop.it

The LinkedUp Veni Competition has now opened, and is inviting submissions.

 

The first of three consecutive competitions, the Veni Competition is calling for you to submit an innovative and robust prototype or demo that uses linked and/or open data for educational purposes.


Via Irina Radchenko
luiy's insight:
Veni Competition is inviting submissionsMay 22, 2013 in Calls, Challenge, Dissemination

The LinkedUp Veni Competition has now opened, and is inviting submissions.

The first of three consecutive competitions, the Veni Competition is calling for you to submit an innovative and robust prototype or demo that uses linked and/or open data for educational purposes.

The idea is to mash up the vast amount of material and data that can be used in education. You might design a tool for combining datasets and visualising them in interesting new ways, or perhaps develop a search engine that combines search results from different libraries.

If you are looking for further inspiration, please see our use cases, curated from high profile organisations such as the Commonwealth of Learning.

The total prize fund for ‘Veni’ is 5.000 EUR, however these prizes are only one reason to participate. It is also a fantastic opportunity to work with the large, documented repositoryof linked datasets that the LinkedUp team has catalogued and structured, including data about open educational resources, course content and university facilities.

The LinkedUp team will also be providing dedicated support, including technical support with code. A designated developer blog contains information such as ‘cooking recipes’ and ‘how-to-guides’.

Participants will also be able to showcase their ideas and solutions to a wide community of researchers and practitioner, and will provide the opportunity to meet interesting people from across the public and private sector. After a review process invitations will be made to the teams behind the best applications, offering the opportunity to present their submissions at OKCon in Geneva, on 17 September 2013.

More information can be found on the Challenge website.

What are you waiting for? Join the Challenge today!

Deadline for submissions: 27 June 2013

more...
No comment yet.
Rescooped by luiy from Big Data, Cloud and Social everything
Scoop.it!

Intel’s Data Economy Initiative Aims to Help People Capture the Value of Personal Data | #dataawareness #databrokers

Intel’s Data Economy Initiative Aims to Help People Capture the Value of Personal Data | #dataawareness #databrokers | Public Datasets - Open Data - | Scoop.it
The world’s largest chip maker wants to see a new kind of economy bloom around personal data.

Via Pierre Levy
luiy's insight:

Intel Labs, the company’s R&D arm, is launching an initiative around what it calls the “data economy”—how consumers might capture more of the value of their personal information, like digital records of their their location or work history. To make this possible, Intel is funding hackathons to urge developers to explore novel uses of personal data. It has also paid for a rebellious-sounding website called We the Data, featuring raised fists and stories comparing Facebook to Exxon Mobil.

 

Intel’s effort to stir a debate around “your data” is just one example of how some companies—and society more broadly—are grappling with a basic economic asymmetry of the big data age: they’ve got the data, and we don’t.

 

Internet firms like Google and Amazon are concentrating valuable data about consumers at an unprecedented scale as people click around the Web. But regulations and social standards haven’t kept up with the technical and economic shift, creating a widening gap between data haves and have-nots.

 

“As consumers, we have no right to know what companies know about us. As companies, we have few restrictions on what we can do with this data,” says Hilary Mason, chief data scientist at Bit.ly, a social-media company in New York. “Even though people derive value, and companies derive value, it’s totally chaotic who has rights to what, and it’s making people uncomfortable.”

more...
Renato P. dos Santos's curator insight, May 21, 2013 12:32 PM

Intel lança iniciativa p/ q as pessoas se beneficiem financeiramente de seus dados. A maior parte dos app não funciona sem acesso aos dados pessoais de localização, etc. As empresas possuem nossos dados e aceitamos isso em troca do material grátis, personalizações e outras conveniências que obtemos em troca. Mas não há realmente (ainda?) uma "economia de dados", apenas lucro das empresas. Será que o Big Data vai acabar em nossas mãos individuais?

Scooped by luiy
Scoop.it!

GLEAMviz v4.2: new data layers available #dataviz #cartographie #mapping #healthcare

GLEAMviz v4.2: new data layers available #dataviz #cartographie #mapping #healthcare | Public Datasets - Open Data - | Scoop.it
The v4.2 release allows the users to visualize on the geographical mapping the additional data layers regarding population data and healthcare indicators. The 4.2 version also features major...
luiy's insight:

The 4.2 version also features major improvement to the simulation engine and enhancement to the data visualization settings. GLEAMviz is now also a great teaching tool, fully documented with the new manual v4.2.

The  GLEAMviz install package, the new manual and full documentation for the current version 4.2 can be downloaded here. If you already have any GLEAMviz 4.x version you can automatically update the client by

more...
No comment yet.
Rescooped by luiy from Open Government Daily
Scoop.it!

Spanish telecomm regulator launches data visualization tool | European Public Sector Information Platform #opendata

Spanish telecomm regulator launches data visualization tool | European Public Sector Information Platform #opendata | Public Datasets - Open Data - | Scoop.it

Via Ivan Begtin
luiy's insight:

Inspired by EU Digital Agenda Scoreboard tool for indicators' data graphing, the Spanish telecommunications regulator "Comisión del Mercado de las Telecomunicaciones" (CMT) has opened up a space on its own public data portal CMTDATA for data visualization. The aim is to provide visual reporting on telecommunications services and infrastructure in provinces and autonomous regions. The data used for each graph is provided in the annual Sectoral Economic Report, and can also be downloaded in CSV, XLS and PDF file formats.


http://cmtdata.cmt.es/cmtdata/

more...
No comment yet.
Rescooped by luiy from Open Knowledge
Scoop.it!

Open Data and Preservation | The Signal: Digital Preservation

Open Data and Preservation | The Signal: Digital Preservation | Public Datasets - Open Data - | Scoop.it

Yesterday, May 9, 2013, the U.S. government issued an executive order and an open data policy mandating that federal agencies collect and publish new datasets in open, machine-readable, and, whenever possible, non-proprietary formats.  The new policy gives agencies six months to create an inventory of all the government-produced datasets they collect and maintain; a list of datasets that are publicly accessible; and an online system to collect feedback from the public as to how they would like to use the data.  The goals are twofold — greater access to government data for the public, and the availability of data in forms that businesses and researchers can better use.  This builds on the earlier White House Memorandum on Transparency and Open Government.

 

These documents were accompanied by a link to something that actually caught my fancy even more – a greatly expanded Project Open Data Github repository for guidelines, use cases and tools.  This, alongside the ever-growing (and soon to be extensively updated) data.gov, are evidence of real efforts to release more data and make it truly useful and usable.


Via Irina Radchenko
more...
No comment yet.
Rescooped by luiy from Datacenters
Scoop.it!

Big Data et Data Scientists : un point de vue éclairant ...

La valorisation des données de l’entreprise ne saurait se réduire au simple enjeu technologique du Big Data. Pour éviter ce phénomène, l’exploitation de l’information doit être confiée à des Data Scientists capables d’appréhender la problématique dans toutes ses dimensions : métier, informatique, statistique et mathématique.


Via Pascal Hoguet, Lockall
luiy's insight:

L’idée n’est certes pas totalement nouvelle, mais, bénéficiant de l’attention médiatique portée au Big Data, c’est probablement la première fois qu’elle peut être placée de façon aussi centrale et à aussi haut niveau dans le débat. Au-delà du choix d’un outil ou d’une technologie nouvelle, la réflexion sur la place de la Data Science dans l’entreprise lui permet en effet de questionner ses pratiques, ses données et ses méthodes :

 

• Quels sont les processus sur lesquels une analyse fine de l’information permettrait d’être plus efficace (conception produit, marketing produit, relation client, vente en ligne, production opérationnelle, chaîne logistique, maintenance industrielle, gestion du risque financier, pilotage de la marge…) ?

• Quelles sont les données disponibles dans les systèmes d’information de l’entreprise, qui permettraient de répondre plus efficacement aux questions que se posent les pilotes de ces processus, et quelle est la nature de ces données (volume, qualité, variété…) ? À l’inverse, quelles sont les données non stockées dans les systèmes d’information, et qui permettraient pourtant d’enrichir fortement l’analyse métier (données de logs web, données issues de capteurs industriels…) ?

• Quelles sont les méthodes de traitement de l’information (analyse statistique, analyse numérique, intelligence artificielle, machine learning…) qui permettraient effectivement de transformer ces données en des réponses concrètes et opérationnelles aux questions posées par les pilotes de processus ?

Ces analyses et questionnements pourront effectivement conduire certaines entreprises à recourir aux nouvelles solutions technologiques offertes par le Big Data, et ce, notamment si les données à traiter sont volumineuses et peu structurées (texte, voix, image…), et si les processus métier qu’elles doivent éclairer imposent des exigences de traitement rapide (voire en temps réel).

 

Embauchez une équipe de Data Scientists

Conduire efficacement un projet Big Data est une tâche complexe, portant sur un périmètre vaste et touchant de très nombreux acteurs ; elle ne peut dès lors être confiée qu’à une figure triplement compétente : le Data Scientist. "Data Scientist" est un terme qui cristallise une série d’évolutions dans la pratique des "professionnels de l’exploitation de données" confrontés :

 

• D’une part à la progression rapide des moyens informatiques – hardware et software – mis à leur disposition,

 

• D’autre part au rôle grandissant que leur reconnaissent les départements opérationnels dans les prises de décision métier.

Dans ce contexte, le Data Scientist apparaît comme une synthèse de différentes compétences essentielles pour le projet Big Data, mais difficiles à réunir dans l’entreprise :

 

• Le Data Scientist comme mathématicien : parce qu’il choisit, adapte et applique des approches issues de domaines variés de la statistique et de l’intelligence artificielle pour extraire la valeur des données qu’il manipule, le Data Scientist doit être un mathématicien à même d’évaluer et de comparer différents modèles ou méthodes de calcul, d’en anticiper les avantages et les inconvénients, et de les exploiter en connaissance de cause dans un environnement très métier.

 

• Le Data Scientist comme informaticien : qu’il s’agisse d’extraire les données pertinentes des systèmes d’information, de programmer les algorithmes qui lui permettront de les traiter, ou d’aider à concevoir les plateformes qui faciliteront l’exploitation rapide des résultats obtenus, le Data Scientist se doit de maîtriser les langages de programmation et les environnements technologiques (en particulier ceux du Big Data) adaptés aux différents cas qu’il pourra rencontrer.

 

• Le Data Scientist comme expert métier : parce que ses analyses doivent être menées dans une logique de recherche d’efficacité et de rentabilité de l’entreprise, le Data Scientist doit nourrir un dialogue métier avec les pilotes de processus qu’il accompagne, et doit par ailleurs être force de proposition sur les stratégies à mettre en œuvre ou tactiques à adopter, eu égard aux enseignements qu’il tire de ses analyses. À ce titre, il ne peut être un simple expert technique, mais doit garder les yeux grands ouverts sur les enjeux Business de ses travaux.

 

Pour les décideurs désireux de ne pas manquer le virage du Big Data, le recrutement (ou l’identification dans les équipes en place), d’un ou plusieurs Data Scientists, semble donc être une condition incontournable du succès.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Our Research | Data Sets | Intel Science & Technology Center for Big Data

Our Research | Data Sets | Intel Science & Technology Center for Big Data | Public Datasets - Open Data - | Scoop.it
luiy's insight:
Data Sets

A World of Geo-coded Tweets (Web/Social Media Data Analysis)

A data set that includes links to PostgresSQL dump files containing nearly all geo-tagged Tweets and associated metadata for the whole world, along with detailed instructions for restoring this data into a working database. The data is currently being used as input into MapD (Massively Parallel Database), which uses multiple GPUs to run SQL queries as well as render point and heat maps on the data in real time.

 

MIMIC II (Health Care)

Data from hospital ICU information systems, hospital archives and other external data sources. Created as part of a Bioengineering Research Partnership involving an interdisciplinary team from academia (MIT), industry (Philips Medical Systems) and clinical medicine (Beth Israel Deaconess Medical Center), with the goal of developing and evaluating advanced ICU patient monitoring systems that will substantially improve the efficiency, accuracy and timeliness of clinical decision-making in intensive care.

 

MODIS (Telescope/Satellite Imagery)

For our EarthDB project, we’re assembling a year’s worth of satellite imagery (from NASA’s MODIS instrument) for the whole globe.  Our goal is to develop new tools for creating derived data products from the raw data, rather than requiring scientists to use existing data exports that NASA provides. We are using the Level 1B NASA data (the lowest level of raw data that is geo-referenced), which lives here.  We use raw data at three spatial resolutions, available in sub-directories of the above link:  MOD021KM is 1km resolution, MOD02HKM is 500m resolution, and MOD02QKM is 250m resolution. We also use the MOD03 metadata (also available in a sub-directory), and metadata from here to discriminate between data acquired in daytime and nighttime.

 

University of Washington CoAddition Testing Use-Case (Astronomy/Telescope Imagery)

The Large Synoptic Survey Telescope (LSST) is a large-scale, multi-organization initiative to build a new telescope and use it to continuously survey the entire visible sky. The LSST will generate tens of TB of telescope images every night. The planned survey will cover more sky with more visits than any survey before. The novelty of the project means that no current dataset can exercise the full complexity of the data expected from the LSST. For this reason, before the telescope produces its first images in a few years, astronomers are testing their data analysis pipelines, storage techniques, and data exploration using realistic but simulated images. More information on the simulation process can be found in this paper.  This use-case provides a set of such simulated LSST images (approximately 1TB in binary format) and presents a simple but fundamental type of processing that needs to be performed on these images.

more...
No comment yet.
Scooped by luiy
Scoop.it!

The FTC says sneaky data brokers may illegally sell your data #dataawareness | Digital Trends

The FTC says sneaky data brokers may illegally sell your data #dataawareness  | Digital Trends | Public Datasets - Open Data - | Scoop.it
Some data brokers are up to no good. What you need to know about your data.
luiy's insight:

Absolutely nothing good. The FTC’s actions here are a necessary step, but more stringent action is required to actually curb the data brokerage industry. In an ideal world, these letters would absolutely require companies like Brokers Data and U.S. Data Corporation to stop offering consumer information to use in insurance decisions – which is what the FTC says they appear to be doing. And the letters would also make the six companies that appear to offer your information to employers stop doing that. These letters don’t legally compel these businesses, though, so they may take their chances until the FTC actually slams them with a fine. 

 

After all, if they don’t end up getting penalized, the rewards for being in the data brokerage industry are too great. It’s obvious that companies are hungry to use data to inform their marketing campaigns. Data brokerage is booming.  That’s why Facebook keeps tightening alliances with three of the largest data brokers in the U.S., Datalogix, Acxiom, and Epsilon. A small consolation is that none of the Facebook-affiliated data brokers received a warning letter. But that doesn’t mean they won’t in the future. 

The FTC has the power to fine data brokers who violate the FCRA, and it has in the past. Social media aggregator Spokeo settled with an $800,000 fine for selling information that violated the FCRA. And the FTC isn’t shy about issuing larger fines to big companies that run afoul of privacy regulations, as its $22.5 million penalty settlement with Google illustrates. 

So what can be done? Harsh fines are a good first step, but considering the vast potential for profit data brokers have, legislation will probably be more crucial to actually changing the brokerage industry. As Digital Trends’ Andrew Couts wrote earlier this year, “we need laws that empower consumers in the face of big data.” Couts suggested laws like California’s recently proposed bill, “The Right to Know Act 2013,” which would require companies to give up a year’s worth of personal data to people who wanted to see what they’ve collected. 

 

After all, the idea of collecting data that’s been voluntarily thrown into the digital ether isn’t an inherently evil pursuit – it’s more the fact that we aren’t told when and where our data is collected, and that most people don’t realize the extreme reach of these companies, that’s so troubling. At least the FTC warnings help people learn about which companies to investigate and look into ways to opt out of their specific programs. 



Read more: http://www.digitaltrends.com/social-media/watch-out-the-ftc-warns-sneaky-data-brokers-who-may-illegally-sell-your-data/#ixzz2Spohsdh2 ;
Follow us: @digitaltrends on Twitter | digitaltrendsftw on Facebook

more...
No comment yet.