New media e scienze sociali: un'antropologia digitale
5 views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by Marco Sebastio
Scoop.it!

"Attenti, l'analisi big data può essere fuorviante"

"Attenti, l'analisi big data può essere fuorviante" | New media e scienze sociali: un'antropologia digitale | Scoop.it
L'allarme lanciato da uno studio su Science. L'esperto: il Google Flu Trend ha sovrastimato la prevalenza dell'influenza nella stagione 2012-2013 di oltre il
more...
No comment yet.
Scooped by Marco Sebastio
Scoop.it!

The Parable of Google Flu: Traps in Big Data Analysis

Marco Sebastio's insight:

1) Big data are typically not scientifically calibrated. This goes back to my post last month regarding measurement. This does not make them useless from a scientific point of view, but you do need to build into the analysis that the “measures” of behavior are being affected by unseen things. In this case, the likely culprit was the Google search algorithm, which was modified in various ways that we believe likely to have increased flu related searches.

2) Big data + analytic code used in scientific venues with scientific claims need to be more transparent. This is a tricky issue, because there are both legitimate proprietary interests involved and privacy concerns, but much more can be done in this regard than has been done in the 3 GFT papers. [One of my aspirations over the next year is to work together with big data companies, researchers, and privacy advocates to figure out how this can be done.]

3) It’s about the questions, not the size of the data. In this particular case, one could have done a better job stating the likely flu prevalence today by ignoring GFT altogether and just project 3 week old CDC data to today (better still would have been to combine the two). That is, a synthesis would have been more effective than a pure “big data” approach. I think this is likely the general pattern.

4) More generally, I’d note that there is much more that the academy needs to do. First, the academy needs to build the foundation for collaborations around big data (e.g., secure infrastructures, legal understandings around data sharing, etc). Second, there needs to be MUCH more work done to build bridges between the computer scientists who work on big data and social scientists who think about deriving insights about human behavior from data more generally. We have moved perhaps 5% of the way that we need to in this regard.”

more...
No comment yet.