Soccer Embraces Big Data to Quantify the Beautiful Game - Wired
Soccer Embraces Big Data to Quantify the Beautiful GameWiredPortland Timbers' Andrew Jean-Baptiste vies for the ball with New England Revolution's Blake Brettschneider (23) during a Major League Soccer game in March.
Reinventing Society In The Wake Of Big Data
Apache Drill Proposal
Where to start with text mining.
This post is less a coherent argument than an outline of discussion topics I’m proposing for a workshop at NASSR2012 (a conference of Romanticists). But I’m putting this on the blog sin...
Twitter text mining and sentiment analysis
Update: An expanded version of this tutorial will appear in the new Elsevier book Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications by Gary Miner et. al which...
Rooting out rumors, epidemics, and crime – with math
How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did
Target has perfected the technique of analyzing consumers' shopping habits to figure out who's pregnant. How can they send customers congratulatory coupons without freaking them out?
Titan Provides Real-Time Big Graph Data
Titan is an Apache 2 licensed, distributed graph database capable of supporting tens of thousands of concurrent users reading and writing to a single massive-scale graph. In order to substantiate t...
Trident: a high-level abstraction for realtime computation
Big Data Advances in Customer Experience Management
Data Mining with Random Forests
Edge Prediction in a Social Graph: My Solution to Facebook's User Recommendation Contest on Kaggl...
A couple weeks ago, Facebook launched a link prediction contest on Kaggle, with the goal of recommending missing edges in a social graph.
Datablog: Can you predict who will love a song?
Data science communities teamed up with EMI to find out how accurately you can predict someone's opinion of a song based on a handful of details about their general musical taste...
How Google and Facebook are using R « Dataspora
Kaggle’s algorithms show machines are getting too good at judging humans
Kaggle, a San Francisco-based startup that hosts data science competitions, has uncovered some disconcerting insights about human behavior in its two-year run. At times, its founders have been sur...
‘emoto’: Visualising the emotional response to London 2012
Detecting DDoS Attacks with Hadoop
Summary of Euro 2012
To summary the analysis performed during Euro 2012, we prepared graphic with the most interesting results.
Number of tweets is highest during the games, and
after that it drops significantly.
Data mining for network security and intrusion detection
In preparation for “Haxogreen” hackers summer camp which takes place in Luxembourg, I was exploring network security world. My motivation was to find out how data mining is applicable to network security and intrusion detection.
Premier Emotions League: title fight Twitter visualization in R
Below we present use case of how IBM Netezza 1000 and statistical package R may be used for exploration of data from social media.
Revolutions: How Google uses R to make online advertising more effective
At JSM 2011 today, three Google employees (amongst the more than 20 Google delegates there) gave a little insight into how statistical analysis with R yields better results for companies using Google's various advertising products.
Applications of R at Google
At a talk I saw at the useR!2012 conference last month, Googler Karl Millar estimated that there are at least 200 active R users at Google, plus another 300+ occasional users participating in Google's internal R support list.
Two Years Til Algorithms Write News Articles, Say Software Developers
How Physical Stores Can Apply Big Data Like An Amazon.com | SiliconANGLE
Yahoo's "Genome" Uses Big Data To Decode Which Commercials You're Bred To Click On
Yahoo's newly launched Genome platform lets advertisers crunch amazingly specific web information on millions of Americans. Is big data the future of web advertising?
Hadoop: My Experience with Cloudera and MapR
A few months back we started to endeavor on a new Hadoop cluster @ medialets
We have been live with Hadoop in production since April 2010 and we
are still...
‘Good uni: Quality nightlife’. How harvesting tweets opens up a new world of valuable qualitative...
Where will you be this time tomorrow? Smartphone data can guess within 20 metres
Smartphone data from 200 volunteers correctly predicted where users would be 24 hours later, often within 20 metres.
Robo-Pricing
There was something of a kerfuffle recently when it became public knowledge that travel website Orbitz were recommending different price ranges of hotels based on the user's operating system.
Because Hadoop isn't perfect: 8 ways to replace HDFS
Hadoop is on its way to becomig the de facto platform for the next-generation of data-based applications, but it's not without some flaws.
Everything you need to know about Machine Learning in 30 minutes or less hilarymason.com
Natural Language Processing with Hadoop and Python
Cloudera offers enterprises a powerful new data platform built on the popular Apache Hadoop open-source software package.
How iTunes Genius Really Works - Technology Review
An Apple engineer discloses how the company's premier recommendation engine parses millions of iTunes libraries.
Hama - a Bulk Synchronous Parallel computing framework on top of Hadoop
Classification and Prediction Using Neural Networks
Building High-level Features Using Large Scale Unsupervised Learning
The Data Science Interview: Edwin Chen, Twitter
I don’t do pure research—my analysis enables real-world functionality Currently mining terabytes of tweets as a data scientist with Twitter, Edwin Chen studied math and linguistics at MIT and then ...
“Data Science for Live Music”
Presentation by Phil Cowans Chief Architect @Songkick at Data Science London 28/05/12...
When Hadoop isn’t fast enough: The Argument for Storm
Big Data is a Big Deal, and Hadoop is arguably the driving force in Big Data. But as awesome as Hadoop is – and it is quite awesome – it’s incomplete. For many things, Hadoop&#8...
Getting Started With Python for Data Science
Is Machine Learning Losing Impact?
Large-Scale Machine Learning at Twitter
NaiveBayes Classifiers 101
Hadoop Beyond MapReduce, Part 1: Introducing Kitten
Cloudera offers enterprises a powerful new data platform built on the popular Apache Hadoop open-source software package.
Demographic Prediction Based on User’s Browsing Behavior
Paper on predicting the user demographic based on browsing behavior
The State of Online Music Discovery
Choosing music that someone else would like is more complex than suggesting toaster ovens or even movies. The reasons we like a song are highly subjective and can hinge on very specific, sometimes subtle characteristics.
Red Hat’s Data Grid 6 Challenges Hadoop on Big Data
Red Hat’s addition of Data Grid 6 to its JBoss family of products cements the company’s position as a major datacenter player.
Netflix Recommendations: Beyond the 5 stars (Part 2)
Predicting movie ratings accurately is just one aspect of our world-class recommender system. In this second part of the blog post, we will give more insight into our broader personalization technology. We will discuss some of our current models, data, and the approaches we follow to lead innovat...
Hadoop Fatigue
Alternatives to Hadoop
Moneyball 2.0: How Missile Tracking Cameras Are Remaking The NBA
The technology was originally developed to track missiles.
Hadoop and Storm are shifting the industry toward Big Data-enabled cloud applications
Dave and I were fortunate to attend a Churchill Club event on Hadoop Tuesday night. Hadoop sits at the center of the burgeoning Big Data universe, and so one might be tempted to conclude that it&...