'Research shows that if the progressive revelation of information takes the user down a path towards refinement that feels like progress and gets them where they need to be, they will give you up to twelve clicks before turning grumpy. Yes, twelve.'
The Guardian Safety net: how charities are reaching the digitally excluded The Guardian For the past five years, Diabetes UK has been taking its Healthy Lifestyle Roadshow into these communities to offer assessments and advice face-to-face.
What does modern brain and mind science have to offer to improve education, health and quality of life? Here you have some of the most popular highlights about neuroplasticity, emotion and cognition from my book The...
'Companies are jumping on the Internet of Things (IoT) bandwagon and for good reasons. McKinsey Global Institute reports that the IoT business will deliver $6.2 trillion of revenue by 2025. Many people wonder if companies are ready for this explosion of data generated for IoT. As with any new technology, security is always the first point of resistance. I agree that IoT brings a wave of new security concerns but the bigger concern is how woefully unprepared most data centers are for the massive amount of data coming from all of the “things” in the near future....'
Why do people share content and stories. A long story on content sharing: the what, why and how.
Content sharing is that – often secret – holy grail of many content marketing, social media and online marketing practitioners in general.
Although content sharing is just a part of the whole content marketing story, all marketers want to see their content shared far and wide with as many people as possible. So, why do people share content?
The easy answer: because they want to and feel emotionally compelled to. But that’s of course too easy....
Today the photograph has transformed again. From the old world of unprocessed rolls of C-41 sitting in a fridge 20 years ago to sharing photos on the 1.5” screen of a point and shoot camera 10 years back. Today the photograph is something different. Photos automatically leave their capture (and formerly captive) devices to many sharing services. There are a lot of photos. A back of the envelope estimation reports 10% of all photos in the world were taken in the last 12 months, and that was calculated three years ago. And of these services, Flickr has been a great repository of images that are free to share via Creative Commons.
On Flickr, photos, their metadata, their social ecosystem, and the pixels themselves make for a vibrant environment for answering many research questions at scale. However, scientific efforts outside of industry have relied on various sized efforts of one-off datasets for research. At Flickr and at Yahoo Labs, we set out to provide something more substantial for researchers around the globe.
Data, data, data… A glimpse of a small piece of the dataset. YFCC100M by aymanshamma on Flickr.
Today, we are announcing the Flickr Creative Commons dataset as part of Yahoo Webscope’s datasets for researchers. The dataset, we believe, is one of the largest public multimedia datasets that has ever been released—99.3 million images and 0.7 million videos, all from Flickr and all under Creative Commons licensing.
The dataset (about 12GB) consists of a photo_id, a jpeg url or video url, and some corresponding metadata such as the title, description, title, camera type, title, and tags. Plus about 49 million of the photos are geotagged! What’s not there, like comments, favorites, and social network data, can be queried from the Flickr API.
A 1 million photo sample of the 48 million geotagged photos from the dataset plotted around the globe. One Million Creative Commons Geo-tagged Photos by aymanshamma on Flickr.
But of course, processing 100 million images takes a fair bit of processing power, time, and resources that not every research institute has. To aid here, we’ve worked with the International Computer Science Institute (ICSI) at Berkeley and Lawrence Livermore National Laboratory to compute many open standardized computer vision and audio features†, which we plan to host in a shared Amazon Instance, as it’s somewhere north of 50TB, for researchers around the world to use. It’s pretty intense and they brought in a first-of-its-kind supercomputer, the Cray Catalyst, to make the calculations.
The dataset is available now!
The dataset can host a variety of research studies and challenges. One of the first challenges we are issuing is the MediaEval Placing Task, where the task is to build a system capable of accurately predicting where in the world the photos and videos were taken without using the longitude and latitude coordinates. This is just the start. We plan to create new challenges through expansion packs that will widen the scope of the dataset with new tasks like object localization, concept detection, and social semantics.
Interested? Head over to the Yahoo Webscope site to request the dataset. If you have any questions, you can get those answered there as well.
† In case you’re curious: SIFT, GIST, Auto Color Correlogram, Gabor Features, CEDD, Color Layout, Edge Histogram, FCTH, Fuzzy Opponent Histogram, Joint Histogram, Kaldi Features, MFCC, SACC_Pitch, and Tonality.