"The future is here. It's just not evenly distributed yet." - William Gibson :::: Follow this topic for fresh resources and ideas related to Data Science, Machine Learning, Algorithms and #bigdata ::::
It seems that data mining and machine learning became so popular that now almost every CS student knows about classifiers, clustering, statistical NLP ... etc. So it seems that finding data miners is not a hard thing nowadays.
My question is: What are the skills that a data miner could learn that would make him different than the others? To make him a not-so-easy-to-find-someone-like-him kind of person.
Today we’re announcing a new Netflix-OSS project called Surus. Over the next year we plan to release a handful of our internal user defined functions (UDF’s) that have broad adoption across Netflix. The use cases for these functions are varied in nature (e.g. scoring predictive models, outlier detection, pattern matching, etc.) and together extend the analytical capabilities of big data.
“Facebook is sharing some of its technology. The company’s artificial intelligence research team today announced that it is open sourcing its deep-learning AI tools. The software will be available on…”
Via Scott Turner
In conclusion, dplyr is pretty fast (way faster than base R or plyr) but data.table is somewhat faster especially for very large datasets and a large number of groups. For datasets under a million rows operations on dplyr (or data.table) are subseconds and the speed difference does not really matter. For larger datasets one can choose dplyr with data.table as a backend, for example. For even larger datasets, or for those preferring the data.table syntax, data.table might be the choice. Either way, R is the best tool for data munging tabular data with millions of rows.
Here's a neat demo from Yihui Xie: you can talk to this R graph and customize it with voice commands.
You are recommended to use Google Chrome to play with this app. To change the title, say something that starts with "title", e.g. "title I love the R language", or "title Good Morning". To change the color of points, say something that starts with "color", e.g. color "blue", or color "green". When the app is unable to recognize the color, the points will turn gray. To add a regression line, say "regression". To make the points bigger or smaller, say "bigger" or "smaller".The source code of this app is on Github. You may also see my demo of playing with this app.
Course DescriptionComputer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment will involve training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet). We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.
In the first episode of Talking Machines we meet our hosts, Katherine Gorman (nerd, journalist) and Ryan Adam (nerd, Harvard computer science professor), and explore some of the interviews you'll be able to hear this season. Today we hear some short clips on big issues, we'll get technical, but the today is all about introductions.
We start with Kevin Murphy of Google talking about his textbook that has become a standard in the field. Then we turn to Hanna Wallach of Microsoft Research NYC and UMass Amherst and hear about the founding of WiML (Women in Machine Learning). Next we discuss academia's relationship with business with Max Welling from the University of Amsterdam, program co-chair of the 2013 NIPS conference (Neural Information Processing Systems). Finally, we sit down with three pillars of the field Yann LeCun, Yoshua Bengio, and Geoff Hinton to hear about where the field has been and where it might be headed.
Following a recent post at Online Behavior on Visualizing Google Analytics Data With R I thought I would share my own Shiny application (a free package available for R) for visualising visits, bounce rate etc. from websites.
The second-annual Structure Data Awards are here, where Gigaom picks the most-interesting and most-promising data startups that launched in the previous year. The winners, which range from a non-profit data science organization to a company building infrastructure for deep learning, will present during a special session at our Structure Data conference, which takes places March 18
“ RT @sedielem: Some great practical advice from Ilya Sutskever for training large deep neural networks http://t.co/m0GBzeEJBf”; Ok. So you’re sold. You’re convinced that LDNNs are the present and the future and you want to train it. But rumor has it that it’s so hard, so difficult… or is it? The reality is that it used to be hard, but now the community has consolidated its knowledge and realized that training neural networks is easy as long as you keep the following in mind. Here is a summary of the community’s knowledge of what’s important and what to look after.
Via Dahl Winters
Machine learning is an exciting and fast-moving field of computer science with many recent consumer applications (e.g., Microsoft Kinect, Google Translate, Iphone's Siri, digital camera face detection, Netflix recommendations, Google news) and applications within the sciences and medicine (e.g., predicting protein-protein interactions, species modeling, detecting tumors, personalized medicine). In this undergraduate-level class, students will learn about the theoretical foundations of machine learning and how to apply machine learning to solve new problems.