Data is big
Follow
Find
3.8K views | +4 today
 
Scooped by ukituki
onto Data is big
Scoop.it!

Scikit-learn, machine learning and cybercrime attribution

Scikit-learn, machine learning and cybercrime attribution | Data is big | Scoop.it
Robert Layton http://2013.pycon-au.org/schedule/30019/view_talk The scikit-learn library is a rapidly growing open source toolkit for machine learning in python.
more...
luiy's curator insight, July 16, 2013 5:51 AM

 

The scikit-learn library is a rapidly growing open source toolkit for machine learning in python. It allows for practitioners and researchers to apply machine learning in a variety of applications and is used by companies worldwide. Developed by programmers from around the world, the project has a large (and increasing) number of machine learning algorithms, a very useful set of utility functions and has also spawned a set of detail

Data is big
"The future is here. It's just not evenly distributed yet." - William Gibson     :::: Follow this topic for fresh resources and ideas related to Data Science, Machine Learning, Algorithms and #bigdata ::::
Curated by ukituki
Your new post is loading...
Your new post is loading...
Scooped by ukituki
Scoop.it!

7 building blocks to create value from data

Slides of the course on big data by C. Levallois from EMLYON Business School. For business students. Check the online video connected with these slides. -> I…
more...
No comment yet.
Scooped by ukituki
Scoop.it!

An Introduction to Change Points (packages: ecp and BreakoutDetection)

An Introduction to Change Points (packages: ecp and BreakoutDetection) | Data is big | Scoop.it
A forewarning, this post is me going out on a limb, to say the least. In fact, it's a post/project requested from me by Brian Peterson, and it follows a new paper that he's written on how to thorou...
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Skills hard to find in machine learners?

Skills hard to find in machine learners? | Data is big | Scoop.it
It seems that data mining and machine learning became so popular that now almost every CS student knows about classifiers, clustering, statistical NLP ... etc. So it seems that finding data miners is not a hard thing nowadays.

My question is: What are the skills that a data miner could learn that would make him different than the others? To make him a not-so-easy-to-find-someone-like-him kind of person.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

The Netflix Tech Blog: Introducing Surus and ScorePMML

ukituki's insight:

Today we’re announcing a new Netflix-OSS project called Surus. Over the next year we plan to release a handful of our internal user defined functions (UDF’s) that have broad adoption across Netflix. The use cases for these functions are varied in nature (e.g. scoring predictive models, outlier detection, pattern matching, etc.) and together extend the analytical capabilities of big data.

more...
No comment yet.
Rescooped by ukituki from R for Journalists
Scoop.it!

Automated Data Collection with R - htmltab: Next version and CRAN release

Automated Data Collection with R - htmltab: Next version and CRAN release | Data is big | Scoop.it

Via M. Edward (Ed) Borasky
more...
No comment yet.
Rescooped by ukituki from artificial intelligence for students
Scoop.it!

Facebook open-sources its deep-learning AI tools

Facebook open-sources its deep-learning AI tools | Data is big | Scoop.it
“Facebook is sharing some of its technology. The company’s artificial intelligence research team today announced that it is open sourcing its deep-learning AI tools. The software will be available on…”
Via Scott Turner
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Cheatsheet: Data Wrangling with dplyr and tidyr #rstats

Click here to edit the title

ukituki's insight:

Printable cheatsheet PDF from RStudio.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

dplyr and a very basic benchmark

dplyr and a very basic benchmark | Data is big | Scoop.it
In conclusion, dplyr is pretty fast (way faster than base R or plyr) but data.table is somewhat faster especially for very large datasets and a large number of groups. For datasets under a million rows operations on dplyr (or data.table) are subseconds and the speed difference does not really matter. For larger datasets one can choose dplyr with data.table as a backend, for example. For even larger datasets, or for those preferring the data.table syntax, data.table might be the choice. Either way, R is the best tool for data munging tabular data with millions of rows.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Talk to R

Here's a neat demo from Yihui Xie: you can talk to this R graph and customize it with voice commands.

ukituki's insight:

You are recommended to use Google Chrome to play with this app. To change the title, say something that starts with "title", e.g. "title I love the R language", or "title Good Morning". To change the color of points, say something that starts with "color", e.g. color "blue", or color "green". When the app is unable to recognize the color, the points will turn gray. To add a regression line, say "regression". To make the points bigger or smaller, say "bigger" or "smaller".The source code of this app is on Github. You may also see my demo of playing with this app.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition

Stanford University CS231n: Convolutional Neural Networks for Visual Recognition | Data is big | Scoop.it
ukituki's insight:
Course DescriptionComputer Vision has become ubiquitous in our society, with applications in search, image understanding, apps, mapping, medicine, drones, and self-driving cars. Core to many of these applications are visual recognition tasks such as image classification, localization and detection. Recent developments in neural network (aka “deep learning”) approaches have greatly advanced the performance of these state-of-the-art visual recognition systems. This course is a deep dive into details of the deep learning architectures with a focus on learning end-to-end models for these tasks, particularly image classification. During the 10-week course, students will learn to implement, train and debug their own neural networks and gain a detailed understanding of cutting-edge research in computer vision. The final assignment will involve training a multi-million parameter convolutional neural network and applying it on the largest image classification dataset (ImageNet). We will focus on teaching how to set up the problem of image recognition, the learning algorithms (e.g. backpropagation), practical engineering tricks for training and fine-tuning the networks and guide the students through hands-on assignments and a final course project. Much of the background and materials of this course will be drawn from the ImageNet Challenge.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Talking Machines - podcast about #machinelearning and #datascience

ukituki's insight:

In the first episode of Talking Machines we meet our hosts, Katherine Gorman (nerd, journalist) and Ryan Adam (nerd, Harvard computer science professor), and explore some of the interviews you'll be able to hear this season. Today we hear some short clips on big issues, we'll get technical, but the today is all about introductions.

We start with Kevin Murphy of Google talking about his textbook that has become a standard in the field. Then we turn to Hanna Wallach of Microsoft Research NYC and UMass Amherst and hear about the founding of WiML (Women in Machine Learning). Next we discuss academia's relationship with business with Max Welling from the University of Amsterdam, program co-chair of  the 2013 NIPS conference (Neural Information Processing Systems). Finally, we sit down with three pillars of the field Yann LeCun, Yoshua Bengio, and Geoff Hinton to hear about where the field has been and where it might be headed. 

more...
No comment yet.
Scooped by ukituki
Scoop.it!

How Google "Translates" Pictures into Words Using Vector Space Mathematics | MIT Technology Review

How Google "Translates" Pictures into Words Using Vector Space Mathematics | MIT Technology Review | Data is big | Scoop.it
Google engineers have trained a machine-learning algorithm to write picture captions using the same techniques it developed for language translation.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Building A Google Analytics App With Shiny & R | Analytics & Optimization

Building A Google Analytics App With Shiny & R | Analytics & Optimization | Data is big | Scoop.it
Following a recent post at Online Behavior on Visualizing Google Analytics Data With R I thought I would share my own Shiny application (a free package available for R) for visualising visits, bounce rate etc. from websites.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Model Performance metrics: How well does my model perform?

Model Performance metrics: How well does my model perform? | Data is big | Scoop.it
This article discusses metrics (Concordance, AUC-ROC, Gini coeff) to evaluate the performance of classification models & their advantages
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Here are the winners of the 2015 Structure Data Awards

Here are the winners of the 2015 Structure Data Awards | Data is big | Scoop.it

The second-annual Structure Data Awards are here, where Gigaom picks the most-interesting and most-promising data startups that launched in the previous year. The winners, which range from a non-profit data science organization to a company building infrastructure for deep learning, will present during a special session at our Structure Data conference, which takes places March 18

more...
No comment yet.
Scooped by ukituki
Scoop.it!

The Unreasonable Effectivness Of Deep Learning

The Unreasonable Effectivness Of Deep Learning | Data is big | Scoop.it
ukituki's insight:

Intro to Unsupervised Learning and Convolutional Networks by Yann LeCun, Computer Science Department, New York University 

more...
No comment yet.
Rescooped by ukituki from Big Data Technologies
Scoop.it!

Practical Advice on Training Large Deep Neural Networks

Practical Advice on Training Large Deep Neural Networks | Data is big | Scoop.it
“ RT @sedielem: Some great practical advice from Ilya Sutskever for training large deep neural networks http://t.co/m0GBzeEJBf”; Ok. So you’re sold. You’re convinced that LDNNs are the present and the future and you want to train it. But rumor has it that it’s so hard, so difficult… or is it? The reality is that it used to be hard, but now the community has consolidated its knowledge and realized that training neural networks is easy as long as you keep the following in mind. Here is a summary of the community’s knowledge of what’s important and what to look after.
Via Dahl Winters
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Pivot Tables in R with dplyr #rstats

Pivot Tables in R with dplyr #rstats | Data is big | Scoop.it
dplyr offers a new and unexpectedly easy way to create powerful Pivot Tables in R. Learn it all in this new tutorial.
more...
No comment yet.
Rescooped by ukituki from R for Journalists
Scoop.it!

Introducing practical and robust anomaly detection in a time series | Twitter Blogs

Introducing practical and robust anomaly detection in a time series | Twitter Blogs | Data is big | Scoop.it
“A novel way to detect anomalies in big data.”
Via M. Edward (Ed) Borasky
more...
No comment yet.
Scooped by ukituki
Scoop.it!

20 R Packages That Should Impact Every Data Scientist

20 R Packages That Should Impact Every Data Scientist | Data is big | Scoop.it

Anybody that has used R know just how frustrating it is to have an analytical idea in the mind that is hard to express.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Introduction To Machine Learning, Fall 2013 NYU Course

ukituki's insight:

Machine learning is an exciting and fast-moving field of computer science with many recent consumer applications (e.g., Microsoft Kinect, Google Translate, Iphone's Siri, digital camera face detection, Netflix recommendations, Google news) and applications within the sciences and medicine (e.g., predicting protein-protein interactions, species modeling, detecting tumors, personalized medicine). In this undergraduate-level class, students will learn about the theoretical foundations of machine learning and how to apply machine learning to solve new problems.

 
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Ashara12/Awesome-DeepLearning

Ashara12/Awesome-DeepLearning | Data is big | Scoop.it
Awesome-DeepLearning - A curated list of awesome Deep Learning tutorials, projects and communities.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

How to turn CSV data into interactive visualizations with R and rCharts

How to turn CSV data into interactive visualizations with R and rCharts | Data is big | Scoop.it
Once your data are in the right format, just a couple of lines of R code can generate a robust chart or graph from your spreadsheet.
more...
No comment yet.