Data is big
Follow
Find
4.2K views | +22 today
 
Scooped by ukituki
onto Data is big
Scoop.it!

7 Steps for Learning Data Mining and Data Science

7 Steps for Learning Data Mining and Data Science | Data is big | Scoop.it
How to learn data mining and data science? I outline seven steps and point you to resources for becoming a data scientist.
ukituki's insight:

Here are 7 steps for learning data mining and data science. Although they are numbered, you can do them in parallel or in a different order.

Languages: Learn R, Python, and SQLTools: Learn several data mining software suitesTextbooks: Read introductory textbooks to understand the fundamentalsEducation: watch webinars, take courses, and consider a certificate or a degree in data scienceData: Check available data resources and find something thereCompetitions: Participate in data mining competitionsInteract with other data scientists, via social networks, groups, and meetings
more...
No comment yet.

From around the web

Data is big
"The future is here. It's just not evenly distributed yet." - William Gibson     :::: Follow this topic for fresh resources and ideas related to Data Science, Machine Learning, Algorithms and #bigdata :::: http://www.dataisbig.co/
Curated by ukituki
Your new post is loading...
Your new post is loading...
Scooped by ukituki
Scoop.it!

R Markdowns - perfect for learning R by examples #rstats

R Markdowns - perfect for learning R by examples #rstats | Data is big | Scoop.it

one of the best ways to learn R by following examples.

more...
No comment yet.
Rescooped by ukituki from R for Journalists
Scoop.it!

Merging Data Sets Based on Partially Matched Data Elements

Merging Data Sets Based on Partially Matched Data Elements | Data is big | Scoop.it
“A tweet from @coneee yesterday about merging two datasets using columns of data that don't quite match got me wondering about a possible R recipe for handling partial matching. The data in question...”
Via M. Edward (Ed) Borasky
more...
No comment yet.
Rescooped by ukituki from Social Network Analysis #sna
Scoop.it!

What Connects Dutch Corporates? Linked Innovation in the Netherlands

What Connects Dutch Corporates? Linked Innovation in the Netherlands | Data is big | Scoop.it
ukituki's insight:
The above graphic is based on all published patent applications by Dutch applicants in 2014. To focus on unique technologies and avoid cluttering of the networks, the patent data was filtered to only include patents representing so-called patent families (multiple patent applications in various countries relating to the same invention). Because of the focus on the Dutch patent landscape, non-Dutch organizations which use Dutch legal entities to host their patents (such as oilfield services company Schlumberger or hard drive manufacturer HGST) were omitted. Networks were then constructed with nodes representing patents and connections between patents based on shared CPC (Cooperative Patent Classification, a system used by worldwide patent offices to classify patents) codes. The final dataset includes 5,895 published patent applications – the network presented here details the largest interconnected technology cluster, comprising 3,561 patents.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Performance and Scale Options for R with Hadoop: A comparison of potential architectures

R and Hadoop go together. In fact, they go together so well, that the number of options available can be confusing to IT and data science teams seeking solutio…
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Divining the 'K' in K-means Clustering

Divining the 'K' in K-means Clustering | Data is big | Scoop.it
The venerable K-means algorithm is the a well-known and popular approach to clustering. It does, of course, have some drawbacks. The most obvious one being the need to choose a pre-determined numbe...
more...
No comment yet.
Scooped by ukituki
Scoop.it!

20 short tutorials all data scientists should read (and practice)

20 short tutorials all data scientists should read (and practice) | Data is big | Scoop.it

We are now at 20, up from 17. I hope I find the time to write a one-page survival guide for UNIX, Python and Perl. Here's one for R. The links to core data science concepts are below - I need to add links to web crawling, attribution modeling and API design. Relevancy engines are discussed in some of the tutorials listed below. And that will complete my 10-page cheat sheet for data science. 

 

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Max Kuhn’s Talk on Predictive Modeling | NYC Data Science Program

Max Kuhn’s Talk on Predictive Modeling | NYC Data Science Program | Data is big | Scoop.it

Max Kuhn, Director of Nonclinical Statistics of Pfizer and also the author of Applied Predictive Modeling joined us on February 17, 2015 and shared his experience with Data Mining with R.

 
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Bivariate Choropleth Maps: A How-to Guide

Bivariate Choropleth Maps: A How-to Guide | Data is big | Scoop.it
Joshua Stevens is a cartographer and GIScientist focusing on UI/UX design, human-computer interaction, and the visualization of spatial data.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

How-to go parallel in R – basics + tips

How-to go parallel in R – basics + tips | Data is big | Scoop.it
Today is a good day to start parallelizing your code. I've been using the parallel package since its integration with R (v. 2.14.0) and its much easier than it at first seems. In this post I'll go through the basics for implementing parallel computations in R, cover a few common pitfalls, and give tips on how to avoid them.

[caption id="attachment_1599" align="aligncenter" width="640"] Don’t waist another second, start parallelizing your computations today! The image is CC by Smudge 9000[/cap
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Golden Rule of Forecasting Checklist ©

1. Even if you are unfamiliar with forecasting, you can use the Golden Rule Checklist to determine whether forecasts are independent and were derived using the best available forecasting methods.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Quantitative Economics

Quantitative Economics | Data is big | Scoop.it
ukituki's insight:

This website presents a series of lectures on quantitative economic modelling, designed and written by Thomas J. Sargent and John Stachurski. The primary programming languages are Python and Julia.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

How to detect Outliers in your dataset and treat them?

How to detect Outliers in your dataset and treat them? | Data is big | Scoop.it
Finding & treating outliers in your dataset can improve your models & predictions. We explain how to detect & treat outliers in a dataset.
more...
No comment yet.
Rescooped by ukituki from R for Journalists
Scoop.it!

Fuzzy String Matching - a survival skill to tackle unstructured information

Fuzzy String Matching - a survival skill to tackle unstructured information | Data is big | Scoop.it
“How to combine different sources of unstructured information using Fuzzy String Matching: a step-by-step tutorial in R”
Via M. Edward (Ed) Borasky
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Deep Learning with {h2o} on MNIST dataset (and Kaggle competition)

Deep Learning with {h2o} on MNIST dataset (and Kaggle competition) | Data is big | Scoop.it

In the previous post we saw how Deep Learning with {h2o} works and how Deep Belief Nets implemented by h2o.deeplearning draw decision boundaries for XOR patterns.What kind of decision boundaries does Deep Learning (Deep Belief Net) draw? Practice with R and {h2o} package

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Psychology Journal Bans Significance Testing « Science-Based Medicine

Psychology Journal Bans Significance Testing « Science-Based Medicine | Data is big | Scoop.it
This is perhaps the first real crack in the wall for the almost-universal use of the null hypothesis significance testing procedure (NHSTP). The journal, Basic...
more...
No comment yet.
Scooped by ukituki
Scoop.it!

R: How to Layout and Design an Infographic

R: How to Layout and Design an Infographic | Data is big | Scoop.it
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Video Playlist from Deep Learning Summit, San Francisco, 2015 - YouTube

Video Playlist from Deep Learning Summit, San Francisco, 2015 - YouTube | Data is big | Scoop.it
ukituki's insight:

21 videos on #deeplearning

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Machine Learning For Journalism at The New York Times

Machine Learning For Journalism at The New York Times | Data is big | Scoop.it
Daeil Kim is a Data Scientist at the New York Times, a role which has crystallised his research work into the niche between Machine Learning and Journalism. Kim spoke at the New York Data Science Meetup last week about how to make journalists work easier by using Machine Learning, a Bayesian perspective on big data and a discreet section on non journalistic related Machine Learning at the NYT.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Shiny App: UEFA CL Round of 16 Drawings 2014/15 #rstats

Shiny App: UEFA CL Round of 16 Drawings 2014/15 #rstats | Data is big | Scoop.it

A new article regarding the 2014/15 drawings will be available shortly is now available (see below). For now you can simulate the drawing process with my newly created Shiny App: Champions League.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

"Year Zero: How We’ll Run Our Lives in Ten Years’ Time" -Alistair Croll (Strata + Hadoop 2015)

Roughly every decade, some kind of military or enterprise technology makes its way into the mainstream: the personal computer; the consumer Internet; the mob...
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Deep Learning Tutorials

ukituki's insight:

The tutorials presented here will introduce you to some of the most important deep learning algorithms and will also show you how to run them using Theano. Theano is a python library that makes writing deep learning models easy, and gives the option of training them on a GPU.

 
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Should you teach Python or R for data science?

Should you teach Python or R for data science? | Data is big | Scoop.it

Last week, I published a post titled Lessons learned from teaching an 11-week data science course, detailing my experiences and recommendations from teaching General Assembly's 66-hour introductory data science course. In the comments, I received the following question

more...
No comment yet.
Scooped by ukituki
Scoop.it!

District Data Labs - How to Transition from Excel to R

District Data Labs - How to Transition from Excel to R | Data is big | Scoop.it
How to Transition from Excel to R - An Intro to R for Microsoft Excel Users
more...
No comment yet.