Data is big
Follow
5.2K views | +21 today
 
Scooped by ukituki
onto Data is big
Scoop.it!

Becoming a Data Scientist - Curriculum via Metromap - Pragmatic Perspectives

Becoming a Data Scientist - Curriculum via Metromap - Pragmatic Perspectives | Data is big | Scoop.it
Becoming a data scientist a journey; for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap?
ukituki's insight:

I organized the overall plan progressively into the following areas / domains:

FundamentalsStatisticsProgrammingMachine LearningText Mining / Natural Language ProcessingData VisualizationBig DataData IngestionData MungingToolbox

Each area  / domain is represented as a "metro line", with the stations depicting the topics you must learn / master / understand in a progressive fashion. 

 

The idea is you pick a line, catch a train and go thru all the stations (topics) till you reach the final destination (or) switch to the next line. I have progressively marked each station (line) 1 thru 10 to indicate the order in which you travel. You can use this as an individual learning plan to identify the areas you most want to develop and the acquire skills. 

 

more...
No comment yet.

From around the web

Data is big
&amp;amp;quot;The future is here. It's just not evenly distributed yet.&amp;amp;quot; - William Gibson     :::: Follow this topic for fresh resources and ideas related to Data Science, Machine Learning, Algorithms and #bigdata :::: <a href="http://www.dataisbig.co" rel="nofollow">http://www.dataisbig.co</a>/
Curated by ukituki
Your new post is loading...
Your new post is loading...
Scooped by ukituki
Scoop.it!

Comprehensive guide for Data Exploration in R

Comprehensive guide for Data Exploration in R | Data is big | Scoop.it
This guide explains how to read data set in R, explore data in R, impute missing values in your dataset, visualize the dataset & merge in R
more...
No comment yet.
Scooped by ukituki
Scoop.it!

pandas: powerful Python data analysis toolkit [Free 1375-page book]

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. 

ukituki's insight:
Free 1375-page book! >> #Python Pandas data analysis toolkit for#DataScience: bit.ly/1HGtBCr  via @KirkDBorne 
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Excellent, “Practical” Talk About Data Science In Today’s Marketplace by Charles Martin

Excellent, “Practical” Talk About Data Science In Today’s Marketplace by Charles Martin | Data is big | Scoop.it
You can always tell when you are listening to someone who knows what they are talking about. In this Edu-video, Charles Martin, an excellent Data Scientist with
more...
No comment yet.
Scooped by ukituki
Scoop.it!

GloVe: Global Vectors for Word Representation

GloVe: Global Vectors for Word Representation | Data is big | Scoop.it
GloVe: Global Vectors for Word Representation
ukituki's insight:

GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

 
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Five most popular similarity measures implementation in python

Five most popular similarity measures implementation in python | Data is big | Scoop.it
    The buzz term similarity distance measures has got wide variety of definitions among the math and data mining practitioners.  .
more...
No comment yet.
Scooped by ukituki
Scoop.it!

.Rddj - Resources for doing data journalism with R

.Rddj - Resources for doing data journalism with R. A curated and opinionated list of resources for learning the in and outs of R for doing data journalism.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Machine Learning Algorithm Mines 16 Billion E-Mails | MIT Technology Review

Machine Learning Algorithm Mines 16 Billion E-Mails | MIT Technology Review | Data is big | Scoop.it
Human e-mailing behavior is so predictable that computer scientists have created an algorithm that can calculate when an e-mail thread is about to end.
more...
Pedro Gomez's curator insight, April 13, 4:17 PM

mail data = smart data = Google power

Scooped by ukituki
Scoop.it!

Methods to find the most important variables for building models in R

Methods to find the most important variables for building models in R | Data is big | Scoop.it
"Selecting the most important predictor variables that explains the major part of variance of the response variable can be key to identify and build high performing models. These techniques are powerful tools that can help reveal the large sediments of gold in your data.

Random Forest
Rando
more...
No comment yet.
Scooped by ukituki
Scoop.it!

100+ Interesting Data Sets for Statistics - rs.io

100+ Interesting Data Sets for Statistics - rs.io | Data is big | Scoop.it
Looking for interesting data sets? Here's a list of more than 100 of the best stuff, from dolphin relationships to political campaign donations to death row prisoners.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Computer model generates automatic captions for images

Computer model generates automatic captions for images | Data is big | Scoop.it
ukituki's insight:

CIFAR fellows have created a machine learning system that generates captions for images from scratch, scanning scenes and putting together a sentence to describe what it sees. 

 

“Instead of it being in French, now it’s in images” 

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Enabling Collaboration Workflows for Datasets Using Dat - YouTube

BIDS Data Science Lecture Series | April 10, 2015 | 1:00-2:30 p.m. | 190 Doe Library, UC Berkeley Speaker: Max Ogden, Computer Programmer, US Open Data Spons...
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Algorithms and Accountability Conference | NYU School of Law

Scholars, stakeholders, and policymakers question the adequacy of existing mechanisms governing algorithmic decision-making and grapple with new challenges presented by the rise of algorithmic power in terms of transparency, fairness, and equal treatment. Algorithms increasingly shape our news, economic options, and educational trajectories. The centrality and concerns about algorithmic decision making have only increased since we hosted the Governing Algorithms conference in May 2013. This event built upon that conversation to address legal, policy and ethical challenges related to algorithmic power in three specific contexts: media production and consumption, commerce, and education.
more...
No comment yet.