Data is big
Follow
4.6K views | +14 today
 
Scooped by ukituki
onto Data is big
Scoop.it!

The Signal and the Noise

The Signal and the Noise | Data is big | Scoop.it
Stats guru and political forecaster Nate Silver reveals why most predictions fail, and shows how we can isolate a true "signal" from a universe of increasingly big and noisy data
more...
No comment yet.
Data is big
&amp;amp;quot;The future is here. It's just not evenly distributed yet.&amp;amp;quot; - William Gibson     :::: Follow this topic for fresh resources and ideas related to Data Science, Machine Learning, Algorithms and #bigdata :::: <a href="http://www.dataisbig.co" rel="nofollow">http://www.dataisbig.co</a>/
Curated by ukituki
Your new post is loading...
Your new post is loading...
Scooped by ukituki
Scoop.it!

Replay: Reproducible data analysis with the checkpoint package

Thanks to all who attended my webinar earlier this week, Reproducibility with Revolution R Open and the Checkpoint Package. If you missed the live session, you can catch up with the slides and video replay which I've embedded below. If you just want to check out the demo of the checkpoint package, it starts at 18:30 in the video below. If you want to follow along at home, you can download the demo script here. Revolution Analytics webinars: Reproducibility with Revolution R Open and the Checkpoint Package
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Deep Learning, NLP, and Representations - colah's blog Powered by RebelMouse

Deep Learning, NLP, and Representations - colah's blog Powered by RebelMouse | Data is big | Scoop.it

This post reviews some extremely remarkable results in applying deep neural networks to natural language processing (NLP). In doing so, I hope to make accessible one promising answer as to why deep neural networks work. I think it's a very elegant perspective.http://colah.github.io/posts/2014-07-NLP... ;

ukituki's insight:

http://colah.github.io/posts/2014-07-NLP-RNNs-Representations/

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Reducing your R memory footprint by 7000x

Reducing your R memory footprint by 7000x | Data is big | Scoop.it
R is notoriously a memory heavy language. I don't necessarily think this is a
bad thing--R wasn't built to be super performant, it was built for analyzing
data! That said, there are times when there are some implementation patterns
that are quite...redundant. As an example, I'm going to show you how you can
prune a 330 MB glm to 45KB without losing significant functionality.


----->


Let's trim the R fat
Le Model
Our model is going ...
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Deep Image: Scaling up Image Recognition. Slides of Ren Wu, Scientist @BaiduResearch, #deeplearning

Deep Image: Scaling up Image Recognition. Slides of Ren Wu, Scientist @BaiduResearch,  #deeplearning | Data is big | Scoop.it

Data is big. 

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Pinnability: Machine learning in the home feed

Pinnability: Machine learning in the home feed | Data is big | Scoop.it
Our unique data set contains abundant human-curated content, so that Pin, board and user dynamics provide informative features for accurate Pinnability prediction. These features fall into three general categories: Pin features, Pinner features and interaction features:

Pin features capture the intrinsic quality of a Pin, such as historical popularity, Pin freshness and likelihood of spam. Visual features from Convolutional Neural Networks (CNN) are also included.
Pinner features are about the particulars of a user, such as how active the Pinner is, gender and board status.
Interaction features represent the Pinner’s past interaction with Pins of a similar type.
more...
No comment yet.
Rescooped by ukituki from Data Science
Scoop.it!

Disruptive Tools In The Data Science Toolkit (Dr. Gurjeet Singh) - Exponential Finance 2014

Dr. Gurjeet Singh of Ayasdi, named Fast Company's 2014 Most Innovative Company in Big Data, addresses the cutting edge of big data and how machine learning/b...

Via Karlo Jara
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Trick to enhance power of Regression model

Trick to enhance power of Regression model | Data is big | Scoop.it
We, as analysts, specialize in optimization of already optimized processes. As the optimization gets finer, opportunity to make the process better gets thinner.  One of the predictive modeling technique used frequently use is regression (Linear or Logistic). Another equally competing technique (typically considered as a challenger) is Decision tree.  
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Algorithmia Launches With More Than 800 Algorithms On Its Marketplace

Algorithmia Launches With More Than 800 Algorithms On Its Marketplace | Data is big | Scoop.it

Algorithmia, the startup that raised $2.4 million last August to connect academics building powerful algorithms and the app developers who could put them to use, just brought its marketplace out of private beta.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Step by Step Guide to Learn #DataScience on R

Step by Step Guide to Learn #DataScience on R | Data is big | Scoop.it
Learning path on R provides a step by step guide to become a data scientist using R. The path includes exercises, tutorials & best practices
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Top 50 Data Science Resources: The Best Blogs, Forums, Videos and Tutorials to Learn All about Data Science

Top 50 Data Science Resources: The Best Blogs, Forums, Videos and Tutorials to Learn All about Data Science | Data is big | Scoop.it
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Machine Learning Done Wrong

Machine Learning Done Wrong | Data is big | Scoop.it

Statistical modeling is a lot like engineering.

In engineering, there are various ways to build a key-value storage, and each design makes a different set of assumptions about the usage pattern. In statistical modeling, there are various algorithms to build a classifier, and each algorithm makes a different set of assumptions about the data.

ukituki's insight:

When dealing with small amounts of data, it’s reasonable to try as many algorithms as possible and to pick the best one since the cost of experimentation is low. But as we hit “big data”, it pays off to analyze the data upfront and then design the modeling pipeline (pre-processing, modeling, optimization algorithm, evaluation, productionization) accordingly.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Flowbox.io & Luna - amazing new programming language

Flowbox.io & Luna - amazing new programming language | Data is big | Scoop.it
Woyciech Danilo from flowbox.io talks about their new programming language, Luna. Flowbox develops professional video compositing software, which is powered by a new programming language...
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Retail Forecasting: Step 1 of 6, data preprocessing

Retail Forecasting: Step 1 of 6, data preprocessing | Data is big | Scoop.it
Accurate and timely forecast in retail business drives success. It is an essential enabler of supply and inventory planning, product pricing, promotion, and placement. As part of Azure ML offering, Microsoft provides a template letting data scientists easily build and deploy a retail forecasting solution.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Data Science Specialization Course Notes by sux13

I have compiled notes for all 9 courses of the Johns Hopkins University/Coursera Data Science Specialization. The notes are all written in R Markdown format and encompass all concepts covered in class, as well as additional examples I have compiled from lecture, my own exploration, StackOverflow, and Khan Academy. 

ukituki's insight:

These documents are intended to be comprehensive sources of reference for future use and they have served me wonderfully in completing the assignments for each course. So I hope you will find them helpful as well.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Machine Learning -on demand version - Stanford University (Coursera)

Machine Learning -on demand version - Stanford University (Coursera) | Data is big | Scoop.it
Machine Learning from Stanford University. Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it.  
ukituki's insight:

The on demand format allows you to work through the materials at your own pace. All materials are available at any time, and there are no deadlines for exercises or assignments.

 

If you joined Machine Learning in a previous session but didn’t quite complete the coursework, I hope you’ll consider revisiting the materials in the on demand format. You can now take as much time as you need to fully understand each lesson and complete each assignment successfully.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Guide to Data Science Cheat Sheets

Guide to Data Science Cheat Sheets | Data is big | Scoop.it
ukituki's insight:

Selection of the most useful Data Science cheat sheets, covering SQL, Python (including NumPy, SciPy and Pandas), R (including Regression, Time Series, Data Mining), MATLAB, and more.

more...
No comment yet.
Rescooped by ukituki from Big Data Analysis in the Clouds
Scoop.it!

Power to the new people analytics | McKinsey & Company

Power to the new people analytics | McKinsey & Company | Data is big | Scoop.it
Techniques used to mine consumer and industry data may also let HR tackle employee retention and dissatisfaction. A McKinsey Quarterly article.

Via Ángel Yustas Domínguez, Klaus Meschede, Pierre Levy
more...
No comment yet.
Scooped by ukituki
Scoop.it!

CDO = IS + IG + IR + IE | Blog post

ukituki's insight:

Capgemini’s 2015 survey of 1,000 senior decision-makers across nine industries and 10 countries revealed that some 43% of organizations are restructuring to exploit data opportunities. Encouragingly, 33% of the surveyed companies have appointed a Chief Data Officer (CDO) or a similar C-level role to lead and exploit such data opportunities, with another 19% planning to do so over the next 12 months.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Comprehensive learning path – Data Science in Python

Comprehensive learning path – Data Science in Python | Data is big | Scoop.it
A comprehensive learning path to become a data scientist using Python. Topics include machine learning, deep learning & pandas on Python.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Collaborative Computing with distcomp

Collaborative Computing with distcomp | Data is big | Scoop.it

A new R package available on GitHub from a group of Stanford researchers has the potential to significantly advance the practice of collaborative computing with large data sets distributed over separate sites that may be unwilling to explicitly share data. The fundamental idea is to be able to rapidly set up a web service based on Shiny and opencpu technology that manages and performs a series of master / slave computations which require sharing only intermediate results. 

ukituki's insight:

The particular target application for distcomp is any group of medical researchers who would like to fit a statistical model using the data from several data sets, but face daunting difficulties with data aggregation or are constrained by privacy concerns. Distcomp and its methodology, however, ought to be of interest to any organization with data spread across multiple heterogeneous database environments.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Book: Data Science in R

Book: Data Science in R | Data is big | Scoop.it

A Case Studies Approach to Computational Reasoning and Problem Solving
Deborah Nolan, Duncan Temple Lang
April 2, 2015 Forthcoming by Chapman and Hall/CRC 


ukituki's insight:

Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving illustrates the details involved in solving real computational problems encountered in data analysis. It reveals the dynamic and iterative process by which data analysts approach a problem and reason about different ways of implementing solutions.

more...
No comment yet.