Data is big
10.9K views | +2 today
Data is big
&amp;amp;quot;The future is here. It's just not evenly distributed yet.&amp;amp;quot; - William Gibson     :::: Follow this topic for fresh resources and ideas related to Data Science, Machine Learning, Algorithms and #bigdata :::: <a href="http://www.dataisbig.co" rel="nofollow">http://www.dataisbig.co</a>/
Curated by ukituki
Your new post is loading...
Your new post is loading...
Scooped by ukituki
Scoop.it!

Yahoo Releases the Largest-ever Machine Learning Dataset for Researchers

Yahoo Releases the Largest-ever Machine Learning Dataset for Researchers | Data is big | Scoop.it

Today, we are proud to announce the public release of the largest-ever machine learning dataset to the research community. The dataset stands at a massive ~110B events (13.5TB uncompressed) of anonymized user-news item interaction data, collected by recording the user-news item interactions of about 20M users from February 2015 to May 2015.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Common Probability Distributions: The Data Scientist's Crib Sheet - Cloudera Engineering Blog

Common Probability Distributions: The Data Scientist's Crib Sheet - Cloudera Engineering Blog | Data is big | Scoop.it
Data scientists have hundreds of probability distributions from which to choose. Where to start?
Data science, whatever it may be, remains a big deal.  “A data scientist is better at statistics than any software engineer,” you may overhear a pundit say, at your local tech get-togethers and hackathons. The applied mathematicians have their revenge, because statistics hasn’t been this talked-about since the roaring 20s. They have their own legitimizing Venn diagram of which people don’t make fun. Read More
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Creative AI & multimodality: looking ahead

Lecture on Creative AI (Generative Deep Learning) at Imperial College London, 1 December 2015
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Google TensorFlow Tutorial

TensorFlow Tutorial given by Dr. Chung-Cheng Chiu at Google Brain on Dec. 29, 2015 http://datasci.tw/event/google_deep_learning
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Architecting Predictive Algorithms for Machine Learning

Machine learning is one of the newest tools in a Data Scientist’s arsenal. In this session, you will learn key architectural principles and frameworks for cr...
more...
No comment yet.
Scooped by ukituki
Scoop.it!

From academia to freelance #datascience

From academia to freelance #datascience | Data is big | Scoop.it
Data science freelancing (vs academia): freedom, impact, meritocracy, fast pace, money.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Deep Learning on Amazon EC2 GPU with Python and nolearn - PyImageSearch

Deep Learning on Amazon EC2 GPU with Python and nolearn - PyImageSearch | Data is big | Scoop.it
Your GPU matters. A Lot. The GPUs included in most notebooks are optimized for power efficiency and not necessarily computational efficiency.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Machine Intelligence In The Real World

Machine Intelligence In The Real World | Data is big | Scoop.it
ukituki's insight:

“Panopticons” Collect A Broad Dataset
“Lasers” Collect A Focused Dataset
“Alchemists” Promise To Turn Your Data Into Gold
“Gateways” Create New Use Cases From Specific Data Types
“Magic Wands” Seamlessly Fix A Workflow
“Navigators” Create Autonomous Systems For The Physical World
“Agents” Create Cyborgs And Bots To Help With Virtual Tasks
“Pioneers” Are Very Smart

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Cohort analysis: Retention Rate Visualization with R

When conducting Cohort Analysis, one of the most important measures is Customer Retention Rate. I will share a few ideas for visualizing this parameter in this post. Last year I shared several charts for Customer Retention Rate visualization in this post. However, it is always helpful to analyze and visualize both relative (Customer Retention Rate) and absolute values (number of customers in a cohort). For this, I have created charts that combine these values

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Common Probability Distributions: The Data Scientist's Crib Sheet - Cloudera Engineering Blog

Common Probability Distributions: The Data Scientist's Crib Sheet - Cloudera Engineering Blog | Data is big | Scoop.it
Data scientists have hundreds of probability distributions from which to choose. Where to start?
Data science, whatever it may be, remains a big deal.  “A data scientist is better at statistics than any software engineer,” you may overhear a pundit say, at your local tech get-togethers and hackathons. The applied mathematicians have their revenge, because statistics hasn’t been this talked-about since the roaring 20s. They have their own legitimizing Venn diagram of which people don’t make fun.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Learning Simple Algorithms from Examples

We present a neural network based framework to learn algorithms from examples. We tackle problems like copying, reversing sequences, multi-digit addition

more...
No comment yet.
Scooped by ukituki
Scoop.it!

The Evolution of Distributed Programming in R

R and distributed programming rank highly on my list of “good things”, so imagine my delight when two new packages, ddR and multidplyr

ukituki's insight:

Distributed programming is normally taken up for a variety of reasons:

To speed up a process or piece of codeTo scale up an interface or application for multiple users

There has been a huge appetite for this in the R community for a long time so my first thought was “Why now? Why not before?”

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Deep Learning for Natural Language Processing: Word Embeddings

Guest Lecture on NLP & Deep Learning (Word Embeddings) at the course Language technology at KTH, Stockholm, 3 December 2015
more...
No comment yet.
Scooped by ukituki
Scoop.it!

The Traveling Salesman with Simulated Annealing, R, and Shiny

The Traveling Salesman with Simulated Annealing, R, and Shiny | Data is big | Scoop.it

I built an interactive Shiny application that uses simulated annealing to solve the famous traveling salesman problem. 

more...
No comment yet.
Scooped by ukituki
Scoop.it!

[1512.02595] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, resulting in a 7x speedup over our previous system. Because of this efficiency, experiments that previously took weeks now run in days. This enables us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

A Bayesian Approach to Monitoring Process Change - Part 1

A Bayesian Approach to Monitoring Process Change - Part 1 | Data is big | Scoop.it
In this series of articles we discuss the use of probabilistic Bayesian modelling to help efficiently evaluate changes to business processes.
more...
No comment yet.