Things about Data Analytics
11.1K views | +3 today
Follow
Things about Data Analytics
Articles I found in the web focused on Data Mining, Big Data, BI and Visual Analytics
Your new post is loading...
Your new post is loading...
Scooped by Roberto Rösler
Scoop.it!

Ideas on interpreting machine learning

Ideas on interpreting machine learning | Things about Data Analytics | Scoop.it
You’ve probably heard by now that machine learning algorithms can use big data to predict whether a donor will give to a charity, whether an infant in a NICU will develop sepsis, whether a customer will respond to an ad, and on and on. Machine learning can even drive cars and predict elections. ... Err, wait. Can it? I believe it can, but these recent high-profile hiccups should leave everyone who works with data (big or not) and machine learning algorithms asking themselves some very hard questions: do I understand my data? Do I understand the model and answers my machine learning algorithm is giving me? And do I trust these answers? Unfortunately, the complexity that bestows the extraordinary predictive abilities on machine learning algorithms also makes the answers the algorithms produce hard to understand, and maybe even hard to trust.
Roberto Rösler's insight:
One of the most comprehensive articles about this topic with a lot of ideas for your own experiments
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Practical Guide to implementing Neural Networks in Python (using Theano)

Practical Guide to implementing Neural Networks in Python (using Theano) | Things about Data Analytics | Scoop.it
In this article, I’ll provide a comprehensive practical guide to implement Neural Networks using Theano. If you are here for just python codes, feel free to skip the sections and learn at your pace. And, if you are new to Theano, I suggest you to follow the article sequentially to gain complete knowledge.
Roberto Rösler's insight:
Extremly well written introduction showing you the fundamentals behind neural networks in detail. If you like to know a little bit more than just to appling them ...
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Piping in R and in Pandas - FastML

In R community, there’s this one guy, Hadley Wickam, who by himself made R great again. One of the many, many things he came up with - so many they call it a hadleyverse - is the dplyr package, which aims to make data analysis easy and fast. It works by allowing a user to take a data frame and apply to it a pipeline of operations resulting in a desired outcome (an example in just a minute). This approach is a good match for the mental model some data scientists have and turned out to be successful. Then people have ported key pieces to Pandas.

Roberto Rösler's insight:
Didn't know those python modules - pandas-ply lacks of functionality (for example - you can not reference to a computation within the same call like "mutate(df, var_x = var_z, var_y = var_x +1). Will test dplython soon ....
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Better Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm - Machine Learning Mastery

Better Naive Bayes: 12 Tips To Get The Most From The Naive Bayes Algorithm - Machine Learning Mastery | Things about Data Analytics | Scoop.it
Naive Bayes is a simple and powerful technique that you should be testing and using on your classification problems. It is simple to understand, gives good results and is fast to build a model and make predictions. For these reasons alone you should take a closer look at the algorithm. In a recent blog post, you learned how to implement the Naive Bayes algorithm from scratch in python. In this post you will learn tips and tricks to get the most from the Naive Bayes algorithm.
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Parallel Coordinates via Pivot and LOD Expressions

Parallel Coordinates via Pivot and LOD Expressions | Things about Data Analytics | Scoop.it
Parallel coordinates are a useful chart type for comparing a number of variables at once across a dimension. They aren’t a native chart type in Tableau, but have been built at different times, here’s one by Joe Mako that I use in this post for the data and basic chart. The data is a set of vehicle attributes from the 1970s, I first saw it used in this post from Robert Kosara. This post updates the method Joe used with two enhancements that make the parallel coordinates plot easier to create and more extensible, namely pivot and Level of Detail Expressions.
Roberto Rösler's insight:
Share your insight
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Cause & Effect : A different way to explore temporal data in Tableau with R

Cause & Effect : A different way to explore temporal data in Tableau with R | Things about Data Analytics | Scoop.it
Whether it is forecasting your quarterly sales or comparing historical data, working with time series data is big part of business analytics. Sometimes patterns are easy to see, while in others they might be elusive especially when comparing different time series with short/frequent cyclical patterns. A common question could be understanding the impact of a…
Roberto Rösler's insight:
Another post showing how R can add advanced analytics capabilities to Tableau.
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Get off the deep learning bandwagon and get some perspective - PyImageSearch

Get off the deep learning bandwagon and get some perspective - PyImageSearch | Things about Data Analytics | Scoop.it
The following rant is actually more of an indictment of how we treat current “hot” machine learning algorithms — like “silver bullets” and the magic pill to cure our classification ailments. But these algorithms are not silver bullets, they are not magic pills, and they are not tools in a toolbox — they are methodologies backed by rational thought processes with assumptions regarding the datasets they are applied to. By spending a little bit more time thinking about the actual problem rather than blindly throwing a bunch of algorithms at the wall and seeing what sticks, I believe that we can only further the research.
Roberto Rösler's insight:
Event machine learning isn't free of the "silver-bullet" belief. That's a nice article of all the algorithms that are announced to solve most ml tasks once and for all ...
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Deep Learning 101

Deep Learning 101 | Things about Data Analytics | Scoop.it

Deep learning has become something of a buzzword in recent years with the explosion of 'big data', 'data science', and their derivatives mentioned in the media. Justifiably, deep learning approaches have recently blown other state-of-the-art machine learning methods out of the water for standardized problems such as the MNIST handwritten digits dataset. My goal is to give you a layman understanding of what deep learning actually is so you can follow some of my thesis research this year as well as mentally filter out news articles that sensationalize these buzzwords.

Roberto Rösler's insight:

Good introduction that maps the properties of deep learning to its main statistical concepts without losing the reader in a big mess of formulas.

more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Impala vs Hive: Difference between Sql on Hadoop components

Impala vs Hive: Difference between Sql on Hadoop components | Things about Data Analytics | Scoop.it

Hadoop has continued to grow and develop ever since it was introduced in the market 10 years ago. Every new release and abstraction on Hadoop is used to improve one or the other drawback in data processing, storage and analysis. Apache Hive was introduced by Facebook to manage and process the large datasets in the distributed storage in Hadoop. Apache Hive is an abstraction on Hadoop MapReduce and has its own SQL like language HiveQL. Cloudera Impala was developed to resolve the limitations posed by low interaction of Hadoop Sql. Cloudera Impala provides low latency high performance SQL like queries to process and analyze data with only one condition that the data be stored on Hadoop clusters.

more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Seven Techniques for Data Dimensionality Reduction

Seven Techniques for Data Dimensionality Reduction | Things about Data Analytics | Scoop.it

The recent explosion of data set size, in number of records and attributes, has triggered the development of a number of big data platforms as well as parallel data analytics algorithms. At the same time though, it has pushed for usage of data dimensionality reduction procedures. Indeed, more is not always better. Large amounts of data might sometimes produce worse performances in data analytics applications.

One of my most recent projects happened to be about churn prediction and to use the 2009 KDD Challenge large data set. The particularity of this data set consists of its very high dimensionality with 15K data columns. Most data mining algorithms are column-wise implemented, which makes them slower and slower on a growing number of data columns. The first milestone of the project was then to reduce the number of columns in the data set and lose the smallest amount of information possible at the same time.

Using the project as an excuse, we started exploring the state-of-the-art on dimensionality reduction techniques currently available and accepted in the data analytics landscape.

Roberto Rösler's insight:

Interesting to see that "basic methods" do such a good job on dimensionality reduction (even when the comparison was carried out only on a single dataset)

more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

10 more lessons learned from building Machine Learning systems

Presentation at #mlconf 2015 in San Francisco
Roberto Rösler's insight:

Interesting for REAL WORLD machine learning ...

more...
Martin McGaha's curator insight, March 4, 2016 5:33 AM

Interesting for REAL WORLD machine learning ...

Scooped by Roberto Rösler
Scoop.it!

DIY Chord Diagrams in Tableau - by Noah Salvaterra

DIY Chord Diagrams in Tableau - by Noah Salvaterra | Things about Data Analytics | Scoop.it
Can I make this in Tableau?… You can now.

Chord Diagrams are a chart that is often used for visualizing network flows that represent migration within a system, the one on the left for example, shows changes in cell phone choice for a collection of users. I’ll introduce the chart type a bit more carefully later, and provide a reference, but by the end of this post my hope is that you’ll have a sense of how you might build this type of chart with your own data in Tableau.

Joe Mako suggested the possibility that he will visualize the data from this post in a more effective way. I think that would be a great follow up and I certainly welcome that sort of feedback. I’m fine with the possibility that something better could turn up. The data underlying the chord diagrams in this post is included at the end. So feel free to have a crack at it. My opinion may not be that convincing on this topic, but the alternatives I’ve seen so far, such as stacked bars, heat maps or in the case of geographic data just throwing it all on a map and hoping it makes sense have their own problems as well. What I think most folks can agree on is that Chord Diagrams aren’t the right choice for every visualization. Searching the internet, I found more bad than good, so I chose the examples here with some care. My hope is that by enabling chord diagrams in Tableau, we can allow some experimentation and maybe establish some guidelines to know when it might be a good option, even if that answer turns out to be never.
Roberto Rösler's insight:

Wow ... I recently created Chord charts with D3 (because there is a special template for this), but I had never imagined that this is also possible with Tableau.

My experience using Chord charts in practice - people get easily impressed, but the chart is not easily understood.

more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Scatter Plots with Marginal Densities - An Example for Doing Exploratory Data Analysis with Tableau and R

Scatter Plots with Marginal Densities - An Example for Doing Exploratory Data Analysis with Tableau and R | Things about Data Analytics | Scoop.it
Introduction

One of the first stages in most data analysis projects is about exploring the data at hand. During this stage the analyst tries to get familiar with his dataset by looking at summary statistics, feature distributions and relationships between different attributes - just to name the key tasks. It is a really important procedure before the start of hypotheses testing and statistical modeling, as it gives important insight about what can be done with data and where we should expect problems. For example, a discrete target attribute where the labels are extremely uneven distributed (rare events) should guide our choice for the right modeling and data prep technique. Or if we detect an independent feature that is highly correlated with the target, then this indicates a good candidate for feature selection. Visualizations are the key technique used within exploratory data analysis, which conversely should give us great preconditions for using Tableau during this stage.

Question

One of the most frequently used visualizations is a scatter plot. It is used for showing the relationship between two continuous features. A simple scatter plot can be easily enriched with more “features” like showing the correlation, marginal densities plots and histograms, groupings as well as trend lines. After reading this blog post by John Mount and Nina Zumelthat shows exactly such an extended version of a scatter plot, I was wondering if the same visualization can be created within Tableau and how we can strengthen Tableau’s advantages regarding interactivity to make the plot even more useful.

 
Roberto Rösler's insight:

my new blog post

more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

The Neural Network Zoo

The Neural Network Zoo | Things about Data Analytics | Scoop.it
With new neural network architectures popping up every now and then, it’s hard to keep track of them all. Knowing all the abbreviations being thrown around (DCIGN, BiLSTM, DCGAN, anyone?) can be a bit overwhelming at first.

So I decided to compose a cheat sheet containing many of those architectures. Most of these are neural networks, some are completely different beasts. Though all of these architectures are presented as novel and unique, when I drew the node structures… their underlying relations started to make more sense.
Roberto Rösler's insight:
Sometimes a picture can say may than lots of books and papers ...
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

#Data16 Keynote: Tableau’s Three-Year Roadmap

#Data16 Keynote: Tableau’s Three-Year Roadmap | Things about Data Analytics | Scoop.it

We, like you, are data people. Our mission is to help people see and understand their data, because we know data can empower people to achieve great things. In short, we work toward our mission so you can achieve yours. It’s what drives all of us at Tableau. To do your best work, you need an analytics platform that allows you to make the most of all your data in your organization. This platform should answer deeper questions and scale as your usage increases all while keeping your data secure. And that’s where we come in. As we shared during the keynote at TC16, every part of our product roadmap is designed to empower you and your entire organization to make better decisions faster with data. Here is a small sample of what we have planned for the next three years.

Roberto Rösler's insight:
Great - tableau linux server, map layers & context analytics
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Visual Business Intelligence – The Myth of Self-Service Analytics

Exploring and analyzing data is not at all like pumping your own gas. We should all be grateful that when gas stations made the transition from full service to self service many years ago, they did not relegate auto repair to the realm of self service as well. Pumping your own gas involves a simple procedure that requires little skill.
Roberto Rösler's insight:
“If you want more human intelligence… get more humans” - Maggie Boden ...
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

How To Convert Tableau Files: The Conversion Tool Explained

Last week I posted a new Tableau conversion tool here for converting Tableau files from one version of Tableau to a previous version of Tableau, for example, converting a Tableau 9.3 file back to 9.2 or a Tableau 10 beta file back to Tableau 9.3. A big thank you to the Tableau community for all of the emails and tweets. Clearly this is something the community has been looking for and I hope it's helpful. I wanted to follow up with another post to explain how the conversion is done, provide further details about the tool itself and give a disclaimer. So here it goes.
Roberto Rösler's insight:
This looks very interesting if you want to get access to a workbook with a Tableau version older than the one it was created with. One reason could be that you want it to upload to a server that runs an older version of Tableau (your production server for example) but you created it with the newest version of Tableau Desktop available.
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

How-to: Detect and Report Web-Traffic Anomalies in Near Real-Time - Cloudera Engineering Blog

How-to: Detect and Report Web-Traffic Anomalies in Near Real-Time - Cloudera Engineering Blog | Things about Data Analytics | Scoop.it

This framework based on Apache Flume, Apache Spark Streaming, and Apache Impala (incubating) can detect and report on abnormal bad HTTP requests within seconds.                     
Website performance and availability are mission-critical for companies of all types and sizes, not just those with a revenue stream directly tied to the web. Web pages can become unavailable for many reasons, including overburdened backing data stores or content-management systems or a delay in load times of third-party content such as advertisements.

more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results

R, Python Duel As Top Analytics, Data Science software – KDnuggets 2016 Software Poll Results | Things about Data Analytics | Scoop.it
Roberto Rösler's insight:
One of the few polls showing a realistic picture of the analytics software landscape and which tools really matter.
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Dataflow/Beam & Spark: A Programming Model Comparison

With the programming model/SDK portion of Google Cloud Dataflow moving into an Apache Software Foundation incubator project, Apache Beam, we thought now a good time to discuss the unique features and capabilities that distinguish Dataflow from Apache Spark, from a strictly programming-model perspective.
Roberto Rösler's insight:
Programming your stream job in one instead of several frameworks and running it with different platforms sounds promising. Waiting for the first release of the python API.
more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Getting rid of “Year of” for Dates

Getting rid of “Year of” for Dates | Things about Data Analytics | Scoop.it

The Tableau blog reposted my Fast Headers for Single Measure Tables post and I got the following question from Pamela Ann in response:

@tableau@jonathandrummey It's so simple! Thanks! Now how do I get a header to stop saying "Year of Year" instead of just "Year"? 

— Pamela Ann (@JustPeachie05) November 20, 2015

Roberto Rösler's insight:

Small feature nice to know

more...
Martin McGaha's curator insight, March 4, 2016 5:34 AM

Small feature nice to know

Scooped by Roberto Rösler
Scoop.it!

Neural Networks, Manifolds, and Topology

Neural Networks, Manifolds, and Topology | Things about Data Analytics | Scoop.it
Roberto Rösler's insight:

Recently, there’s been a great deal of excitement and interest in deep neural networks because they’ve achieved breakthrough results in areas such as computer vision.1

However, there remain a number of concerns about them. One is that it can be quite challenging to understand what a neural network is really doing. If one trains it well, it achieves high quality results, but it is challenging to understand how it is doing so. If the network fails, it is hard to understand what went wrong.

While it is challenging to understand the behavior of deep neural networks in general, it turns out to be much easier to explore low-dimensional deep neural networks – networks that only have a few neurons in each layer. In fact, we can create visualizations to completely understand the behavior and training of such networks. This perspective will allow us to gain deeper intuition about the behavior of neural networks and observe a connection linking neural networks to an area of mathematics called topology.

A number of interesting things follow from this, including fundamental lower-bounds on the complexity of a neural network capable of classifying certain datasets.

more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

From DevOps to DataOps – Data Science Central

From DevOps to DataOps – Data Science Central | Things about Data Analytics | Scoop.it
From DevOps to DataOps I believe that it’s time for data engineers and data scientists to embrace a similar new discipline — let’s call it “DataOps” — that at its core addresses the needs of data professionals on the modern internet and inside the modern enterprise.
Roberto Rösler's insight:
So true...
more...
Scooped by Roberto Rösler
Scoop.it!

Which Algorithm Family Can Answer My Question?

There are a few data science questions that seem to pop up a lot. They’re listed here, together with the best algorithm family. If you don’t see yours or one like it, let us know and we’ll add it. Several of these questions have links to sample experiments or working examples in the Azure ML Marketplace.

Roberto Rösler's insight:

This overview helps in discussion and when getting in touch with a problem for the first time.

more...
No comment yet.
Scooped by Roberto Rösler
Scoop.it!

Data visualization essentials for data scientists, How to properly present a Data Mining project?

Data visualization essentials for data scientists, How to properly present a Data Mining project? | Things about Data Analytics | Scoop.it
Hints on presenting Data Mining project
more...
No comment yet.