Data Analytics
29 views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by Robinson Worley
Scoop.it!

New analytics could help measure content efficacy - eSchool News (registration)

New analytics could help measure content efficacy - eSchool News (registration) | Data Analytics | Scoop.it
New analytics could help measure content efficacy eSchool News (registration) analytics Learning analytics have become a key feature within many school software programs.
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

How Big Data Analytics is Speeding Cancer Tumor Profiling - A Smarter Planet Blog

How Big Data Analytics is Speeding Cancer Tumor Profiling - A Smarter Planet Blog | Data Analytics | Scoop.it
By Dr. George Poste Cancer is a formidable foe. Oncologists have long known that cancers arising in different body organs, or in the same organ in different patients, progress and respond differently to treatment.
Robinson Worley's insight:
Combine this signature technology with ubiquitous electronic medical records and you would have a powerful weapon in the didn't against cancer.
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

How to Turn your Data into a New Product

How to Turn your Data into a New Product | Data Analytics | Scoop.it
Fast Forward Labs founder Hillary Mason spoke at Tableau’s user conference about understanding your data and the five questions to ask to move from data to product.
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

Algorithms Make Better Predictions — Except When They Don’t

Algorithms Make Better Predictions — Except When They Don’t | Data Analytics | Scoop.it
Analytics can't replace intuition.
more...
No comment yet.
Rescooped by Robinson Worley from Distributed Architectures
Scoop.it!

Call me maybe: Elasticsearch

Call me maybe: Elasticsearch | Data Analytics | Scoop.it

Previously, on Jepsen, we saw RabbitMQ throw away a staggering volume of data. In this post, we’ll explore Elasticsearch’s behavior under various types of network failure.

 

Elasticsearch is a distributed search engine, built around Apache Lucene–a well-respected Java indexing library. Lucene handles the on-disk storage, indexing, and searching of documents, while ElasticSearch handles document updates, the API, and distribution. Documents are written to collections as free-form JSON; schemas can be overlaid onto collections to specify particular indexing strategies.

 

As with many distributed systems, Elasticsearch scales in two axes: sharding and replication. The document space is sharded–sliced up–into many disjoint chunks, and each chunk allocated to different nodes. Adding more nodes allows Elasticsearch to store a document space larger than any single node could handle, and offers quasilinear increases in throughput and capacity with additional nodes. For fault-tolerance, each shard is replicated to multiple nodes. If one node fails or becomes unavailable, another can take over. There are additional distinctions between nodes which can process writes, and those which are read-only copies–termed “data nodes”–but this is primarily a performance optimization.


Because index construction is a somewhat expensive process, Elasticsearch provides a faster, more strongly consistent database backed by a write-ahead log. Document creation, reads, updates, and deletes talk directly to this strongly-consistent database, which is asynchronously indexed into Lucene. Search queries lag behind the “true” state of Elasticsearch records, but should eventually catch up. One can force a flush of the transaction log to the index, ensuring changes written before the flush are made visible.


 

But this is Jepsen, where nothing works the way it’s supposed to. Let’s give this system’s core assumptions a good shake and see what falls out!


Via Nico
more...
Nico's curator insight, June 20, 2014 10:54 AM

This article echoes the trouble we had with Elasticsearch partition tolerance. Our fix was to decrease the probability of a partition: running on high end servers.

Rescooped by Robinson Worley from Programming Stuffs
Scoop.it!

How to Become a Data Scientist

How to Become a Data Scientist | Data Analytics | Scoop.it

Summary:  If you are wondering how to become a Data Scientist or what that title really means, try these insights.

I got started in data science way back.  I’ve been a commercial predictive modeler since 2001 and as naming trends have changed I now identify myself as a Data Scientist.  No one gave me this title.  But by observing the literature, the job listings, and my peers in the field it was clear that Data Scientist communicated most clearly what my knowledge and experience have led me to become.

These days you can get a degree in data science so you can show your diploma that certifies your credentials.  But these are relatively new so, with all due respect, if you only recently got your degree you are still a beginner.  Those of us who use this title today most likely came from combination backgrounds of business, hard science, computer science, operations research, and statistics.

What you call yourself is one thing but what your employer or client is looking for can be quite a different kettle of fish.  A lot has been written about data scientists being as elusive as unicorns.  Not being a unicorn I’d say this sets the bar pretty high.  Additionally, as I’ve perused the job listings it is equally true that the title is used so loosely and with such little understanding that an ad for data scientist may actually describe an entry level analyst and some ads for analysts are looking for polymath data scientists. 

All of this confusion over what we’re called and what we actually do can make you down right schizophrenic.  This makes it all the more complicated to answer the frequent inquiries I get from folks still in school or early in their career about how to become a data scientist.

Imagine my surprise and delight when in the space of a week two publications came across my desk that not only cast new light and understanding on this question but also have helped me understand that there is not just one definition of data scientist, but a reasoned argument (based on statistical analysis) that there are in fact four types.

Four Types of Data Scientists

The information here comes from the O’Reilly paper “Analyzing the Analyzers” by Harris, Murphy, and Vaisman, 2013.  My hat’s off to these folks for their insightful survey and conclusions drawn by statistical analysis of those results.  This is a must read.  I was able to download this at no charge from http://www.oreilly.com/data/free/analyzing-the-analyzers.csp.

There are 40 pages of good analysis here so this will be only the highest level summary.  In short, they conclude there are four types of Data Scientists differentiated not so much by the breadth of knowledge, which is similar, but their depth in specific areas and how each type prefers to interact with data science problems.

Data Businesspeople

Data Creatives

Data Developers

Data Researchers

By evaluating 22 specific skills and multi-part self-identification statements they cluster and generalize according to these descriptions.  I am betting you will recognize yourself in one of these categories.

Data Businesspeople are those that are most focused on the organization and how data projects yield profit. They were most likely to rate themselves highly as leaders and entrepreneurs, and the most likely to have reported managing an employee. They were also quite likely to have done contract or consulting work, and a substantial proportion have started a business. Although they were the least likely to have an advanced degree among respondents, they were the most likely to have an MBA. But Data Businesspeople definitely have technical skills and were particularly likely to have undergraduate Engineering degrees. And they work with real data — about 90% report at least occasionally working on gigabyte-scale problems. 

Data Creatives.  Data scientists can often tackle the entire soup-to-nuts analytics process on their own: from extracting data, to integrating and layering it, to performing statistical or other advanced analyses, to creating compelling visualizations and interpretations, to building tools to make the analysis scalable and broadly applicable. We think of Data Creatives as the broadest of data scientists, those who excel at applying a wide range of tools and technologies to a problem, or creating innovative prototypes at hackathons — the quintessential Jack of All Trades. They have substantial academic experience with about three-quarters having taught classes and presented papers. Common undergraduate degrees were in areas like Economics and Statistics. Relatively few Data Creatives have a PhD. As the group most likely to identify as a Hacker they also had the deepest Open Source experience with about half contributing to OSS projects and about half working on Open Data projects.

Data Developer.  We think of Data Developers as people focused on the technical problem of managing data — how to get it, store it, and learn from it. Our Data Developers tended to rate themselves fairly highly as Scientists, although not as highly as Data Researchers did. This makes sense particularly for those closely integrated with the Machine Learning and related academic communities. Data Developers are clearly writing code in their day-to-day work. About half have Computer Science or Computer Engineering degrees.  More Data Developers land in the Machine Learning/ Big Data skills group than other types of data scientist.

Data Researchers.  One of the interesting career paths that leads to a title like “data scientist” starts with academic research in the physical or social sciences, or in statistics. Many organizations have realized the value of deep academic training in the use of data to understand complex processes, even if their business domains may be quite different from classic scientific fields. The majority of respondents whose top Skills Group was Statistics ended up in this category. Nearly 75% of Data Researchers have published in peer-reviewed journals and over half have a PhD.

What Does this Mean for Someone Seeking to Enter the Field?

So if I am a young person seeking to enter Data Science how are these descriptions useful?  It’s possible that you could train and develop an emphasis that would lead you into the Researcher, Developer, or Creative roles.  It is less likely that education alone will put you on the Businesspeople track which implies experiences in business, not just education.  But here’s what’s interesting.  According to Harris, Murphy, and Vaisman it’s not the skills that are different but the way we choose to emphasize them in our approach to Data Science problems.  Here’s their chart.


Via Alex Kantone
more...
vitonzhang's curator insight, October 25, 12:06 AM
Share your insight
Scooped by Robinson Worley
Scoop.it!

Hadoop at a Crossroads?

Hadoop at a Crossroads? | Data Analytics | Scoop.it
A few facts and opinions and a couple of announcements, with a prediction on where the "Hadoop stack" might be going. (#Hadoop at a Crossroads?
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

Go Hadoop! Err, Hadoop and Go. - Hortonworks

Go Hadoop! Err, Hadoop and Go. - Hortonworks | Data Analytics | Scoop.it
Working with Hadoop and the Go language. (#Golang and #hadoop via YARN http://t.co/sXr9cIar0q)
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

Hadoop YARN Installation: The definitive guide

Hadoop YARN Installation: The definitive guide | Data Analytics | Scoop.it
This article guides you in the installation of the new generation Hadoop based on YARN. It is based on the most recent version of Hadoop at the time of this writing (2.2.0) and includes HDFS, YARN and MapReduce configurations for both single-node and cluster environments.
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

13 Insanely Useful Predictive Analytics Resources - Radius

13 Insanely Useful Predictive Analytics Resources - Radius | Data Analytics | Scoop.it
Digest... 1. TDWI Best Practices Report | Predictive Analytics for Business Advantage 2. Five Steps to Master Big Data and Predictive Analytics in 2014 4. Business Analytics: Moving From Descriptive to Predictive Analytics 5. Getting Analytics Right From the Start 6. Focus On: Predictive Analytics 10. Predictive Analytics Guide __________________ ► Receive a FREE daily summary of The Marketing Technology Alert directly to your inbox. To subscribe, please go to http://ineomarketing.com/About_The_MAR_Sub.html (your privacy is protected).
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

IBM Big Data in a Minute: Spotting trends with big data | The Big Data Hub

IBM Big Data in a Minute: Spotting trends with big data | The Big Data Hub | Data Analytics | Scoop.it
RT @IBMAnalytics: Video: How to look though your data to uncover trends http://t.co/i2mJzW3zSE #bigdata #analytics
Robinson Worley's insight:
scouts and big data. awesome.
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

Build a Single Page Application with Angular, Node & Mongo – Part II | Kevin Delemme - Software Engineer. Geek. JavaScript Addict

Build a Single Page Application with Angular, Node & Mongo – Part II | Kevin Delemme - Software Engineer. Geek. JavaScript Addict | Data Analytics | Scoop.it
RT @serebro: Build a Single Page Application with #angular, #node & #mongodb http://t.co/jK6rdoTspR
more...
No comment yet.
Rescooped by Robinson Worley from Distributed Architectures
Scoop.it!

Apache Mahout, Hadoop's original machine learning project, is moving on from MapReduce

Apache Mahout, Hadoop's original machine learning project, is moving on from MapReduce | Data Analytics | Scoop.it
The Apache Mahout project will now support Apache Spark and another data engine called H20 as it tries to retain its status as the go-to set of machine learning libraries for Hadoop.

Via Nico
more...
Nico's curator insight, March 28, 2014 8:03 AM

Mahout on Spark, that's good news!

Rescooped by Robinson Worley from BigData NoSql and Data Stuff
Scoop.it!

Machine Learning: Sentiment Analysis Algorithm Tutorial in JavaScript

Machine Learning: Sentiment Analysis Algorithm Tutorial in JavaScript | Data Analytics | Scoop.it

This article demonstrates a simple but effective sentiment analysis algorithm built on top of the Naive Bayes classifier I demonstrated in the last ML in JS article. I’ll go over some basic sentiment analysis concepts and then discuss how a Naive Bayes classifier can be modified for sentiment analysis. If you’re just looking for the summary and code demonstration, jump down.

Introduction

Compared to the other major machine learning tasks, sentiment analysis has surprisingly little written about it. This is in part because sentiment analysis (and natural language in general) is difficult, but I also suspect that many people opt to monetize their sentiment algorithms rather than publishing them.

Sentiment analysis is highly applicable to enterprise business (“alert me any time someone writes something negative about one of our products anywhere on the internet!”), and sentiment analysis is fresh enough that there’s a lot of “secret sauce” still out there. And now that the world is run on social sites like Twitter, Facebook, Yelp, and Amazon (Reviews), there are literally billions of data points that can be analyzed every single day. So it’s hard to blame anyone for trying to get a piece of that pie!

Today we’ll discuss an “easy” but effective sentiment analysis algorithm. I put “easy” in quotes because our work today will build upon the Naive Bayes classifer we talked about last time; but this will only be “easy” if you’re familiar with those concepts already–you may want to review that article before continuing.

There are better, more sophisticated algorithms than the one we’ll develop today, but this is a great place to start. As always, I’ll follow up with additional articles that cover more advanced topics in sentiment analysis in the future.

I found a nice, compact data set for this experiment. It consists of 10,662 sentences from movie reviews, labeled “positive” and “negative” (and split evenly). The primary goal today is to figure out how to tweak the Naive Bayes classifier so that it works well for sentiment analysis.


Via Alex Kantone
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

5 ways to add machine learning to Java, JavaScript, and more

5 ways to add machine learning to Java, JavaScript, and more | Data Analytics | Scoop.it
Here are the best libraries for four languages, plus Java on Hadoop, to help you turn machine learning into a business tool (To add #MachineLearning: for Python, scikit-learn; for Hadoop: Mahout; for Java: Weka; for JavaScript: ConvNetJS
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

Seth's Blog: Analytics without action

Seth's Blog: Analytics without action | Data Analytics | Scoop.it
Don't measure anything unless the data helps you make a better decision or change your actions. If you're not prepared to change your diet or your workouts, don't get on the scale.
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

The CIO and CMO Perspective on Big Data

The CIO and CMO Perspective on Big Data | Data Analytics | Scoop.it
CMOs now command more of the tech budget than any other executive outside of the CIO. With big data being one of the main drivers of technology spending, a strong relationship between IT and marketing is critical to business success.
more...
No comment yet.
Scooped by Robinson Worley
Scoop.it!

The Internet Of Things Will Radically Change Your Big Data Strategy

The Internet Of Things Will Radically Change Your Big Data Strategy | Data Analytics | Scoop.it
Companies are jumping on the Internet of Things (IoT) bandwagon and for good reasons. McKinsey Global Institute reports that the IoT business will deliver $6.2 trillion of revenue by 2025.
Robinson Worley's insight:
I am living the DIY pain right now. Database as a service sounds nice.
more...
No comment yet.