Data is big
Follow
Find
2.5K views | +5 today
 
Scooped by ukituki
onto Data is big
Scoop.it!

Public Data Sets - Open Science Data Cloud

Repository for public data sets of scientific interest, hosted on the OSDC. 
ukituki's insight:

The data sets below can downloaded over the internet or high performance networks such as Internet2, as well as computed over directly on the OSDC. Currently, the OSDC hosts about 700 TB of data and the plan is to steadily increase this to the petabyte level. If you have suggestions about data that should be included, please let us know at info@opencloudconsortium.org.

more...
No comment yet.

From around the web

Data is big
"The future is here. It's just not evenly distributed yet." - William Gibson     :::: Follow this topic for fresh resources and ideas related to Data Science, Machine Learning, Algorithms and #bigdata ::::
Curated by ukituki
Your new post is loading...
Your new post is loading...
Scooped by ukituki
Scoop.it!

hts with regressors

hts with regressors | Data is big | Scoop.it

The hts pack­age for R allows for fore­cast­ing hier­ar­chi­cal and grouped time series data. The idea is to gen­er­ate fore­casts for all series at all lev­els of aggre­ga­tion with­out impos­ing the aggre­ga­tion con­straints, and then to rec­on­cile the fore­casts so they sat­isfy the aggre­ga­tion con­straints. (An intro­duc­tion to rec­on­cil­ing hier­ar­chi­cal and grouped time series is avail­able in this Fore­sight paper.)

 
more...
No comment yet.
Rescooped by ukituki from Data hacking
Scoop.it!

rBlocks

rBlocks | Data is big | Scoop.it
rBlocks is an attempted port of ipythonblocks to R, to provide a fun and visual tool to explore data structures and control flow.
Via Claudia Mihai
more...
No comment yet.
Rescooped by ukituki from Data hacking
Scoop.it!

The Only Probability Cheatsheet You'll Ever Need

The Only Probability Cheatsheet You'll Ever Need | Data is big | Scoop.it
“ Handy resource for #datascience : a super-condensed probability cheat sheet http://t.co/BdcgkAdgpi”;
Via Claudia Mihai
more...
Yaser Helmy's curator insight, July 21, 10:49 AM

Although I have been a practicing data scientist for years now, I have actually understood some concepts from this sheet!

 

Loved it.

Scooped by ukituki
Scoop.it!

Tutorials from Conference on Knowledge Discovery and Data Mining (KDD), New York 2014

Tutorials from Conference on Knowledge Discovery and Data Mining (KDD), New York 2014 | Data is big | Scoop.it
Conference on Knowledge Discovery and Data Mining (KDD)
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Machine Learning Work-Flow

Machine Learning Work-Flow | Data is big | Scoop.it
So far, I am planning to write a serie of posts explaining a basic Machine Learning work-flow (mostly supervised). In this post, my target is to propose the bird-eye view, as I'll dwell into details at the latter posts explaining each of the components in detail. I decide to write this serie due to two reasons; the first reason is self-education -to get all my bits and pieces together after a period of theoretical research and industrial practice- the second is to present a naive guide to beginn
more...
luiy's curator insight, October 13, 1:43 PM

Each box has a color tone from YELLOW to RED. The yellower the box, the more this component relies on Statistics knowledge base. As the box turns into red[gets darker], the component depends more heavily on Machine Learning knowledge base. By saying this, I also imply that, without good statistical understanding, we are not able to construct a convenient machine learning pipeline. As a footnote, this schema is changed by post-modernism of Representation Learning algorithms and I'll touch this at the latter posts.

Scooped by ukituki
Scoop.it!

An Introduction to Feature Selection | Machine Learning Mastery

An Introduction to Feature Selection | Machine Learning Mastery | Data is big | Scoop.it

Which features should you use to create a predictive model? This is a difficult question that may require deep knowledge of the problem domain. 

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Big Data Frameworks

Big Data Frameworks | Data is big | Scoop.it
ukituki's insight:

“Big-data” is one of the most inflated buzzword of the last years. Technologies born to handle huge datasets and overcome limits of previous products are gaining popularity outside the research environment. The following list would be a reference of this world. It’s still incomplete and always will be.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Data wrangling, exploration, and analysis with R

Data wrangling, exploration, and analysis with R | Data is big | Scoop.it

Click here to edit the title

ukituki's insight:

Learn how to

explore, groom, visualize, and analyze data,make all of that reproducible, reusable, and shareableusing R
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Learn R : 12 Books and Online Resources - YOU CANalytics

Learn R : 12 Books and Online Resources - YOU CANalytics | Data is big | Scoop.it
R, an open-source statistical and data mining programming language, is slowly but surely catching up in its race with commercial software like SAS & SPSS. I believe R will eventually replace SAS as the language of choice for modeling and analysis for most organizations. The primary reason for this is plainly commercial. Most organizations areRead More...
more...
No comment yet.
Rescooped by ukituki from Data Nerd's Corner
Scoop.it!

Using data science to build better products

Using data science to build better products | Data is big | Scoop.it
Data science is about extracting knowledge from data and creating practical, actionable insights to improve some facet of a business

Via Carla Gentry CSPO
more...
Carla Gentry CSPO's curator insight, September 23, 6:16 AM

Making sure that data insights are useful to people who don't think about machine learning all day is super important, since the beneficiary of data science work is often a front-line employee, a customer/user or another non-technical stakeholder. For this reason--at least for us at Yhat--data science and product-building go hand-in-hand. Sure, we're data junkies and enjoy walking the parameter space as much as the next guy or girl. But the kicker for us in any data analysis project is the "why".

Rescooped by ukituki from Data hacking
Scoop.it!

Hidden Markov Models in R

Hidden Markov Models in R | Data is big | Scoop.it
The general idea of a HMM is easy enough to understand: one observes some time series or stochastic process and imagines that it has been generated by an unobserved or "hidden" Markov process. However, the details of formulating and fitting a HMM involve some specialized knowledge, and the sophisticated tools available to develop a HMM in R can add an additional level of complexity. Joe’s presentation helps a beginner to dive right in. He briefly states what HMMs are all about, presents some practical examples, and then goes on to show how to use the functions in the very powerful depmixS4 package to fit an HMM model to a time series of S&P 500 returns.
Via Claudia Mihai
more...
No comment yet.
Rescooped by ukituki from Data hacking
Scoop.it!

Agent Based Models and RNetLogo

Agent Based Models and RNetLogo | Data is big | Scoop.it
If I had to pick just one application to be the “killer app” for the digital computer I would probably choose Agent Based Modeling (ABM). Imagine creating a world populated with hundreds, or even thousands of agents, interacting with each other and with the environment according to their own simple rules. What kinds of patterns and behaviors would emerge if you just let the simulation run? Could you guess a set of rules that would mimic some part of the real world? This dream is probably much older than the digital computer, but according to Jan Thiele’s brief account of the history of ABMs that begins his recent paper, R Marries NetLogo: Introduction to the RNetLogo Package in the Journal of Statistical Software, academic work with ABMs didn’t really take off until the late 1990s. Now, people are using ABMs for serious studies in economics, sociology, ecology, socio-psychology, anthropology, marketing and many other fields. No less of a complexity scientist than Doyne Farmer (of Dynamic Systems and Prediction Company fame) has argued in Naturefor using ABMs to model the complexity of the US economy, and has published on using ABMs to drive investment models. in the following clip of a 2006 interview, Doyne talks about building ABMs to explain the role of subprime mortgages on the Housing Crisis. (Note that when asked about how one would calibrate such a model Doyne explains the need to collect massive amounts of data on individuals.)
Via Claudia Mihai
more...
No comment yet.
Scooped by ukituki
Scoop.it!

PCA Revealed | Gaston Sanchez

PCA Revealed | Gaston Sanchez | Data is big | Scoop.it

Principal Components Analysis (PCA) is one of the basic multivariate data analysis methods. 

ukituki's insight:

Principal Components Analysis (PCA) is one of the basic multivariate data analysis methods. PCA Revealed aims to help you understand the basics of PCA in an intuitive and simple way, and how to apply it in R.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Visualizing Payments to Doctors With Package Statebins

Visualizing Payments to Doctors With Package Statebins | Data is big | Scoop.it

Introduction

The purpose of this post is to explain how to graph topological data with
the statebins package. To do this we will play with General Payment Data
for non-research/ownership payments to physicians and teaching hospitals.
This data was recently released and, in short, contains the data for
“gifts” pharma companies and others give to doctors and teaching hospitals
because they are just great people.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

IDEAL MOOC

IDEAL MOOC | Data is big | Scoop.it
The IDEAL MOOC will teach you the cognitive science background and the programming bases to design robots and virtual agents capable of autonomous cognitive development driven by their intrinsic motivation. More than that, it will offer a place to discuss research in Developmental AI.
more...
No comment yet.
Scooped by ukituki
Scoop.it!

10 most popular data science presentations on Slideshare

10 most popular data science presentations on Slideshare | Data is big | Scoop.it
These presentations have been viewed between 14,000 times (for #10) and 75,000 times (for #1 - see below), though pageview numbers are subject to manipulations…
more...
No comment yet.
Scooped by ukituki
Scoop.it!

Webscope | Yahoo Labs

Webscope | Yahoo Labs | Data is big | Scoop.it
ukituki's insight:

We have various types of data available to share. They are categorized into Ratings, Language, Graph, Advertising and Market Data, Computing Systems and an appendix of other relevant data and resources available via the Yahoo Developer Network.

more...
No comment yet.
Scooped by ukituki
Scoop.it!

Expert Big Data Tips

Whether you are interested in healthcare data analytics or looking to get started with big data and marketing, these fundamental principles from data experts w…
more...
No comment yet.
Scooped by ukituki
Scoop.it!

In-depth introduction to machine learning in 15 hours of expert videos

In-depth introduction to machine learning in 15 hours of expert videos | Data is big | Scoop.it
In January 2014, Stanford University professors Trevor Hastie and Rob Tibshirani (authors of the legendary Elements of Statistical Learning textbook) taught an online course based on their newest textbook, An Introduction to Statistical Learning with Applications in R (ISLR). I found it to be an excellent course in statistical learning...
more...
No comment yet.