R You Learning
451 views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by David Meza
Scoop.it!

A Conversation with Max Kuhn – The useR! 2014 Interview

A Conversation with Max Kuhn – The useR! 2014 Interview | R You Learning | Scoop.it
(This article was first published on DataScience.LA » R, and kindly contributed to R-bloggers)
The Interview
In the video above, Max provides some amazing insights into the why and how of caret, an R package he created.
David Meza's insight:

I had an opportunity to attend a lecture given by Max. Great information and his book is a must read for those learning machine learning techniques.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Simpler R coding with pipes > the present and future of the magrittr package

Simpler R coding with pipes > the present and future of the magrittr package | R You Learning | Scoop.it
(This article was first published on R-statistics blog » R, and kindly contributed to R-bloggers)
This is a guest post by Stefan Milton, the author of the magrittr package which introduces the %>% operator to R programming.
David Meza's insight:

If you are an R programmer, you need to look at the magrittr package. It will make you life easier.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Buster - a new R package for bagging hierarchical clustering

Buster - a new R package for bagging hierarchical clustering | R You Learning | Scoop.it

IntI recently found myself a bit stuck. I needed to cluster some data. The distances between the data points were not representable in Euclidean space so I had to use hierarchical clustering. But then...

David Meza's insight:

A new package that may be of use to many. Still needs some refining, however it is a good take to a common issue.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Automated determination of distribution groupings – A StackOverflow collaboration

Automated determination of distribution groupings – A StackOverflow collaboration | R You Learning | Scoop.it
(This article was first published on me nugget, and kindly contributed to R-bloggers)
For those of you not familiar with StackOverflow (SO), it's a coder's help forum on the StackExchange website.
David Meza's insight:

StackOverFlow - A great way to ask for help and collaborate on a useful solution.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

A question of model uncertainty

A question of model uncertainty | R You Learning | Scoop.it
(This article was first published on ExploringDataBlog, and kindly contributed to R-bloggers) It has been several months since my last post on classification tree models, because two things have been consuming all of my spare time.  The...
more...
No comment yet.
Scooped by David Meza
Scoop.it!

Beginner's guide to R: Introduction

Beginner's guide to R: Introduction | R You Learning | Scoop.it
In part 1 of our hands-on series, we explain why R's a great choice for basic data analysis and visualization work, and how to get started.
David Meza's insight:

Now...for those new to R, here is some help for you.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

swirl: Learning Statistics & R

swirl: Learning Statistics & R | R You Learning | Scoop.it
(This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to R-bloggers) Most of us would acknowledge that getting up to speed with R involves a pretty steep learning curve - but it's worth every drop...
David Meza's insight:

Learning, teching or developing in R, you need to check this out.

more...
Umesh Acharya's curator insight, October 22, 2014 2:49 AM

Very interesting package in R for learning data analysis

Rescooped by David Meza from Data in Social Media
Scoop.it!

Computing for Data Analysis

Computing for Data Analysis | R You Learning | Scoop.it

In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.


Via Ed Stenson
David Meza's insight:

I have taken several courses on Coursera, all have been informative. This hould be a good course for those wanting to learn R.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Draw nicer Classification and Regression Trees with the rpart.plot package

Draw nicer Classification and Regression Trees with the rpart.plot package | R You Learning | Scoop.it

The basic way to plot a classification or regression tree built with R’s rpart() function is just to call plot. However, in general, the results just aren’t pretty. As it turns out, for some time now there has been a better way to plot rpart() trees: the prp()function in Stephen Milborrow’s rpart.plot package. This function is a veritable “Swiss Army Knife” for plotting trees and the documentation for the package is quite good: in addition to the package pdf, Stephen maintains a very nice and easy-to-read user manual on his webpage.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Two forthcoming R books

Two forthcoming R books | R You Learning | Scoop.it

The first is Applied Predictive Modeling by Max Kuhn and Kjell Johnson. Max Kuhn is the author of the caret package, an extremely useful and powerful R package for fitting and optimizing all kinds of predictive models in R. It's available now on Amazon Kindle and will be published in hardcover by Springer in July.

The second is Dynamic Documents with R and knitr by Yihui Xie, the author of the knitr package. With knitr you can easily create beautiful documents and reports, with text, tables and figures all dynamically generated by R. It will also be available in July.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Data Cleaning is a critical part of the Data Science process

Data Cleaning is a critical part of the Data Science process | R You Learning | Scoop.it
A New York Times article yesterday discovers the 80-20 rule: that 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. The article gives short shrift to this important task by calling it "janitorial work", but whether you call it data munging, data wrangling or anything else, it's a critical part of the data science. I'm in agreement with Jeffrey Heer, professor of computer science at the University of Washington and a co-founder of Trifacta, who is quoted in the article saying, “It’s an absolute myth that you...
David Meza's insight:

For those of you new to data science, remember, data munging is a tedious yet crucial part of the process. Do not cut corners, your research will suffer.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Topic Modeling In R

Topic Modeling In R | R You Learning | Scoop.it
David Meza's insight:

For those interesting in text mining. look at this body of work. I can thins of some great uses, how about you..

more...
No comment yet.
Scooped by David Meza
Scoop.it!

R Notes: vectors

R Notes: vectors | R You Learning | Scoop.it
  R is different from C family languages. It has a C syntax, but a Lisp semantics. Programmers from C/C++/Java world would find many usages in R adhoc and need to memorize special cases. This is because they use R from a C's perspective. R is a very elegant language if we unlearn some C concepts and know R’s rules. I am writing several R notes to explain several important R language rules. This is the first note.   The atomicity of R vectors The atomic data structure in R is vector. This is so different from any C family language. In C/C++, built-in types such as int and char are atomic data structures while C array (a continuous data block in memory) is obviously not the simplest type. In R, vector is indeed the most basic data structure. There is no scalar data structure in R – you cannot have a scalar int in R as int x = 10 in C. The atomicity of R vectors is written in many documents. The reason that it is usually skipped by R learners is that many R users come from C in which array is a composite data structure. Many seemingly special cases in R language all comes from the atomicity of R vectors. And I will try to cover them coherently. Display x <- 10 # equivalent to x <- c(10)x # or equivalent to print(x)
## [1] 10
y <- c(10, 20)y
## [1] 10 20
What does [1] mean in the output? It means that the output is a vector and from index 1, the result is ... x is a vector of length 1, so its value is [1] 10, while y is a vector of length 2, so its value is [1] 10 20. For a vector with longer length, the output contains more indices to assist human reading:
z <- 1:25print(z)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23## [24] 24 25
Vectors with different types
Though vectors in R are atomic. There are different vectors: int vector, float vector, complex vector, character vector and logical vector. Int and float vectors are numeric vectors. In above, we have seen int vectors. Let's see more types of vectors below:
x <- c(1, 2.1)mode(x)
## [1] "numeric"
y <- c("a", "bb", "123")mode(y)
## [1] "character"
z <- complex(real = 1, imaginary = 1)mode(z)
## [1] "complex"
Notice that in R, string (In R's term: character type) is like int, float, logical types. It is not a vector of chars. R does not differentiate between a character and a sequences of characters. R has a set of special functions such as paste and strsplit for string processing, however R's character type is not a composite type and it is not a vector of chars either!
matrix and array
Matrix is a vector with augmented properties and this makes matrix an R class. Its core data structure is still a vector. See the example below:
y <- c(1, 2, 3, 4, 5, 6)x <- matrix(y, nrow = 3, ncol = 2)class(x)
## [1] "matrix"
rownames(x) <- c("A", "B", "C")colnames(x) <- c("V1", "V2")attributes(x)
## $dim## [1] 3 2## ## $dimnames## $dimnames[[1]]## [1] "A" "B" "C"## ## $dimnames[[2]]## [1] "V1" "V2"
x
## V1 V2## A 1 4## B 2 5## C 3 6
as.vector(x)
## [1] 1 2 3 4 5 6
In R, arrays are less frequently used. A 2D arrays is indeed a matrix. To find more: ?array. We can say that an array/matrix is a vector (augmented with dim and other properties). But we cannot say that a vector is an array. In OOP terminology, array/matrix is a subtype of vector.
operators
Because the fundamental data structure in R is vector, all the basic operators are defined on vectors. For example, + is indeed vector addition while adding two vectors with length 1 is just a special case.
When the lengths of the two vectors are not of the same length, then the shorter one is repeated to the same length as the longer one. For example:
x <- c(1, 2, 3, 4, 5)y <- c(1)x + y # y is repeated to (1,1,1,1,1)
## [1] 2 3 4 5 6
z <- c(1, 2)x + z # z is repeated to (1,2,1,2,1), a warning is triggered
## Warning: longer object length is not a multiple of shorter object length
## [1] 2 4 4 6 6
+,-,*,/,etc. are vector operators. When they are used on matrices, their semantics are the same when dealing with vectors – a matrix is treated as a long vector concatenated column by column. So do not expect all of them to work properly as matrix operators! For example:
x <- c(1, 2)y <- matrix(1:6, nrow = 2)x * y
## [,1] [,2] [,3]## [1,] 1 3 5## [2,] 4 8 12
For matrix multiplication, we shall use the dedicated operator:
x %*% y # 1 x 2 * 2 x 3 = 1 x 3
## [,1] [,2] [,3]## [1,] 5 11 17
y %*% x # dimension does not match, c(1,2) is a row vector, not a col vector!
## Error: non-conformable arguments
The single-character operators are all operated on vectors and would expect generate a vector of the same length. So &, |, etc, are vector-wise logic operators.  While &&, ||, etc are special operators that generates a logic vector with length 1 (usually used in IF clauses).
x <- c(T, T, F)y <- c(T, F, F)x & y
## [1] TRUE FALSE FALSE
x && y
## [1] TRUE
math functions
All R math functions take vector inputs and generate vector outputs. For example:
exp(1)
## [1] 2.718
exp(c(1))
## [1] 2.718
exp(c(1, 2))
## [1] 2.718 7.389
sum(matrix(1:6, nrow = 2)) # matrix is a vector, for row/col sums, use rowSums/colSums
## [1] 21
cumsum(c(1, 2, 3))
## [1] 1 3 6
which.min(c(3, 1, 2))
## [1] 2
sqrt(c(3, 2))
## [1] 1.732 1.414

NA and NULL

NA is a valid value. NULL means empty.
print(NA)
## [1] NA
print(NULL)
## NULL
c(NA, 1)
## [1] NA 1
c(NULL, 1)
## [1] 1
 
 
*I find Knitr integrated with RStudio IDE is very helpful to write tutorials.
David Meza's insight:

Useful for those new to R and have a programming background.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

How to write and debug an R function

How to write and debug an R function | R You Learning | Scoop.it
(This article was first published on R for Public Health, and kindly contributed to R-bloggers)

To leave a comment for the author, please follow the link and comment on his blog: R for Public Health.
David Meza's insight:

For those new to R, functions are a pivotal part of your learning. This article provides good information.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Plotly and rOpenSci: Make ggplots shareable and interactive.

Plotly and rOpenSci: Make ggplots shareable and interactive. | R You Learning | Scoop.it
(This article was first published on Revolutions, and kindly contributed to R-bloggers) By Matt Sundquist Plotly's Co-Founder Here at Plotly, we are on a mission to build a platform where data scientists can analyze data, create beautiful...
more...
No comment yet.
Scooped by David Meza
Scoop.it!

One Page R: A Survival Guide to Data Science with R

One Page R: A Survival Guide to Data Science with R | R You Learning | Scoop.it
From Togaware.

Many of the documents have been developed and tested whilst visiting the Shenzhen Institutes of Technology as an International Visiting Profess…
more...
No comment yet.
Scooped by David Meza
Scoop.it!

60+ R resources to improve your data skills

60+ R resources to improve your data skills | R You Learning | Scoop.it
From books to videos to online tutorials -- most free! -- here are plenty of ideas to burnish your R knowledge.
David Meza's insight:

Great article for those who have started using R and want to learn more. 

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Waiting in One Line or Multiple Lines

Waiting in One Line or Multiple Lines | R You Learning | Scoop.it
(This article was first published on Statistical Research » R, and kindly contributed to R-bloggers)
Whenever I go to the grocery store it always seems to be a lesson in statistics.
David Meza's insight:

An interesting way to use R in your everday life. 

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Fantastic presentations from R using slidify and rCharts

Fantastic presentations from R using slidify and rCharts | R You Learning | Scoop.it
(This article was first published on Data Community DC » R, and kindly contributed to R-bloggers)
Dr.
David Meza's insight:

Want to add a little style to your data presentations, check this out.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Installing the RGoogleAnalytics package

Installing the RGoogleAnalytics package | R You Learning | Scoop.it
In this blog post, I would walk you through the steps from downloading to installing the RGoogleAnalytics package on your machine.
more...
No comment yet.
Scooped by David Meza
Scoop.it!

Statistical Quality Control in R

Statistical Quality Control in R | R You Learning | Scoop.it

Quality Control is an important part of most workplaces. From manufacturing to software development, you'll be hard pressed to find a business that doesn't have some basic QC practicies. Oftentimes this means one person meticulously inspecting products, looking for imperfections. It can be as painstaking as it is boring.

In this post, we're going to show you how to automate this with statistical quality control and R.

more...
No comment yet.
Scooped by David Meza
Scoop.it!

Big Data Right Now: Five Trendy Open Source Technologies | TechCrunch

Big Data Right Now: Five Trendy Open Source Technologies  | TechCrunch | R You Learning | Scoop.it
Big Data is on every CIO’s mind this quarter, and for good reason. Companies will have spent $4.3 billion on Big Data technologies by the end of 2012.

But here’s where it gets interesting.
more...
No comment yet.