Your new post is loading...
Your new post is loading...

Scooped by
David Meza


Scooped by
David Meza

(This article was first published on DataScience.LA » R, and kindly contributed to Rbloggers) The Interview In the video above, Max provides some amazing insights into the why and how of caret, an R package he created.

Scooped by
David Meza

(This article was first published on Rstatistics blog » R, and kindly contributed to Rbloggers) This is a guest post by Stefan Milton, the author of the magrittr package which introduces the %>% operator to R programming.

Scooped by
David Meza

IntI recently found myself a bit stuck. I needed to cluster some data. The distances between the data points were not representable in Euclidean space so I had to use hierarchical clustering. But then...

Scooped by
David Meza

(This article was first published on me nugget, and kindly contributed to Rbloggers) For those of you not familiar with StackOverflow (SO), it's a coder's help forum on the StackExchange website.

Scooped by
David Meza

(This article was first published on ExploringDataBlog, and kindly contributed to Rbloggers) It has been several months since my last post on classification tree models, because two things have been consuming all of my spare time. The...

Scooped by
David Meza

In part 1 of our handson series, we explain why R's a great choice for basic data analysis and visualization work, and how to get started.

Scooped by
David Meza

(This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to Rbloggers) Most of us would acknowledge that getting up to speed with R involves a pretty steep learning curve  but it's worth every drop...
In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a highlevel statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.
Via Ed Stenson

Scooped by
David Meza


Scooped by
David Meza

The basic way to plot a classification or regression tree built with R’s rpart() function is just to call plot. However, in general, the results just aren’t pretty. As it turns out, for some time now there has been a better way to plot rpart() trees: the prp()function in Stephen Milborrow’s rpart.plot package. This function is a veritable “Swiss Army Knife” for plotting trees and the documentation for the package is quite good: in addition to the package pdf, Stephen maintains a very nice and easytoread user manual on his webpage.

Scooped by
David Meza

The first is Applied Predictive Modeling by Max Kuhn and Kjell Johnson. Max Kuhn is the author of the caret package, an extremely useful and powerful R package for fitting and optimizing all kinds of predictive models in R. It's available now on Amazon Kindle and will be published in hardcover by Springer in July. The second is Dynamic Documents with R and knitr by Yihui Xie, the author of the knitr package. With knitr you can easily create beautiful documents and reports, with text, tables and figures all dynamically generated by R. It will also be available in July.


Scooped by
David Meza

A New York Times article yesterday discovers the 8020 rule: that 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. The article gives short shrift to this important task by calling it "janitorial work", but whether you call it data munging, data wrangling or anything else, it's a critical part of the data science. I'm in agreement with Jeffrey Heer, professor of computer science at the University of Washington and a cofounder of Trifacta, who is quoted in the article saying, “It’s an absolute myth that you...

Scooped by
David Meza


Scooped by
David Meza

R is different from C family languages. It has a C syntax, but a Lisp semantics. Programmers from C/C++/Java world would find many usages in R adhoc and need to memorize special cases. This is because they use R from a C's perspective. R is a very elegant language if we unlearn some C concepts and know R’s rules. I am writing several R notes to explain several important R language rules. This is the first note. The atomicity of R vectors The atomic data structure in R is vector. This is so different from any C family language. In C/C++, builtin types such as int and char are atomic data structures while C array (a continuous data block in memory) is obviously not the simplest type. In R, vector is indeed the most basic data structure. There is no scalar data structure in R – you cannot have a scalar int in R as int x = 10 in C. The atomicity of R vectors is written in many documents. The reason that it is usually skipped by R learners is that many R users come from C in which array is a composite data structure. Many seemingly special cases in R language all comes from the atomicity of R vectors. And I will try to cover them coherently. Display x < 10 # equivalent to x < c(10)x # or equivalent to print(x) ## [1] 10 y < c(10, 20)y ## [1] 10 20 What does [1] mean in the output? It means that the output is a vector and from index 1, the result is ... x is a vector of length 1, so its value is [1] 10, while y is a vector of length 2, so its value is [1] 10 20. For a vector with longer length, the output contains more indices to assist human reading: z < 1:25print(z) ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23## [24] 24 25 Vectors with different types Though vectors in R are atomic. There are different vectors: int vector, float vector, complex vector, character vector and logical vector. Int and float vectors are numeric vectors. In above, we have seen int vectors. Let's see more types of vectors below: x < c(1, 2.1)mode(x) ## [1] "numeric" y < c("a", "bb", "123")mode(y) ## [1] "character" z < complex(real = 1, imaginary = 1)mode(z) ## [1] "complex" Notice that in R, string (In R's term: character type) is like int, float, logical types. It is not a vector of chars. R does not differentiate between a character and a sequences of characters. R has a set of special functions such as paste and strsplit for string processing, however R's character type is not a composite type and it is not a vector of chars either! matrix and array Matrix is a vector with augmented properties and this makes matrix an R class. Its core data structure is still a vector. See the example below: y < c(1, 2, 3, 4, 5, 6)x < matrix(y, nrow = 3, ncol = 2)class(x) ## [1] "matrix" rownames(x) < c("A", "B", "C")colnames(x) < c("V1", "V2")attributes(x) ## $dim## [1] 3 2## ## $dimnames## $dimnames[[1]]## [1] "A" "B" "C"## ## $dimnames[[2]]## [1] "V1" "V2" x ## V1 V2## A 1 4## B 2 5## C 3 6 as.vector(x) ## [1] 1 2 3 4 5 6 In R, arrays are less frequently used. A 2D arrays is indeed a matrix. To find more: ?array. We can say that an array/matrix is a vector (augmented with dim and other properties). But we cannot say that a vector is an array. In OOP terminology, array/matrix is a subtype of vector. operators Because the fundamental data structure in R is vector, all the basic operators are defined on vectors. For example, + is indeed vector addition while adding two vectors with length 1 is just a special case. When the lengths of the two vectors are not of the same length, then the shorter one is repeated to the same length as the longer one. For example: x < c(1, 2, 3, 4, 5)y < c(1)x + y # y is repeated to (1,1,1,1,1) ## [1] 2 3 4 5 6 z < c(1, 2)x + z # z is repeated to (1,2,1,2,1), a warning is triggered ## Warning: longer object length is not a multiple of shorter object length ## [1] 2 4 4 6 6 +,,*,/,etc. are vector operators. When they are used on matrices, their semantics are the same when dealing with vectors – a matrix is treated as a long vector concatenated column by column. So do not expect all of them to work properly as matrix operators! For example: x < c(1, 2)y < matrix(1:6, nrow = 2)x * y ## [,1] [,2] [,3]## [1,] 1 3 5## [2,] 4 8 12 For matrix multiplication, we shall use the dedicated operator: x %*% y # 1 x 2 * 2 x 3 = 1 x 3 ## [,1] [,2] [,3]## [1,] 5 11 17 y %*% x # dimension does not match, c(1,2) is a row vector, not a col vector! ## Error: nonconformable arguments The singlecharacter operators are all operated on vectors and would expect generate a vector of the same length. So &, , etc, are vectorwise logic operators. While &&, , etc are special operators that generates a logic vector with length 1 (usually used in IF clauses). x < c(T, T, F)y < c(T, F, F)x & y ## [1] TRUE FALSE FALSE x && y ## [1] TRUE math functions All R math functions take vector inputs and generate vector outputs. For example: exp(1) ## [1] 2.718 exp(c(1)) ## [1] 2.718 exp(c(1, 2)) ## [1] 2.718 7.389 sum(matrix(1:6, nrow = 2)) # matrix is a vector, for row/col sums, use rowSums/colSums ## [1] 21 cumsum(c(1, 2, 3)) ## [1] 1 3 6 which.min(c(3, 1, 2)) ## [1] 2 sqrt(c(3, 2)) ## [1] 1.732 1.414 NA and NULL NA is a valid value. NULL means empty. print(NA) ## [1] NA print(NULL) ## NULL c(NA, 1) ## [1] NA 1 c(NULL, 1) ## [1] 1 *I find Knitr integrated with RStudio IDE is very helpful to write tutorials.

Scooped by
David Meza

(This article was first published on R for Public Health, and kindly contributed to Rbloggers) To leave a comment for the author, please follow the link and comment on his blog: R for Public Health.

Scooped by
David Meza

(This article was first published on Revolutions, and kindly contributed to Rbloggers) By Matt Sundquist Plotly's CoFounder Here at Plotly, we are on a mission to build a platform where data scientists can analyze data, create beautiful...

Scooped by
David Meza

From Togaware. Many of the documents have been developed and tested whilst visiting the Shenzhen Institutes of Technology as an International Visiting Profess…

Scooped by
David Meza

From books to videos to online tutorials  most free!  here are plenty of ideas to burnish your R knowledge.

Scooped by
David Meza

(This article was first published on Statistical Research » R, and kindly contributed to Rbloggers) Whenever I go to the grocery store it always seems to be a lesson in statistics.

Scooped by
David Meza

(This article was first published on Data Community DC » R, and kindly contributed to Rbloggers) Dr.

Scooped by
David Meza

In this blog post, I would walk you through the steps from downloading to installing the RGoogleAnalytics package on your machine.

Scooped by
David Meza

Quality Control is an important part of most workplaces. From manufacturing to software development, you'll be hard pressed to find a business that doesn't have some basic QC practicies. Oftentimes this means one person meticulously inspecting products, looking for imperfections. It can be as painstaking as it is boring. In this post, we're going to show you how to automate this with statistical quality control and R.

Scooped by
David Meza

Big Data is on every CIO’s mind this quarter, and for good reason. Companies will have spent $4.3 billion on Big Data technologies by the end of 2012. But here’s where it gets interesting.

Great resource for those learning to program in R. Check out his book, on sale now.