(This article was first published on DataScience.LA » R, and kindly contributed to R-bloggers) The Interview In the video above, Max provides some amazing insights into the why and how of caret, an R package he created.
David Meza's insight:
I had an opportunity to attend a lecture given by Max. Great information and his book is a must read for those learning machine learning techniques.
(This article was first published on R-statistics blog » R, and kindly contributed to R-bloggers) This is a guest post by Stefan Milton, the author of the magrittr package which introduces the %>% operator to R programming.
David Meza's insight:
If you are an R programmer, you need to look at the magrittr package. It will make you life easier.
IntI recently found myself a bit stuck. I needed to cluster some data. The distances between the data points were not representable in Euclidean space so I had to use hierarchical clustering. But then...
David Meza's insight:
A new package that may be of use to many. Still needs some refining, however it is a good take to a common issue.
(This article was first published on ExploringDataBlog, and kindly contributed to R-bloggers) It has been several months since my last post on classification tree models, because two things have been consuming all of my spare time. The...
(This article was first published on Econometrics Beat: Dave Giles' Blog, and kindly contributed to R-bloggers) Most of us would acknowledge that getting up to speed with R involves a pretty steep learning curve - but it's worth every drop...
David Meza's insight:
Learning, teching or developing in R, you need to check this out.
In this course you will learn how to program in R and how to use R for effective data analysis. You will learn how to install and configure software necessary for a statistical programming environment, discuss generic programming language concepts as they are implemented in a high-level statistical language. The course covers practical issues in statistical computing which includes programming in R, reading data into R, creating informative data graphics, accessing R packages, creating R packages with documentation, writing R functions, debugging, and organizing and commenting R code. Topics in statistical data analysis and optimization will provide working examples.
The basic way to plot a classification or regression tree built with R’s rpart() function is just to call plot. However, in general, the results just aren’t pretty. As it turns out, for some time now there has been a better way to plot rpart() trees: the prp()function in Stephen Milborrow’s rpart.plot package. This function is a veritable “Swiss Army Knife” for plotting trees and the documentation for the package is quite good: in addition to the package pdf, Stephen maintains a very nice and easy-to-read user manual on his webpage.
The first is Applied Predictive Modeling by Max Kuhn and Kjell Johnson. Max Kuhn is the author of the caret package, an extremely useful and powerful R package for fitting and optimizing all kinds of predictive models in R. It's available now on Amazon Kindle and will be published in hardcover by Springer in July.
The second is Dynamic Documents with R and knitr by Yihui Xie, the author of the knitr package. With knitr you can easily create beautiful documents and reports, with text, tables and figures all dynamically generated by R. It will also be available in July.
A New York Times article yesterday discovers the 80-20 rule: that 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. The article gives short shrift to this important task by calling it "janitorial work", but whether you call it data munging, data wrangling or anything else, it's a critical part of the data science. I'm in agreement with Jeffrey Heer, professor of computer science at the University of Washington and a co-founder of Trifacta, who is quoted in the article saying, “It’s an absolute myth that you...
David Meza's insight:
For those of you new to data science, remember, data munging is a tedious yet crucial part of the process. Do not cut corners, your research will suffer.
R is different from C family languages. It has a C syntax, but a Lisp semantics. Programmers from C/C++/Java world would find many usages in R adhoc and need to memorize special cases. This is because they use R from a C's perspective. R is a very elegant language if we unlearn some C concepts and know R’s rules. I am writing several R notes to explain several important R language rules. This is the first note. The atomicity of R vectors The atomic data structure in R is vector. This is so different from any C family language. In C/C++, built-in types such as int and char are atomic data structures while C array (a continuous data block in memory) is obviously not the simplest type. In R, vector is indeed the most basic data structure. There is no scalar data structure in R – you cannot have a scalar int in R as int x = 10 in C. The atomicity of R vectors is written in many documents. The reason that it is usually skipped by R learners is that many R users come from C in which array is a composite data structure. Many seemingly special cases in R language all comes from the atomicity of R vectors. And I will try to cover them coherently. Display x <- 10 # equivalent to x <- c(10)x # or equivalent to print(x) ##  10 y <- c(10, 20)y ##  10 20 What does  mean in the output? It means that the output is a vector and from index 1, the result is ... x is a vector of length 1, so its value is  10, while y is a vector of length 2, so its value is  10 20. For a vector with longer length, the output contains more indices to assist human reading: z <- 1:25print(z) ##  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23##  24 25 Vectors with different types Though vectors in R are atomic. There are different vectors: int vector, float vector, complex vector, character vector and logical vector. Int and float vectors are numeric vectors. In above, we have seen int vectors. Let's see more types of vectors below: x <- c(1, 2.1)mode(x) ##  "numeric" y <- c("a", "bb", "123")mode(y) ##  "character" z <- complex(real = 1, imaginary = 1)mode(z) ##  "complex" Notice that in R, string (In R's term: character type) is like int, float, logical types. It is not a vector of chars. R does not differentiate between a character and a sequences of characters. R has a set of special functions such as paste and strsplit for string processing, however R's character type is not a composite type and it is not a vector of chars either! matrix and array Matrix is a vector with augmented properties and this makes matrix an R class. Its core data structure is still a vector. See the example below: y <- c(1, 2, 3, 4, 5, 6)x <- matrix(y, nrow = 3, ncol = 2)class(x) ##  "matrix" rownames(x) <- c("A", "B", "C")colnames(x) <- c("V1", "V2")attributes(x) ## $dim##  3 2## ## $dimnames## $dimnames[]##  "A" "B" "C"## ## $dimnames[]##  "V1" "V2" x ## V1 V2## A 1 4## B 2 5## C 3 6 as.vector(x) ##  1 2 3 4 5 6 In R, arrays are less frequently used. A 2D arrays is indeed a matrix. To find more: ?array. We can say that an array/matrix is a vector (augmented with dim and other properties). But we cannot say that a vector is an array. In OOP terminology, array/matrix is a subtype of vector. operators Because the fundamental data structure in R is vector, all the basic operators are defined on vectors. For example, + is indeed vector addition while adding two vectors with length 1 is just a special case. When the lengths of the two vectors are not of the same length, then the shorter one is repeated to the same length as the longer one. For example: x <- c(1, 2, 3, 4, 5)y <- c(1)x + y # y is repeated to (1,1,1,1,1) ##  2 3 4 5 6 z <- c(1, 2)x + z # z is repeated to (1,2,1,2,1), a warning is triggered ## Warning: longer object length is not a multiple of shorter object length ##  2 4 4 6 6 +,-,*,/,etc. are vector operators. When they are used on matrices, their semantics are the same when dealing with vectors – a matrix is treated as a long vector concatenated column by column. So do not expect all of them to work properly as matrix operators! For example: x <- c(1, 2)y <- matrix(1:6, nrow = 2)x * y ## [,1] [,2] [,3]## [1,] 1 3 5## [2,] 4 8 12 For matrix multiplication, we shall use the dedicated operator: x %*% y # 1 x 2 * 2 x 3 = 1 x 3 ## [,1] [,2] [,3]## [1,] 5 11 17 y %*% x # dimension does not match, c(1,2) is a row vector, not a col vector! ## Error: non-conformable arguments The single-character operators are all operated on vectors and would expect generate a vector of the same length. So &, |, etc, are vector-wise logic operators. While &&, ||, etc are special operators that generates a logic vector with length 1 (usually used in IF clauses). x <- c(T, T, F)y <- c(T, F, F)x & y ##  TRUE FALSE FALSE x && y ##  TRUE math functions All R math functions take vector inputs and generate vector outputs. For example: exp(1) ##  2.718 exp(c(1)) ##  2.718 exp(c(1, 2)) ##  2.718 7.389 sum(matrix(1:6, nrow = 2)) # matrix is a vector, for row/col sums, use rowSums/colSums ##  21 cumsum(c(1, 2, 3)) ##  1 3 6 which.min(c(3, 1, 2)) ##  2 sqrt(c(3, 2)) ##  1.732 1.414
NA and NULL
NA is a valid value. NULL means empty. print(NA) ##  NA print(NULL) ## NULL c(NA, 1) ##  NA 1 c(NULL, 1) ##  1
*I find Knitr integrated with RStudio IDE is very helpful to write tutorials.
David Meza's insight:
Useful for those new to R and have a programming background.
(This article was first published on Revolutions, and kindly contributed to R-bloggers) By Matt Sundquist Plotly's Co-Founder Here at Plotly, we are on a mission to build a platform where data scientists can analyze data, create beautiful...
Quality Control is an important part of most workplaces. From manufacturing to software development, you'll be hard pressed to find a business that doesn't have some basic QC practicies. Oftentimes this means one person meticulously inspecting products, looking for imperfections. It can be as painstaking as it is boring.
In this post, we're going to show you how to automate this with statistical quality control and R.
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.