The first is Applied Predictive Modeling by Max Kuhn and Kjell Johnson. Max Kuhn is the author of the caret package, an extremely useful and powerful R package for fitting and optimizing all kinds of predictive models in R. It's available now on Amazon Kindle and will be published in hardcover by Springer in July.
The second is Dynamic Documents with R and knitr by Yihui Xie, the author of the knitr package. With knitr you can easily create beautiful documents and reports, with text, tables and figures all dynamically generated by R. It will also be available in July.
The community team at Revolution Analytics has just updated this list of resources to learn about R on the Web. Included is this list of the top 3 resources for absolute beginners getting started with R:
Hortonworks' announcement of the Stinger Initiative last week claiming to make Hive 100 times faster utilising and introducing new core Hadoop technologies. It could result in a near-BigQuery-style performance for a wide range of Hadoop users. That would enable ad hoc and interactive data querying for large datasets, something that requires sophisticated, large, traditional data warehouse setups. Hadoop clusters would instantly solve a highly significant use-case in many companies and potentially become dual use.
More and more frequently we see organizations make the mistake of mixing and confusing team roles on a data science or "big data" project - resulting in over-allocation of responsibilities assigned to data scientists. For example, data scientists are often tasked with the role of data engineer leading to a misallocation of human capital. Here the data scientist wastes precious time and energy finding, organizing, cleaning, sorting and moving data. The solution is adding data engineers, among others, to the data science team.
I've been impressed in recent months by the number and quality of free datascience/machine learning books available online. I don't mean free as in some guy paid for a PDF version of an O'Reilly book and then posted it online for others to use/steal, but I mean genuine published books with a free online version sanctioned by the publisher. That is, "the publisher has graciously agreed to allow a full, free version of my book to be available on this site."
In order to meet the challenges of Big Data, you must rethink data systems from the ground up. You will discover that some of the most basic ways people manage data in traditional systems like the relational database management system (RDBMS) is too complex for Big Data systems. The simpler, alternative approach is a new paradigm for Big Data. In this article based on chapter 1, author Nathan Marz shows you this approach he has dubbed the “lambda architecture.”
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.