Big Data, Statistics and Machine Learning
1.7K views | +0 today
Follow
Your new post is loading...
Your new post is loading...
Scooped by Flavio Barros
Scoop.it!

ŷhat | Rodeo: A data science IDE for Python

ŷhat | Rodeo: A data science IDE for Python | Big Data, Statistics and Machine Learning | Scoop.it
Today we're excited to introduce a new project: Rodeo. Rodeo is an IDE that's
built expressly for doing data science in Python. Think of it as a light
weight alternative to the IPython Notebook.
We've been using it for projects internally, but today we're releasing it to
the public! We hope you like it as much as we do.

A quick overview. Click to enlarge.
Why we built it
I like the IPython Notebook for presentations ...
more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Finding clusters of CRAN packages using igraph

Finding clusters of CRAN packages using igraph | Big Data, Statistics and Machine Learning | Scoop.it
Flavio Barros's insight:

Nessa continuação a análise dos pacotes do R com métodos de grafos, os pacotes foram agrupados pela detecção de comunidades, e em cada grupo foram apresentados os pacotes com maiores pageranks. O resultado foi muito bom como pode se ver.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Finding the essential R packages using the pagerank algorithm

Finding the essential R packages using the pagerank algorithm | Big Data, Statistics and Machine Learning | Scoop.it
by Andrie de Vries A few weeks ago Joseph Rickert wrote an excellent post about using the igraph package, illustrating many concepts of using graphs. His post reminded me of another excellent blog entry by Antonio Piccolboni where he used the page.rank() function in the igraph package to determine the essential R packages. Unfortunately Antonio does not show the code he used, and I intrigued me to recreate his analysis. In this post I illustrate: Using the miniCRAN package to build a graph of package dependencies (see previous blog post) Using page.rank() to compute the most relevant packages Incidentally, I...
Flavio Barros's insight:

Uma análise muito interessante que mostra os pacotes do R mais relevantes de acordo com o algoritmo PageRank.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

GraphLab | GraphLab Create™: Release Notes

GraphLab | GraphLab Create™: Release Notes | Big Data, Statistics and Machine Learning | Scoop.it
Flavio Barros's insight:

Foi lançada a versão 1.0 do Graphlab Create. Os destaques são um toolkit para deep learning e também uma  RESTful API. 

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Leitura da PNAD 2013 com o R - Flavio Barros

Leitura da PNAD 2013 com o R - Flavio Barros | Big Data, Statistics and Machine Learning | Scoop.it
Com o erro recente na divulgação dos resultados da PNAD 2013, o nome do IBGE e também os resultados dessa …
Flavio Barros's insight:

Para quem tiver interesse em utilizar a PNAD 2013 no R, ai está uma solução.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

The birth of a word

The birth of a word | Big Data, Statistics and Machine Learning | Scoop.it
MIT researcher Deb Roy wanted to understand how his infant son learned language -- so he wired up his house with videocameras to catch every moment (with exceptions) of his son's life, then parsed 90,000 hours of home video to watch "gaaaa" slowly turn into "water." Astonishing, data-rich research with deep implications for how we learn.
Flavio Barros's insight:

Ele gravou mais de 200Tb em vídeo dos primeiros passos de seu filho. As conclusões e os resultados de sua pesquisa são surpreendentes.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

GraphLab thinks its new software can democratize machine learning

GraphLab thinks its new software can democratize machine learning | Big Data, Statistics and Machine Learning | Scoop.it
A Seattle-based machine learning startup called GraphLab is releasing the first official version of its software, which the company hopes can democratize an historically difficult space. Called Create, the software is focused on simplicity, speed and being able to handle a wide variety of applications.
Flavio Barros's insight:

Eu testei o graphlab e achei rápido e fácil de usar. Vamos ver quanto vai custar...

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

A startup is pushing an alternative to Facebook by showing how much info we really share

A startup is pushing an alternative to Facebook by showing how much info we really share | Big Data, Statistics and Machine Learning | Scoop.it
With a new tool for assigning personality profiles based on Facebook posts, a startup called Five is trying to demonstrate the types of inferences companies can make about consumers. The company hopes it will spur desire for a more-private alternative to today’s very public platforms.
Flavio Barros's insight:

Muito legal essa ferramenta para analisar o seu perfil no Facebook. O meu está lá. Comparem para ver.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Introducing R for Big Data with PivotalR

Introducing R for Big Data with PivotalR | Big Data, Statistics and Machine Learning | Scoop.it
Wouldn't it be great if there was a way to harness the familiarity and usability of a tool like R, and at the same time take advantage of the performance and scalability benefits of in-database/in-Hadoop computation? We're happy to announce PivotalR, a package that translates R code into SQL for processing, is available to download from GitHub today.
Flavio Barros's insight:

Uma solução interessante para escalar o R.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Large Scale Machine Learning with Apache Spark

Large Scale Machine Learning with Apache Spark | Big Data, Statistics and Machine Learning | Scoop.it
Spark offers a number of advantages over its predecessor MapReduce that make it ideal for large-scale machine learning. For example, Spark includes MLLib, a library of machine learning algorithms for large data. The presentation will cover the state of MLLib and the details of some of the scalable algorithms it includes, mainly K-means.
Flavio Barros's insight:

Apresentação mostrando o uso da MLLIb do Spark, feita pela Cloudera.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Got a Minute? Spin up a Spark cluster on your laptop with Docker.

Got a Minute? Spin up a Spark cluster on your laptop with Docker. | Big Data, Statistics and Machine Learning | Scoop.it
Apache Spark and Shark have made data analytics faster to write and faster to run on clusters. This post will teach you how to use Docker to quickly and automatically install, configure and deploy Spark and Shark as well. How…
Flavio Barros's insight:

Além do Ferry, nesse post o autor ensina como criar um ambiente de desenvolvimento com vários nós para o Spark usando o Docker.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Ferry | Big Data Development Environment Using Docker

Ferry | Big Data Development Environment Using Docker | Big Data, Statistics and Machine Learning | Scoop.it

máquinFerry helps developers provision and deploy big data applications
using Docker. Ferry supports Hadoop, Cassandra, GlusterFS, and Open MPI

Flavio Barros's insight:

O Hadoop tem um modo de instalação que permite utiliza-lo em uma única máquina, por exemplo seu notebook. No entanto, para quem está aprendendo a utilizar essa tecnologia, o interessante seria ter acesso a uma estrutura mais parecida com aquela encontrada na realidade. A menos que você tenha acesso a um cluster com o Hadoop, uma forma de "improvisar" um cluster no seu notebook seria criar várias máquinas virtuais e interliga-las. Essa estratégia também não é assim tão boa, uma vez que cada máquina virtual é pesada e criar um "cluster" com poucos nós já esgota rapidamente os recursos do seu computador. Ai é que entra o Docker para salvar o dia: o projeto OpenCore tem a função de permitir a criação de um "cluster" de contêineres Docker. Assim, com poucos comandos você pode definir uma infra básica para começar a explorar o Hadoop sem ter de instalar uma máquina virtual. 

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Database vendor open sources Postgres-XL for scale-out workloads

Database vendor open sources Postgres-XL for scale-out workloads | Big Data, Statistics and Machine Learning | Scoop.it
A new open-source project called Postgres-XL is pushing scale-out and MPP capabilities for the popular database. Postgres-XL is the product of a database vendor called TransLattice and is based on technology it acquired from StormDB in October.
Flavio Barros's insight:

O Postgres-XL permite escalar o Postgres de forma que o SGBD possa ser utilizado também com grandes conjuntos de dados em aplicações de Big Data. O projeto acaba de ser liberado como open source.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Dockerizando Shiny Apps - Flavio Barros

Dockerizando Shiny Apps - Flavio Barros | Big Data, Statistics and Machine Learning | Scoop.it
Depois de uma longa pausa de mais de quatro meses, finalmente estou voltando a postar aqui. Infelizmente, diversos compromissos me impediram de continuar postando, mas acabei por dar uma repaginada no blog, alterar a implantação (agora esse blog roda inteiramente dentro de um contêiner docker, com algumas outras coisas legais que pretendo postar mais para …
more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Revolution R Enterprise tutorial: Free 8h interactive tutorial on Big Data Analytics

Revolution R Enterprise tutorial: Free 8h interactive tutorial on Big Data Analytics | Big Data, Statistics and Machine Learning | Scoop.it
In need for better ways to handle large data sets? Interested in manipulating, visualizing, and analysing large datasets with RevoScaleR? Then make sure to have a look at this free hands-on Revolution R Enterprise tutorial on Big Data Analytics by Revolution Analytics and DataCamp. Everything takes place in the online interactive learning interface of DataCamp, so no […]
The post Revolution R Enterprise tutorial: Free 8h interactive tutorial on Big Data Analytics appeared first on DataCamp Blog
Flavio Barros's insight:

Um curso grátis de 8h sobre a plataforma Revolution Analytics. O curso vai ser oferecido pela plataforma online da DataCamp. Eu acho que vale muito a pena para quem já é usuário do R e gostaria de conhecer melhor o Revolutions para Big Data.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Confidence vs. Credibility Intervals

Confidence vs. Credibility Intervals | Big Data, Statistics and Machine Learning | Scoop.it
Tomorrow, for the final lecture of the Mathematical Statistics course, I will try to illustrate - using Monte Carlo simulations - the difference between classical statistics, and the Bayesien approach. The (simple) way I see it is the following, for frequentists, a probability is a measure of the the frequency of repeated events, so the interpretation is that parameters are fixed (but unknown), and data are random for Bayesians, a probability is a measure of the degree of certainty about values,
Flavio Barros's insight:

Muito bom para entender a diferença entre intervalos de confiança e intervalos de credibilidade.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Video introduction to data manipulation with dplyr

Video introduction to data manipulation with dplyr | Big Data, Statistics and Machine Learning | Scoop.it
Hadley Wickham's dplyr package is a great toolkit for getting data ready for analysis in R. If you haven't yet taken the plunge to using dplyr, Kevin Markham has put together a great hands-on video tutorial for his Data School blog, which you can see below. The video covers the five main data-manipulation "verbs" that dplyr provides: filter, select, arrange, mutate and summarise/group_by. (It also introduces the glimpse function, a handy alternative to str, that I had overlooked before.) The vid
more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

How to publish R and ggplot2 to the web

How to publish R and ggplot2 to the web | Big Data, Statistics and Machine Learning | Scoop.it
by Matt Sundquist, Plotly Co-founder It's delightfully smooth to publish R code, plots, and presentations to the web. For example: Shiny makes interactive apps from R. Pretty R highlights R code for HTML. Slidify makes slides from R Markdown. Knitr and RPubs let you publish R Markdown docs. GitHub and devtools let you quickly release packages and collaborate. Now, Plotly lets you collaboratively edit and publish interactive ggplot2 graphs using these tools. This post shows how. Find us on GitHub
Flavio Barros's insight:

Para quem já usa os gráficos do ggplot2, esse é um recurso bem interessante para transforma-los em gráficos interativos para a web.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

New data packages

New data packages | Big Data, Statistics and Machine Learning | Scoop.it
I’ve released four new data packages to CRAN: babynames, fueleconomy, nasaweather and nycflights13. The goal of these packages is to provide some interesting, and relatively large, datasets to demonstrate various data analysis challenges in R. The package source code (on github, linked above) is fully reproducible so that you can see some data tidying in […]
Flavio Barros's insight:

Tem alguns conjuntos de dados ai muito interessantes, agora disponíveis no CRAN.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

The dendextend package for visualizing and comparing trees of hierarchical clusterings (slides from useR!2014)

The dendextend package for visualizing and comparing trees of hierarchical clusterings (slides from useR!2014) | Big Data, Statistics and Machine Learning | Scoop.it
This week I presented in the useR!2014 my package dendextend (also on github), for easily manipulating, visualizing, and comparing dendrograms. Put simply, it is a package designed to easily create figures like these: Here is my presentation from useR: You are also invited to give a look to the current version of the package vignettes: https://github.com/talgalili/dendextend/blob/master/vignettes/dendextend-tutorial.pdf I […]
Flavio Barros's insight:

Pacote muito interessante para comparar clusters hierárquicos. 

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Discounted and free plans are available for educational use

Discounted and free plans are available for educational use | Big Data, Statistics and Machine Learning | Scoop.it
Teach and learn better, together. Learn to ship software like a pro.
Flavio Barros's insight:

Na área de desenvolvimento de software é lugar comum utilizar um sistema de controle de versão. Entretanto, entre pesquisadores, estatísticos e cientistas de dados, muitas vezes usar uma ferramenta como essa é uma novidade. O git em especial é um dos melhores sistemas de controle de versão da atualidade, criado por nada mais nada menos que Linus Torvalds e utilizado no desenvolvimento do Linux. Usar o git para controlar versões de projetos, análises e mesmo de uma tese em Latex é ótimo, mas melhor ainda é usa-lo conjugado ao github e colaborar com outras pessoas, além de manter um ótimo backup para emergências. O único problema do github, pelo menos para mim, é que somente repositórios públicos são de graça. ENTRETANTO, se você for um estudante ou um professor, e planeja hospedar teses e artigos no github, de modo privado e sem pagar nada, o Github Education é uma boa pedida. Fica a dica.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

More than 250 million global events are now in the cloud for anyone to analyze

More than 250 million global events are now in the cloud for anyone to analyze | Big Data, Statistics and Machine Learning | Scoop.it
The Global Database of Events, Languages, and Tones is a growing trove of information about meaningful events that have happened across the world in the past three decades. Now, it’s available to the public to access and analyze using Google’s cloud computing services.
Flavio Barros's insight:

Agora está disponível na internet uma base de dados com mais de 250 milhões de eventos que ocorreram no mundo todo desde 1979. Eventos são uma forma indireta de medir a atividades em vários lugares do mundo e essa base de dados pode ser usada para fazer diversos tipos de análise.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Big Data for Social Innovation (SSIR)

Big Data for Social Innovation (SSIR) | Big Data, Statistics and Machine Learning | Scoop.it
Nonprofits lag behind business and science in using big data effectively.
Flavio Barros's insight:

Um artigo interessante sobre a relação entre Big Data e a inovação social. O artigo foi publicado Stanford Social Innovation Review.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

Scientific Python Tips and Tricks - Scott Sievert

Scientific Python Tips and Tricks - Scott Sievert | Big Data, Statistics and Machine Learning | Scoop.it
You want to pick up Python. But it’s hard and confusing to pick up a whole new
framework. You want to try and switch, but it’s too much effort and …
Flavio Barros's insight:

Essa dica é para Físicos, Matemáticos, Engenheiros e outros profissionais que já utilizam o Matlab ou o Mathematica no dia a dia  e gostariam de substitui-los pelo Python. O Python é uma linguagem de programação versátil e tem sido utilizada em áreas tão diversas quanto astrofísica e aprendizado de máquina.

more...
No comment yet.
Scooped by Flavio Barros
Scoop.it!

6 Easy Steps: Deploy Pivotal’s Hadoop on Docker | Pivotal P.O.V.

6 Easy Steps: Deploy Pivotal’s Hadoop on Docker | Pivotal P.O.V. | Big Data, Statistics and Machine Learning | Scoop.it
Flavio Barros's insight:

O Docker é uma ferramenta fantástica para criar contêineres de software e utiliza-los imediatamente. Nesse post a Pivotal mostra como implantar sua distribuição do Hadoop utilizando imagens do Docker. 

more...
No comment yet.