Bits 'n Pieces on Big Data
1.3K views | +0 today
Follow
Bits 'n Pieces on Big Data
Innovative information and insight into Big Data (if you like the content, please consider donating to my bitcoin address #3Pjof6N9xRAYXXSPZ4EAFLfHGn51ZdPcxi)
Curated by onur savas
Your new post is loading...
Your new post is loading...
Scooped by onur savas
Scoop.it!

"Data Science: Where are We Going?" - Dr. DJ Patil (Strata + Hadoop 2015) - YouTube

Data Science, where are we going? What impact can we expect? With a special introduction from President Barack Obama. Watch more from Strata + Hadoop San Jos...
onur savas's insight:

This is the most interesting presentation but many more material in this year's Strata: http://strataconf.com/big-data-conference-ca-2015/public/content/home

more...
No comment yet.
Rescooped by onur savas from Big Data and NoSQL Daily
Scoop.it!

Apache Spark for Big Analytics

Apache Spark for Big Analytics | Bits 'n Pieces on Big Data | Scoop.it

Via Simon Hunanyan
more...
Simon Hunanyan's curator insight, December 23, 2013 10:09 PM

Spark, an Apache incubator project, is an open source distributed computing framework for advanced analytics in Hadoop. It's 100X faster than what they are able to achieve with MapReduce. Spark includes a machine learning library (MLLib), a graph engine (GraphX), a streaming analytics engine (Spark Streaming) and much more...

Currently, Spark supports programming interfaces for Scala, Java and Python.  The R interface is under development and this is expected to be released in the first half of 2014.

Scooped by onur savas
Scoop.it!

Netflix open sources its data traffic cop, Suro

Netflix open sources its data traffic cop, Suro | Bits 'n Pieces on Big Data | Scoop.it
Netflix has open sourced a tool called Suro that collects event data from disparate application servers before sending them to other data platforms such as Hadoop and Elasticsearch.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Hadapt: Classifying the SQL-on-Hadoop Solutions

Hadapt: Classifying the SQL-on-Hadoop Solutions | Bits 'n Pieces on Big Data | Scoop.it
Given all of the "SQL-on-Hadoop" initiatives, now is a good time to classify them and study the similarities and differences between these approaches.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Twitter open sources Storm-Hadoop hybrid called Summingbird

Twitter open sources Storm-Hadoop hybrid called Summingbird | Bits 'n Pieces on Big Data | Scoop.it
Twitter has open sourced a “streaming MapReduce” system called Summingbird that makes Hadoop and Storm play nicer together so applications that require both batch and stream processing can do their jobs with as little complexity as possible.
more...
webDEVILopers's curator insight, September 4, 2013 8:56 AM

In the case of Twitter, Hadoop handles batch processing, Storm handles stream processing, and the hybrid system is called Summingbird. It’s not a tool for every job, but it sounds pretty handy for those it’s designed to address. Hybrid systems like this are actually becoming more common as companies realize they can’t survive in a real-time world with Hadoop alone.

Scooped by onur savas
Scoop.it!

Facebook's trillion-edge, Hadoop-based and open source graph-processing engine

Facebook's trillion-edge, Hadoop-based and open source graph-processing engine | Bits 'n Pieces on Big Data | Scoop.it
Facebook has detailed its extensive improvements to the open source Apache Giraph graph-processing platform. The project, which is built on top of Hadoop, can now process trillions of connections between people, places and things in minutes.
more...
DG2's curator insight, September 3, 2013 8:10 AM

Many ML algorithms are most readily expressed in terms of local, vertex-centric computations. Think of label propagation, k-means, spectral clustering ... All in all, moving from vector spaces to graphs is the natural thing to do in many applications. Giraph provides an efficient way to run this kind of algorithms on top of an existing Hadoop infrastructure.

Scooped by onur savas
Scoop.it!

ClueWeb12 Dataset Manipulation Tool

ClueWeb12 Dataset Manipulation Tool | Bits 'n Pieces on Big Data | Scoop.it

This is a collection of tools for manipulating the ClueWeb12 collection.

 

"clueweb - Hadoop tools for manipulating ClueWeb collections"

onur savas's insight:

ClueWeb12 is a 27.6 TB dataset of web crawls (http://lemurproject.org/clueweb12/). This tool from Jimmy Lin allows to bring it down to 860 GB in terms of <doc id, term> vectors.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Hadoop Summit, San Jose - June 26-27, 2013

Hadoop Summit, San Jose - June 26-27, 2013 | Bits 'n Pieces on Big Data | Scoop.it
The 6th Annual Hadoop Summit; the leading conference for the Hadoop Community. A two-day event featuring Apache Hadoop thought leaders.
onur savas's insight:

The videos generally become available in the next few days after the summit.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Netflix open sources its Hadoop manager for AWS

Netflix open sources its Hadoop manager for AWS | Bits 'n Pieces on Big Data | Scoop.it
Netflix has open sourced its software to make running Hadoop jobs on the Amazon Web Services cloud as easy as possible.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

The Atlas Distributed Data Management System and Databases

onur savas's insight:

They have chosen HBase over MongoDB and Cassandra.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

H2O: Interactive Machine Learning for Hadoop

H2O: Interactive Machine Learning for Hadoop | Bits 'n Pieces on Big Data | Scoop.it

The open source math and
prediction engine for Hadoop.

onur savas's insight:

This set of tools is claimed to be interactive. Not installed or tried yet. 

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Deploy and Manage Hadoop on OpenStack: Savanna

Deploy and Manage Hadoop on OpenStack: Savanna | Bits 'n Pieces on Big Data | Scoop.it

Savanna aims to provide users with simple means to provision a Hadoop cluster by specifying several parameters like Hadoop version, cluster topology, nodes hardware details and a few more. After user fills in all the parameters, Savanna deploys the cluster in a few minutes. Also provides means to scale already provisioned cluster by adding/removing worker nodes on demand.

onur savas's insight:

Savanna is going to be part of OpenStack. It was presented a frew days ago at "Unconference track" on April 15th, 2013 during OpenStack Summit in Portland. A great feature: Both deployment and monitoring tools will be installed on stand-alone VMs, thus allowing a single instance to manage/monitor several clusters at once.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Hadoop at a Crossroads?

Hadoop at a Crossroads? | Bits 'n Pieces on Big Data | Scoop.it
A few facts and opinions and a couple of announcements, with a prediction on where the "Hadoop stack" might be going.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Manifact - Real-time Metrics - Statistics - Alerts

Manifact - Real-time Metrics - Statistics - Alerts | Bits 'n Pieces on Big Data | Scoop.it

Real-time Metrics, Statistics and Alerts. Works with MySql, Oracle, PostgreSql, and Hadoop.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

SCAPE - SCAlable Preservation Environments

SCAPE - SCAlable Preservation Environments | Bits 'n Pieces on Big Data | Scoop.it

The SCAPE project will develop scalable services for planning and execution of institutional preservation strategies on an open source platform that orchestrates semi-automated workflows for large-scale, heterogeneous collections of complex digital objects. SCAPE will enhance the state of the art of digital preservation in three ways: by developing infrastructure and tools for scalable preservation actions; by providing a framework for automated, quality-assured preservation workflows and by integrating these components with a policy-based preservation planning and watch system. These concrete project results will be validated within three large-scale Testbeds from diverse application areas.

onur savas's insight:

SCAPE Public Wiki: http://wiki.opf-labs.org/display/SP/Home

 

Also, upcoming "Hadoop Driven Digital Preservation" agenda: http://wiki.opf-labs.org/display/SP/Agenda+-+Hadoop+Driven+Digital+Preservation

more...
No comment yet.
Scooped by onur savas
Scoop.it!

From HadoopDB to Hadapt: A Case Study of Transitioning a VLDB paper...

VLDB 2013 Early Career Research Contribution Award Presentation Abstract: Four years ago at VLDB 2009, a paper was published about a research prototype.

onur savas's insight:

Daniel Abadi's Hadapt slides from VLDB.

more...
No comment yet.
Scooped by onur savas
Scoop.it!

Hortonworks Hadoop Sandbox

Hortonworks Hadoop Sandbox | Bits 'n Pieces on Big Data | Scoop.it
Learn Hadoop with Hortonworks Sandbox. A free download that comes with many interactive Hadoop tutorials.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

Hadoop Security and Cloudera's new Role Based Access Control

Hadoop Security and Cloudera's new Role Based Access Control | Bits 'n Pieces on Big Data | Scoop.it
Hadoop Security and Cloudera's new Role Based Access Control Sentry project. About Hadoop, security, Cloudera, Sentry, MapReduce, BigData,
more...
No comment yet.
Scooped by onur savas
Scoop.it!

SpatialHadoop

SpatialHadoop | Bits 'n Pieces on Big Data | Scoop.it
onur savas's insight:

Also check, Multi-Dimensional HBase (MD-HBase) from Divy Agrawal and his group (@USCB): http://www.cs.ucsb.edu/~sudipto/papers/md-hbase.pdf

more...
No comment yet.
Scooped by onur savas
Scoop.it!

The Netflix Tech Blog: Introducing Lipstick on A(pache) Pig

The Netflix Tech Blog: Introducing Lipstick on A(pache) Pig | Bits 'n Pieces on Big Data | Scoop.it

We’re pleased to announce Lipstick (our Pig workflow visualization tool) as the latest addition to the suite of Netflix Open Source Software.

more...
No comment yet.
Rescooped by onur savas from Big Data Analysis in the Clouds
Scoop.it!

Cray integrates Hadoop Big Data analytics with supercomputers

Cray integrates Hadoop Big Data analytics with supercomputers | Bits 'n Pieces on Big Data | Scoop.it
Cray is bringing integrated open source Hadoop Big Data analytics software to its supercomputing platforms.

Via Pierre Levy
more...
No comment yet.
Scooped by onur savas
Scoop.it!

HIPI - Hadoop Image Processing Interface

HIPI - Hadoop Image Processing Interface | Bits 'n Pieces on Big Data | Scoop.it
HIPI - Hadoop Image Processing Interface is a Hadoop MapReduce library for performing image processing tasks in the Hadoop distributed computation framework.
more...
No comment yet.
Scooped by onur savas
Scoop.it!

On Big Data, Analytics and Hadoop: Interview with Daniel Abadi

On Big Data, Analytics and Hadoop: Interview with Daniel Abadi | Bits 'n Pieces on Big Data | Scoop.it
onur savas's insight:

An interview with Daniel Abadi on Big Data, Analytics and Hadoop. Prof. Abadi is an Associate Professor of Computer Science at Yale University, and Chief Scientist and Co-founder of Hadapt (http://hadapt.com/).

more...
No comment yet.