|Scooped by dala lopo|
It is a lot of buzz around Big Data and also the NOSQL movement today and rightly so. The difficulties with data have essentially been two-fold: find cost effective ways to store rising amounts of data and information, and find approaches to mine these records to extract meaningful Business Intelligence.
This matter has been compounded because of the emergence of web 2.0 technologies whose legion of loyal fans who are able to number in the millions generate copious amounts of data every minute, and by the time you understand it you could have gigabytes and terabytes of web data in one single day. Obviously, this necessitates very radical departures on the current state of the art for data storage and mining technologies.
While traditional IT houses not with the web 2.0 stripe might not face this sort of real estate property issues when it comes to data storage, mining that data for meaningful intelligence continues to be a are employed in progress and also a major headache it doesn't matter what the size of computer data Warehouse. So because you would possibly not need to be within the bleeding edge and choose a grid based MPP solution for ones increasing storage needs, you will certainly want to require a serious go through the emerging Algorithm and Heuristics driven data mining techniques led by Map/Reduce.
Map/Reduce may yet become the perfect killer app that can be the panacea for your Business Intelligence ailments. This is very serious stuff. If Google has bet its house into it and has chose to make this the foundation for their search technology, then you better assume that this is very strong medicine.
Using traditional relational database technology to serve your Big Data data warehousing (DW) needs has become rather effectively known. This is not performing operations between databases, particularly if they span networks. Try performing a join between two database instances and you will know very well what After all. To fix these issues, you'll find custom solutions from vendors like Teradata and Netezza. The barrier for entry is still quite high in adopting scalping strategies, however, in relation to license fees and setup and maintenance costs.
There's an alternative. Were now within the era of framework-based DW, DIY DW and DW inside Cloud. The existing set of tools and technologies who have emerged have helped democratize this domain which was for long the exclusive preserve of a few select vendors. The revolution was led by grid-based implementations adopted because of the leading players like Google (Bigtable), Facebook (Cassandra) and Yahoo (Hadoop).
Hadoop has emerged as the most widely used Map/Reduce based free frameworks for Big Data and lots of Information majors have adopted fractional laser treatments. Beware that this is usually a framework and will need a lot of customization and programming to have it to complete what you want. If Hadoop seriously isn't to your taste, then there are similar implementations like AsterData and GreenPlum which work the identical concepts but could enable you to get all set very quickly using own abstractions libraries like SQL-MR and intelligent dashboards for easy configuration and maintenance. Another very appealing feature of these offerings is the capacity to be hosted in a Cloud so all your advanced analytic needs can be executed off premises.
Speaking within a broad sense, you'll find three general flavors to pick from in terms of Big Data solutions:
* Custom build BigData frameworks like Teradata and VLDB implementations from Oracle which can be proprietary frameworks designed to handle large datasets. These frameworks will still be very relational in orientation and are not created to talk with unstructured data sets.
* Data Warehouse Appliances like Oracle's Exadata. This introduces the thought of DW-in-a-box the location where the entire framework meant for a typical DW implementation (the Hardware, Software Framework regarding data store and Advanced Analytical tools) are common vertically integrated and supplied by identical vendor being a packaged solution.
* Open Source NoSQL-oriented Big Data Frameworks like Hadoop and Cassandra. These frameworks implement advanced analytical and mining algorithms like Map/Reduce and are also created to be placed on commodity hardware on an MPP architecture with huge Master/Slave clusters. There're excellent at managing vast amounts of unstructured, text-oriented information.
* Commercial Big Data Frameworks like AsterData and GreenPlum, which go through same paradigm of MPP infrastructures but have implemented their very own add-ons for example SQL-MR and also other optimizations for faster analytics.