OK, so you and your company have decided that this Big Data is more than a buzzword, and that you’re going to jump feet first into this. You’ve worked with various business leaders, planned out a pilot project, and set aside a budget. But then things come to a grinding halt! Where do you find the skills to actually deploy this?
Despite the activity around Big Data, there is still a significant shortage of skilled professionals who can truly be called Data Scientists who can evaluate business needs and impact, write the algorithms and program platforms such as Hadoop.
The Hadoop framework is broad, and is a new menagerie of jargon and projects: HDFS, Hbase, Hive, Pig, Zookeeper, Map/Reduce, and R just to name a few.
During my trip to the Bay Area this week, I was very encouraged to hear and speak to several companies who have taken some very positive steps towards helping the IT community bridge this gap.
The most significant one to me, are those companies who provide the libraries or interfaces that allow traditional database administrators (DBAs) who have spent years learning, honing and perfecting their skills on well known platforms such as Oracle, IBM DB/2 and others. This traditional database platforms, known as relational database management systems (RDBMS) all use a language called SQL (structured query language). Some Big Data companies are beginning to look at ways of taking the SQL language and allow these queries to be performed on Hadoop.
Now, RDBMS and SQL somewhat goes directly against the principles that forced Hadoop to be created in the first place, which is requirement to have a predefined structure (known as a schema) of the data being stored. The basic idea behind Big Data systems is to breakdown these traditional schemas so that data can be queried and analyzed by any number of factors.