Over the last year we have been experimenting with different data processing platforms including Hadoop, HPCC, several graph databases and NOSQL databases. WE selected a hybrid of STORM and HPCC since combined they are a great match to our processing needs, both have permissive open source licenses and good resources.
Many Big data articles fail to mention that Big Data applications have varying requirements and the different available toolsets suit different applications and circumstances. One of the reasons we went with HPCC is we have good C++ skills on our team, we liked the ECL language and systems architecture. We could see how pur target workflows would map (it helps that the Lexis Nexis team have addressed similar areas int he past and it shows in the product) and we could easily see how to integrate the HPCC functionality with our C++, Java and Python tools.
If we were a pure java only house then HADOOP,CASCADE and HIVE would be a much better match for non real time procesing. Tools like Storm are better architected for real time event processing (hence our hybrid choice). One of the essential lessons and take aways is that each domain and circumstance has different requirements. You have to play with the technologies, test them against workflows and see what works for your problem domain, performance requirements and team skills. For evaluating HPCC a great resource is Arjuna's blog. You will find worked examples discussion of trade offs and approaches and a lot of good information. If you are interested in learning about a great Big Data platform - its a good place to start. Click on the image or the title to learn more.