Big Data and Hadoop
77
Hadoop-related Big Data News and Articles
Curated by Cho Dong Hwan
Follow
Scooped by Cho Dong Hwan onto Big Data and Hadoop
Scoop.it!

Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing · YDN Blog

Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing · YDN Blog | Big Data and Hadoop | Scoop.it

Yahoo! engineering teams are developing technologies to enable Storm applications and Hadoop applications to be hosted on a single cluster. • We have enhanced Storm to support Hadoop style security mechanism (including Kerberos authentication), and thus enable Storm applications authorized to access Hadoop datasets on HDFS and HBase. • Storm is being integrated into Hadoop YARN for resource management. Storm-on-YARN enables Storm applications to utilize the computation resources in our tens of thousands of Hadoop computation nodes. YARN is used to launch Storm application master (Nimbus) on demand, and enables Nimbus to request resources for Storm application slaves (Supervisors).

No comment yet.
Your new post is loading...
Scooped by Cho Dong Hwan
Scoop.it!

With Site Ai, Automated Insights Provides A Cliffs Notes Version Of Your Web Analytics | TechCrunch

With Site Ai, Automated Insights Provides A Cliffs Notes Version Of Your Web Analytics | TechCrunch | Big Data and Hadoop | Scoop.it
Automated Insights, a startup that translates raw data into plain English, is launching a new product that could make analytics data a lot more accessible.
Luciano Lampi's curator insight, May 24, 8:33 AM

better than a dashboard!

Scooped by Cho Dong Hwan
Scoop.it!

Big Data Industry Atlas

Forbes published this chart based on Wikibon data: It’s an $18 billion industry heading to $50 billion in five years, according to tech researchers at Wikibon. Make note of the names in the inner circle.

 

The big data market is still shaping. But soon (not very soon though), we’ll see some clear segments with leaders and challengers. And then…, then we will see a lot of acquisitions and mergers.

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Extending the Data Warehouse with Hadoop

Cloudera believes that the future of Hadoop is as a Platform for Big Data that will complement, not replace, existing data management systems, enabling new ways of interacting with large and diverse data sets. Last week, for example, Cloudera announced the general availability of Cloudera Impala, the industry’s first and only open source interactive SQL framework for the Hadoop platform. Through innovations like Impala, Hadoop presents exciting new opportunities for the enterprise.

Cho Dong Hwan's insight:

"Extending" Not "Replacing"

No comment yet.
Rescooped by Cho Dong Hwan from Corporate Challenge of Big Data
Scoop.it!

Introducing: Project Open Data | The White House

Introducing: Project Open Data | The White House | Big Data and Hadoop | Scoop.it

Last week, President Obama launched the Administration's new Open Data Policy and Executive Order aimed at ensuring that data released by the government will be as accessible and useful as possible.

Project Open Data is an online, public repository intended to foster collaboration and promote the continual improvement of the Open Data Policy.


Via Ian Sykes
No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

The Next Big Thing in Big Data: People Analytics

The Next Big Thing in Big Data: People Analytics | Big Data and Hadoop | Scoop.it

By combining data from both real and virtual worlds, we can now understand behavior at a previously unimaginable scale.

When we use data to uncover the workplace behaviors that make people effective, happy, creative, experts, leaders, followers, early adopters, and so on, we are using “people analytics.”

Jacek Bugajski's curator insight, May 18, 5:31 AM

People Analytics - hmmm... Great idea for companies ;) 

Scooped by Cho Dong Hwan
Scoop.it!

Software Defined Storage Startup PernixData Raises $20M

PernixData, a San Jose, Calif.-based storage software provider, is gearing up for the software-defined storage race. The startup is leveraging server-side flash in the hopes that the technology will give it a leg up on its competitors in the traditional enterprise information technology market.

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Data Science Toolkit

Data Science Toolkit | Big Data and Hadoop | Scoop.it

This command line toolkit helps to extract text-based data from various sources. For example, "html2text http://nytimes.com | text2people" command extract Texts from the front page of New York Times and pipe into filtering only people names.

Happy Hacking! http://www.datasciencetoolkit.org/developerdocs

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Top 3 R resources for beginners

The community team at Revolution Analytics has just updated this list of resources to learn about R on the Web. Included is this list of the top 3 resources for absolute beginners getting started with R:

 

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Announcing Evan's Awesome A/B Tools

Announcing Evan's Awesome A/B Tools | Big Data and Hadoop | Scoop.it

Today I am happy to announce a new suite of online statistics calculators, which I am hereby christening Evan's Awesome A/B Tools. I am calling these tools awesome because they are intuitive, visual, and easy-to-use. Unlike other online statistical calculators you've probably seen, they'll help you understand what's going on "under the hood" of common statistical tests, and by providing ample visual context, they make it easy for you to explain p-values and confidence intervals to your boss. (And they're free!)

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

How Reed Hastings' busy 2005 winter vacation led Netflix to embrace big data

How Reed Hastings' busy 2005 winter vacation led Netflix to embrace big data | Big Data and Hadoop | Scoop.it
Netflix CEO thought he could do a better job at developing a recommendation algorithm than his engineers. He failed – and the episode shaped the way the company has looked at data ever since.
No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

A guide to speeding up R code

Noam Ross recently shared a very useful guide to speeding up your R code. 

Get a bigger computer (for example, renting an instance on the Amazon cloud for a few cents an hour)Use parallel programming techniquesUsing the R byte-compilerProfiling and benchmarking your codeUsing high-performance packages (like xts, for time series)And lastly, rewriting your code to use more efficient constructs

One other tip that can have some great performance benefits is linking R to parallel BLAS libraries (Revolution R does this by default). For more details on how to speed up your R code read Noam's excellent guide, linked below.

Noam Ross: FasteR! HigheR! StrongeR! - A Guide to Speeding Up R Code for Busy People

 
No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Friends don't let friends calculate p-values (without fully understanding them)

Friends don't let friends calculate p-values (without fully understanding them) | Big Data and Hadoop | Scoop.it
No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

The Chief Data Officer Rises

The Chief Data Officer Rises | Big Data and Hadoop | Scoop.it
For 2013 and beyond, experts are anticipating the advent of the role of Chief Data Officer to better understand when business units should be looking for answers in the company's data, treating data as a strategic asset.
No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

What Do Scientific Studies Show?

What Do Scientific Studies Show? | Big Data and Hadoop | Scoop.it
Stories claiming to report useful scientific breakthroughs appear in the news media every day. But what use are they if they are so frequently reversed?
No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Apache Hive 0.11: Stinger Phase 1 Delivered | Hortonworks

Apache Hive 0.11: Stinger Phase 1 Delivered | Hortonworks | Big Data and Hadoop | Scoop.it

As representatives of this open, community led effort we are very proud to announce the first release of the new and improved Apache Hive, version 0.11.  This substantial release embodies the work of a wide group of people from Microsoft, Facebook , Yahoo, SAP and others. 

 

As promised we have delivered phase 1 of the Stinger Initiative in late spring.  This release is another proof point that that the open community can innovate at a rate unequaled by any proprietary vendor.  As part of phase 1 we promised windowing, new data types, the optimized RC (ORC) file and base optimizations to the Hive Query engine and the community has delivered these key features.

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Linfox crunches big data to keep trucks on time

Linfox crunches big data to keep trucks on time | Big Data and Hadoop | Scoop.it

Logistics giant Linfox is embarking on a big data-crunching exercise that will give its control centres the ability to predict hazards and help drivers navigate around them. 

The company is using a SAP HANA in-memory analytics database engine to crawl about 12 million real-time records generated by telematics equipment on a subset of its 5000-plus truck fleet.

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

10 things to know about linear model summary in R

10 things to know about linear model summary in R | Big Data and Hadoop | Scoop.it

R makes it easy to fit a linear model to your data. The hard part is knowing whether the model you've built is worth keeping and, if so, figuring out what to do next.

This is a post about linear models in R, how to interpret lm results, and common rules of thumb to help side-step the most common mistakes.

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

NYC announce new risk-based(data mined) fire inspections

MAYOR BLOOMBERG AND FIRE COMMISSIONER CASSANO ANNOUNCE NEW RISK-BASED FIRE INSPECTIONS CITYWIDE BASED ON DATA MINED FROM CITY RECORDS 

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Statistics vs Data Science vs BI(Just for Fun)

Statistics vs Data Science vs BI(Just for Fun) | Big Data and Hadoop | Scoop.it
As someone who trained as a statistician, I've always struggled with that title. I love the rigor and insight that Statistics brings to data analysis, but let's face it: Statistics — the name — has always had a bit of a branding problem.
No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Dell takes SharePlex to Hadoop and beyond

Dell takes SharePlex to Hadoop and beyond | Big Data and Hadoop | Scoop.it
Dell Software's (formerly Quest's) SharePlex replication tool for Oracle now works with Hadoop...or anything else that can talk to a JMS queue.
No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

GraphLab picks up $6.75m from Madrona and NEA to bolster its ‘Hadoop for graphs’

GraphLab picks up $6.75m from Madrona and NEA to bolster its ‘Hadoop for graphs’ | Big Data and Hadoop | Scoop.it

GraphLab-the-company wants to capitalize on the success of GraphLab-the-open-source-project by building a commercial product for applying advanced machine-learning to massive graph datasets, referring to its platform as a “Hadoop but for graphs” on a high level. The company promises to continue actively supporting the open-source project.

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Free Datascience books

I've been impressed in recent months by the number and quality of free datascience/machine learning books available online. I don't mean free as in some guy paid for a PDF version of an O'Reilly book and then posted it online for others to use/steal, but I mean genuine published books with a free online version sanctioned by the publisher. That is, "the publisher has graciously agreed to allow a full, free version of my book to be available on this site."

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

In a Big Data World, Don't Forget Experimentation

In a Big Data World, Don't Forget Experimentation | Big Data and Hadoop | Scoop.it

In the data world today, "big" dominates. But sometimes you don't need big. You need a small dose of exactly the right data. Data that bear precisely on the question at hand, that you understand deeply, and that you can trust. If such data are already at hand, great. But frequently they are not. And then, nothing beats a well-conceived, -designed, - controlled, -executed, and -analyzed experiment. Companies need to make sure experimentation is included in their "data toolkits," learn when to use it, and develop the skills to conduct effective experiments.

No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

Tracking Hadoop Jobs from Your Mac: There’s an App for That

Tracking Hadoop Jobs from Your Mac: There’s an App for That | Big Data and Hadoop | Scoop.it

JobTracker.app is a Mac menu bar app interface to the Hadoop JobTracker. It provides Growl/Notification Center notices of starting, completed, and failed jobs and gives easy access to the detail pages of those jobs.

 
No comment yet.
Scooped by Cho Dong Hwan
Scoop.it!

The Pitfalls of Prediction | National Institute of Justice

The Pitfalls of Prediction | National Institute of Justice | Big Data and Hadoop | Scoop.it

Although the science of prediction continues to improve, the work of making predictions in criminal justice is plagued by persistent shortcomings. Some stem from unfamiliarity with scientific strategies or an over-reliance on timeworn — but unreliable — prediction habits. If prediction in criminal justice is to take full advantage of the strength of these new tools, practitioners, analysts, researchers and others must avoid some commonplace mistakes and pitfalls in how they make predictions.

No comment yet.