 Your new post is loading...
Automated Insights, a startup that translates raw data into plain English, is launching a new product that could make analytics data a lot more accessible.
Forbes published this chart based on Wikibon data: It’s an $18 billion industry heading to $50 billion in five years, according to tech researchers at Wikibon. Make note of the names in the inner circle. The big data market is still shaping. But soon (not very soon though), we’ll see some clear segments with leaders and challengers. And then…, then we will see a lot of acquisitions and mergers.
Cloudera believes that the future of Hadoop is as a Platform for Big Data that will complement, not replace, existing data management systems, enabling new ways of interacting with large and diverse data sets. Last week, for example, Cloudera announced the general availability of Cloudera Impala, the industry’s first and only open source interactive SQL framework for the Hadoop platform. Through innovations like Impala, Hadoop presents exciting new opportunities for the enterprise.
Last week, President Obama launched the Administration's new Open Data Policy and Executive Order aimed at ensuring that data released by the government will be as accessible and useful as possible. Project Open Data is an online, public repository intended to foster collaboration and promote the continual improvement of the Open Data Policy.
Via Ian Sykes
By combining data from both real and virtual worlds, we can now understand behavior at a previously unimaginable scale. When we use data to uncover the workplace behaviors that make people effective, happy, creative, experts, leaders, followers, early adopters, and so on, we are using “people analytics.”
PernixData, a San Jose, Calif.-based storage software provider, is gearing up for the software-defined storage race. The startup is leveraging server-side flash in the hopes that the technology will give it a leg up on its competitors in the traditional enterprise information technology market.
The community team at Revolution Analytics has just updated this list of resources to learn about R on the Web. Included is this list of the top 3 resources for absolute beginners getting started with R:
Today I am happy to announce a new suite of online statistics calculators, which I am hereby christening Evan's Awesome A/B Tools. I am calling these tools awesome because they are intuitive, visual, and easy-to-use. Unlike other online statistical calculators you've probably seen, they'll help you understand what's going on "under the hood" of common statistical tests, and by providing ample visual context, they make it easy for you to explain p-values and confidence intervals to your boss. (And they're free!)
Netflix CEO thought he could do a better job at developing a recommendation algorithm than his engineers. He failed – and the episode shaped the way the company has looked at data ever since.
Noam Ross recently shared a very useful guide to speeding up your R code. Get a bigger computer (for example, renting an instance on the Amazon cloud for a few cents an hour)Use parallel programming techniquesUsing the R byte-compilerProfiling and benchmarking your codeUsing high-performance packages (like xts, for time series)And lastly, rewriting your code to use more efficient constructs One other tip that can have some great performance benefits is linking R to parallel BLAS libraries (Revolution R does this by default). For more details on how to speed up your R code read Noam's excellent guide, linked below. Noam Ross: FasteR! HigheR! StrongeR! - A Guide to Speeding Up R Code for Busy People
For 2013 and beyond, experts are anticipating the advent of the role of Chief Data Officer to better understand when business units should be looking for answers in the company's data, treating data as a strategic asset.
|
Stories claiming to report useful scientific breakthroughs appear in the news media every day. But what use are they if they are so frequently reversed?
As representatives of this open, community led effort we are very proud to announce the first release of the new and improved Apache Hive, version 0.11. This substantial release embodies the work of a wide group of people from Microsoft, Facebook , Yahoo, SAP and others. As promised we have delivered phase 1 of the Stinger Initiative in late spring. This release is another proof point that that the open community can innovate at a rate unequaled by any proprietary vendor. As part of phase 1 we promised windowing, new data types, the optimized RC (ORC) file and base optimizations to the Hive Query engine and the community has delivered these key features.
Logistics giant Linfox is embarking on a big data-crunching exercise that will give its control centres the ability to predict hazards and help drivers navigate around them. The company is using a SAP HANA in-memory analytics database engine to crawl about 12 million real-time records generated by telematics equipment on a subset of its 5000-plus truck fleet.
R makes it easy to fit a linear model to your data. The hard part is knowing whether the model you've built is worth keeping and, if so, figuring out what to do next. This is a post about linear models in R, how to interpret lm results, and common rules of thumb to help side-step the most common mistakes.
MAYOR BLOOMBERG AND FIRE COMMISSIONER CASSANO ANNOUNCE NEW RISK-BASED FIRE INSPECTIONS CITYWIDE BASED ON DATA MINED FROM CITY RECORDS
As someone who trained as a statistician, I've always struggled with that title. I love the rigor and insight that Statistics brings to data analysis, but let's face it: Statistics — the name — has always had a bit of a branding problem.
Dell Software's (formerly Quest's) SharePlex replication tool for Oracle now works with Hadoop...or anything else that can talk to a JMS queue.
GraphLab-the-company wants to capitalize on the success of GraphLab-the-open-source-project by building a commercial product for applying advanced machine-learning to massive graph datasets, referring to its platform as a “Hadoop but for graphs” on a high level. The company promises to continue actively supporting the open-source project.
I've been impressed in recent months by the number and quality of free datascience/machine learning books available online. I don't mean free as in some guy paid for a PDF version of an O'Reilly book and then posted it online for others to use/steal, but I mean genuine published books with a free online version sanctioned by the publisher. That is, "the publisher has graciously agreed to allow a full, free version of my book to be available on this site."
In the data world today, "big" dominates. But sometimes you don't need big. You need a small dose of exactly the right data. Data that bear precisely on the question at hand, that you understand deeply, and that you can trust. If such data are already at hand, great. But frequently they are not. And then, nothing beats a well-conceived, -designed, - controlled, -executed, and -analyzed experiment. Companies need to make sure experimentation is included in their "data toolkits," learn when to use it, and develop the skills to conduct effective experiments.
JobTracker.app is a Mac menu bar app interface to the Hadoop JobTracker. It provides Growl/Notification Center notices of starting, completed, and failed jobs and gives easy access to the detail pages of those jobs.
Although the science of prediction continues to improve, the work of making predictions in criminal justice is plagued by persistent shortcomings. Some stem from unfamiliarity with scientific strategies or an over-reliance on timeworn — but unreliable — prediction habits. If prediction in criminal justice is to take full advantage of the strength of these new tools, practitioners, analysts, researchers and others must avoid some commonplace mistakes and pitfalls in how they make predictions.
|