The city is flanked on three sides by smog-trapping mountain ranges.There are numerous sources of foul air, and a multitude of subtle ways the chemicals interact with each other, which make it hard to identify what problems need fixing.
There was a lot of news about Spark’s ascension in the big data ranks this week, as well as some speculation. According to Cloudera’s Mike Olson, his company is widely embracing Spark — including to run Hive — but not in place of Impala.
The SIGFOX connected world, a fast growing ecosystem connecting the world through the SIGFOX cellular connectivity for the IoT and M2M....
It is impossible to imagine the deployment of billions of connected devices if communication costs remain high, and with such volumes every square inch of silicon counts. The SIGFOX protocol is compatible with the majority of existing transceivers, making it easily accessible for module and device manufacturers.
Facebook has built its own networking switch and developed a Linux-based operating systems to run it. The goal is to create networking infrastructure that mimics a server in terms of how its managed and configured.
Founded just two-and-a-half years ago, Qubole now has more than 30 customers running on Hadoop nodes that it manages, including prominent Web-based outfits like Pintrest and Quora, and a handful of online advertising specialists. Qubole, which maintains dual headquarters in Silicon Valley and Bangalore, runs its clusters on AWS and Google Cloud, and exposes a high-level Web interface for users to query their Hadoop clusters via Hive, Pig, Presto, and MapReduce.
The on-going Big Data media hype stirs up a lot of passionate voices. There are naysayers (“it is nothing new“), doomsayers (“it will disrupt everything”), and soothsayers (e.g., Predictive Analytics experts). The naysayers are most bothersome, in my humble opinion. (Note: I am not talking about skeptics, whom we definitely and desperately need during any period of maximized hype!)
Elasticsearch, the company behind a very popular open source suite for indexing, searching and visualizing JSON documents, has raised a $70 million series C round of venture capital. Just more than two years since being founded, the company has raised $104 million.
Pinterest is no National Security Agency, but the company, which identifies itself as a “visual discovery tool,” has grown into a collector of plenty information. Like Twitter, Facebook, Google, and other web giants, Pinterest has developed sophisticated systems for storing the data, but it’s also built a tool that lets lots of employees get at it.
In a blog today, Pinterest data engineer Mohammad Shahangian sheds light on the “self-serve platform” he and his colleagues have created for accessing data in Pinterest’s Hadoop clusters sitting in the Amazon Web Services public cloud.
Last year Glassdoor, an online job search engine and career community, incorporated a graph database providing members with real-time job recommendations. In March, it was reported that Cray CEO Pete Ungaro said that a Major League Baseball team had bought a Cray supercomputer, which has the capacity to quickly process vast amounts of data. Just last month, ProgrammableWeb reported that JustGiving, a leading social and charity giving platform, is building the new GiveGraph engine, which should be completed by the end of 2014. According to JustGiving, GiveGraph is the "world's largest graph of giving behavior and contains 44 million people, [tens of thousands] of causes and 111 million connections."
The possibility of equipping the postal network (vehicles, mailboxes, mail pieces and parcels, sorting centers, etc.) with low-cost sensors will exponentially expand the capability of postal operators to collect valuable data. This new rich data sources could help the Postal Service improve operational performance, customer service, create new products and services, and support more efficient decision-making processes. The “Internet of Postal Things”, experts say, could also have a positive spillover effect on other adjacent non-postal sectors, as the information collected by and for the Postal Service could be useful to others.
How should Argentina fans feel about all this, given the disappointment they’ve experienced in World Cups past and the hopes they’ve pinned on Messi this year? So far in the 2014 tournament, Messi has been erasing whatever gap there was between his Barcelona stats and his Argentina stats, with style. And that gap was never really as big as it appeared...In other words, if Barca-Messi and Argentina-Messi were two different people, even based solely on the stats recorded since 2010, there’s a good chance they’d be the two best players in the world.
Getting the right people together with the right technology poses a major challenge.Collaboration among those with different skillsets is crucial—alignment around clear goals can incent this collaboration.Executive sponsorship is essential.Because they were born from data, Internet businesses have a built-in advantage when it comes to employing it.Big data and analytics will continue to evolve, but they are here to stay.
Failure to take privacy seriously can put you at risk of fines or litigation, but the worse case scenario involves negative publicity equating your company with a lack of concern over personal information (e.g. Target and Neiman Marcus). The Google ECJ case is an opportunity to strengthen relationships between clients and consumers by reconsidering how their data is managed. The winners of the digital age will be those who see privacy as an investment that secures profits and opens up privacy markets across the globe.
"eRegulations Insights builds on the success of organizations such as the Sunlight Foundation to free open data, promote public participation and help agencies uncover constructive feedback on specific issues," said Stephen Sorkin, chief strategy officer, Splunk. "As governments grapple with how best to enhance citizen participation and improve transparency, providing similar access to other open data sets could have a transformative impact on government at all levels. Regulators and citizens alike can empower themselves with data-driven debate, and we hope eRegulations Insights can help show just how powerful a little more transparency can really be."
Big data is no longer the sole domain of big companies. As the perception of big data moves from futuristic hype to real-world opportunity, the promise of improved decision making, increased operational efficiency and new revenue streams has more organizations actively engaging in data analysis projects than ever before. That no longer just means more enterprise organizations, either. Midmarket companies are jumping on the big data bandwagon in a big way. In fact, a recent survey by Competitive Topic: Big Data.
[The] objective (to use deep learning to wrestle the practically-unknowables down to knowables) seems to be the impetus behind a two-year-old US DoD Advanced Research Projects Agency initiative called Deep Exploration and Filtering of Text (DEFT)... DEFT aims to "analyze textual data at a scale beyond what humans could do by themselves....[DEFT is designed to enable] more efficient…processing [of] text information and…[greater] understanding [of] connections in text that might not be readily apparent to humans....[D]efense analysts [would be able] to efficiently investigate…more documents, which would enable discovery of implicitly expressed, actionable information within those documents."
DARPA's ability to deliver on this grand promise is still unproven. However, the range of deep-learning ML approaches included under DEFT is truly impressive. A partial list includes separate functional modules to detect anomalies, disfluency, ambiguity, vagueness, causal relations, person-relations, semantic equivalences, entailments and redundancies in textual corpora.
A data-driven approach to business is mandatory to stay competitive in your industry. How are companies harnessing this power to disrupt their industries? From data models to marketing, to applying big data to saving the environment, here are our favorite examples of using data to innovate and some tips to build a data-driven strategy for your company.