A concise look at the differences between how Spark and MapReduce manage cluster resources under YARN
The most popular Apache YARN application after MapReduce itself is Apache Spark. At Cloudera, we have worked hard to stabilize Spark-on-YARN (SPARK-1101), and CDH 5.0.0 added support for Spark on YARN clusters.
In this post, you’ll learn about the differences between the Spark and MapReduce architectures, why you should care, and how they run on the YARN cluster ResourceManager.
A recent paper with an innocent sounding title is probably the biggest news in neural networks since the invention of the backpropagation algorithm. But what exactly does it all mean?
A recent paper "Intriguing properties of neural networks" by Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow and Rob Fergus, a team that includes authors from Google's deep learning research project outlines two pieces of news about the way neural networks behave that run counter to what we believed - and one of them is frankly astonishing.
Embracing the widely used JSON data-exchange format, the new version of the PostgreSQL open-source database takes aim at the growing NoSQL market of nonrelational data stores, notably the popular MongoDB. The first beta version of PostgreSQL 9.4, released Thursday, includes a number of new features that address the rapidly growing...Read more »
OpenBTS is a Unix application that uses a software radio to present a GSM air interface to standard 2G GSM handset and uses a SIP softswitch or PBX to connect calls. (You might even say that OpenBTS is a simplified form of IMS that works with 2G feature-phone handsets.) The combination of the global-standard GSM air interface with low-cost VoIP backhaul forms the basis of a new type of cellular network that can be deployed and operated at substantially lower cost than existing technologies in many applications, including rural cellular deployments and private cellular networks in remote areas.
A unique graphite dashboard aimed to be a general purpose dashboard that looks nice and makes it easy to construct and edit dashboards through the UI. It also contains an advanced and unique graph editor and graphite target expression / function editor. Other notible features are fast client side rendering, select to zoom in, multiple y-axes and graph templating.
Cluster Monkey is an exclusive content based site that speaks directly to the high performance computing (HPC) cluster market and community. We focus on benchmarks, tutorials, case studies, and how-to information that is useful to cluster users, administrators, purchasers, and designers.
We have released a Linux distribution of Neo4j 2.0.1 community on Windows Azure’s VM Depot website. Users of Windows Azure are now able to copy a platform image of Neo4j 2.0.1 directly from the VM Depot. Once provisioned, a fresh …
As we build products to eventually power Promoted Pins, it’s vital to maintain a no-fail reliable data infrastructure. Today we’re open sourcing Secor, a zero data loss log persistence service whose initial use case was to save logs produced by our monetization pipeline.
Deep learning made its name in 2012 when machine learning gurus used it to win a Kaggle competition. Recent success in such competitions suggests that deep learning is the most accurate machine learning method currently in use.
Spark 1.0.0 is a major release marking the start of the 1.X line. This release brings both a variety of new features and strong API compatibility guarantees throughout the 1.X line. Spark 1.0 adds a new major component, Spark SQL, for loading and manipulating structured data in Spark. It includes major extensions to all of Spark’s existing standard libraries (ML, Streaming, and GraphX) while also enhancing language support in Java and Python. Finally, Spark 1.0 brings operational improvements including full support for the Hadoop/YARN security model and a unified submission process for all supported cluster managers.
High Energy Physics (HEP) has been using Machine Learning (ML) techniques such as boosted decision trees (paper) and neural nets since the 90s. These techniques are now routinely used for difficult tasks such as the Higgs boson search. Nevertheless, formal connections between the two research fields are rather scarce, with some exceptions such as the AppStat group at LAL, founded in 2006. In collaboration with INRIA, AppStat promotes interdisciplinary research on machine learning, computational statistics, and high-energy particle and astroparticle physics.
When I was younger I enjoyed watching a television show called I’ve Got a Secret. The premise of the show was to have celebrity panel members guess the identity, occupation or accomplishment of a guest by asking a series of questions. The guests would oftentimes attempt to mislead or confuse [...]
After some delay (and because of helpful prompting by Giles Heywood and code contributions by John Harrison) d3Network now plays nicely with Shiny web apps. This means you can fully integrate R/D3.js network graphs into your web apps. Here is what one ...
I’m a pretty heavy Unix user and I tend to prefer doing things the Unix Way™, which is to say, composing many small command line oriented utilities. With composability comes power and with specialization comes simplicity. Although, sometimes if two utilities are used all the time, sometimes it makes sense for either:
A utility that specializes in a very common use-case One utility to provide basic functionality from another utility
For example, one thing that I find myself doing a lot of is searching a directory recursively for files that contain an expression: Despite the fact that you can do this, specialized utilities, such as ack have come up to simplify this style of querying. Turns out, there’s also power in not having to consult the man pages all the time. Another example, is the interaction between uniq and sort. uniq presumes sorted data.…
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.