Version 8.0 of the Unicode Standard is now available. It includes 41 new emoji characters (including five modifiers for diversity), 5,771 new ideographs for Chinese, Japanese, and Korean, the new Georgian lari currency symbol, and 86 lowercase Cherokee syllables. It also adds letters to existing scripts to support Arwi (the Tamil language written in the Arabic script), the Ik language in Uganda, Kulango in the Côte d’Ivoire, and other languages of Africa. In total, this version adds 7,716 new characters and six new scripts.
Today I’m excited to announce the general availability of Spark 1.4! Spark 1.4 introduces SparkR, an R API targeted towards data scientists. It also evolves Spark’s DataFrame API with a large number of new features. Spark’s ML pipelines API first introduced in Spark 1.3 graduates from an alpha component. Finally, Spark Streaming and Core add visualization and monitoring to aid in production debugging. We’ll be publishing in-depth posts covering Spark’s new features over the coming weeks. Here I’ll briefly outline some of the major themes and features in this release.
Distributed systems are not strictly an engineering problem. It’s far too easy to assume a “backend” development concern, but the reality is there are implications at every point in the stack. Often the trade-offs we make lower in the stack in order to buy responsiveness bubble up to the top—so much, in fact, that it rarely doesn’t impact the application in some way. Distributed systems affect the user. We need to shift the focus from system properties and guarantees to business rules and application behavior. We need to understand the limitations and trade-offs at each level in the stack and why they exist. We need to assume failure and plan for recovery. We need to start thinking of distributed systems as a UX problem.
Spark is gaining wide industry adoption due to its superior performance, simple interfaces, and a rich library for analysis and calculation. Like many projects in the big data ecosystem, Spark runs on the Java Virtual Machine (JVM). Because Spark can store large amounts of data in memory, it has a major reliance on Java’s memory management and garbage collection (GC). New initiatives like Project Tungsten will simplify and optimize memory management in future Spark versions. But today, users who understand Java’s GC options and parameters can tune them to eek out the best the performance of their Spark applications. This article describes how to configure the JVM’s garbage collector for Spark, and gives actual use cases that explain how to tune GC in order to improve Spark’s performance. We look at key considerations when tuning GC, such as collection throughput and latency.
Have you ever wondered how Airbnb’s price tips for hosts works? In this dynamic pricing feature, we show hosts the probability of getting a booking (green for a higher chance, red for a lower chance), or predicted demand, and allow them to easily price their listings dynamically with a click of a button. Many features …
Artificial Neural Networks have spurred remarkable recent progress in image classification and speech recognition. But even though these are very useful tools based on well-known mathematical methods, we actually understand surprisingly little of why certain models work and others don’t. So let’s take a look at some simple techniques for peeking inside these networks.
An artificial intelligence system has for the first time reverse-engineered the regeneration mechanism of planaria—the small worms whose extraordinary power to regrow body parts has made them a research model in human regenerative medicine.
By David Smith I was on a panel back in 2009 where Bow Cowgill said, "The best thing about R is that it was written by statisticians. The worst thing about R is that it was written by statisticians." R is undeniably quirky — especially to computer scientists — and yet it has attracted a huge following for a domain-specific language, with more than two million users wordwide. So why has R become so successful, despite being outside the mainstream of programming languages? John Cook adeptly tackles that question in a 2013 lecture, "The R Language: The Good The Bad...
I’m pleased to announce that Microsoft has completed the acquisition of BlueStripe Software, a leading provider of application management technology. BlueStripe’s solution helps map, monitor and troubleshoot distributed applications across heterogeneous operating systems and multiple datacenter and cloud environments. BlueStripe is commonly used today by customers to extend the value of Microsoft System Center by adding application-aware infrastructure performance monitoring. More and more, busi
I’ve written a package for image processing in R, with the goal of providing a fast API in R that lets you do things in C++ if you need to. The package is called imager, and it’ on Github. The whole thing is based on CImg, a very nice C++ library for image processing by […]
On Monday, we compared the performance of several different ways of calculating a distance matrix in R. Now there's another method to add to the list: using GPU acceleration in R. A GPU is a dedicated, high-performance chip available on many computers today. Unlike the CPU, it's not used for general computations, but rather for specialized tasks that benefit from a massively multi-threaded architecture. Video-game graphics is the usual target for GPUs, but in recent years they've been used for certain high-performance computing tasks as well. The problem is that GPUs require specialized programming, and because they have limited access...
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.