The team behind Pivotal's GemFire in-memory transactional data store recently unveiled a new database solution powered by GemFire and Apache Spark, called SnappyData.
How to integrate with the Slack platform More than simply another collaboration solution, Slack has RESTful APIs that let you exchange data with READ NOW SnappyData is another recent example of Spark employed as a component in a larger database solution, with or without other pieces from Apache Hadoop.
Amazon Kinesis Firehose, the easiest way to load streaming data into AWS, now supports Amazon Elasticsearch Service as a data delivery destination. You can now use Amazon Kinesis Firehose to stream data to your Amazon Elasticsearch domains continuously and in near real time. Amazon Kinesis Firehose automatically scales to match the throughput of your data and handles all the underlying stream management. For more information, see the Amazon Kinesis Firehose website and developer guide.
Cascading overload failures are widely found in large-scale parallel systems and remain a major threat to system reliability; therefore, they are of great concern to maintainers and managers of different systems. Accurate cascading failure prediction can provide useful information to help control networks. However, for a large, gradually growing network with increasing complexity, it is often impractical to explore the behavior of a single node from the perspective of failure propagation. Fortunately, overload failures that propagate through a network exhibit certain spatial-temporal correlations, which allows the study of a group of nodes that share common spatial and temporal characteristics. Therefore, in this study, we seek to predict the failure rates of nodes in a given group using machine-learning methods.
We simulated overload failure propagations in a weighted lattice network that start with a center attack and predicted the failure percentages of different groups of nodes that are separated by a given distance. The experimental results of a feedforward neural network (FNN), a recurrent neural network (RNN) and support vector regression (SVR) all show that these different models can accurately predict the similar behavior of nodes in a given group during cascading overload propagation.
(This is part 2 of a two part series of blog posts about doing data science and engineering in a containerized world, see part 1 here) Let's admit it, data scientists are developing some pretty sweet (and potentially valuable) models, optimizations, visualizations, etc. Unfortunately, many of these models will never
EY and Adobe have today announced a new strategic alliance that will expand digital experience and web content services to help clients with their digital transformations. Adobe, a leader in digital marketing solutions, will team with EY to help companies improve cost efficiency and gain competitive advantage through digital transformation programs.
For the past three years, our smartest engineers at Databricks have been working on a stealth project. Today, we are unveiling DeepSpark, a major new milestone in Apache Spark. DeepSpark uses cutting-edge neural networks to automate the many manual processes of software development, including writing test cases, fixing bugs, implementing features according to specs, and reviewing pull requests (PRs) for their correctness, simplicity, and style.
XGBoost is a library designed and optimized for tree boosting. Gradient boosting trees model is originally proposed by Friedman et al. By embracing multi-threads and introducing regularization, XGBoost delivers higher computational power and more accurate prediction. More than half of the winning solutions in machine learning challenges hosted at Kaggle adopt XGBoost (Incomplete list). XGBoost has provided native interfaces for C++, R, python, Julia and Java users. It is used by both data exploration and production scenarios to solve real world machine learning problems.
Pigeons in London have a bad reputation. Some people call them flying rats. And many blame them for causing pollution with their droppings. But now the birds are being used to fight another kind of pollution in this city of 8.5 million.
"The problem for air pollution is that it's been largely ignored as an issue for a long time," says Andrea Lee, with the London-based environmental organization ClientEarth. "People don't realize how bad it is, and how it actually affects their health."
London's poor air quality is linked to nearly 10,000 early deaths a year, Lee says, citing a report released by the mayor last year. If people were better informed about the pollution they're breathing, she says, they could pressure the government to do something about it.
Understanding the extremely variable, complex shape and venation characters of angiosperm leaves is one of the most challenging problems in botany. Machine learning offers opportunities to analyze large numbers of specimens, to discover novel leaf features of angiosperm clades that may have phylogenetic significance, and to use those characters to classify unknowns. Previous computer vision approaches have primarily focused on leaf identification at the species level. It remains an open question whether learning and classification are possible among major evolutionary groups such as families and orders, which usually contain hundreds to thousands of species each and exhibit many times the foliar variation of individual species. Here, we tested whether a computer vision algorithm could use a database of 7,597 leaf images from 2,001 genera to learn features of botanical families and orders, then classify novel images. The images are of cleared leaves, specimens that are chemically bleached, then stained to reveal venation. Machine learning was used to learn a codebook of visual elements representing leaf shape and venation patterns. The resulting automated system learned to classify images into families and orders with a success rate many times greater than chance. Of direct botanical interest, the responses of diagnostic features can be visualized on leaf images as heat maps, which are likely to prompt recognition and evolutionary interpretation of a wealth of novel morphological characters. With assistance from computer vision, leaves are poised to make numerous new contributions to systematic and paleobotanical studies.
Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.
A South African team of neuromarketers and neuroscientists have announced the launch of the world’s first ever NeuroWine, a wine that was developed by taking the tools and technologies that are traditionally used in neuroscience and applying them to the art of wine-making.
Neural Sense, a local neuromarketing consultancy, partnered with Pieter Walser, a Cape winemaker from the BLANKBottle label, and using neuroscience and biometric technologies, tested 21 different white wine and 20 different red wine varietals from a number of different vineyards across the country. They assessed Walser’s emotional and cognitive responses to each taste testing experience to create the world’s first NeuroWine (one bottle of red and one white).
Dr David Rosenstein, from Neural Sense, explains. “One of the pieces of technology we used – known as electroencephalography or EEG – is a device which fits around the head and picks up the electrical activity on the surface of one’s scalp. It looks at how the brain is functioning and the associated brain waves, which in turn tells us various things about brain activity.
After three years of research into how it might accelerate its Bing search engine using field programmable gate arrays (FPGAs), Microsoft came up with a scheme that would let it lash Stratix V devices from Altera to the two-socket server nodes in the minimalist Open Cloud Servers that it has designed expressly for its hyperscale datacenters. These CPU-FPGA hybrids were rolled out into production earlier this year to accelerate Bing page rank functions, and Microsoft started hunting around for other workloads with which to juice with FPGAs.
Deep learning was the next big job that Microsoft is pretty sure can benefit from FPGAs and, importantly, do so within the constraints of its hyperscale infrastructure. Microsoft’s systems have unique demands given that Microsoft is building systems, storage, and networks that have to support many different kinds of workloads – all within specific power, thermal, and budget envelopes.
Pretty soon, any messaging app that doesn’t have a platform for bots will be seriously left behind. “Messengers are the new browsers and bots are the new websites,” as Kik‘s Mike Roberts puts it to me.
With this in mind, the messaging app that’s big with America’s youth has today launched a bot store and developer platform to support it.
I'm sure you have been hearing at least some of the hype over "containers" and Docker this past year. In fact, Bryan Cantrill (CTO at Joyent) and Ben Hindman (founder at Mesosphere) recently declared that 2015 was the "year of the container" (see their webinar here) So what's all the hype and how does this relate to what's happening in the data science and engineering world?
If you have been living in a hole this past year, here is an introduction to containers along with some advantages of using them. Here, however, I am going to provide some resources for those wishing to containerize their data pipelines.
Mathematical thinking dominates our understanding of the universe. Now network theorists have discovered the tipping points in the evolution of ideas that have shaped the modern mathematical landscape.
If the marketer's goal is to reach customers with the right message, in the right place, at the right time, it stands to reason that deeper insight into any of those dimensions could only be a good thing. Enter Adobe, which just rolled out a raft of new data-science tools designed to help make that happen.
Scheduled to be introduced Tuesday at the company's Adobe Summit event in Las Vegas, the services bring new algorithms to the Adobe Marketing Cloud with the goal of helping brands deliver optimal customer experiences.
In the Marketing Cloud's Adobe Experience Manager, for example, a new Smart Tag feature taps machine learning to help marketers find Creative Cloud assets such as photos or videos. Smart Tag is available now.
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
The Databox envisions an open-source personal networked device, augmented by cloud-hosted services, that collates, curates, and mediates access to an individual’s personal data by verified and audited third party applications and services. The Databox will form the heart of an individual’s personal data processing ecosystem, providing a platform for managing secure access to data and enabling authorised third parties to provide the owner with authenticated services, including services that may be accessed while roaming outside the home environment
The Databox project will run for 3 years and starts in September 2016. It is funded by EPSRC, Trust, Identity, Privacy and Security in the Digital Economy theme. Industry partners include the BBC, BT, Microsoft Research, and Telefonica. The project is also supported by the Internet Society, Cornell Tech, and the Horizon Digital Economy Research Institute. The open source platform and app ecosystem code will appear on the repository on GitHub.
We propose there is a need for a technical platform enabling people to engage with the collection, management and consumption of personal data; and that this platform should itself be personal, under the direct control of the individual whose data it holds. In what follows, we refer to this platform as the Databox, a personal, networked service that collates personal data and can be used to make those data available. While your Databox is likely to be a virtual platform, in that it will involve multiple devices and services, at least one instance of it will exist in physical form such as on a physical form-factor computing device with associated storage and networking, such as a home hub.
Recent innovations such as optogenetics and neuroimaging enable us to characterise the relationship between the activity of neurons and their system-level behaviour. These technical innovations call for theories that describe neuronal interactions and reveal the underlying principles. In physics, electromagnetism, the general theory of relativity, and the quantum field theory have been consolidated within the rigorous and holistic framework of a gauge theory . Here, we propose that a gauge-theoretic formalism in neurosciences might not only provide a quantitative framework for modelling neural activity but also show that neuronal dynamics across scales—from single neurons to population activity—are described by the same principle. This paper suggests that if we could formulate a gauge theory for the brain—or cast an existing theory as a gauge theory—then many aspects of neurobiology can be seen as consequences of fundamental invariance properties. This approach could clarify the intimate relationship between apparently distinct phenomena (e.g., action and perception) and, potentially, offer new tools for computational neuroscience and modelling.
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.