A universal need in understanding complex networks is the identification of individual information channels and their mutual interactions under different conditions. In neuroscience, our premier example, networks made up of billions of nodes dynamically interact to bring about thought and action. Granger causality is a powerful tool for identifying linear interactions, but handling nonlinear interactions remains an unmet challenge. We present a nonlinear multidimensional hidden state (NMHS) approach that achieves interaction strength analysis and decoding of networks with nonlinear interactions by including latent state variables for each node in the network. We compare NMHS to Granger causality in analyzing neural circuit recordings and simulations, improvised music, and sociodemographic data. We conclude that NMHS significantly extends the scope of analyses of multidimensional, nonlinear networks, notably in coping with the complexity of the brain.
Error rates across one of Facebook’s sites were spiking. The problem had first shown up through an automated alert triggered by an in-memory time-series database called Gorilla a few minutes after the problem started. One set of engineers mitigated the immediate issue. A second group set out to find the root cause. They fired up Facebook’s time series correlation engine built on top of Gorilla, and searched for metrics showing a correlation with the errors. This showed that copying a release binary to Facebook’s web servers (a routine event) caused an anomalous drop in memory used across the site…
An important problem in econometrics and marketing is to infer the causal impact that a designed market intervention has exerted on an outcome metric over time. In order to allocate a given budget optimally, for example, an advertiser must assess to what extent different campaigns have contributed to an incremental lift in web searches, product installs, or sales. This paper proposes to infer causal impact on the basis of a diffusion-regression state-space model that predicts the counterfactual market response that would have occurred had no intervention taken place. In contrast to classical difference-in-differences schemes, state-space models make it possible to (i) infer the temporal evolution of attributable impact, (ii) incorporate empirical priors on the parameters in a fully Bayesian treatment, and (iii) flexibly accommodate multiple sources of variation, including the time-varying influence of contemporaneous covariates, i.e., synthetic controls. Using a Markov chain Monte Carlo algorithm for model inversion, we illustrate the statistical properties of our approach on synthetic data. We then demonstrate its practical utility by evaluating the effect of an online advertising campaign on search-related site visits. We discuss the strengths and limitations of state-space models in enabling causal attribution in those settings where a randomised experiment is unavailable. The CausalImpact R package provides an implementation of our approach.
The team behind Pivotal's GemFire in-memory transactional data store recently unveiled a new database solution powered by GemFire and Apache Spark, called SnappyData.
How to integrate with the Slack platform More than simply another collaboration solution, Slack has RESTful APIs that let you exchange data with READ NOW SnappyData is another recent example of Spark employed as a component in a larger database solution, with or without other pieces from Apache Hadoop.
Amazon Kinesis Firehose, the easiest way to load streaming data into AWS, now supports Amazon Elasticsearch Service as a data delivery destination. You can now use Amazon Kinesis Firehose to stream data to your Amazon Elasticsearch domains continuously and in near real time. Amazon Kinesis Firehose automatically scales to match the throughput of your data and handles all the underlying stream management. For more information, see the Amazon Kinesis Firehose website and developer guide.
Cascading overload failures are widely found in large-scale parallel systems and remain a major threat to system reliability; therefore, they are of great concern to maintainers and managers of different systems. Accurate cascading failure prediction can provide useful information to help control networks. However, for a large, gradually growing network with increasing complexity, it is often impractical to explore the behavior of a single node from the perspective of failure propagation. Fortunately, overload failures that propagate through a network exhibit certain spatial-temporal correlations, which allows the study of a group of nodes that share common spatial and temporal characteristics. Therefore, in this study, we seek to predict the failure rates of nodes in a given group using machine-learning methods.
We simulated overload failure propagations in a weighted lattice network that start with a center attack and predicted the failure percentages of different groups of nodes that are separated by a given distance. The experimental results of a feedforward neural network (FNN), a recurrent neural network (RNN) and support vector regression (SVR) all show that these different models can accurately predict the similar behavior of nodes in a given group during cascading overload propagation.
(This is part 2 of a two part series of blog posts about doing data science and engineering in a containerized world, see part 1 here) Let's admit it, data scientists are developing some pretty sweet (and potentially valuable) models, optimizations, visualizations, etc. Unfortunately, many of these models will never
EY and Adobe have today announced a new strategic alliance that will expand digital experience and web content services to help clients with their digital transformations. Adobe, a leader in digital marketing solutions, will team with EY to help companies improve cost efficiency and gain competitive advantage through digital transformation programs.
For the past three years, our smartest engineers at Databricks have been working on a stealth project. Today, we are unveiling DeepSpark, a major new milestone in Apache Spark. DeepSpark uses cutting-edge neural networks to automate the many manual processes of software development, including writing test cases, fixing bugs, implementing features according to specs, and reviewing pull requests (PRs) for their correctness, simplicity, and style.
XGBoost is a library designed and optimized for tree boosting. Gradient boosting trees model is originally proposed by Friedman et al. By embracing multi-threads and introducing regularization, XGBoost delivers higher computational power and more accurate prediction. More than half of the winning solutions in machine learning challenges hosted at Kaggle adopt XGBoost (Incomplete list). XGBoost has provided native interfaces for C++, R, python, Julia and Java users. It is used by both data exploration and production scenarios to solve real world machine learning problems.
You might ask what the difference is between most artificial intelligence (AI) companies and SparkCognition. Here it is: while at other firms, humans build models; SparkCognition puts them together with algorithms. Rather than roughing out one model and then doing a bunch of testing, SparkCognition continually tests and fits models to data accumulating in real time, an architecture that allows it to deal with big data.
The CausalImpact R package implements an approach to estimating the causal effect of a designed intervention on a time series. For example, how many additional daily clicks were generated by an advertising campaign? Answering a question like this can be difficult when a randomized experiment is not available. The package aims to address this difficulty using a structural Bayesian time-series model to estimate how the response metric might have evolved after the intervention if the intervention had not occurred.
Author Summary Shape plays an important role in object recognition. Despite years of research, no models of vision could account for shape understanding as found in human vision of natural images. Given recent successes of deep neural networks (DNNs) in object recognition, we hypothesized that DNNs might in fact learn to capture perceptually salient shape dimensions. Using a variety of stimulus sets, we demonstrate here that the output layers of several DNNs develop representations that relate closely to human perceptual shape judgments. Surprisingly, such sensitivity to shape develops in these models even though they were never explicitly trained for shape processing. Moreover, we show that these models also represent categorical object similarity that follows human semantic judgments, albeit to a lesser extent. Taken together, our results bring forward the exciting idea that DNNs capture not only objective dimensions of stimuli, such as their category, but also their subjective, or perceptual, aspects, such as shape and semantic similarity as judged by humans.
Anomaly detection is the process of identifying unexpected items or events in datasets, which differ from the norm. In contrast to standard classification tasks, anomaly detection is often applied on unlabeled data, taking only the internal structure of the dataset into account. This challenge is known as unsupervised anomaly detection and is addressed in many practical applications, for example in network intrusion detection, fraud detection as well as in the life science and medical domain. Dozens of algorithms have been proposed in this area, but unfortunately the research community still lacks a comparative universal evaluation as well as common publicly available datasets. These shortcomings are addressed in this study, where 19 different unsupervised anomaly detection algorithms are evaluated on 10 different datasets from multiple application domains. By publishing the source code and the datasets, this paper aims to be a new well-funded basis for unsupervised anomaly detection research. Additionally, this evaluation reveals the strengths and weaknesses of the different approaches for the first time. Besides the anomaly detection performance, computational effort, the impact of parameter settings as well as the global/local anomaly detection behavior is outlined. As a conclusion, we give an advise on algorithm selection for typical real-world tasks.
A South African team of neuromarketers and neuroscientists have announced the launch of the world’s first ever NeuroWine, a wine that was developed by taking the tools and technologies that are traditionally used in neuroscience and applying them to the art of wine-making.
Neural Sense, a local neuromarketing consultancy, partnered with Pieter Walser, a Cape winemaker from the BLANKBottle label, and using neuroscience and biometric technologies, tested 21 different white wine and 20 different red wine varietals from a number of different vineyards across the country. They assessed Walser’s emotional and cognitive responses to each taste testing experience to create the world’s first NeuroWine (one bottle of red and one white).
Dr David Rosenstein, from Neural Sense, explains. “One of the pieces of technology we used – known as electroencephalography or EEG – is a device which fits around the head and picks up the electrical activity on the surface of one’s scalp. It looks at how the brain is functioning and the associated brain waves, which in turn tells us various things about brain activity.
After three years of research into how it might accelerate its Bing search engine using field programmable gate arrays (FPGAs), Microsoft came up with a scheme that would let it lash Stratix V devices from Altera to the two-socket server nodes in the minimalist Open Cloud Servers that it has designed expressly for its hyperscale datacenters. These CPU-FPGA hybrids were rolled out into production earlier this year to accelerate Bing page rank functions, and Microsoft started hunting around for other workloads with which to juice with FPGAs.
Deep learning was the next big job that Microsoft is pretty sure can benefit from FPGAs and, importantly, do so within the constraints of its hyperscale infrastructure. Microsoft’s systems have unique demands given that Microsoft is building systems, storage, and networks that have to support many different kinds of workloads – all within specific power, thermal, and budget envelopes.
Pretty soon, any messaging app that doesn’t have a platform for bots will be seriously left behind. “Messengers are the new browsers and bots are the new websites,” as Kik‘s Mike Roberts puts it to me.
With this in mind, the messaging app that’s big with America’s youth has today launched a bot store and developer platform to support it.
I'm sure you have been hearing at least some of the hype over "containers" and Docker this past year. In fact, Bryan Cantrill (CTO at Joyent) and Ben Hindman (founder at Mesosphere) recently declared that 2015 was the "year of the container" (see their webinar here) So what's all the hype and how does this relate to what's happening in the data science and engineering world?
If you have been living in a hole this past year, here is an introduction to containers along with some advantages of using them. Here, however, I am going to provide some resources for those wishing to containerize their data pipelines.
Mathematical thinking dominates our understanding of the universe. Now network theorists have discovered the tipping points in the evolution of ideas that have shaped the modern mathematical landscape.
If the marketer's goal is to reach customers with the right message, in the right place, at the right time, it stands to reason that deeper insight into any of those dimensions could only be a good thing. Enter Adobe, which just rolled out a raft of new data-science tools designed to help make that happen.
Scheduled to be introduced Tuesday at the company's Adobe Summit event in Las Vegas, the services bring new algorithms to the Adobe Marketing Cloud with the goal of helping brands deliver optimal customer experiences.
In the Marketing Cloud's Adobe Experience Manager, for example, a new Smart Tag feature taps machine learning to help marketers find Creative Cloud assets such as photos or videos. Smart Tag is available now.
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.