Although no one can quite agree how to define it, the general idea is to find datasets so enormous that they can reveal patterns invisible to conventional inquiry. The data are often generated by millions of real-world user actions, such as tweets or credit-card purchases, and they can take thousands of computers to collect, store, and analyze. To many companies and researchers, though, the investment is worth it because the patterns can unlock information about anything from genetic disorders to tomorrow’s stock prices.
But there’s a problem: It’s tempting to think that with such an incredible volume of data behind them, studies relying on big data couldn’t be wrong. But the bigness of the data can imbue the results with a false sense of certainty. Many of them are probably bogus—and the reasons why should give us pause about any research that blindly trusts big data.
Hacking Creativity, led by the Red Bull High Performance group, is the largest study of creative style in history. And now everyone can participate in this groundbreaking effort to unlock the secrets of the creative process. Take our quick survey: It’s not a test of how creative you are, but rather a profile of how you create.
Discover which creative habits you share with a diverse group of 500 global innovators. Are you a multitasker or do you prefer to focus on one project at a time? Do you work best alone or in a more collaborative environment? Find inspiration from those trailblazers following similar paths to creativity and learn how to boost your own creative firepower.
The human gut harbors a large and complex community of beneficial microbes that remain stable over long periods. This stability is considered critical for good health but is poorly understood. Here we develop a body of ecological theory to help us understand microbiome stability. Although cooperating networks of microbes can be efficient, we find that they are often unstable. Counterintuitively, this finding indicates that hosts can benefit from microbial competition when this competition dampens cooperative networks and increases stability. More generally, stability is promoted by limiting positive feedbacks and weakening ecological interactions. We have analyzed host mechanisms for maintaining stability—including immune suppression, spatial structuring, and feeding of community members—and support our key predictions with recent data.
The ecology of the microbiome: Networks, competition, and stability Katharine Z. Coyte, Jonas Schluter, Kevin R. Foster
An important challenge in several disciplines is to understand how sudden changes can propagate among coupled systems. Examples include the synchronization of business cycles, population collapse in patchy ecosystems, markets shifting to a new technology platform, collapses in prices and in confidence in financial markets, and protests erupting in multiple countries. A number of mathematical models of these phenomena have multiple equilibria separated by saddle-node bifurcations. We study this behaviour in its normal form as fast–slow ordinary differential equations. In our model, a system consists of multiple subsystems, such as countries in the global economy or patches of an ecosystem. Each subsystem is described by a scalar quantity, such as economic output or population, that undergoes sudden changes via saddle-node bifurcations. The subsystems are coupled via their scalar quantity (e.g. trade couples economic output; diffusion couples populations); that coupling moves the locations of their bifurcations. The model demonstrates two ways in which sudden changes can propagate: they can cascade (one causing the next), or they can hop over subsystems. The latter is absent from classic models of cascades. For an application, we study the Arab Spring protests. After connecting the model to sociological theories that have bistability, we use socioeconomic data to estimate relative proximities to tipping points and Facebook data to estimate couplings among countries. We find that although protests tend to spread locally, they also seem to ‘hop' over countries, like in the stylized model; this result highlights a new class of temporal motifs in longitudinal network datasets.
Coupled catastrophes: sudden shifts cascade and hop among interdependent systems
Charles D. Brummitt, George Barnett, Raissa M. D'Souza
Natural ecosystems are icons of complexity because of the myriad ways in which species depend on, and influence, one another - from feeding relationships, to competition for space, to helping each other by providing habitat or shelter. With Sonia Kefi at the helm, we just published, to our knowledge, the first comprehensive map of all known ecological interactions for an entire ecosystem: the coastal marine intertidal of central Chile. The result? This incredible diversity of dependencies that characterize natural history is not random. The links realized are a very small, and predictable subset of what is possible.
Plants have a hard time finding mates -- their inability to get up and move around tends to inhibit them. Luckily for plants, bees and other pollinator species (including butterflies, moths and birds) help matchmake these lonely plants in exchange for food. Fernanda S. Valdovinos explains how these intricate pollination networks work and how it can all change from one season to the next.
Eric L Berlow's insight:
This is the second lesson in our Ecology Series in collaboration with TED Ed - focused on ecological networks. This lesson describes the architecture of pollination networks in real ecosystems.
Computational creativity is an emerging branch of artificial intelligence that places computers in the center of the creative process. Broadly, creativity involves a generative step to produce many ideas and a selective step to determine the ones that are the best. Many previous attempts at computational creativity, however, have not been able to achieve a valid selective step. This work shows how bringing data sources from the creative domain and from hedonic psychophysics together with big data analytics techniques can overcome this shortcoming to yield a system that can produce novel and high-quality creative artifacts. Our data-driven approach is demonstrated through a computational creativity system for culinary recipes and menus we developed and deployed, which can operate either autonomously or semi-autonomously with human interaction. We also comment on the volume, velocity, variety, and veracity of data in computational creativity.
Abstract: While many recently proposed methods aim to detect network communities in large datasets, such as those generated by social media and telecommunications services, most evaluation (i.e. benchmarking) of this research is based on small, hand-curated datasets. We argue that these two types of networks differ so significantly that, by evaluating algorithms solely on the smaller networks, we know little about how well they perform on the larger datasets. Recent work addresses this problem by introducing social network datasets annotated with meta-data that is believed to approximately indicate a ‘ground truth’ set of network communities. While such efforts are a step in the right direction, we find this meta-data problematic for two reasons. First, in practice, the groups contained in such meta-data may only be a subset of a network's communities. Second, while it is often reasonable to assume that meta-data is related to network communities in some way, we must be cautious about assuming that these groups correspond closely to network communities. Here, we consider these difficulties and propose an evaluation scheme based on a classification task that is tailored to deal with them.
Some communities have agreed to share online — geneticists, for example, post DNA sequences at the GenBank repository, and astronomers are accustomed to accessing images of galaxies and stars from, say, the Sloan Digital Sky Survey, a telescope that has observed some 500 million objects — but these remain the exception, not the rule. Historically, scientists have objected to sharing for many reasons: it is a lot of work; until recently, good databases did not exist; grant funders were not pushing for sharing; it has been difficult to agree on standards for formatting data and the contextual information called metadata; and there is no agreed way to assign credit for data. But the barriers are disappearing, in part because journals and funding agencies worldwide are encouraging scientists to make their data public.
An ecological network synthesizing all known interactions between species exhibits a clear pattern of organization that reflects evolutionary and ecological constraints operating in this entangled bank of species.
Eric L Berlow's insight:
Species are linked to each other by a myriad of positive and negative interactions. This complex spectrum of interactions constitutes a network of links that mediates ecological communities’ response to perturbations, such as exploitation and climate change. In the last decades, there have been great advances in the study of intricate ecological networks. We have, nonetheless, lacked both the data and the tools to more rigorously understand the patterning of multiple interaction types between species (i.e., “multiplex networks”), as well as their consequences for community dynamics. Using network statistical modeling applied to a comprehensive ecological network, which includes trophic and diverse non-trophic links, we provide a first glimpse at what the full “entangled bank” of species looks like. The community exhibits clear multidimensional structure, which is taxonomically coherent and broadly predictable from species traits. Moreover, dynamic simulations suggest that this non-random patterning of how diverse non-trophic interactions map onto the food web could allow for higher species persistence and higher total biomass than expected by chance and tends to promote a higher robustness to extinctions.
Misuse of the P value — a common test for judging the strength of scientific evidence — is contributing to the number of research findings that cannot be reproduced, the American Statistical Association (ASA) warns in a statement released today. The group has taken the unusual step of issuing principles to guide use of the P value, which it says cannot determine whether a hypothesis is true or whether results are important.
Scientists perform a tiny subset of all possible experiments. What characterizes the experiments they choose? And what are the consequences of those choices for the pace of scientific discovery? We model scientific knowledge as a network and science as a sequence of experiments designed to gradually uncover it. By analyzing millions of biomedical articles published over 30 y, we find that biomedical scientists pursue conservative research strategies exploring the local neighborhood of central, important molecules. Although such strategies probably serve scientific careers, we show that they slow scientific advance, especially in mature fields, where more risk and less redundant experimentation would accelerate discovery of the network. We also consider institutional arrangements that could help science pursue these more efficient strategies.
Choosing experiments to accelerate collective discovery Andrey Rzhetsky, Jacob G. Foster, Ian T. Foster, and James A. Evans
An attempt to reconcile the effects of temperature on economic productivity at the micro and macro levels produces predictions of global economic losses due to climate change that are much higher than previous estimates.
This study leveraged a network-enriched species distribution model to filter out the environmental "noise" and infer causality from observational data.
Statistical models often use observational data to predict phenomena; however, interpreting model terms to understand their influence can be problematic. This issue poses a challenge in species conservation where setting priorities requires estimating influences of potential stressors using observational data. We present a novel approach for inferring influence of a rare stressor on a rare species by blending predictive models with nonparametric permutation tests. We illustrate the approach with two case studies involving rare amphibians in Yosemite National Park, USA. The endangered frog, Rana sierrae, is known to be negatively impacted by non-native fish, while the threatened toad, Anaxyrus canorus, is potentially affected by packstock. Both stressors and amphibians are rare, occurring in ~10% of potential habitat patches. We first predict amphibian occupancy with a statistical model that includes all predictors but the stressor to stratify potential habitat by predicted suitability. A stratified permutation test then evaluates the association between stressor and amphibian, all else equal. Our approach confirms the known negative relationship between fish and R. sierrae, but finds no evidence of a negative relationship between current packstock use and A. canorus breeding. Our statistical approach has potential broad application for deriving understanding (not just prediction) from observational data.
When you picture the lowest levels of the food chain, you might imagine herbivores happily munching on lush, living green plants. But this idyllic image leaves out a huge (and slightly less appetizing) source of nourishment: dead stuff. John C. Moore details the "brown food chain," explaining how such unlikely delicacies as pond scum and animal poop contribute enormous amounts of energy to our ecosystems.
Eric L Berlow's insight:
This is the first lesson in our Ecology Series in collaboration with TED Ed . The series focuses on networks in ecology. This one by John Moore explores the wonderful world of detritus. Most of us know that nature 'recycles' dead parts - but this lesson highlights that when you scale this process to the entire ecosystem, dead stuff, and 'brown' food chains, are an unexpectedly huge source of energy that fuels most ecosystems.
While feedback loops are a bummer at band practice, they are essential in nature. What does nature’s feedback look like, and how does it build the resilience of our world? Anje-Margriet Neutel describes some common positive and negative feedback loops, examining how an ecosystem’s many loops come together to make its ‘trademark sound.’
Eric L Berlow's insight:
Positive and negative feedback loops are confusing to many people. In this animated information-rich 5-minute TED Ed lesson, Anje-Margriet Neutel - a world expert in the loopiness of natural ecosystems - nails it with an interesting analogy to music. Her landmark Science paper in 2002 describes the role of weak links in long loops for stabilizing ecosystems.
Effective point-of-use devices for providing safe drinking water are urgently needed to reduce the global burden of waterborne disease. Here we show that plant xylem from the sapwood of coniferous trees – a readily available, inexpensive, biodegradable, and disposable material – can remove bacteria from water by simple pressure-driven filtration. Approximately 3 cm3 of sapwood can filter water at the rate of several liters per day, sufficient to meet the clean drinking water needs of one person. The results demonstrate the potential of plant xylem to address the need for pathogen-free drinking water in developing countries and resource-limited settings.
Many complex networks show signs of modular structure, uncovered by community detection. Although many methods succeed in revealing various partitions, it remains difficult to detect at what scale some partition is significant.
Models for the topology or dynamics of various networks abound, but until now, there has been no single universal framework for complex networks that can separate factors contributing to the topology and dynamics of networks.
Recognizing direct relationships between variables connected in a network is a pervasive problem in biological, social and information sciences as correlation-based networks contain numerous indirect relationships. Here we present a general method for inferring direct effects from an observed correlation matrix containing both direct and indirect effects. We formulate the problem as the inverse of network convolution, and introduce an algorithm that removes the combined effect of all indirect paths of arbitrary length in a closed-form solution by exploiting eigen-decomposition and infinite-series sums. We demonstrate the effectiveness of our approach in several network applications: distinguishing direct targets in gene expression regulatory networks; recognizing directly interacting amino-acid residues for protein structure prediction from sequence alignments; and distinguishing strong collaborations in co-authorship social networks using connectivity information alone. In addition to its theoretical impact as a foundational graph theoretic tool, our results suggest network deconvolution is widely applicable for computing direct dependencies in network science across diverse disciplines.
Network deconvolution as a general method to distinguish direct dependencies in networks Soheil Feizi, Daniel Marbach, Muriel Médard & Manolis Kellis
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.