Information is a precise concept that can be defined mathematically, but its relationship to what we call "knowledge" is not always made clear. Furthermore, the concepts "entropy" and "information", while deeply related, are distinct and must be used with care, something that is not always achieved in the literature. In this elementary introduction, the concepts of entropy and information are laid out one by one, explained intuitively, but defined rigorously. I argue that a proper understanding of information in terms of prediction is key to a number of disciplines beyond engineering, such as physics and biology.
Human impacts on the planet, including anthropogenic climate change, are reshaping ecosystems in unprecedented ways. To meet the challenge of conserving biodiversity in this rapidly changing world, we must understand how ecological assemblages respond to novel conditions (1). However, species in ecosystems are not fixed entities, even without human-induced change. All ecosystems experience natural turnover in species presence and abundance. Taking account of this baseline turnover in conservation planning could play an important role in protecting biodiversity.
In peer recommendation systems, social signals affect item popularity about half as much as position and content do, and further create a "herding" effect that biases people's judgments about the content.
Since its introduction in the 1960s, the theory of innovation diffusion has contributed to the advancement of several research fields, such as marketing management and consumer behavior. The 1969 seminal paper by Bass [F.M. Bass, Manag. Sci. 15, 215 (1969)] introduced a model of product growth for consumer durables, which has been extensively used to predict innovation diffusion across a range of applications. Here, we propose a novel approach to study innovation diffusion, where interactions among individuals are mediated by the dynamics of a time-varying network. Our approach is based on the Bass’ model, and overcomes key limitations of previous studies, which assumed timescale separation between the individual dynamics and the evolution of the connectivity patterns. Thus, we do not hypothesize homogeneous mixing among individuals or the existence of a fixed interaction network. We formulate our approach in the framework of activity driven networks to enable the analysis of the concurrent evolution of the interaction and individual dynamics. Numerical simulations offer a systematic analysis of the model behavior and highlight the role of individual activity on market penetration when targeted advertisement campaigns are designed, or a competition between two different products takes place.
The new digital revolution of big data is deeply changing our capability of understanding society and forecasting the outcome of many social and economic systems. Unfortunately, information can be very heterogeneous in the importance, relevance, and surprise it conveys, affecting severely the predictive power of semantic and statistical methods. Here we show that the aggregation of web users’ behavior can be elicited to overcome this problem in a hard to predict complex system, namely the financial market. Specifically, our in-sample analysis shows that the combined use of sentiment analysis of news and browsing activity of users of Yahoo! Finance greatly helps forecasting intra-day and daily price changes of a set of 100 highly capitalized US stocks traded in the period 2012–2013. Sentiment analysis or browsing activity when taken alone have very small or no predictive power. Conversely, when considering a news signal where in a given time interval we compute the average sentiment of the clicked news, weighted by the number of clicks, we show that for nearly 50% of the companies such signal Granger-causes hourly price returns. Our result indicates a “wisdom-of-the-crowd” effect that allows to exploit users’ activity to identify and weigh properly the relevant and surprising news, enhancing considerably the forecasting power of the news sentiment.
Ranco G, Bordino I, Bormetti G, Caldarelli G, Lillo F, Treccani M (2016) Coupling News Sentiment with Web Browsing Data Improves Prediction of Intra-Day Price Dynamics. PLoS ONE 11(1): e0146576. http://dx.doi.org/10.1371/journal.pone.0146576
Women are dramatically underrepresented in computer science at all levels in academia and account for just 15% of tenure-track faculty. Understanding the causes of this gender imbalance would inform both policies intended to rectify it and employment decisions by departments and individuals. Progress in this direction, however, is complicated by the complexity and decentralized nature of faculty hiring and the non-independence of hires. Using comprehensive data on both hiring outcomes and scholarly productivity for 2659 tenure-track faculty across 205 Ph.D.-granting departments in North America, we investigate the multi-dimensional nature of gender inequality in computer science faculty hiring through a network model of the hiring process. Overall, we find that hiring outcomes are most directly affected by (i) the relative prestige between hiring and placing institutions and (ii) the scholarly productivity of the candidates. After including these, and other features, the addition of gender did not significantly reduce modeling error. However, gender differences do exist, e.g., in scholarly productivity, postdoctoral training rates, and in career movements up the rankings of universities, suggesting that the effects of gender are indirectly incorporated into hiring decisions through gender's covariates. Furthermore, we find evidence that more highly ranked departments recruit female faculty at higher than expected rates, which appears to inhibit similar efforts by lower ranked departments. These findings illustrate the subtle nature of gender inequality in faculty hiring networks and provide new insights to the underrepresentation of women in computer science.
Gender, Productivity, and Prestige in Computer Science Faculty Hiring Networks Samuel F. Way, Daniel B. Larremore, Aaron Clauset
Loss of cortical integration and changes in the dynamics of electrophysiological brain signals characterize the transition from wakefulness towards unconsciousness. In this study, we arrive at a basic model explaining these observations based on the theory of phase transitions in complex systems. We studied the link between spatial and temporal correlations of large-scale brain activity recorded with functional magnetic resonance imaging during wakefulness, propofol-induced sedation and loss of consciousness and during the subsequent recovery. We observed that during unconsciousness activity in frontothalamic regions exhibited a reduction of long-range temporal correlations and a departure of functional connectivity from anatomical constraints. A model of a system exhibiting a phase transition reproduced our findings, as well as the diminished sensitivity of the cortex to external perturbations during unconsciousness. This framework unifies different observations about brain activity during unconsciousness and predicts that the principles we identified are universal and independent from its causes.
Large-scale signatures of unconsciousness are consistent with a departure from critical dynamics Enzo Tagliazucchi, Dante R. Chialvo, Michael Siniatchkin, Enrico Amico, Jean-Francois Brichant, Vincent Bonhomme, Quentin Noirhomme, Helmut Laufs, Steven Laureys
Recent availability of geo-localized data capturing individual human activity together with the statistical data on international migration opened up unprecedented opportunities for a study on global mobility. In this paper we consider it from the perspective of a multi-layer complex network, built using a combination of three datasets: Twitter, Flickr and official migration data. Those datasets provide different but equally important insights on the global mobility: while the first two highlight short-term visits of people from one country to another, the last one - migration - shows the long-term mobility perspective, when people relocate for good. And the main purpose of the paper is to emphasize importance of this multi-layer approach capturing both aspects of human mobility at the same time. So we start from a comparative study of the network layers, comparing short- and long- term mobility through the statistical properties of the corresponding networks, such as the parameters of their degree centrality distributions or parameters of the corresponding gravity model being fit to the network. We also focus on the differences in country ranking by their short- and long-term attractiveness, discussing the most noticeable outliers. Finally, we apply this multi-layered human mobility network to infer the structure of the global society through a community detection approach and demonstrate that consideration of mobility from a multi-layer perspective can reveal important global spatial patterns in a way more consistent with other available relevant sources of international connections, in comparison to the spatial structure inferred from each network layer taken separately.
Global multi-layer network of human mobility Alexander Belyi, Iva Bojic, Stanislav Sobolevsky, Izabela Sitko, Bartosz Hawelka, Lada Rudikova, Alexander Kurbatski, Carlo Ratti
Consensus dynamics in decentralised multiagent systems are subject to intense studies, and several different models have been proposed and analysed. Among these, the naming game stands out for its simplicity and applicability to a wide range of phenomena and applications, from semiotics to engineering. Despite the wide range of studies available, the implementation of theoretical models in real distributed systems is not always straightforward, as the physical platform imposes several constraints that may have a bearing on the consensus dynamics. In this paper, we investigate the effects of an implementation of the naming game for the kilobot robotic platform, in which we consider concurrent execution of games and physical interferences. Consensus dynamics are analysed in the light of the continuously evolving communication network created by the robots, highlighting how the different regimes crucially depend on the robot density and on their ability to spread widely in the experimental arena. We find that physical interferences reduce the benefits resulting from robot mobility in terms of consensus time, but also result in lower cognitive load for individual agents.
Emergence of Consensus in a Multi-Robot Network: from Abstract Models to Empirical Validation Vito Trianni, Daniele De Simone, Andreagiovanni Reina, Andrea Baronchelli
To find useful work to do for their colony, individual eusocial animals have to move, somehow staying attentive to relevant social information. Recent research on individual Temnothorax albipennis ants moving inside their colony’s nest found a power-law relationship between a movement’s duration and its average speed; and a universal speed profile for movements showing that they mostly fluctuate around a constant average speed. From this predictability it was inferred that movement durations are somehow determined before the movement itself. Here, we find similar results in lone T. albipennis ants exploring a large arena outside the nest, both when the arena is clean and when it contains chemical information left by previous nest-mates. This implies that these movement characteristics originate from the same individual neural and/or physiological mechanism(s), operating without immediate regard to social influences. However, the presence of pheromones and/or other cues was found to affect the inter-event speed correlations. Hence we suggest that ants’ motor planning results in intermittent response to the social environment: movement duration is adjusted in response to social information only between movements, not during them. This environmentally flexible, intermittently responsive movement behaviour points towards a spatially allocated division of labour in this species. It also prompts more general questions on collective animal movement and the role of intermittent causation from higher to lower organizational levels in the stability of complex systems.
Human computation, a term introduced by Luis von Ahn (1), refers to distributed systems that combine the strengths of humans and computers to accomplish tasks that neither can do alone (2). The seminal example is reCAPTCHA, a Web widget used by 100 million people a day when they transcribe distorted text into a box to prove they are human. This free cognitive labor provides users with access to Web content and keeps websites safe from spam attacks, while feeding into a massive, crowd-powered transcription engine that has digitized 13 million articles from The New York Times archives (3). But perhaps the best known example of human computation is Wikipedia. Despite initial concerns about accuracy (4), it has become the key resource for all kinds of basic information. Information science has begun to build on these early successes, demonstrating the potential to evolve human computation systems that can model and address wicked problems (those that defy traditional problem-solving methods) at the intersection of economic, environmental, and sociopolitical systems.
The power of crowds Pietro Michelucci, Janis L. Dickinson
P-values are widely used in both the social and natural sciences to quantify the statistical significance of observed results. The recent surge of big data research has made p-value an even more popular tool to test the significance of a study. However, substantial literature has been produced critiquing how p-values are used and understood. In this paper we review this recent critical literature, much of which is routed in the life sciences, and consider its implications for social scientific research. We provide a coherent picture of what the main criticisms are, and draw together and disambiguate common themes. In particular, we explain how the False Discovery Rate is calculated, and how this differs from a p-value. We also make explicit the Bayesian nature of many recent criticisms, a dimension that is often underplayed or ignored. We also identify practical steps to help remediate some of the concerns identified, and argue that p-values need to be contextualised within (i) the specific study, and (ii) the broader field of inquiry.
P-values: misunderstood and misused Bertie Vidgen, Taha Yasseri
There is huge amount of content produced online by amateur authors, covering a large variety of topics. Sentiment analysis (SA) extracts and aggregates users’ sentiments towards a target entity. Machine learning (ML) techniques are frequently used as the natural language data is in abundance and has definite patterns. ML techniques adapt to domain specific solution at high accuracy depending upon the feature set used. The lexicon-based techniques, using external dictionary, are independent of data to prevent overfitting but they miss context too in specialized domains. Corpus-based statistical techniques require large data to stabilize. Complex network based techniques are highly resourceful, preserving order, proximity, context and relationships. Recent applications developed incorporate the platform specific structural information i.e. meta-data. New sub-domains are introduced as influence analysis, bias analysis, and data leakage analysis. The nature of data is also evolving where transcribed customer-agent phone conversation are also used for sentiment analysis. This paper reviews sentiment analysis techniques and highlight the need to address natural language processing (NLP) specific open challenges. Without resolving the complex NLP challenges, ML techniques cannot make considerable advancements. The open issues and challenges in the area are discussed, stressing on the need of standard datasets and evaluation methodology. It also emphasized on the need of better language models that could capture context and proximity.
Sentiment analysis and the complex natural language Muhammad Taimoor Khan, Mehr Durrani, Armughan Ali, Irum Inayat, Shehzad Khalid and Kamran Habib Khan
The tensile strength of a chain is determined by its weakest link. Does this idea apply to more complex systems too? For instance, does the weakest thread of a spider web initiate cascading failure, when a strong wind gust is stretching the web to its limit? What happens to a computer when both the supply voltage and the ambient temperature are more than 20% outside its normal range of operations? Climate change, an increasingly more densely populated world and the rapid change of technology seem to put more systems under large stress. Engineering sustainable systems with a more favorable response to large stress appears to be an urgent societal need. Emergency evacuations of hospitals after hurricane Katharina and Sandy, and the May 22, 2011 tornado in Joplin illustrate the urgent need for modeling the adaptive capacity of hospitals during an extended loss of infrastructure . Presidential Policy Directive 21  and the U.S. Department of Homeland Security National Infrastructure Protection Plan (NIPP)  call for increasing resilience of the nation’s critical infrastructure.
System under large stress: Prediction and management of catastrophic failures Alfred Hübler
A computer has beaten a human professional for the first time at Go — an ancient board game that has long been viewed as one of the greatest challenges for artificial intelligence (AI). The best human players of chess, draughts and backgammon have all been outplayed by computers. But a hefty handicap was needed for computers to win at Go. Now Google’s London-based AI company, DeepMind, claims that its machine has mastered the game.
Complexity Digest's insight:
Mastering the game of Go with deep neural networks and tree search David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre+ et al.
We study a large data set of protein structure ensembles of very diverse sizes determined by nuclear magnetic resonance. By examining the distance-dependent correlations in the displacement of residues pairs and conducting finite size scaling analysis it was found that the correlations and susceptibility behave as in systems near a critical point implying that, at the native state, the motion of each amino acid residue is felt by every other residue up to the size of the protein molecule. Furthermore certain protein's shapes corresponding to maximum susceptibility were found to be more probable than others. Overall the results suggest that the protein's native state is critical, implying that despite being posed near the minimum of the energy landscape, they still preserve their dynamic flexibility.
Critical fluctuations in proteins native states Qian-Yuan Tang, Yang-Yang Zhang, Jun Wang, Wei Wang, Dante R. Chialvo
Understanding cities is central to addressing major global challenges from climate change to economic resilience. Although increasingly perceived as fundamental socio-economic units, the detailed fabric of urban economic activities is only recently accessible to comprehensive analyses with the availability of large datasets. Here, we study abundances of business categories across US metropolitan statistical areas, and provide a framework for measuring the intrinsic diversity of economic activities that transcends scales of the classification scheme. A universal structure common to all cities is revealed, manifesting self-similarity in internal economic structure as well as aggregated metrics (GDP, patents, crime). We present a simple mathematical derivation of the universality, and provide a model, together with its economic implications of open-ended diversity created by urbanization, for understanding the observed empirical distribution. Given the universal distribution, scaling analyses for individual business categories enable us to determine their relative abundances as a function of city size. These results shed light on the processes of economic differentiation with scale, suggesting a general structure for the growth of national economies as integrated urban systems.
Scaling and universality in urban economic diversification Hyejin Youn, Luís M. A. Bettencourt, José Lobo, Deborah Strumsky, Horacio Samaniego, Geoffrey B. West
The dynamic drivers of interfirm interactions across space have rarely been explored in the context of disaster recovery; therefore, the mechanism through which shocks propagate is unclear. This paper uses stochastic actor-oriented modeling to examine how trade networks among the 500 largest Japanese companies evolved during 2010 and 2011, i.e. before and after the Great East Japan Earthquake to identify sources of vulnerability in the system. In contrast to previous reports on broken supply chains, the network displayed only modest change even in the directly affected areas. Controlling for distance and for firm size, we find that when firms changed their partners, they preferred firms that were popular among other firms, that had partners in common with them and that also bought some products or services from them. These findings concur with a criticism that Japanese firms avoid external actors and exhibit inflexibility in reorganizing their networks in times of need, which contrasts with the non-cliquish network structures observed in high-performing economic sectors. The results also highlight the role of energy firms in disaster resilience. Unlike other large Japanese companies that cluster in major urban centers, energy firms are distributed across Japan. However, despite their peripheral physical locations, energy firms are centrally located in trade networks. Thus, while a disaster in any region may affect some energy firms and lead to large-scale temporary shocks, the entire network is unlikely to be disconnected by any region-specific disaster because of the spatial distribution of the topological network core formed by energy companies.
Energy and resilience: The effects of endogenous interdependencies on trade network formation across space among major Japanese firms PETR MATOUS and YASUYUKI TODO
Much recent research aims to identify evidence for Drug-Drug Interactions (DDI) and Adverse Drug reactions (ADR) from the biomedical scientific literature. In addition to this "Bibliome", the universe of social media provides a very promising source of large-scale data that can help identify DDI and ADR in ways that have not been hitherto possible. Given the large number of users, analysis of social media data may be useful to identify under-reported, population-level pathology associated with DDI, thus further contributing to improvements in population health. Moreover, tapping into this data allows us to infer drug interactions with natural products-including cannabis-which constitute an array of DDI very poorly explored by biomedical research thus far.Our goal is to determine the potential of Instagram for public health monitoring and surveillance for DDI, ADR, and behavioral pathology at large. Most social media analysis focuses on Twitter and Facebook, but Instagram is an increasingly important platform, especially among teens, with unrestricted access of public posts, high availability of posts with geolocation coordinates, and images to supplement textual analysis.Using drug, symptom, and natural product dictionaries for identification of the various types of DDI and ADR evidence, we have collected close to 7000 user timelines spanning from October 2010 to June 2015.We report on 1) the development of a monitoring tool to easily observe user-level timelines associated with drug and symptom terms of interest, and 2) population-level behavior via the analysis of co-occurrence networks computed from user timelines at three different scales: monthly, weekly, and daily occurrences. Analysis of these networks further reveals 3) drug and symptom direct and indirect associations with greater support in user timelines, as well as 4) clusters of symptoms and drugs revealed by the collective behavior of the observed population.This demonstrates that Instagram contains much drug- and pathology specific data for public health monitoring of DDI and ADR, and that complex network analysis provides an important toolbox to extract health-related associations and their support from large-scale social media data.
MONITORING POTENTIAL DRUG INTERACTIONS AND REACTIONS VIA NETWORK ANALYSIS OF INSTAGRAM USER TIMELINES. Correia RB1, Li L, Rocha LM.
A number of organizations ranging from terrorist groups such as ISIS to politicians and nation states reportedly conduct explicit campaigns to influence opinion on social media, posing a risk to democratic processes. There is thus a growing need to identify and eliminate "influence bots" - realistic, automated identities that illicitly shape discussion on sites like Twitter and Facebook - before they get too influential. Spurred by such events, DARPA held a 4-week competition in February/March 2015 in which multiple teams supported by the DARPA Social Media in Strategic Communications program competed to identify a set of previously identified "influence bots" serving as ground truth on a specific topic within Twitter. Past work regarding influence bots often has difficulty supporting claims about accuracy, since there is limited ground truth (though some exceptions do exist [3,7]). However, with the exception of , no past work has looked specifically at identifying influence bots on a specific topic. This paper describes the DARPA Challenge and describes the methods used by the three top-ranked teams.
The DARPA Twitter Bot Challenge V.S. Subrahmanian, Amos Azaria, Skylar Durst, Vadim Kagan, Aram Galstyan, Kristina Lerman, Linhong Zhu, Emilio Ferrara, Alessandro Flammini, Filippo Menczer, Rand Waltzman, Andrew Stevens, Alexander Dekhtyar, Shuyang Gao, Tad Hogg, Farshad Kooti, Yan Liu, Onur Varol, Prashant Shiralkar, Vinod Vydiswaran, Qiaozhu Mei, Tim Huang
Building resilience into today’s complex infrastructures is critical to the daily functioning of society and its ability to withstand and recover from natural disasters, epidemics, and cyber-threats. This study proposes quantitative measures that capture and implement the definition of engineering resilience advanced by the National Academy of Sciences. The approach is applicable across physical, information, and social domains. It evaluates the critical functionality, defined as a performance function of time set by the stakeholders. Critical functionality is a source of valuable information, such as the integrated system resilience over a time interval, and its robustness. The paper demonstrates the formulation on two classes of models: 1) multi-level directed acyclic graphs, and 2) interdependent coupled networks. For both models synthetic case studies are used to explore trends. For the first class, the approach is also applied to the Linux operating system. Results indicate that desired resilience and robustness levels are achievable by trading off different design parameters, such as redundancy, node recovery time, and backup supply available. The nonlinear relationship between network parameters and resilience levels confirms the utility of the proposed approach, which is of benefit to analysts and designers of complex systems and networks.
Operational resilience: concepts, design and analysis Alexander A. Ganin, Emanuele Massaro, Alexander Gutfraind, Nicolas Steen, Jeffrey M. Keisler, Alexander Kott, Rami Mangoubi & Igor Linkov Scientific Reports 6, Article number: 19540 (2016) http://dx.doi.org/10.1038/srep19540
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.