Big Data, Cloud a...
tag "wikipedia"
31.9K views | +99 today

# Big Data, Cloud and Social everything

Big because of the volume, velocity and variety of the data that are processed in the cloud. Economic, social and cultural effects.
Curated by Pierre Levy

## Popular Tags

Current selected tag: wikipedia. Clear.
 Scooped by Pierre Levy

## Quantifying Wikipedia Usage Patterns Before Stock Market Moves

Financial crises result from a catastrophic combination of actions. Vast stock market datasets offer us a window into some of the actions that have led to these crises.
Carlos Fosca's curator insight,

Un ejemplo de aplicación de análisis de grandes datos: Este artículo presenta evidencias de como es posible predecir el movimiento de la bolsa a través del análisis de la data de visitas a determinadas páginas financieras de Wikipedia.

 Scooped by Pierre Levy

## Data science - Wikipedia, the free encyclopedia

Data science incorporates varying elements and builds on techniques and theories from many fields, including math, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. Data science is a novel term that is often used interchangeably with competitive intelligence or business analytics, although it is becoming more common. Data science seeks to use all available and relevant data to effectively tell a story that can be easily understood by non-practitioners.

A practitioner of data science is called a data scientist. The term was coined by DJ Patil and Jeff Hammerbacher.[1] Data scientists solve complex data problems through employing deep expertise in some scientific discipline. It is generally expected that data scientists are able to work with various elements of mathematics, statistics and computer science, although expertise in these subjects are not required. However, a data scientist is most likely to be an expert in only one or two of these disciplines and proficient in another two or three. There is probably no living person who is an expert in all of these disciplines - if so they would be extremely rare. This means that data science must be practiced as a team, where across the membership of the team there is expertise and proficiency across all the disciplines.

Good data scientists are able to apply their skills to achieve a broad spectrum of end results. Some of these include the ability to find and interpret rich data sources, manage large amounts of data despite hardware, software and bandwidth constraints, merge data sources together, ensure consistency of data-sets, create visualizations to aid in understanding data and building rich tools that enable others to work effectively. The skill-sets and competencies that data scientists employ vary widely. Data scientists are an integral part of competitive intelligence, a newly emerging field that encompasses a number of activities, such as data mining and analysis, that can help businesses gain a competitive edge.[2]

No comment yet.
 Scooped by Pierre Levy

## Wikipedia is now drawing facts from the Wikidata repository, and so can you

The Wikimedia Foundation’s first major new project in 7 years is now feeding the biggest project in that stable, Wikipedia itself. But anyone can take structured data from Wikidata, due to its open license.
Renato P. dos Santos's curator insight,

subprojeto Wikidata não só abastece a Wikipédia como é embrião da Web semântica

 Scooped by Pierre Levy

## Bayesian network - Wikipedia, the free encyclopedia

A Bayesian network, Bayes network, belief network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

Formally, Bayesian networks are directed acyclic graphs whose nodes represent random variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses. Edges represent conditional dependencies; nodes which are not connected represent variables which are conditionally independent of each other. Each node is associated with a probability function that takes as input a particular set of values for the node's parent variables and gives the probability of the variable represented by the node. For example, if the parents are $m$ Boolean variables then the probability function could be represented by a table of $2^m$ entries, one entry for each of the $2^m$ possible combinations of its parents being true or false. Similar ideas may be applied to undirected, and possibly cyclic, graphs; such are called Markov networks.

Efficient algorithms exist that perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams.

No comment yet.