e-Xploration
24.3K views | +0 today

 Scooped by luiy onto e-Xploration

An introduction to #machinelearning with scikit-learn - #datamining #algorithms

Machine learning: the problem setting

In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), is it said to have several attributes or features

luiy's insight:

We can separate learning problems in a few large categories:

- Supervised learning: in which the data comes with additional attributes that we want to predict (Click here to go to the scikit-learn supervised learning page).This problem can be either:

- Classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of classification problem would be the handwritten digit recognition example, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class.

- Regression: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight.

- Unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.

No comment yet.

e-Xploration

antropologo.net, dataviz, collective intelligence, algorithms, social learning, social change, digital humanities
Curated by luiy

Popular Tags

 Scooped by luiy

Oycib ::: Collective Intelligence. "Kaan". Network Visualisation.

luiy's insight:

Beginning with the origins, Oycib means in Mayan language "the place of honey". In this projet, Oycib is an e-Research infrastructure for the Collective Intelligence Analysis.

With Oycib infrastructure we propose an analysis model, based in the digital practices and collaboration profiles for the development of Social Learning and the Context Awareness in the Collective Intelligence process.

The infrastructure design and the profiles proposed here, are based on historical studies about social organization glyphs in Mayan culture made by Montgomery (2002) and Calvin (2012).

Initially we worked with four collaboration profiles: the "Itzaat", the "Pitziil", the "Ayuxul" and the "Sajal" (profiles), but we can find others depending of the organization context. Thus, it's important to mention that each profile is found based on the e-Xploración model and they are the qualitative and quantitative interpretation of the collaborative practices. In this way, we propose methods based on Social Network Analysis for the learning and knowledge management.

Thus, the network in Oycib is called "Kaan" (sky or network in Mayan Lenguage). In the "Kaan" we present the visualization of the subjects and objects, such as persons, forums, blogs, files, groups and all the interactions among them. Additionally, each profile and their interactions is presented.

... you can interact with "Kaan" here.

http://viz.oycib.org/net_all_3/network/index.html

No comment yet.
 Scooped by luiy

Travail et travailleurs de la donnée - #Algopol | #datascience #methods

Le questionnement scientifique qui anime le projet ALGOPOL voudrait comprendre la structure des liens sociaux existant au sein de réseaux égocentrés à partir du contenu des échanges et des liens partagés sur Facebook. Les interactions sur cette plateforme se déploient-elles différemment, avec une énonciation différente, autour de contenus partagés différents, selon les segments du réseau social mobilisés ? A-t-on des conversations différentes avec les liens « forts » et les liens « faibles » ? Les objets informationnels mis en partage sont-ils les mêmes selon la forme et la structure de la sociabilité numérique des individus ? Chercher à répondre à ces questions requiert des données fines et précises que les méthodes d’enquête traditionnelle ont beaucoup de difficulté à fournir [11].
No comment yet.
 Scooped by luiy

#Apocalypse when? Infographic guide to Doomsday threats | #nano #bioterrorism

Which apocalyptic threats are most likely to wipe out Earth’s population and when? Our infographic reveals all.
No comment yet.
 Scooped by luiy

#memex : Human Traffickers Caught on Hidden Internet | #deepWeb

A new set of search tools called Memex, developed by DARPA, peers into the “deep Web” to reveal illegal activity
luiy's insight:

DARPA has said very little about Memex and its use by law enforcement and prosecutors to investigate suspected criminals.

According to published reports, including one from Carnegie Mellon University, the NYDA’s Office is one of several law enforcement agencies that have used early versions of Memex software over the past year to find and prosecute human traffickers, who coerce or abduct people—typically women and children—for the purposes of exploitation, sexual or otherwise. “Memex”—a combination of the words “memory” and “index” first coined in a 1945 article for The Atlantic—currently includes eight open-source, browser-based search, analysis and data-visualization programs as well as back-end server software that perform complex computations and data analysis.

Such capabilities could become a crucial component of fighting human trafficking, a crime with low conviction rates, primarily because of strategies that traffickers use to disguise their victims’ identities (pdf). The United Nations Office on Drugs and Crimeestimates there are about 2.5 million human trafficking victims worldwide at any given time, yet putting the criminals who press them into service behind bars is difficult. In its 2014 study on human trafficking (pdf) the U.N. agency found that 40 percent of countries surveyed reported less than 10 convictions per year between 2010 and 2012. About 15 percent of the 128 countries covered in the report did not record any convictions.

http://www.scientificamerican.com/slideshow/scientific-american-exclusive-darpa-memex-data-maps/

http://www.cmu.edu/news/stories/archives/2015/january/detecting-sex-traffickers.html

http://www.darpa.mil/NewsEvents/Releases/2014/02/09.aspx

No comment yet.
 Scooped by luiy

#DataMining Reveals a Global Link Between #Corruption and #Wealth | #dataviz

Social scientists have never understood why some countries are more corrupt than others. But the first study that links corruption with wealth could help change that.

One question that social scientists and economists have long puzzled over is how corruption arises in different cultures and why it is more prevalent in some countries than others. But it has always been difficult to find correlations between corruption and other measures of economic or social activity.

Michal Paulus and Ladislav Kristoufek at Charles University in Prague, Czech Republic, have for the first time found a correlation between the perception of corruption in different countries and their economic development.

The data they use comes from Transparency International, a nonprofit campaigning organisation based in Berlin, Germany, and which defines corruption as the misuse of public power for private benefit. Each year, this organization publishes a global list of countries ranked according to their perceived levels of corruption. The list is compiled using at least three sources of information but does not directly measure corruption, because of the difficulties in gathering such data.

Instead, it gathers information from a wide range of sources such as the African Development Bank and the Economist Intelligence Unit. But it also places significant weight on the opinions of experts who are asked to assess corruption levels.

The result is the Corruption Perceptions Index ranking countries between 0 (highly corrupt) to 100 (very clean). In 2014, Denmark occupied of the top spot as the world’s least corrupt nation while Somalia and North Korea prop up the table in an unenviable tie for the most corrupt countries on the planet.

No comment yet.
 Scooped by luiy

artoo.js · The client-side #scraping companion | #ddj

luiy's insight:

Features

- Scrape everything, everywhere: invoke artoo in the JavaScript context of any web page.

- Loaded with helpers: Scrape data quick & easy with powerful methods such as artoo.scrape.

- Spiders: Crawl pages through ajax and retrieve accumulated data with artoo's spiders.

- Content expansion: Expand pages' content programmatically thanks to artoo.autoExpand utilities.

- Store: stash persistent data in the localStorage with artoo's handyabstraction.

- Sniffers: hook on XHR requests to retrieve circulating data with a variety oftools.

- Instructions: record the instructions typed into the console and save them for later use.

- jQuery: jQuery is injected alongside artoo in the pages you visit so you can handle the DOM easily.

- Custom bookmarklets: you can use artoo as a framework and easily create custom bookmarklets to execute your code.

- User Interfaces: build parasitic user interfaces easily with a creative usageof Shadow DOM.

- Chrome extension: trying to scrape a nasty page abiding by some sneaky HTML5 rules? Here, have a chrome extension.

No comment yet.
 Scooped by luiy

Abridged List of #MachineLearning Topics. #Resources #tools #datascience

luiy's insight:

- Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using model architectures composed of multiple non-linear transformations.

- Online machine learning is a model of induction that learns one instance at a time thus reducing the amount of memory required.

- Natural Language Toolkit (NLTK) - a leading tool for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

-Computer Vision. OpenCV – popular computer vision library designed to by computational efficiency with a strong focus on real-time applications.

No comment yet.
 Scooped by luiy

Overview of #Python Visualization Tools | #dataviz #datascience

Overview of common python visualization tools
luiy's insight:

Introduction

In the python world, there are multiple options for visualizing your data. Because of this variety, it can be really challenging to figure out which one to use when. This article contains a sample of some of the more popular ones and illustrates how to use them to create a simple bar chart. I will create examples of plotting data with:

- Pandas

- Seaborn

- ggplot

- Bokeh

- pygal

- Plotly

No comment yet.
 Rescooped by luiy from ESN - RSE & SocBiz

Les facteurs de réussite d’un réseau social d’entreprise | #CI #analytics #RSE

Les performances des réseaux internes d'une vingtaine d'entreprise ont été comparées par le cabinet de conseil Lecko pour déterminer ce qui les mène au succès... ou pas.

Via Eric Laurent
luiy's insight:

Beaucoup de sociétés ont des réseaux sociaux d’entreprise (RSE), mais toutes ne rencontrent pas le même succès avec ces projets. Pour tenter de déterminer les facteurs de réussite de ces espaces collaboratifs, le cabinet de conseil Lecko a réalisé un benchmark pour la deuxième année consécutive. Une vingtaine de grandes entreprises ont été comparées via l’outil Lecko RSE Analytics qui renvoie des métriques sur l’activité sociale enregistrée sur les plateformes (création d’un profil, ajout d’un commentaire ou « like »). Pour compléter le tout, plus de 90 community manager ont été interrogés pour comparer leurs pratiques. L’importance des community manager ne se dément pas. 71 % des espaces performants sont nés de l’initiative d’un community manager (voir le tome 7 de l'étude sur l'Etat de l'art des réseaux sociaux d'entreprise de Lecko)

No comment yet.
 Scooped by luiy

OpenGraphiti : Data Visualization Framework | #SNA #open #dataviz

luiy's insight:
Description

OpenGraphiti is a free and open source 3D data visualization engine for data scientists to visualize semantic networks and to work with them. It offers an easy-to-use API with several associated libraries to create custom-made datasets. It leverages the power of GPUs to process and explore the data and sits on a homemade 3D engine.

No comment yet.
 Scooped by luiy

The Emerging Science of Human-Data Interaction | #bigdata #HDI

The rapidly evolving ecosystems associated with personal data is creating an entirely new field of scientific study, say computer scientists. And this requires a much more powerful ethics-based infrastructure.
luiy's insight:

... Richard Mortier at the University of Nottingham in the UK and a few pals say the increasingly complex, invasive and opaque use of data should be a call to arms to change the way we study data, interact with it and control its use. Today, they publish a manifesto describing how a new science of human-data interaction is emerging from this “data ecosystem” and say that it combines disciplines such as computer science, statistics, sociology, psychology and behavioural economics.

They start by pointing out that the long-standing discipline of human-computer interaction research has always focused on computers as devices to be interacted with. But our interaction with the cyber world has become more sophisticated as computing power has become ubiquitous, a phenomenon driven by the Internet but also through mobile devices such as smartphones. Consequently, humans are constantly producing and revealing data in all kinds of different ways.

Mortier and co say there is an important distinction between data that is consciously created and released such as a Facebook profile; observed data such as online shopping behaviour; and inferred data that is created by other organisations about us, such as preferences based on friends’ preferences.

Original Article : http://arxiv.org/abs/1412.6159

No comment yet.
 Scooped by luiy

What Is “ #OpenAccess ”? | #OpenScience

Imagine the progress that can happen—in health, science, education—when scholarly research is made freely available among scientists, patients, inventors, and others.
luiy's insight:

Before the open access model existed, almost all peer-reviewed articles based on scholarly research were published in corporate-owned print journals, whose subscription fees were often prohibitively expensive—despite the fact that authors are not paid for their articles. Publishers rarely invest in the actual research and typically provide little added value in the articles’ preparation and distribution.

These journals were available to the general public only at university libraries in wealthy countries. This meant that doctors treating patients with HIV and AIDS in remote regions of Africa, for instance, could not access complete articles describing the results of the latest medical research on treatments, even when the research upon which these articles were based was undertaken in their remote regions.

No comment yet.
 Scooped by luiy

Ethnography for the Internet | #Anthropology #CyberEthnography

The internet has become embedded into our daily lives, no longer an esoteric phenomenon, but instead an unremarkable way of carrying out our interactions with one another. Online and offline are interwoven in everyday experience. Using the internet has become accepted as a way of being present in the world, rather than a means of accessing some discrete virtual domain. Ethnographers of these contemporary Internet-infused societies consequently find themselves facing serious methodological dilemmas: where should they go, what should they do there and how can they acquire robust knowledge about what people do in, through and with the internet?

This book presents an overview of the challenges faced by ethnographers who wish to understand activities that involve the internet. Suitable for both new and experienced ethnographers, it explores both methodological principles and practical strategies for coming to terms with the definition of field sites, the connections between online and offline and the changing nature of embodied experience. Examples are drawn from a wide range of settings, including ethnographies of scientific institutions, television, social media and locally based gift-giving networks. - See more at: http://www.bloomsbury.com/uk/ethnography-for-the-internet-9780857855701/#sthash.q1UHC7O1.dpuf

luiy's insight:

1 Introduction
2 The E3 Internet: The Embedded, Embodied, Everyday Internet
3 Ethnographic Strategies for the Embedded, Embodied, Everyday Internet
4 Observing and Experiencing Online/Offline Connections
5 Connective Ethnography in Complex Institutional Landscapes
6 The Internet in Ethnographies of the Everyday
7 Conclusion
References

No comment yet.
 Scooped by luiy

#Python Packages For #DataMining

Just because you have a “hammer”, doesn’t mean that every problem you come across will be a “nail”.

The intelligent key thing is when you use  the same hammer to solve what ever problem
luiy's insight:

Topics:

- why pyton?

- Librerias -- > NumPy, SciPy, Pandas, Matlotlib, Ipython, scikit-learn

No comment yet.
 Rescooped by luiy from Politique des algorithmes

Connecting the Dots Behind the 2016 Candidates | #ddj #politics

How the teams behind some likely and announced 2016 candidates are connected to previous campaigns, administrations and organizations.

Via Dominique Cardon
No comment yet.
 Scooped by luiy

District Data Labs - How to Transition from Excel to #R | #datascience

How to Transition from Excel to R - An Intro to R for Microsoft Excel Users
luiy's insight:

In today's increasingly data-driven world, business people are constantly talking about how they want more powerful and flexible analytical tools, but are usually intimidated by the programming knowledge these tools require and the learning curve they must overcome just to be able to reproduce what they already know how to do in the programs they've become accustomed to using. For most business people, the go-to tool for doing anything analytical is Microsoft Excel.

No comment yet.
 Scooped by luiy

Introducing the #streamgraph htmlwidget #R Package | #datascience

We were looking for a different type of visualization for a project at work this past week and my thoughts immediately gravitated towards streamgraphs. The TLDR on streamgraphs is they they are generalized versions of stacked area graphs with free baselines across the x axis. They are somewhat controversial but have a “draw you in” […]
luiy's insight:

Streamgraphs require a continuous variable for the x axis, and thestreamgraph widget/package works with years or dates (support for xtsobjects and POSIXct types coming soon). Since they display categorical values in the area regions, the data in R needs to be in long format which is easy to do with dplyr & tidyr.

The package recognizes when years are being used and does all the necessary conversions for you. It also uses a technique similar to expand.grid to ensure all categories are represented at every observation (not doing so makesd3.stack unhappy).

No comment yet.
 Scooped by luiy

How MIT Visualizes Supply Chain Risk | #SNA #predictive

MIT Supply Chain Management Director Bruce Arntzen discusses risk visualization and Sourcemap [Video]

How does a company keep tabs on thousands of suppliers? That’s the question Bruce Arntzen tried to answer when he started the Hi-Viz Research Project. As Executive Director of MIT’s Supply Chain Management Program, Arntzen works with corporations to find innovative solutions to supply chain problems. The idea for the Hi-Viz project came during a 2011 meeting of the Supply Chain Risk Leadership Council. A survey of attendees listed Supply Chain Visibility as the top concern. Why? With thousands of suppliers and sub-suppliers, it can be very time-consuming to find the weakest link in a supply chain. Arntzen’s solution: an automatic visualization of the end-to-end supply chain where the weakest links could be seen in real time. Watch his interview to learn how MIT and Sourcemap developed the first automated risk visualization [more details below the fold].

In 2015, the Hi-Viz project is partnering with actuarial data providers to provide predictive risk analytics. Sourcemap is making available inventory risk mapping as part of its enterprise software-as-a-service. Want to get involved? Learn more about the Hi-Viz project, or contact Sourcemap for a demo.

No comment yet.
 Scooped by luiy

The SHOGUN #MachineLearning #Toolbox | #datascience

The Shogun Machine learning toolbox provides a wide range of unified and efficientMachine Learning (ML) methods. The toolbox seamlessly allows to easily combine multiple data representations, algorithm classes, and general purpose tools. This enables both rapid prototyping of data pipelines and extensibility in terms of new algorithms. We combine modern software architecture in C++ with both efficient low-level computing backends and cutting edge algorithm implementations to solve large-scale Machine Learning problems (yet) on single machines.

One of Shogun's most exciting features is that you can use the toolbox through aunified interface from C++, Python, Octave, R, Java, Lua, C#, etc. This not just means that we are independent of trends in computing languages, but it also lets you use Shogun as a vehicle to expose your algorithm to multiple communities. We use SWIGto enable bidirectional communication between C++ and target languages. Shogun runs under Linux/Unix, MacOS, Windows.

No comment yet.
 Scooped by luiy

Reporters love Twitter and geeks love coding. Today, I’m merging the best of both worlds! On the menu: Python scripts to use Twitter to its full potential!

luiy's insight:

When my friend @TerraCiolfe showed me @WeAreTheDeads project, I said to myself that I really need to learn how to control Twitter through Python. @WeAreTheDeads is a Twitter account publishing the name of a fallen soldiers at the 11th minute of each hour.

Of course, nobody is working behind the screen. A program chooses the soldier in a database and publishes his name, hour after hour. With 119,000 names to publish, the script will run until 2023, according to the author of this great idea, the reporter @GlenMcGregor from the Ottawa Citizen.

With a little bit of research (my sources are at the end of the article), I learnt how to work with Twitter from a Python script. Actually, we can do way more than automatically publish tweets! It’s also possible to extract a lot of data about users and their tweets. For example, you can research specific tweets in a specific location. I created a nice animated map at the end. You’ll see!

No comment yet.
 Scooped by luiy

TULIP : Data Visualization Software | #SNA #dataviz

Tulip is a software system dedicated to the visualization of huge graphs. It enables, 3D visualizations, 3D modifications, plugin support, support for clusters and navigation, and automatic graph drawing.
luiy's insight:

Tulip is an information visualization framework dedicated to the analysis and visualization of relational data. Tulip aims to provide the developer with a complete library, supporting the design of interactive information visualization applications for relational data that can be tailored to the problems he or she is addressing.

Written in C++ the framework enables the development of algorithms, visual encodings, interaction techniques, data models, and domain-specific visualizations. One of the goal of Tulip is to facilitates the reuse of components and allows the developers to focus on programming their application. This development pipeline makes the framework efficient for research prototyping as well as the development of end-user applications.

No comment yet.
 Rescooped by luiy from Social Network Analysis #sna

Davos on Twitter: who do the attendees follow? | #dataviz #SNA

Via ukituki
luiy's insight:

Every year, the World Economic Forum brings together the most recognisable figures of business and politics. With all eyes on Davos, we decided to turn the optics upside down and see who the twitterati gathered in Switzerland follow on social media.

The inner ring of circles represent the 20 most-followed accounts by Davos attendees, while the outer circles are individual attendees.

ukituki's curator insight,

Network Visualization by Finanacial Times

 Scooped by luiy

After Ayotzinapa's Disappeared, Locals Are Taking Power In Tecoanapa | #democracy #socialchange

luiy's insight:

On Sept. 26, 2014, municipal police attacked a group of students from Ayotzinapa school in Mexico’s Guerrero state. Of the 43 disappeared students, eight came from Tecoanapa. Now their fellow citizens have shut down the local government buildings and set up a people’s council. It’s a movement that is gathering momentum across Guerrero.

No comment yet.
 Scooped by luiy

#BigData, new epistemologies and paradigm shifts | #socialscience #DH

luiy's insight:

Whilst Jim Gray envisages the fourth paradigm of science to be data-intensive and a radically new extension of the established scientific method, others suggest that Big Data ushers in a new era of empiricism, wherein the volume of data, accompanied by techniques that can reveal their inherent truth, enables data to speak for themselves free of theory. The empiricist view has gained credence outside of the academy, especially within business circles, but its ideas have also taken root in the new field of data science and other sciences. In contrast, a new mode of data-driven science is emerging within traditional disciplines in the academy. In this section, the epistemological claims of both approaches are critically examined, mindful of the different drivers and aspirations of business and the academy, with the former preoccupied with employing data analytics to identify new products, markets and opportunities rather than advance knowledge per se, and the latter focused on how best to make sense of the world and to determine explanations as to phenomena and processes.

http://bds.sagepub.com/content/1/1/2053951714528481.full

No comment yet.
 Scooped by luiy

How Diversity Makes Us Smarter | #IntelligenceCollective

luiy's insight:

Information and Innovation

The key to understanding the positive influence of diversity is the concept of informational diversity. When people are brought together to solve problems in groups, they bring different information, opinions and perspectives. This makes obvious sense when we talk about diversity of disciplinary backgrounds—think again of the interdisciplinary team building a car. The same logic applies to social diversity. People who are different from one another in race, gender and other dimensions bring unique information and experiences to bear on the task at hand. A male and a female engineer might have perspectives as different from one another as an engineer and a physicist—and that is a good thing.

No comment yet.
 Scooped by luiy

Your Digital Image: Factors Behind #Demographic and #Psychometric #Predictions from Social Network Profiles | #identity

luiy's insight:

Our system allows users to examine the factors influencing the predictions, so users can determine how “Liking” a certain item changes the predictions regarding their intel ligence, or how changing the number of friends they have affects the predictions regarding their personality. Clearly, these factors are under the control of the user, and users may modify their behavior on Facebook to be perceived in a positive manner. As people can form judgments on others based on their social media profiles [4], this phenomenon is not new. However, we believe an automated tool can allow people to easily determine how others may perceive thembased on their behavior on social networks.

No comment yet.