e-Xploration
27.9K views | +8 today
Follow
e-Xploration
antropologiaNet, dataviz, collective intelligence, algorithms, social learning, social change, digital humanities
Curated by luiy
Your new post is loading...
Your new post is loading...
Scooped by luiy
Scoop.it!

SORTING - Visualizing sorting #algorithms | #dataviz #datascience

SORTING - Visualizing sorting #algorithms | #dataviz #datascience | e-Xploration | Scoop.it
A visualization of the most famous Sorting Algorithms.
luiy's insight:
ABOUT SORTING

Sorting a sequence of items is one of the pillar of Computer Science. 


A sorting algorithm is an algorithm that organizes elements of a sequence in a certain order. Since the early days of computing, the sorting problem has been one of the main battlefields for researchers. The reason behind this is not only the need of solving a very common task but also the challenge of solving a complex problem in the most efficient way. 

SORTING is an attempt to visualize and help to understand how some of the most famous sorting algorithms work. This project provides two standpoints to look at algorithms, one is more artistic (apologies to any real artist out there), the other is more analytical aiming at explaining algorithm step by step. 

This project does not want to teach the theory of sorting algorithms, there are amazing resources, books and courses for this purpose. SORTING is for the ones who want to see these algorithms under a different ligth and hopefully appreciate the processing and brain power behind these piece of genius that in many ways have changed the way we live.

 
more...
No comment yet.
Scooped by luiy
Scoop.it!

Travail et travailleurs de la donnée - #Algopol | #datascience #methods

Travail et travailleurs de la donnée - #Algopol | #datascience #methods | e-Xploration | Scoop.it
Le questionnement scientifique qui anime le projet ALGOPOL voudrait comprendre la structure des liens sociaux existant au sein de réseaux égocentrés à partir du contenu des échanges et des liens partagés sur Facebook. Les interactions sur cette plateforme se déploient-elles différemment, avec une énonciation différente, autour de contenus partagés différents, selon les segments du réseau social mobilisés ? A-t-on des conversations différentes avec les liens « forts » et les liens « faibles » ? Les objets informationnels mis en partage sont-ils les mêmes selon la forme et la structure de la sociabilité numérique des individus ? Chercher à répondre à ces questions requiert des données fines et précises que les méthodes d’enquête traditionnelle ont beaucoup de difficulté à fournir [11].
more...
No comment yet.
Scooped by luiy
Scoop.it!

Introducing the #streamgraph htmlwidget #R Package | #datascience

Introducing the #streamgraph htmlwidget #R Package | #datascience | e-Xploration | Scoop.it
We were looking for a different type of visualization for a project at work this past week and my thoughts immediately gravitated towards streamgraphs. The TLDR on streamgraphs is they they are generalized versions of stacked area graphs with free baselines across the x axis. They are somewhat controversial but have a “draw you in” […]
luiy's insight:

Streamgraphs require a continuous variable for the x axis, and thestreamgraph widget/package works with years or dates (support for xtsobjects and POSIXct types coming soon). Since they display categorical values in the area regions, the data in R needs to be in long format which is easy to do with dplyr & tidyr.

The package recognizes when years are being used and does all the necessary conversions for you. It also uses a technique similar to expand.grid to ensure all categories are represented at every observation (not doing so makesd3.stack unhappy).

more...
No comment yet.
Scooped by luiy
Scoop.it!

@WeAreTheDead : When #Twitter meets #Python! | #datascience

@WeAreTheDead : When #Twitter meets #Python! | #datascience | e-Xploration | Scoop.it

Reporters love Twitter and geeks love coding. Today, I’m merging the best of both worlds! On the menu: Python scripts to use Twitter to its full potential!

luiy's insight:

When my friend @TerraCiolfe showed me @WeAreTheDeads project, I said to myself that I really need to learn how to control Twitter through Python. @WeAreTheDeads is a Twitter account publishing the name of a fallen soldiers at the 11th minute of each hour.

 

Of course, nobody is working behind the screen. A program chooses the soldier in a database and publishes his name, hour after hour. With 119,000 names to publish, the script will run until 2023, according to the author of this great idea, the reporter @GlenMcGregor from the Ottawa Citizen.

 

With a little bit of research (my sources are at the end of the article), I learnt how to work with Twitter from a Python script. Actually, we can do way more than automatically publish tweets! It’s also possible to extract a lot of data about users and their tweets. For example, you can research specific tweets in a specific location. I created a nice animated map at the end. You’ll see!

more...
No comment yet.
Scooped by luiy
Scoop.it!

Text Visualization Browser | #dataviz #sna #datascience

Text Visualization Browser | #dataviz #sna #datascience | e-Xploration | Scoop.it
Text Visualization Browser
luiy's insight:
Text Visualization BrowserDeveloped by Kostiantyn Kucher and Andreas KerrenISOVIS group, Linnaeus University, Växjö, SwedenCheck out our IEEE VIS 2014 poster abstract
more...
DareDo's curator insight, October 30, 2014 6:07 AM

De multiples manières de visualiser des textes...

Sans doute devrions-nous réfléchir à des manières simples d'organiser nos propres textes et nos ressources.

A creuser certainement...

Stephen Dale's curator insight, November 7, 2014 11:23 AM

A Visual Survey of Text Visualization Techniques. Excellent resource.

Scooped by luiy
Scoop.it!

What's (technically) in your tweets? | #datascience #API #twitter

What's (technically) in your tweets? | #datascience #API #twitter | e-Xploration | Scoop.it
Just because you only see 140 characters doesn't mean that Twitter isn't getting complicated behind the scenes. Here's how status objects are evolving.
luiy's insight:

.. an interesting map of what's going on behind your Twitter stream. As it turns out, there is quite a bit of data associated with not just you as a user, but also with every tweet that you post to the service.

more...
No comment yet.
Scooped by luiy
Scoop.it!

#Taxonomy of Data Scientists | #datascience #skills

#Taxonomy of Data Scientists | #datascience #skills | e-Xploration | Scoop.it
This is a first attempt at classifying data scientists. I invite you to produce a more comprehensive, better solution.

The 10 pioneering data scientists liste…
luiy's insight:

Who is the purest data scientist?

 

Vincent Granville compares the 4-skill mix of each of these 10 data scientists (as found in the above table), with the generic data science skill mix identified in the previous article (Data Science = 0.24 * Data Mining + 0.15 * Machine Learning + 0.14 * Analytics + 0.11 * Big Data). In short, I computed 10 correlations (one per data scientist) to determine who best represents data science.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Analyse de #sentiments automatique, pourquoi est-ce si compliqué ? | #datascience #lexicometric

Analyse de #sentiments automatique, pourquoi est-ce si compliqué ? | #datascience #lexicometric | e-Xploration | Scoop.it

A l’heure où le big data représente l’un des grands défis technologiques et économiques actuels, de nombreux outils d’analyse se positionnent sur le marché afin d’offrir aux entreprises une connaissance clients davantage poussée.

luiy's insight:

Certaines techniques actuelles font appel à des traitements lexicométriques, mêlant analyses linguistiques et statistiques. D’autres s’appuient sur des techniques d’apprentissage automatique afin d’améliorer automatiquement les performances des programmes d’analyse au fur et à mesure de leur utilisation.

 

Quelle que soit la méthode utilisée, toutes les subtilités du langage ne peuvent être reconstituées sous forme d’algorithmes pour être reconnues par un système informatique. En effet, la langue comprend différents niveaux d’articulation, chaque niveau comportant son lot de difficultés :

 

- Niveau lexical

- Niveau syntaxique

- Niveau sémantique

- Niveau pragmatique

more...
No comment yet.
Scooped by luiy
Scoop.it!

The Question to Ask Before Hiring a Data Scientist | #datascience #basics

The Question to Ask Before Hiring a Data Scientist | #datascience #basics | e-Xploration | Scoop.it
Will your new hire produce analysis for humans or machines?
luiy's insight:

Is your data scientist producing analytics for machines or humans?

 

This distinction is important across organizations, industries, and job titles (our fellows are being placed at jobs with titles that range from Quant to Data Scientist to Analyst to Statistician). Unfortunately, most hiring managers conflate the types of talent and temperament necessary for these roles.

more...
No comment yet.
Scooped by luiy
Scoop.it!

The Impact Cycle – how to think of actionable insights | #datascience #methods

The Impact Cycle – how to think of actionable insights | #datascience #methods | e-Xploration | Scoop.it
luiy's insight:

I. Identify the question. In a non intrusive way, help your business partner identify the critical business question(s) he or she needs help in answering. Then set a clear expectation of the time and the work involved to get an answer.

 

M. Master the data.This is the analyst’s sweet spot—assemble, analyze, and synthesize all available information that will help in answering the critical business question. Create simple and clear visual presentations (charts, graphs, tables, interactive data environments, and so on) of that data that are easy to comprehend.

 

P. Provide the meaning. Articulate clear and concise interpretations of the data and visuals in the context of the critical business questions that were identified.

 

A. Actionable recommendations. Provide thoughtful business recommendations based on your interpretation of the data. Even if they are off-base, it’s easier to react to a suggestion that to generate one. Where possible, tie a rough dollar figure to any revenue improvements or cost savings associated with your recommendations.

 

C. Communicate insights. Focus on a multi-pronged communication strategy that will get your insights as far and as wide into the organization as possible. Maybe it’s in the form of an interactive tool others can use, a recorded WebEx of your insights, a lunch and learn, or even just a thoughtful executive memo that can be passed around.

 

T. Track outcomes. Set up a way to track the impact of your insights. Make sure there is future follow-up with your business partners on the outcome of any actions. What was done, what was the impact, and what are the new critical questions that need your help as a result?

more...
No comment yet.
Rescooped by luiy from Social Network Analysis #sna
Scoop.it!

Cluster your Twitter Data with #R and #k-means | #datascience

Cluster your Twitter Data with #R and #k-means | #datascience | e-Xploration | Scoop.it

Hello everbody! Today  I want to show you how you can get deeper insights into your Twitter followers with the help of R


Via ukituki
more...
No comment yet.
Scooped by luiy
Scoop.it!

K-CORE DECOMPOSITION OF INTERNET GRAPHS: HIERARCHIES, SELF-SIMILARITY AND MEASUREMENT BIASES | #datascience #SNA

more...
No comment yet.
Scooped by luiy
Scoop.it!

Is the Sample Good Enough? Comparing Data from Twitter’s Streaming API wit Twitter’s Firehose | #datascience

more...
No comment yet.
Scooped by luiy
Scoop.it!

#Python Packages For #DataMining

#Python Packages For #DataMining | e-Xploration | Scoop.it
Just because you have a “hammer”, doesn’t mean that every problem you come across will be a “nail”.

The intelligent key thing is when you use  the same hammer to solve what ever problem
luiy's insight:

Topics: 

 

- why pyton?

 

- Librerias -- > NumPy, SciPy, Pandas, Matlotlib, Ipython, scikit-learn

 

more...
No comment yet.
Scooped by luiy
Scoop.it!

District Data Labs - How to Transition from Excel to #R | #datascience

District Data Labs - How to Transition from Excel to #R | #datascience | e-Xploration | Scoop.it
How to Transition from Excel to R - An Intro to R for Microsoft Excel Users
luiy's insight:

In today's increasingly data-driven world, business people are constantly talking about how they want more powerful and flexible analytical tools, but are usually intimidated by the programming knowledge these tools require and the learning curve they must overcome just to be able to reproduce what they already know how to do in the programs they've become accustomed to using. For most business people, the go-to tool for doing anything analytical is Microsoft Excel.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Abridged List of #MachineLearning Topics. #Resources #tools #datascience

Abridged List of #MachineLearning Topics. #Resources #tools #datascience | e-Xploration | Scoop.it
luiy's insight:

- Deep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using model architectures composed of multiple non-linear transformations.

 

- Online machine learning is a model of induction that learns one instance at a time thus reducing the amount of memory required.

 

- Natural Language Toolkit (NLTK) - a leading tool for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

 

-Computer Vision. OpenCV – popular computer vision library designed to by computational efficiency with a strong focus on real-time applications.

 

more...
No comment yet.
Scooped by luiy
Scoop.it!

Overview of #Python Visualization Tools | #dataviz #datascience

Overview of #Python Visualization Tools | #dataviz #datascience | e-Xploration | Scoop.it
Overview of common python visualization tools
luiy's insight:

Introduction

 

In the python world, there are multiple options for visualizing your data. Because of this variety, it can be really challenging to figure out which one to use when. This article contains a sample of some of the more popular ones and illustrates how to use them to create a simple bar chart. I will create examples of plotting data with:

 

- Pandas

- Seaborn

- ggplot

- Bokeh

- pygal

- Plotly

 

more...
No comment yet.
Rescooped by luiy from Data is big
Scoop.it!

Machine Learning #WorkFlow | #datascience #bigdata

Machine Learning #WorkFlow | #datascience #bigdata | e-Xploration | Scoop.it
So far, I am planning to write a serie of posts explaining a basic Machine Learning work-flow (mostly supervised). In this post, my target is to propose the bird-eye view, as I'll dwell into details at the latter posts explaining each of the components in detail. I decide to write this serie due to two reasons; the first reason is self-education -to get all my bits and pieces together after a period of theoretical research and industrial practice- the second is to present a naive guide to beginn

Via ukituki
luiy's insight:

Each box has a color tone from YELLOW to RED. The yellower the box, the more this component relies on Statistics knowledge base. As the box turns into red[gets darker], the component depends more heavily on Machine Learning knowledge base. By saying this, I also imply that, without good statistical understanding, we are not able to construct a convenient machine learning pipeline. As a footnote, this schema is changed by post-modernism of Representation Learning algorithms and I'll touch this at the latter posts.

more...
No comment yet.
Scooped by luiy
Scoop.it!

#Mining the Social Web, 2nd-Edition | #datascience #SNA #tools

#Mining the Social Web, 2nd-Edition | #datascience #SNA #tools | e-Xploration | Scoop.it
Mining-the-Social-Web-2nd-Edition - The official online compendium for Mining the Social Web, 2nd Edition (O'Reilly, 2013)
luiy's insight:

Chapter 0 - Preface

 

Chapter 1 - Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More

 

Chapter 2 - Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More

 

Chapter 3 - Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More

 

Chapter 4 - Mining Google+: Computing Document Similarity, Extracting Collocations, and More

 

Chapter 5 - Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts and More

 

Chapter 6 - Mining Mailboxes: Analyzing Who's Talking To Whom About What, How Often, and More

 

Chapter 7 - Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More

 

Chapter 8 - Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing Over RDF, and More

 

Chapter 9 - Twitter Cookbook

 

Appendix A - Virtual Machine Experience

Appendix B - OAuth Primer

Appendix C - Python & IPython Notebook Tips

more...
No comment yet.
Scooped by luiy
Scoop.it!

Chorus Project : #Twitter #analytics tool suite | #bigdata

Chorus Project : #Twitter #analytics tool suite | #bigdata | e-Xploration | Scoop.it
Twitter data retrieval and visual analytics. Designed for social research. GUI based for easy access and fast productivity.
luiy's insight:

The Chorus package currently comprises of two distinct programs:

Tweetcatcher

Firstly, we have Chorus-TCD (TweetCatcher Desktop). Tweetcatcher allows users to sift Twitter for relevant data in two distinct ways: either by topical keywords appearing in Twitter conversation widely (i.e. semantically-driven data) or by identifying a network of Twitter users and following their daily ‘Twitter lives’ (i.e. user-driven data).

Tweetvis

Secondly, we have Chorus-TV (TweetVis), which is a visual analytic suite for facilitating both quantitative and qualitative approaches to social media data in social science. Visual analytics (VA) is an interdisciplinary computing methodology combining methods from data mining, information visualization, human-computer interaction and cognitive psychology. The VA approach is highly relevant to the aims of Chorus, enabling exploratory analysis of social media data in an intuitive and user-friendly fashion. Two main views are available within Chorus-TV. The Timeline Explorer (below) provides users an opportunity to analyse Twitter data across time and visualize the unfolding Twitter conversation according to various metrics (including tweet frequency, sentiment, semantic novelty and homogeneity, collocated words, and so on).

more...
No comment yet.
Scooped by luiy
Scoop.it!

Mapping the global #Twitter heartbeat: The geography of Twitter | #datascience

Mapping the global #Twitter heartbeat: The geography of Twitter | #datascience | e-Xploration | Scoop.it
Mapping the global Twitter heartbeat: The geography of Twitter
luiy's insight:

In just under seven years, Twitter has grown to count nearly three percent of the entire global population among its active users who have sent more than 170 billion 140–character messages. Today the service plays such a significant role in American culture that the Library of Congress has assembled a permanent archive of the site back to its first tweet, updated daily. With its open API, Twitter has become one of the most popular data sources for social research, yet the majority of the literature has focused on it as a text or network graph source, with only limited efforts to date focusing exclusively on the geography of Twitter, assessing the various sources of geographic information on the service and their accuracy. More than three percent of all tweets are found to have native location information available, while a naive geocoder based on a simple major cities gazetteer and relying on the user–provided Location and Profile fields is able to geolocate more than a third of all tweets with high accuracy when measured against the GPS–based baseline. Geographic proximity is found to play a minimal role both in who users communicate with and what they communicate about, providing evidence that social media is shifting the communicative landscape

more...
No comment yet.
Scooped by luiy
Scoop.it!

Morph : Get structured #data out of the web | #crawlers #datascience

luiy's insight:

Morph A Heroku for Scrapers

 

Get structured data out of the web

 

- All code and collaboration through GitHub

- Write your scrapers in Ruby, Python, PHP or Perl

- Simple API to grab dataSchedule scrapers or run manually

- Process isolation via Docker

- Trivial to move scraper code and data from ScraperWiki Classic

more...
No comment yet.
Scooped by luiy
Scoop.it!

visone : analysis & visualization of social networks | #SNA #datascience

visone : analysis & visualization of social networks | #SNA #datascience | e-Xploration | Scoop.it
luiy's insight:

On the applications page in the visone wiki we list research projects in which visone has been applied as well as datasets on which the usage of visone can be illustrated. We are planning to release a demonstration video soon, too. In the meanwhile you might want to jump directly into basic and advanced tutorials, that focus on differnet aspects of the software.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Which social media network type is your topic? Which did you want it to be? | #SNA #datascience

Which social media network type is your topic?  Which did you want it to be? | #SNA #datascience | e-Xploration | Scoop.it
There are at least six different types of social media network structures present in systems like Twitter and other services in which people are able to reply to one another. Each of the six patter...
luiy's insight:

This table describes each of the six patterns in terms of the difference between that pattern and the other five patterns.

more...
No comment yet.
Rescooped by luiy from Big Data Analysis in the Clouds
Scoop.it!

Intro materials: network analysis software (UCINET, NodeXL, Gephi, Statnet, ERGM, RSiena) | #datascience #SNA

Intro materials: network analysis software (UCINET, NodeXL, Gephi, Statnet, ERGM, RSiena) | #datascience #SNA | e-Xploration | Scoop.it
Introductory materials, handouts and R scripts for network analysis and visualization.

 

In the fall of 2012, I got to design & lead the weekly labs for a network seminar at USC. I also worked on the methods portion of the syllabus for the class. COMM 645: Communication Networks is a PhD-level course taught by Peter Monge. The labs cover a range of network tools – from the classic UCINETprogram through NodeXL and Gephi, to R introduction, Statnet, exponential random graph and actor-based modeling. Since the handouts & script examples may be useful for people outside the course, I’m sharing them here.


Via Pierre Levy
more...
No comment yet.