e-Xploration
Follow
Find tag "clustering"
21.8K views | +3 today
e-Xploration
antropologo.net, dataviz, collective intelligence, algorithms, social learning, social change, digital humanities
Curated by luiy
Your new post is loading...
Your new post is loading...
Scooped by luiy
Scoop.it!

Text Visualization Browser | #dataviz #sna #datascience

Text Visualization Browser | #dataviz #sna #datascience | e-Xploration | Scoop.it
Text Visualization Browser
luiy's insight:
Text Visualization BrowserDeveloped by Kostiantyn Kucher and Andreas KerrenISOVIS group, Linnaeus University, Växjö, SwedenCheck out our IEEE VIS 2014 poster abstract
more...
DareDo's curator insight, October 30, 6:07 AM

De multiples manières de visualiser des textes...

Sans doute devrions-nous réfléchir à des manières simples d'organiser nos propres textes et nos ressources.

A creuser certainement...

Stephen Dale's curator insight, November 7, 11:23 AM

A Visual Survey of Text Visualization Techniques. Excellent resource.

Rescooped by luiy from Social Network Analysis #sna
Scoop.it!

Cluster your Twitter Data with #R and #k-means | #datascience

Cluster your Twitter Data with #R and #k-means | #datascience | e-Xploration | Scoop.it

Hello everbody! Today  I want to show you how you can get deeper insights into your Twitter followers with the help of R


Via ukituki
more...
No comment yet.
Scooped by luiy
Scoop.it!

Open source diagramming framework for Java | Datagr4m | #SNA #clustering

Open source diagramming framework for Java | Datagr4m | #SNA #clustering | e-Xploration | Scoop.it
luiy's insight:

Assigning layouts to structural data patterns generates diagrams closed to the domain model conventions.

2 ways for analysing data topologies:

 

1- top-down analysis: compute largest super-structures first, and refine content of each structure by computing internal sub-structures. 

 

2- bottom-up analysis: compute smallest sub-structures first, and then generate super-structures based on sub-structures patterns until no more super-structure is generated.

more...
No comment yet.
Rescooped by luiy from Complex Networks
Scoop.it!

Visualization techniques for categorical analysis of social networks with multiple edge sets | #SNA

Visualization techniques for categorical analysis of social networks with multiple edge sets | #SNA | e-Xploration | Scoop.it

Via Becheru Alexandru
luiy's insight:

The node link graph on the left runs into limitation when trying to compare multiple properties, since only one property can be mapped to color at a time. This makes it hard for the user to look at both gender and grade level. In the radial layout on the right, we group by grade and map color to gender. The visualization starts with 8th grade on top and continues counter-clockwise with 12th grade at bottom right and unknown to the top right. The radial layout shows that gender plays less of a role as kids get older (there is more mixing of gender in higher grades).

more...
No comment yet.
Scooped by luiy
Scoop.it!

A smart local moving #algorithm for large-scale modularity-based community detection | #SNA #clustering

A smart local moving #algorithm for large-scale modularity-based community detection | #SNA #clustering | e-Xploration | Scoop.it
luiy's insight:

Our smart local moving (SLM) algorithm is an algorithm for community detection (or clustering) in large networks. The SLM algorithm maximizes a so-called modularity function. The algorithm has been successfully applied to networks with tens of millions of nodes and hundreds of millions of edges. The details of the algorithm are documented in a paper (preprint available here).

 

The SLM algorithm has been implemented in the Modularity Optimizer, a simple command-line computer program written in Java. The Modularity Optimizer can be freely downloaded. The program can be run on any system that supports Java version 1.6 or higher. In addition to the SLM algorithm, the Modularity Optimizer also provides an implementation of the well-known Louvain algorithm for large-scale community detection developed by Vincent Blondel and colleagues. An extension of the Louvain algorithm with a multilevel refinement procedure, as proposed by Randolf Rotta and Andreas Noack, is implemented as well. All algorithms implemented in the Modularity Optimizer support the use of a resolution parameter to determine the granularity level at which communities are detected.

more...
Jean-Michel Livowsky's curator insight, November 16, 2013 8:38 AM

SLM algoritm. Very nice move in this complex approach of collective intelligence.

Scooped by luiy
Scoop.it!

How To Detect #Communities Using Social Network Analysis | #SNA

How To Detect #Communities Using Social Network Analysis | #SNA | e-Xploration | Scoop.it
luiy's insight:

Think of communities as very similar to the segments identified in a brand’s customer segmentation model. (With demographics analysis layered on, you might even find that they’re the same.)

While direct marketing communications is often customized by segment, historically this hasn’t been something brands have done in social. But, using social network analysis and also Twitter & Facebook ad targeting, it’s possible to send specific messages to specific groups of people.

 

Powered by Pulsar TRAC these could be people engaging in a specific conversation, individuals sharing a piece of content online, or the followers of an account on Twitter. Any group of people, in essence, as long as we can define that audience through some property of its behaviour in social media – such as keyword, user bio, or location.

 

Community analysis allows brands to really understand the behavior of their audiences in a way they can’t replicate with offline, non-social data.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Data Mining #Algorithms In R/Clustering/K-Cores | #datascience #SNA

Data Mining #Algorithms In R/Clustering/K-Cores | #datascience #SNA | e-Xploration | Scoop.it
luiy's insight:
Cores

The notion of core is presented in Butts (2010) as following:

 

Let G = (V, E) be a graph, and let f (v, S, G) for v ∈ V, S ⊆ V be a real-valued vertex property function (in the language of Batagelj and Zaversnik). Then some set H ⊆ V is a generalized k-core for f if H is a maximal set such that f (v, H, G) ≥ k for all v ∈ H. Typically, f is chosen to be a degree measure with respect to S (e.g., the number of ties to vertices in S). In this case, the resulting k-cores have the intuitive property of being maximal sets such that every set member is tied (in the appropriate manner) to at least k others within the set.

 

Degree-based k-cores are a simple tool for identifying well-connected structures within large graphs. Let the core number of vertex v be the value of the highest-value core containing v. Then, intuitively, vertices with high core numbers belong to relatively well-connected sets (in the sense of sets with high minimum internal degree). It is important to note that, while a given k-core need not be connected, it is composed of subsets which are themselves well-connected; thus, the k-cores can be thought of as unions of relatively cohesive subgroups.

 

As k-cores are nested, it is also natural to think of each k-core as representing a “slice” through a hypothetical “cohesion surface” on G. (Indeed, k-cores are often visualized in exactly this manner.)

The kcores function produces degree-based k-cores, for various degree measures (with or without edge values). The return value is the vector of core numbers for V , based on the selected degree measure. Missing (i.e., NA) edge are removed for purposes of the degree calculation.

 
more...
No comment yet.
Rescooped by luiy from Data is big
Scoop.it!

#Open Software for Text Analysis, Text Mining, Text Analytics | #clustering #patterns

#Open Software for Text Analysis, Text Mining, Text Analytics | #clustering #patterns | e-Xploration | Scoop.it
Review of Top 11 Free Software for Text Analysis, Text Mining, Text Analytics ? KH Coder, Carrot2, GATE, tm, Gensim, Natural Language Toolkit, RapidMiner, Unstructured Information Management Architecture, OpenNLP, KNIME, Orange-Textable and LPU are some of the key vendors who provides text analytics software

Via ukituki
more...
No comment yet.
Scooped by luiy
Scoop.it!

VOSviewer. #Dataviz | #SNA #clustering

VOSviewer. #Dataviz | #SNA #clustering | e-Xploration | Scoop.it
VOSviewer is a freely available computer program for creating, visualizing, and exploring bibliometric maps of science.
luiy's insight:

VOSviewer is a freely available computer program that can be used for the following purposes:

 

VOSviewer can be used to create maps based on network data. Maps are created using the VOS mapping technique and the VOS clustering technique.VOSviewer can be used to view and explore maps. It can show a map in various different ways, each emphasizing a different aspect of the map. It offers functionality such as zooming, scrolling, and searching, which facilitates the detailed examination of a map. 

VOSviewer is primarily intended to be used for analyzing bibliometric networks. The program can for instance be used to create maps of publications, authors, or journals based on a co-citation network or to create maps of keywords based on a co-occurrence network. Various examples of maps created using VOSviewer are available here.

 

VOSviewer has been written in the Java programming language and runs on most hardware and operating system platforms. VOSviewer can be downloaded here. The program can also be started directly by clicking the Launch button below.

 

- See more at: http://www.vosviewer.com/#!

more...
No comment yet.
Rescooped by luiy from Social Network Analysis #sna
Scoop.it!

#Clustering Memes in Social Media | #datascience #SNA_indatcom | @jabawack

#Clustering Memes in Social Media | #datascience #SNA_indatcom | @jabawack | e-Xploration | Scoop.it

Via ukituki
luiy's insight:

The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data.

more...
ukituki's curator insight, October 12, 2013 2:43 PM

The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data.