e-Xploration
Follow
Find tag "datamining"
22.7K views | +5 today
e-Xploration
antropologo.net, dataviz, collective intelligence, algorithms, social learning, social change, digital humanities
Curated by luiy
Your new post is loading...
Your new post is loading...
Scooped by luiy
Scoop.it!

#DataMining Reveals a Global Link Between #Corruption and #Wealth | #dataviz

#DataMining Reveals a Global Link Between #Corruption and #Wealth | #dataviz | e-Xploration | Scoop.it
Social scientists have never understood why some countries are more corrupt than others. But the first study that links corruption with wealth could help change that.

 

One question that social scientists and economists have long puzzled over is how corruption arises in different cultures and why it is more prevalent in some countries than others. But it has always been difficult to find correlations between corruption and other measures of economic or social activity.

 

Michal Paulus and Ladislav Kristoufek at Charles University in Prague, Czech Republic, have for the first time found a correlation between the perception of corruption in different countries and their economic development.

 

The data they use comes from Transparency International, a nonprofit campaigning organisation based in Berlin, Germany, and which defines corruption as the misuse of public power for private benefit. Each year, this organization publishes a global list of countries ranked according to their perceived levels of corruption. The list is compiled using at least three sources of information but does not directly measure corruption, because of the difficulties in gathering such data.

 

Instead, it gathers information from a wide range of sources such as the African Development Bank and the Economist Intelligence Unit. But it also places significant weight on the opinions of experts who are asked to assess corruption levels.

 

The result is the Corruption Perceptions Index ranking countries between 0 (highly corrupt) to 100 (very clean). In 2014, Denmark occupied of the top spot as the world’s least corrupt nation while Somalia and North Korea prop up the table in an unenviable tie for the most corrupt countries on the planet.

more...
No comment yet.
Scooped by luiy
Scoop.it!

What is Chat, Twitter, text messaging and instant messaging abbreviations? - Definition I #semantic #cyberculture

This is a long list of abbreviations used in e-mail and online chatting. Chat abbreviations are commonly used in e-mail, online chatting, online discussion forum postings, instant messaging, and in text messaging, especially between cell phone users.
luiy's insight:
AbbreviationMeaning<3heart404I haven't a clueA3Anyplace, anywhere, anytimeADNAny day nowAFAIKAs far as I knowAFKAway from keyboardAREAcronym-rich environmentASAPAs soon as possible
more...
No comment yet.
Scooped by luiy
Scoop.it!

An introduction to #machinelearning with scikit-learn - #datamining #algorithms

An introduction to #machinelearning with scikit-learn - #datamining #algorithms | e-Xploration | Scoop.it

Machine learning: the problem setting 

 

In general, a learning problem considers a set of n samples of data and then tries to predict properties of unknown data. If each sample is more than a single number and, for instance, a multi-dimensional entry (aka multivariate data), is it said to have several attributes or features

luiy's insight:

We can separate learning problems in a few large categories:

 

- Supervised learning: in which the data comes with additional attributes that we want to predict (Click here to go to the scikit-learn supervised learning page).This problem can be either:

 

- Classification: samples belong to two or more classes and we want to learn from already labeled data how to predict the class of unlabeled data. An example of classification problem would be the handwritten digit recognition example, in which the aim is to assign each input vector to one of a finite number of discrete categories. Another way to think of classification is as a discrete (as opposed to continuous) form of supervised learning where one has a limited number of categories and for each of the n samples provided, one is to try to label them with the correct category or class.

 

- Regression: if the desired output consists of one or more continuous variables, then the task is called regression. An example of a regression problem would be the prediction of the length of a salmon as a function of its age and weight.

 

- Unsupervised learning, in which the training data consists of a set of input vectors x without any corresponding target values. The goal in such problems may be to discover groups of similar examples within the data, where it is called clustering, or to determine the distribution of data within the input space, known as density estimation, or to project the data from a high-dimensional space down to two or three dimensions for the purpose of visualization.

more...
No comment yet.
Scooped by luiy
Scoop.it!

#Taxonomy of Data Scientists | #datascience #skills

#Taxonomy of Data Scientists | #datascience #skills | e-Xploration | Scoop.it
This is a first attempt at classifying data scientists. I invite you to produce a more comprehensive, better solution.

The 10 pioneering data scientists liste…
luiy's insight:

Who is the purest data scientist?

 

Vincent Granville compares the 4-skill mix of each of these 10 data scientists (as found in the above table), with the generic data science skill mix identified in the previous article (Data Science = 0.24 * Data Mining + 0.15 * Machine Learning + 0.14 * Analytics + 0.11 * Big Data). In short, I computed 10 correlations (one per data scientist) to determine who best represents data science.

more...
No comment yet.
Scooped by luiy
Scoop.it!

scikit-learn: machine learning in #Python — #datascience

scikit-learn: machine learning in #Python — #datascience | e-Xploration | Scoop.it
scikit-learnMachine Learning in PythonSimple and efficient tools for data mining and data analysisAccessible to everybody, and reusable in various contextsBuilt on NumPy, SciPy, and matplotlibOpen source, commercially usable - BSD license
luiy's insight:

Classification

Identifying to which set of categories a new observation belong to.

Applications: Spam detection, Image recognition.

Algorithms: SVM, nearest neighbors, random forest, ...

 

Regression

Predicting a continuous value for a new example.

Applications: Drug response, Stock prices.
Algorithms: SVR, ridge regression, Lasso, ...
...etc.
more...
No comment yet.