Public Datasets -...
Follow
Find tag "bigdata"
8.2K views | +0 today
Public Datasets - Open Data -
Your new post is loading...
Your new post is loading...
Scooped by luiy
Scoop.it!

TubeKit: A Youtube #Crawling Toolkit | #datascience #tools #bigdata

TubeKit: A Youtube #Crawling Toolkit | #datascience #tools #bigdata | Public Datasets - Open Data - | Scoop.it

 #bigdata

luiy's insight:

TubeKit is a toolkit for creating YouTube crawlers. It allows one to build one's own crawler that can crawl YouTube based on a set of seed queries and collect up to 16 different attributes.

 

TubeKit assists in all the phases of this process starting database creation to finally giving access to the collected data with browsing and searching interfaces. In addition to creating crawlers, TubeKit also provides several tools to collect a variety of data from YouTube, including video details and user profiles

more...
No comment yet.
Scooped by luiy
Scoop.it!

Data Repositories - Mother's Milk for Data Scientists | #datasets #opendata

Data Repositories - Mother's Milk for Data Scientists | #datasets #opendata | Public Datasets - Open Data - | Scoop.it
Mothers are life givers, giving the milk of life. While there are so very few analogies so apropos, data is often considered the Mother's Milk of Corporate Valuation. So, as a data scientist, we sh...
luiy's insight:

Here are a few repositories from KDnuggets that are worth taking a look at:

more...
No comment yet.
Rescooped by luiy from dataInnovation
Scoop.it!

From Data Ownership to #Data Usage: Personal Data Marketplaces | #PersonalData #myData

From Data Ownership to #Data Usage: Personal Data Marketplaces | #PersonalData #myData | Public Datasets - Open Data - | Scoop.it
In a world where more data is created than ever before, data ownership is gaining in importance. What are the possibilities for organisations and consumers?

Via Vanrijmenam, Fàtima Galan
luiy's insight:

Personal Data Marketplaces

 

This is just the beginning. There are also Big Data startups that are developing personal data marketplaces. These young companies are taking a different approach regarding Big Data and are empowering consumers to determine what’s done with their data and receive monetary rewards for the usage of their data.

 

One of such companies is Handshake. They are working hard to cut out the data brokers such as Experian or Acxiom and give consumers the power over their personal data. End-users are giving monetary rewards in exchange for their data and a bit of their time. Users can share the usual personal information as well as more detailed personal information about their hobbies and life. The more data is shared and the more time spent with Handshake, the more money consumers can make. According to Duncan White, CEO of Handshake, an individual can make up to $ 24.000 per year through the platform.  Of course this requires substantial time and dedication to the platform, but it is an interesting business model.

 

Another new startup is Ctrlio. This company is developing a platform for individuals to become more in control of their own data, decide what to do with the data and save money too via personalized offers. The advantage for brands is that they can make very relevant offers based on rich personal profiles, resulting in higher conversion rates.

A third Big Data startup targeting personal data is Datacoup. Currently they are running a beta where they offer users $8 a month in return for their data

 

___________________

 

Links

 

https://datacoup.com/ ; (PERSONAL DATA MARKETPLACE)

 

http://handshake.uk.com/hs/index.html

 

http://thenextweb.com/insider/2013/09/17/whats-the-true-value-of-your-personal-data-meet-the-people-who-want-to-help-you-sell-it/

 

http://ctrlio.com/

 

 

more...
Fàtima Galan's curator insight, May 30, 2014 7:26 AM

"There are also Big Data startups that are developing personal data marketplaces. These young companies are taking a different approach regarding Big Data and are empowering consumers to determine what’s done with their data and receive monetary rewards for the usage of their data."

 

"Another new startup is Ctrlio. This company is developing a platform for individuals to become more in control of their own data, decide what to do with the data and save money too via personalized offers. The advantage for brands is that they can make very relevant offers based on rich personal profiles, resulting in higher conversion rates."

Rescooped by luiy from Homo Agilis (Collective Intelligence, Agility and Sustainability : The Future is already here)
Scoop.it!

Quantified Self Ideology: #Personal Data becomes #BigData | #sensors #predictive

A key contemporary trend emerging in big data science is the quantified self: individuals engaged in the deliberate self-tracking of any kind of biological, ...

Via Claude Emond
more...
No comment yet.
Rescooped by luiy from Big Data : Digital Assets to Evaluate, Protect and Value
Scoop.it!

7 ways #BigData could revolutionize life | #health #hadoop #algorithms

7 ways #BigData could revolutionize life | #health #hadoop #algorithms | Public Datasets - Open Data - | Scoop.it
7 ways Big Data could revolutionize life by 2020 #infographic | See more about big data.

Via C. CHAMBET-FALQUET
more...
No comment yet.
Rescooped by luiy from Personal data and technology
Scoop.it!

Beware the Big Errors of #Big Data | #controverses

Beware the Big Errors of #Big Data | #controverses | Public Datasets - Open Data - | Scoop.it
We’re more fooled by noise than ever before, and it’s because of a nasty phenomenon called “big data”. Big data may mean more information, but it also means more false information.

Via C.I.L. CONSULTING
luiy's insight:

Big-data researchers have the option to stop doing their research once they have the right result. In options language: The researcher gets the “upside” and truth gets the “downside.” It makes him antifragile, that is, capable of benefiting from complexity and uncertainty — and at the expense of others.

 

But beyond that, big data means anyone can find fake statistical relationships, since the spurious rises to the surface. This is because in large data sets, large deviations are vastly more attributable to variance (or noise) than to information (or signal). It’s a property of sampling: In real life there is no cherry-picking, but on the researcher’s computer, there is. Large deviations are likely to be bogus.

more...
No comment yet.
Rescooped by luiy from Big Data, Cloud and Social everything
Scoop.it!

Center for Data #Innovation » #Data Innovation 101

Center for Data #Innovation » #Data Innovation 101 | Public Datasets - Open Data - | Scoop.it

Via Pierre Levy
luiy's insight:

A conversation about data-driven innovation is possible now because new technologies have made it easier and cheaper to collect, store, analyze, use, and disseminate data. But while the potential for vastly more data-driven innovation exists, many organizations have been slow to adopt these technologies. Policymakers around the world should do more to spur data-driven innovation in both the public and private sectors, including by supporting the development of human capital, encouraging the advancement of innovative technology, and promoting the availability of data itself for use and reuse.

more...
Arent van 't Spijker's curator insight, February 13, 2014 7:09 AM

They must be very busy: the Center for Data Innovation formulates and promotes pragmatic public policies designed to enable data-driven innovation in the public and private sector, create new economic opportunities, and improve quality of life.

Scooped by luiy
Scoop.it!

#BigData vs #OpenData - Mapping It Out | #OpenGob

#BigData vs #OpenData - Mapping It Out | #OpenGob | Public Datasets - Open Data - | Scoop.it
Confused by Open Data, Big Data, and Open Government? This Venn diagram explains it all.
luiy's insight:

As I’ve worked on my upcoming book, Open Data Now – to be published by McGraw-Hill on January 10 – I’ve had to think through and explain how Open Data, Big Data, and Open Government are related to each other. Lately I’ve seen a number of others, like the authors of the new McKinsey Open Data report (see page 4), try to map the territory in similar ways. The Open Data community is producing a lot of Venn diagrams these days, with a lot of colorful overlapping circles. (Some also deal with the use of Personal Data, but that’s one circle too many for me.)

 

For my own contribution to the discussion, I’m proposing the model shown here. To all Open Data wonks: Please take a look, comment, and add your own ideas. We’re at a stage where we need to define more precisely what we’re talking about. This may help.

 

My starting point is the evolving understanding of these three areas. Big Data essentially describes very large datasets, but that’s a somewhat subjective judgment that depends on technology: today’s Big Data may not seem so big in a few years when data analysis and computing technology improve. Open Government is a combination of ideas: it includes collaborative strategies to engage citizens in government; government releasing data about its own operations, like federal spending data; and government releasing data that it collects on issues of public interest, such as health, environment, and different industries.

more...
No comment yet.
Scooped by luiy
Scoop.it!

X-RIME: #Hadoop based large scale social network analysis | #bigdata #SNA

luiy's insight:

Today's Internet-based social network sites possess huge user communities. They hold large amount of data about their users and want to generate core competency from the data. A key enabler for this is a cost efficient solution for social data management and social network analysis (SNA). 

Such a solution faces a few challenges. The most important one is that the solution should be able to handle massive and heterogeneous data sets. Facing this challenge, the traditional data warehouse based solutions are usually not cost efficient enough. On the other hand, existing SNA tools are mostly used in single workstation mode, and not scalable enough. To this end, low cost and highly scalable data management and processing technologies from cloud computing society should be brought in to help. 

However, most of existing cloud based data analysis solutions are trying to provide SQL-like general purpose query languages, and do not directly support social network analysis. This makes them hard to optimize and hard to use for SNA users. So, we came up with X-RIME to fix this gap. 

So, briefly speaking, X-RIME wants to provide a few value-added layers on top of existing cloud infrastructure, to support smart decision loops based on massive data sets and SNA. To end users, X-RIME is a library consists of Map-Reduce programs, which are used to do raw data pre-processing, transformation, SNA metrics and structures calculation, and graph / network visualization. The library could be integrated with other Hadoop based data warehouses (e.g., HIVE) to build more comprehensive solutions.

more...
No comment yet.
Rescooped by luiy from Social Network Analysis #sna
Scoop.it!

Making sense of #BigData : mining Twitter names I #dataviz #dh

Making sense of #BigData : mining Twitter names I #dataviz #dh | Public Datasets - Open Data - | Scoop.it

Millions of geo tweets in various languages, discussing anything from 'hey, I'm here' to finance, geopolitics or marketing. How do you make sense of them? 


Via ukituki
luiy's insight:

Our name recognition software can predict, given a person name : its cultural and linguistic classification, country of origin, gender and spelling variants.

 

Our onomastics blog presents a few examples of data visualizations, prepared using NamSor™ Onomastics software (NomTri™).

To know more about what we do, visit our website at http://namsor.com/ or email us at contact@namsor.com

more...
ukituki's curator insight, December 21, 2013 8:27 AM

We’ve used name recognition (applied onomastics) to filter information and produce unique maps of the e-Diasporas. Where are the digitally connected Italian, Turkish and Russian today? They may be migrants, tourists, business travellers, student, visiting scientists…

Rescooped by luiy from Communication à l'ère du numérique
Scoop.it!

#BigData : pourquoi nos métadonnées sont-elles plus personnelles que nos empreintes digitales? I #privacy #emotions

#BigData : pourquoi nos métadonnées sont-elles plus personnelles que nos empreintes digitales? I #privacy #emotions | Public Datasets - Open Data - | Scoop.it

A l’occasion du colloque "la politique des données personnelles : Big Data ou contrôle individuel " organisé par l’Institut des systèmes complexes et l’Ecole normale supérieure de Lyon qui se tenait le 21 novembre dernier, Yves-Alexandre de Montjoye (@yvesalexandre) était venu présenter ses travaux, et à travers lui, ceux du MediaLab sur ce sujet (Cf. "D'autres outils et règles pour mieux contrôler les données" ). Yves-Alexandre de Montjoye est doctorant au MIT. Il travaille au laboratoire de dynamique humaine du Media Lab, aux côtés de Sandy Pentland, dont nous avons plusieurs fois fait part des travaux.

Nos données de déplacements sont encore plus personnelles que nos empreintes digitales

Faire correspondre des empreintes digitales n’est pas si simple, rappelle Yves-Alexandre de Montjoye. Dans Les preuves de l’identité, Edmond Locard, le fondateur de la police scientifique, explique qu’il suffit d’utiliser 12 points de références pour être sur et certain d’identifier les empreintes digitales d'un individu.


Via Andrea Naranjo
luiy's insight:

Le BFI (Big Five Inventory), cet inventaire des cinq grands facteurs de personnalité est un test psychologique mis au point par les psychologues John, Donahue et Kentle en 1991 (voir Wikipédia), qui depuis une centaine de questions permet de décrire 5 grands types de caractères auxquels sont corrélés des caractéristiques comme la performance au travail ou la capacité à prendre des décisions d'achats. Pour chacun de ceux qui passe le test, le modèle distingue 5 grandes caractéristiques psychologiques comme l’ouverture à l’expérience (c’est-à-dire l’appréciation de l'art, de l'émotion, de l'aventure, des idées peu communes, la curiosité et l’imagination), la conscienciosité (c’est-à-dire l’autodiscipline, le respect des obligations, l’organisation plutôt que la spontanéité), l’extraversion (l’énergie, la tendance à chercher la stimulation et la compagnie des autres), l’agréabilité (une tendance à être compatissant et coopératif plutôt que soupçonneux et antagonique envers les autres) et enfin le névrosisme ou neuroticisme (c’est-à-dire le contraire de la stabilité émotionnelle, à savoir la tendance à éprouver facilement des émotions désagréables comme la colère, l'inquiétude, la dépression ou la vulnérabilité). Pour les psychologues qui utilisent ces tests depuis longtemps, nos réponses permettent d’évaluer notre profil psychologique selon ces critères qui permettent à leur tour d’induire un grand nombre de caractéristiques comme la performance au travail ou la capacité à prendre des décisions d’achats…

more...
No comment yet.
Scooped by luiy
Scoop.it!

Apache #Mahout: Scalable #MachineLearning and #DataMining I #bigdata

Apache #Mahout: Scalable #MachineLearning and #DataMining I #bigdata | Public Datasets - Open Data - | Scoop.it
luiy's insight:
Mahout currently hasCollaborative FilteringUser and Item based recommendersK-Means, Fuzzy K-Means clusteringMean Shift clusteringDirichlet process clusteringLatent Dirichlet AllocationSingular value decompositionParallel Frequent Pattern miningComplementary Naive Bayes classifierRandom forest decision tree based classifierHigh performance java collections (previously colt collections)A vibrant communityand many more cool stuff to come by this summer thanks to Google summer of code
more...
No comment yet.
Rescooped by luiy from Hot Trends in Business Intelligence
Scoop.it!

How Soon Will #BigData Yield Big Profits?

How Soon Will #BigData Yield Big Profits? | Public Datasets - Open Data - | Scoop.it
Big Data is “the next frontier for innovation, competition, and productivity,” says McKinsey & Company. But companies and executives rushing into data collection and analysis expecting immediate payoffs are bound to be disappointed.

Via Yves Mulkers
more...
No comment yet.
Scooped by luiy
Scoop.it!

Project #BigData. Expanding on Project C to look at a different use case | #datascience #opendata

Project #BigData. Expanding on Project C to look at a different use case | #datascience #opendata | Public Datasets - Open Data - | Scoop.it
luiy's insight:

Project Big Data is an interactive tool which enables you to visualize and explore the funding patterns of over 600 companies in the Big Data ecosystem! It is based on the work I did for Project C (which you see and can read about here). The list of companies and their classification into categories is based on a dozen published sources and rough text analytics of the Crunchbase database. Crunchbase is a curated crowed sourced database of over 285k companies.

 

As for the data, there are 645 public & private companies in the data set. From Teradata and IBM to Actuate & Zoomdata. I began by harvesting data from Crunchbase by using their free API w/ Python. As of September, Crunchbase had 1250 funding events for 410 of the companies on my list. I've grouped these companies into 18 categories, allowing you to compare peers as well as trends across categories. Some of the categories are broken down further. For example, the tool allows you to differentiate between cloud-based and on premise solutions or SQL vs. NoSQL databases. I gathered additional data from a variety of sources. For example, LinkedIn was used to find the number of employees.

 

 

OPENACCESS Workbook: Project Big Data v1.0 

https://public.tableausoftware.com/download/workbooks/ProjectBigDatav1_0?format=html

 

more...
No comment yet.
Rescooped by luiy from Data Visualization & Open data
Scoop.it!

10 Great Places to Find #Datasets for Infographics | #opendata

10 Great Places to Find #Datasets for Infographics | #opendata | Public Datasets - Open Data - | Scoop.it

“Creating an infographic is an excellent way to break down complex information and statistics into an easy-to-follow visual that is designed with your target audience in mind. Infographics have grown in popularity because they are easy to share and a simple tactic for promoting a business. Perhaps you’ve considered incorporating infographics into your content marketing…”


Via massimo facchinetti, Jesse Soininen
luiy's insight:

1. Government Websites
2. Google’s Public Data Directory 
3. Reddit.com 
4. Infochimp 
5. UNData 
6. Visual.ly
7. DataMarket
8. Number Of
9. Gallup Polls
10. Get the Data

more...
No comment yet.
Rescooped by luiy from The urban.NET
Scoop.it!

Watch_Dogs WeAreData | #smartcities #opendata

Watch_Dogs WeAreData | #smartcities #opendata | Public Datasets - Open Data - | Scoop.it
Discover how data controls the cities of Paris, London and Berlin in these hyperconnected times.
more...
luiy's curator insight, May 28, 2014 6:28 AM

Watch_Dogs WeareData gathers available geolocated datain a non-exhaustive way: we only display the information for which we have been given the authorization by the sources. Yet, it is already a huge amount of data. You may even watch what other users are looking at on the website through Facebook connect.

Emeric Nectoux's curator insight, June 3, 2014 2:50 PM

Good visualization of streaming data geo-located. 

Scooped by luiy
Scoop.it!

The Big Data #open source #tools | #bigdata

The Big Data #open source #tools | #bigdata | Public Datasets - Open Data - | Scoop.it
The Big Data open source tools landscape is growing rapidly. Check it out here.
luiy's insight:

There are already so many open source tools related to Big Data. Check out the below figure to find out about the most important open source tools for big data. In the near future we will describe each open source tool in more detail. At the moment you can click the logo’s to open the respective website

more...
No comment yet.
Scooped by luiy
Scoop.it!

Now available: Planning for #BigData | #opendata #freebook

Now available: Planning for #BigData | #opendata #freebook | Public Datasets - Open Data - | Scoop.it
Planning for Big Data is a new book that helps you understand what big data is, why it matters, and where to get started.
luiy's insight:

" thanks to an open source project called Hadoop, commodity Linux hardware and cloud computing, this power is in reach for everyone. A data revolution is sweeping business, government and science, with consequences as far reaching and long lasting as the web itself. "


Our aim with Strata is to help you understand what big data is, why it matters, and where to get started. In the wake the recent conference, we’re delighted to announce the publication of our “Planning for Big Data” book. Available as a free download, the book contains the best insights from O’Reilly Radar authors over the past three months, including myself, Alistair Croll, Julie Steele and Mike Loukides.

more...
Mlik Sahib's curator insight, March 4, 2014 9:17 PM
Where to start?

"Every revolution has to start somewhere, and the question for many is “how can data science and big data help my organization?” After years of data processing choices being straightforward, there’s now a diverse landscape to negotiate. What’s more, to become data driven, you must grapple with changes that are cultural as well as technological.

Our aim with Strata is to help you understand what big data is, why it matters, and where to get started. In the wake the recent conference, we’re delighted to announce the publication of our “Planning for Big Data” book. Available as a free download, the book contains the best insights from O’Reilly Radar authors over the past three months, including myself, Alistair Croll, Julie Steele and Mike Loukides.

“Planning for Big Data” is for anybody looking to get a concise overview of the opportunity and technologies associated with big data. If you’re already working with big data, hand this book to your colleagues or executives to help them better appreciate the issues and possibilities."

Scooped by luiy
Scoop.it!

Mining of Massive Datasets | #bigdata #datascience #freebook

more...
Jay Ratcliff's curator insight, February 20, 2014 11:24 AM

I like this book.  I had a class a Coursera where we used this text.  One of the things it helped me with was the mechanics of clustering and using different ways to measure distance between objects in a euclidean or non-euclidean space.  Plus there is a lot of stuff on Map-Reduce as well.

Scooped by luiy
Scoop.it!

Data innovations report | #bigdata #opendata

more...
Arent van 't Spijker's curator insight, February 13, 2014 7:15 AM

Literally a list of 100 data innovations. I particularly like IBM's content analysis of ingredients and cooking methods to create new and exotic recipes.

Rescooped by luiy from Personal data and technology
Scoop.it!

A White House report will investigate how #BigData affect everyday #privacy | #NSA

A White House report will investigate how #BigData affect everyday #privacy | #NSA | Public Datasets - Open Data - | Scoop.it
A new Presidential report will investigate how data collection and analysis affect everyday privacy.

Via C.I.L. CONSULTING
luiy's insight:

A presidential working group is to examine how large-scale data collection and analysis by the private and public sectors for purposes outside of intelligence or law enforcement are affecting privacy. The goal is to identify areas in which new policies might be needed to restrain the technology and business of “big data” (a term that the White House post acknowledges is somewhat nebulous by linking to this MIT Technology Review post).

 

Snowden’s leaks didn’t say much directly about the creeping influence of such techniques in everyday life. But they certainly caused many people and businesses to be more aware of privacy issues of all kinds. It seems that similar thinking led President Obama to ask for this review, which he first mentioned publicly in last week’s speech about NSA surveillance (see “Obama Promises Reform of Phone Data Collection Program”).

more...
No comment yet.
Rescooped by luiy from Digital Humanities for beginners
Scoop.it!

#BigData & Society | #journals #openaccess

#BigData & Society | #journals #openaccess | Public Datasets - Open Data - | Scoop.it

Via Pierre Levy
luiy's insight:

Big Data & Society (BD&S) is an open access peer-reviewed scholarly journal that publishes interdisciplinary work principally in the social sciences, humanities and computing and their intersections with the arts and natural sciences about the implications of Big Data for societies.

The Journal's key purpose is to provide a space for connecting debates about the emerging field of Big Data practices and how they are reconfiguring academic, social, industry, business and government relations, expertise, methods, concepts and knowledge.

 

BD&S moves beyond usual notions of Big Data and treats it as an emerging field of practices that is not defined by but generative of (sometimes) novel data qualities such as high volume and granularity and complex analytics such as data linking and mining. It thus attends to digital content generated through online and offline practices in social, commercial, scientific, and government domains. This includes, for instance, content generated on the Internet through social media and search engines but also that which is generated in closed networks (commercial or government transactions) and open networks such as digital archives, open government and crowdsourced data.  Critically, rather than settling on a definition the Journal makes this an object of interdisciplinary inquiries and debates explored through studies of a variety of topics and themes.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Help your doctor; #share your data I #bigdata #health

How do we do we fix our broken healthcare system? Susan Desmond-Hellmann sees a future in which consumers drive treatments and innovation, starting by sharin...
more...
No comment yet.
Scooped by luiy
Scoop.it!

Internet Archaeologists Reconstruct Lost Web Pages I #DataAnalysis #bigdata

Internet Archaeologists Reconstruct Lost Web Pages I #DataAnalysis #bigdata | Public Datasets - Open Data - | Scoop.it
Two Internet archaeologists have found a way to reconstruct lost web pages and deleted material, using traces found online on Twitter, blogs and elsewhere.
luiy's insight:

First, some background. The pair began their work by studying the thousands of tweets, blog posts and other resources that were published during the 18 days of uprising in the Egyptian revolution in 2011. These resources were important, they say, because they provide a valuable record of a historic event.

 

However, they also discovered that some of these posts and others on the web were disappearing and began to measure the rate at which they were vanishing. Hence the numbers given above.

The new work is their attempt to reconstruct these missing posts and resources, at least in part, from the clues they leave behind on the web.

 

SalahEldeen and Nelson began by attempting to confirm the earlier results, and that threw up a surprise.

“An interesting phenomena occurred as several of the resources that were previously declared as missing became available again,” they say.

 

That’s possible if the original disappearance was the result of a disrupted domain or archive that was later restored, or a user account that had been suspended and later reinstated.

more...
No comment yet.