Public Datasets -...
Follow
Find tag "crawling"
8.5K views | +0 today
Public Datasets - Open Data -
Your new post is loading...
Your new post is loading...
Scooped by luiy
Scoop.it!

TubeKit: A Youtube #Crawling Toolkit | #datascience #tools #bigdata

TubeKit: A Youtube #Crawling Toolkit | #datascience #tools #bigdata | Public Datasets - Open Data - | Scoop.it

 #bigdata

luiy's insight:

TubeKit is a toolkit for creating YouTube crawlers. It allows one to build one's own crawler that can crawl YouTube based on a set of seed queries and collect up to 16 different attributes.

 

TubeKit assists in all the phases of this process starting database creation to finally giving access to the collected data with browsing and searching interfaces. In addition to creating crawlers, TubeKit also provides several tools to collect a variety of data from YouTube, including video details and user profiles

more...
No comment yet.
Scooped by luiy
Scoop.it!

Hyphe #Crawler | #Medialab Tools | #dataviz #datamining #SNA_indatcom

Hyphe #Crawler | #Medialab Tools | #dataviz #datamining #SNA_indatcom | Public Datasets - Open Data - | Scoop.it
luiy's insight:
Welcome

Hyphe does not manage different corpora or users at the moment. All the data is stored as a single corpus summarized here.

 

We used: 


- Web Interface:  Domino.js, Sigma.js, Bootstrap, jQuery,Modernizr, Initializr 


- Crawl & storage serverAPI:  Lucene, Scrapy, Twisted,JsonRPC, MongoDB, Thrift

more...
No comment yet.