e-Xploration
31.7K views | +4 today
Follow
e-Xploration
antropologiaNet, dataviz, collective intelligence, algorithms, social learning, social change, digital humanities
Curated by luiy
Your new post is loading...
Your new post is loading...
Scooped by luiy
Scoop.it!

Morph : Get structured #data out of the web | #crawlers #datascience

luiy's insight:

Morph A Heroku for Scrapers

 

Get structured data out of the web

 

- All code and collaboration through GitHub

- Write your scrapers in Ruby, Python, PHP or Perl

- Simple API to grab dataSchedule scrapers or run manually

- Process isolation via Docker

- Trivial to move scraper code and data from ScraperWiki Classic

more...
No comment yet.
Scooped by luiy
Scoop.it!

Twitter Archiving Google Spreadsheet TAGS v5 MASHe | #dataviz #extracting #SNA_indatcom

Twitter Archiving Google Spreadsheet TAGS v5 MASHe | #dataviz #extracting #SNA_indatcom | e-Xploration | Scoop.it
For a couple of years I've been sharing a Google Sheet template for archiving searches from Twitter. In September 2012 Twitter announced the release of a new version of their API (the spreadsheet uses this to request data from Twitter).
luiy's insight:
Twitter Archiving Google Spreadsheet TAGS v5

For a couple of years now to support my research in Twitter community analysis/visualisation I’ve been developing my Twitter Archiving Google Spreadsheet (TAGS). To allow other to explore the possibilities of data generated by Twitter I’ve released copies of this template to the community.

 

In September 2012 Twitter announced the release of a new version of their API (the spreadsheet uses this to request data from Twitter). Around the same time Twitter also announced that the old version of their API would be switched off in March 2013. This has required some modification of TAGS to work with the new API. The biggest change for TAGS is that all requests now need authenticated access.

So here it is:

 

*** Twitter Archive Google Spreadsheet – TAGS v5.0 ***


[If the first link doesn't work try Opening this Spreadsheet and File > Make a copy]

more...
No comment yet.
Scooped by luiy
Scoop.it!

Google #crawlers: See which #robots Google uses to crawl the web

Google #crawlers: See which #robots Google uses to crawl the web | e-Xploration | Scoop.it

See which robots Google uses to crawl the web.

luiy's insight:

"Crawler" is a generic term for any program (such as a robot or spider) used to automatically discover and scan websites by following links from one webpage to another. Google's main crawler is called Googlebot. This table lists information about the common Google crawlers you may see in your referrer logs, and how they should be specified in robots.txt, the robots meta tags, and the X-Robots-Tag HTTP directives.

more...
No comment yet.
Scooped by luiy
Scoop.it!

Tweet Archivist Desktop | #dataviz #socialmedia

Tweet Archivist Desktop | #dataviz #socialmedia | e-Xploration | Scoop.it
Tweet Archivist, an desktop application tool to archive, analyze, visualize, save and export tweets.
luiy's insight:
Tweet Archivist Desktop is a Windows application that helps you archive tweets for later data-mining and analysis. Start a search with Tweet Archivist and it will get as many results as it can. Then, leave Tweet Archivist running and it will poll Twitter for that search as frequently as once every five minutes.
more...
No comment yet.