OpenIMAJ is an award-winning set of libraries and tools for multimedia content analysis and content generation.
OpenIMAJ is an award-winning set of libraries and tools for multimedia content analysis and content generation. OpenIMAJ is very broad and contains everything from state-of-the-art computer vision (e.g. SIFT descriptors, salient region detection, face detection, etc.) and advanced data clustering, through to software that performs analysis on the content, layout and structure of webpages.
YASGUI (Yet Another Sparql GUI) is a web application to query any SPARQL endpoint. I work on this tool as a pet-project, as I could not find any other SPARQL graphical user interface which fits all my requirements:
Work on all endpoints (not just the CORS-enabled ones)Multi-platform (i.e. a web application)Easy-to-work user interface (i.e. prefix fetching, syntax highlighting/checking, storing queries)
Wandora is a general purpose information management application. Wandora is written in Java programming language. Wandora's internal data model is based on Topic Maps. Wandora is a desktop application with a graphical user interface, layered data storage, huge collection of data extraction, import and export options and embedded HTTP server. Wandora's license is GNU GPL version 3.
Herramienta interesante para combinar y visualizar datos. Tiene extractores para muchas fuentes (Twitter, CMSs, ...)
To provide that help, we are releasing the Wikilinks Corpus: 40 million total disambiguated mentions within over 10 million web pages -- over 100 times bigger than the next largest corpus (about 100,000 documents, see the table below for mention and entity counts). The mentions are found by looking for links to Wikipedia pages where the anchor text of the link closely matches the title of the target Wikipedia page. If we think of each page on Wikipedia as an entity (an idea we’ve discussed before), then the anchor text can be thought of as a mention of the corresponding entity.
The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer:
20 years ago, Tim Berners-Lee invented the World Wide Web. For his next project, he's building a web for open, linked data that could do for numbers what the Web did for words, pictures, video: unlock our data and reframe the way we use it together.
ReVerb is a program that automatically identifies and extracts binary relationships from English sentences. ReVerb is designed for Web-scale information extraction, where the target relations cannot be specified in advance and speed is important.
TweetUM is the Twitter-based User Modeling Frameworkdeveloped by the Web Information Systems group at TU Delft. It features a set of strategies that allow for semantic enrichment of Twitter messages (tweets) and relates tweets to external (Semantic) Web resources. TweetUM offers functionality such as entity recognition or discovery of relations between entities. Based on enriched tweets, it provides a variety of user modeling strategies that are applied for various applications such as personalized news recommendations. Some TweetUM highlights:
semantic enrichment by linking tweets with (Semantic) Web resourcesuser modeling with temporal dynamicsTUMS Web service makes TweetUM features available to other Web applicationspersonalization based on TweetUM: news recommendations, personalized faceted search
WS4J (WordNet Similarity for Java) provides a pure Java API for several published Semantic Relatedness/Similarity algorithms listed in the table below. Download a jar file from the "Downloads" tab and you can immediately use WS4J on Princeton's English WordNet 3.0 & NICT's Japanese WordNet 0.9, from your Java program. The codebase is mostly a Java re-implementation of WordNet-Similarity-2.05 (Perl), with some test cases for verifying the same logic. WS4J designed to be thread-safe.
With the Nodebox English Linguistics library you can do grammar inflection and semantic operations on English content. You can use the library to conjugate verbs, pluralize nouns, write out numbers, find dictionary descriptions and synonyms for words, summarise texts and parse grammatical structure from sentences.
The library bundles WordNet (using Oliver Steele's PyWordNet), NLTK, Damian Conway'spluralisation rules, Bermi Ferrer's singularization rules, Jason Wiener's Brill tagger, several algorithms adopted from Michael Granger's Ruby Linguistics module, John Wiseman's implementation of the Regressive Imagery Dictionary, Charles K. Ogden's list of basic Englishwords, and Peter Norvig's spelling corrector.
Yelp is a fun and easy way to find, recommend and talk about what's great and not so great in San Francisco and beyond.
If you are a student and come up with an appealing project, you’ll have the opportunity to win one of ten Yelp Dataset Challenge awards for $5,000. Yes, that’s $5,000 for showing us how you use our data in insightful, unique, and compelling ways.
When you type in a search query -- perhaps Plato -- are you interested in the string of letters you typed? Or the concept or entity represented by that string? But knowing that the string represents something real and meaningful only gets you so far in computational linguistics or information retrieval -- you have to know what the string actually refers to. The Knowledge Graph and Freebase are databases of things, not strings, and references to them let you operate in the realm of concepts and entities rather than strings and n-grams.
Ollie is software that automatically identifies and extracts binary relationships from English sentences. Ollie is designed for information extraction, where the target relations cannot be specified in advance and speed is important.
The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites. At the moment, this project does a sentiment analysis on tweets (from twitter.com). It has two modes of operation
Semantic technologies, and in the first place linked data, promise further automation by turning the Web of information into a Web of interconnected and machine processable data sources. Although these technologies have reached an acceptable mature state, they are not broadly used in commercial and public Web applications. In our opinion this is mainly caused by the lack of user-tailored and easy-to-use tool support for creating and publishing semantic contents.
With the One Click Annotator (OCA) we address the issue of missing tool support for creating semantic content by non-expert users. The OCA is an editor for Web browsers for annotating words and phrases with references to ontology concepts. Our main design goal is to simplify the annotation process and provide a tool that non-expert users can easily use to create semantic content:
clear and intuitive user interfacesimple and well-described vocabularieslet users focus on the task of writing text.
This is a Perl module that implements a variety of semantic similarity and relatedness measures based on information found in the lexical database WordNet. In particular, it supports the measures of Resnik, Lin, Jiang-Conrath, Leacock-Chodorow, Hirst-St.Onge, Wu-Palmer, Banerjee-Pedersen, and Patwardhan-Pedersen.
Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.
How to integrate my topics' content to my website?
Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.
Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.