Data storage, indexing and querying
389 views | +0 today
Follow
Data storage, indexing and querying
Techninal information about different data storage, indexing and querying technology
Your new post is loading...
Your new post is loading...
Scooped by Benjamin Habegger
Scoop.it!

Embedded Elasticsearch Server for Tests - Cup of Java

When developing with elasticsearch one of the first problems is how to get tests in place that use a fast server instance. It should be easily …
Benjamin Habegger's insight:
Share your insight
more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Searching With Shingles | Blog | Elasticsearch

Searching With Shingles | Blog | Elasticsearch | Data storage, indexing and querying | Scoop.it
In this article, I want to introduce Shingles.  Shingles are effectively word-nGrams.  Given a stream of tokens, the shingle filter will create new tokens by concatenating adjacent terms.   For example, given the phrase “Shingles is a viral disease”, a shingle filter might produce: Shingles is is a a viral viral disease The shingle filter allows you to [...]
more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Open Knowledge Foundation Labs

Open Knowledge Foundation Labs | Data storage, indexing and querying | Scoop.it
We're a community of civic hackers, data wranglers and ordinary citizens intrigued and excited by the possibilities of combining technology and information for good – making government more accountable, culture more accessible and science more efficient.
more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Apache Solr Parallel Indexing | drupal.org

Apache Solr Parallel Indexing | drupal.org | Data storage, indexing and querying | Scoop.it
Benjamin Habegger's insight:

This article gives indications on indexing rates you can exepect using Apache solr (and therefore Lucene).

more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Scaling Lucene for Indexing a Billion Documents

Today we are sharing our experience on improving Lucene's indexing performance. When we started working on this project, we didn't have much exposure in scaling Lucene other than having some introd...
more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison :: Software architect Kristof Kovacs

more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Fun with elasticsearch's children and nested documents - Space Vatican

When you’re indexing data, the world is rarely as simple as each document existing in isolation. Sometimes, you’re better off …
more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

How to crawl billions of pages?

Is it possible to crawl billions of pages on a single server?
more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Getting Started with Payloads

Mark Miller recently posted a brief intro to Span Queries, so I thought I would piggyback on top of his work and show how to get started with Payloads (see also [1]). Introduction Like Spans, paylo...
more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Scaling Lucene and Solr

Scaling Lucene and Solr | Data storage, indexing and querying | Scoop.it
While many Lucene/Solr applications will never outgrow a single, well-configured machine, the fact is, more and more applications are pushing beyond the single machine limit due to either index siz...
more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Lucene in 5 minutes - Lucene Tutorial.com

Lucene in 5 minutes - Lucene Tutorial.com | Data storage, indexing and querying | Scoop.it
Benjamin Habegger's insight:

Really cool startr for Apache Lucene. Both complete and synthetic at the same time !!

more...
No comment yet.
Scooped by Benjamin Habegger
Scoop.it!

Premiers pas avec ElasticSearch (Partie 1) - Zenika

ElasticSearch est un moteur de recherche open source qui fait beaucoup parler de lui. Et pour cause, il possède un atout majeur : il suffit de quelques minutes à peine pour
more...
No comment yet.