Mnemonist
Inverted Index
An InvertedIndex
is an index that considers the inserted documents as a set of tokens that are all keys that one can use to retrieve the documents.
const InvertedIndex = require('mnemonist/inverted-index');
Use case
Usually, InvertedIndex
are used to build full-text search engines.
Here is how it works:
- Transform the inserted document into a set of tokens.
- For each token, add an entry to an internal map with the key being the token & the value being the document.
Let’s see how we could search for documents using a very naive word tokenizer:
// The `words` function from lodash is pretty good, for instance
const words = require('lodash/words');
const documents = [
'The mouse is a gentle animal',
'The cats eats the mouse',
'The mouse eats cheese'
];
// Creating our inverted index with our `words` tokenizer function
const index = new InvertedIndex(words);
// Now we can query it
index.get('mouse');
>>> [
'The mouse is a gentle animal',
'The mouse eats cheese'
]
index.union('cats cheese');
>>> [
'The cats eats the mouse',
'The mouse eats cheese'
]
Constructor
The InvertedIndex
either takes a single argument being a tokenizer function that will process both inserted items or keys & the queries; or a tuple containing two tokenizer functions, one for the inserted items or keys and the second one for the queries.
Example with one tokenizer function
// Let's create an index using a single hash function:
const index = new InvertedIndex((value) => words(value));
index.add('The mouse likes cheese.');
index.query('cheese');
Example with two tokenizer functions
// Let's create an index using two different hash functions:
const index = new InvertedIndex([
// Tokenizer function for inserted documents:
(doc) => words(doc.text),
// Tokenizer function for queries
(query) => words(query),
]);
// Then you'll probably use #.add to insert items
index.add({text: 'The mouse likes cheese.'});
index.query('mouse');
Static #.from
Alternatively, one can build an InvertedIndex
from an arbitrary JavaScript iterable likewise:
const index = InvertedIndex.from(list, tokenizer);
const index = InvertedIndex.from(list, tokenizers);
Members
Methods
Mutation
Read
Iteration
#.size
Number of documents stored in the index.
const index = new InvertedIndex(words);
index.add('The cat eats the mouse.');
index.size
>>> 1
#.dimension
Number of distinct tokens stored in the index (size of the dictionary, if you will).
const index = new InvertedIndex(words);
index.add('The cat eats the mouse.');
index.dimension
>>> 4
#.add
Tokenize the given document using the relevant function and adds it to the index.
O(t)
, t being the number of tokens.
const index = new InvertedIndex(words);
index.add('The cat eats the mouse.');
#.clear
Completely clears the index of its documents.
const index = new InvertedIndex(words);
index.add('The cat eats the mouse.');
index.clear();
index.size
>>> 0
#.get
Tokenize the query using the relevant function, then retrieves the intersection of documents containing the resulting tokens.
O(t)
, t being the number of tokens.
const index = new InvertedIndex(words);
index.add('The cat eats the mouse.');
index.add('The mouse eats cheese.');
index.get('mouse');
>>> [
'The cat eats the mouse.',
'The mouse eats cheese.'
]
index.get('cat mouse');
>>> [
'The cat eats the mouse.'
]
#.forEach
Iterates over the index by applying the callback to every stored document.
const index = new InvertedIndex(words);
index.add('The cat eats the mouse.');
index.add('The mouse eats cheese.');
index.forEach((doc) => {
console.log(doc);
});
#.documents
Returns an iterator over the index’s documents.
const index = new InvertedIndex(words);
index.add('The cat eats the mouse.');
index.add('The mouse eats cheese.');
const iterator = index.documents();
iteraror.next().value
>>> 'The cat eats the mouse.'
#.tokens
Returns an iterator over the index’s tokens.
const index = new InvertedIndex(words);
index.add('The cat eats the mouse.');
index.add('The mouse eats cheese.');
const iterator = index.tokens();
iterator.next().value
>>> 'The'
Iterable
Alternatively, you can iterate over an index’s documents using ES2015 for...of
protocol:
const index = new InvertedIndex(words);
index.add('The cat eats the mouse.');
index.add('The mouse eats cheese.');
for (const doc of index) {
console.log(doc);
}