Ngrams
Reference:
https://en.wikipedia.org/wiki/N-gram
The tokenizers/ngrams
module gather functions used to compute ngrams from the given sequences.
n-grams are a sequence’s subsequences of size n.
import ngrams from 'talisman/tokenizers/ngrams';
// Alternatively, you can use these convenient shortcuts
import {
bigrams,
trigrams,
quadrigrams
} from 'talisman/tokenizers/ngrams';
ngrams(2, ['The', 'cat', 'is', 'happy']);
>>> [
['The', 'cat'],
['cat', 'is'],
['is', 'happy']
]
trigrams(['The', 'cat', 'is', 'happy'])
>>> [
['The', 'cat', 'is'],
['cat', 'is', 'happy']
]
Arguments
- n
number
- size of the subsequences. - sequence
array
- the target sequence.