Tweets Tokenizers

The tokenizers/tweets module gathers the library’s tweets tokenizers.

The aim of those function is to split tweets into tokens relevant to further analysis.

Summary

casual

Reference: http://www.nltk.org/api/nltk.tokenize.html#module-nltk.tokenize.casual

Authors:
Christopher Potts
Ewan Klein
Pierpaolo Pantone

JavaScript implementation of nltk’s tweets tokenizers.

This tokenizer is aware of urls, handles, hashtags, some emoticons etc.

import casual from 'talisman/tokenizers/tweets/casual';

casual('This is a cooool #dummysmiley: :-)');
>>> [
  'This',
  'is',
  'a',
  'cooool',
  '#dummysmiley',
  ':',
  ':-)'
]