Paragraphs Tokenizers

The tokenizers/paragraphs module gathers the library’s various paragraphs tokenizers aiming at splitting raw text into lists of paragraphs.

Summary

naive

naive

Naive function using a simple regular expression to split raw text into paragraphs.

This function will consider we have a paragraph split when finding at least two consecutive line breaks while ignoring any line in between containing only spaces or tabulations.

import paragraphs from 'talisman/tokenizers/paragraphs';

paragraphs('First paragraph.\n\nSecond Paragraph.');
>>> [
  'First paragraph.',
  'Second paragraph.'
]