Paragraphs Tokenizers
The tokenizers/paragraphs
module gathers the library’s various paragraphs tokenizers aiming at splitting raw text into lists of paragraphs.
Summary
naive
Naive function using a simple regular expression to split raw text into paragraphs.
This function will consider we have a paragraph split when finding at least two consecutive line breaks while ignoring any line in between containing only spaces or tabulations.
import paragraphs from 'talisman/tokenizers/paragraphs';
paragraphs('First paragraph.\n\nSecond Paragraph.');
>>> [
'First paragraph.',
'Second paragraph.'
]