Hyphenation Tokenizers

The tokenizers/hyphenation module gathers the library’s various hyphenation algorithms.

Hyphenation algorithms take raw words and split them into parts that can be separated by hyphens when needing to justify text.

Summary

liang

Reference: https://tug.org/docs/liang/

Liang, Franklin Mark. “Word Hy-phen-a-tion by Com-pu-ter”. PhD dissertation, Stanford University Department of Computer Science. Report number STAN-CS-83-977, August 1983.

JavaScript implementation of Liang’s hyphenation.

Note that this version stores patterns targeting the English language so you might want to avoid using it on other languages.

import liang from 'talisman/tokenizers/hyphenation/liang';

liang('university');
>>> ['uni', 'ver', 'si', 'ty']