A tokenizer that takes a document as input and tokenizes it into words, sentences and paragraphs. It is implemented without nltk. This tokenizer can be used in various language models such as n-gram models.
bansalanurag/Tokenizer
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|