public final class OpenNlpTokenizer extends Object implements TextTokenizer
A TextTokenizer implemenation based on Apache OpenNLP. OpenNLP
provides several different tokenizers, ranging from simple, rule-based ones to learnable tokenizers relying on a
trained model. For more information, see the documentation section on
tokenization in the OpenNLP Developer Documentation.
| Constructor and Description |
|---|
OpenNlpTokenizer()
Create a new
OpenNlpTokenizer using a SimpleTokenizer, which tokenizes based on same character
classes. |
OpenNlpTokenizer(File modelFile)
Create a new
OpenNlpTokenizer based on a learned model. |
OpenNlpTokenizer(opennlp.tools.tokenize.Tokenizer tokenizer)
Create a new
OpenNlpTokenizer using an arbitrary implementation of Tokenizer. |
public OpenNlpTokenizer()
Create a new OpenNlpTokenizer using a SimpleTokenizer, which tokenizes based on same character
classes.
public OpenNlpTokenizer(opennlp.tools.tokenize.Tokenizer tokenizer)
Create a new OpenNlpTokenizer using an arbitrary implementation of Tokenizer.
tokenizer - public OpenNlpTokenizer(File modelFile)
Create a new OpenNlpTokenizer based on a learned model. Such learned models are available for example on
the OpenNLP Tools Models web page.
modelFile - Path to the model file, must not be null.public Iterator<Token> iterateTokens(String text)
iterateTokens in interface TextTokenizerCopyright © 2018. All rights reserved.