org.languagetool.tokenizers.ca
Class CatalanWordTokenizer
java.lang.Object
org.languagetool.tokenizers.ca.CatalanWordTokenizer
- All Implemented Interfaces:
- Tokenizer
public class CatalanWordTokenizer
- extends Object
- implements Tokenizer
Tokenizes a sentence into words. Punctuation and whitespace gets its own token.
Special treatment for hyphens and apostrophes in Catalan.
- Author:
- Jaume OrtolĂ
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CatalanWordTokenizer
public CatalanWordTokenizer()
tokenize
public List<String> tokenize(String text)
- Specified by:
tokenize in interface Tokenizer
- Parameters:
text - Text to tokenize
- Returns:
- List of tokens.
Note: a special string ##CA_APOS## is used to replace apostrophes,
and ##CA_HYPHEN## to replace hyphens.
Copyright © 2013. All Rights Reserved.