org.languagetool.tokenizers.ca
Class CatalanWordTokenizer

java.lang.Object
  extended by org.languagetool.tokenizers.ca.CatalanWordTokenizer
All Implemented Interfaces:
Tokenizer

public class CatalanWordTokenizer
extends Object
implements Tokenizer

Tokenizes a sentence into words. Punctuation and whitespace gets its own token. Special treatment for hyphens and apostrophes in Catalan.

Author:
Jaume OrtolĂ 

Constructor Summary
CatalanWordTokenizer()
           
 
Method Summary
 List<String> tokenize(String text)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CatalanWordTokenizer

public CatalanWordTokenizer()
Method Detail

tokenize

public List<String> tokenize(String text)
Specified by:
tokenize in interface Tokenizer
Parameters:
text - Text to tokenize
Returns:
List of tokens. Note: a special string ##CA_APOS## is used to replace apostrophes, and ##CA_HYPHEN## to replace hyphens.


Copyright © 2013. All Rights Reserved.