Package com.ibm.icu.impl
Class PatternTokenizer
- java.lang.Object
-
- com.ibm.icu.impl.PatternTokenizer
-
public class PatternTokenizer extends Object
A simple parsing class for patterns and rules. Handles '...' quotations, \\uxxxx and \\Uxxxxxxxx, and symple syntax. The '' (two quotes) is treated as a single quote, inside or outside a quote- Any ignorable characters are ignored in parsing.
- Any syntax characters are broken into separate tokens
- Quote characters can be specified: '...', "...", and \x
- Other characters are treated as literals
-
-
Field Summary
Fields Modifier and Type Field Description static charBACK_SLASHstatic intBROKEN_ESCAPEstatic intBROKEN_QUOTEstatic intDONEstatic intLITERALstatic charSINGLE_QUOTEstatic intSYNTAXstatic intUNKNOWN
-
Constructor Summary
Constructors Constructor Description PatternTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description UnicodeSetgetEscapeCharacters()UnicodeSetgetExtraQuotingCharacters()UnicodeSetgetIgnorableCharacters()intgetLimit()intgetStart()UnicodeSetgetSyntaxCharacters()booleanisUsingQuote()booleanisUsingSlash()intnext(StringBuffer buffer)Stringnormalize()StringquoteLiteral(CharSequence string)StringquoteLiteral(String string)Quote a literal string, using the available settings.PatternTokenizersetEscapeCharacters(UnicodeSet escapeCharacters)Set characters to be escaped in literals, in quoteLiteral and normalize, eg new UnicodeSet("[^\\u0020-\\u007E]");PatternTokenizersetExtraQuotingCharacters(UnicodeSet syntaxCharacters)Sets the extra characters to be quoted in literalsPatternTokenizersetIgnorableCharacters(UnicodeSet ignorableCharacters)Sets the characters to be ignored in parsing, eg new UnicodeSet("[:pattern_whitespace:]");PatternTokenizersetLimit(int limit)PatternTokenizersetPattern(CharSequence pattern)PatternTokenizersetPattern(String pattern)PatternTokenizersetStart(int start)PatternTokenizersetSyntaxCharacters(UnicodeSet syntaxCharacters)Sets the characters to be interpreted as syntax characters in parsing, eg new UnicodeSet("[:pattern_syntax:]")PatternTokenizersetUsingQuote(boolean usingQuote)PatternTokenizersetUsingSlash(boolean usingSlash)
-
-
-
Field Detail
-
SINGLE_QUOTE
public static final char SINGLE_QUOTE
- See Also:
- Constant Field Values
-
BACK_SLASH
public static final char BACK_SLASH
- See Also:
- Constant Field Values
-
DONE
public static final int DONE
- See Also:
- Constant Field Values
-
SYNTAX
public static final int SYNTAX
- See Also:
- Constant Field Values
-
LITERAL
public static final int LITERAL
- See Also:
- Constant Field Values
-
BROKEN_QUOTE
public static final int BROKEN_QUOTE
- See Also:
- Constant Field Values
-
BROKEN_ESCAPE
public static final int BROKEN_ESCAPE
- See Also:
- Constant Field Values
-
UNKNOWN
public static final int UNKNOWN
- See Also:
- Constant Field Values
-
-
Method Detail
-
getIgnorableCharacters
public UnicodeSet getIgnorableCharacters()
-
setIgnorableCharacters
public PatternTokenizer setIgnorableCharacters(UnicodeSet ignorableCharacters)
Sets the characters to be ignored in parsing, eg new UnicodeSet("[:pattern_whitespace:]");- Parameters:
ignorableCharacters- Characters to be ignored.- Returns:
- A PatternTokenizer object in which characters are specified as ignored characters.
-
getSyntaxCharacters
public UnicodeSet getSyntaxCharacters()
-
getExtraQuotingCharacters
public UnicodeSet getExtraQuotingCharacters()
-
setSyntaxCharacters
public PatternTokenizer setSyntaxCharacters(UnicodeSet syntaxCharacters)
Sets the characters to be interpreted as syntax characters in parsing, eg new UnicodeSet("[:pattern_syntax:]")- Parameters:
syntaxCharacters- Characters to be set as syntax characters.- Returns:
- A PatternTokenizer object in which characters are specified as syntax characters.
-
setExtraQuotingCharacters
public PatternTokenizer setExtraQuotingCharacters(UnicodeSet syntaxCharacters)
Sets the extra characters to be quoted in literals- Parameters:
syntaxCharacters- Characters to be set as extra quoting characters.- Returns:
- A PatternTokenizer object in which characters are specified as extra quoting characters.
-
getEscapeCharacters
public UnicodeSet getEscapeCharacters()
-
setEscapeCharacters
public PatternTokenizer setEscapeCharacters(UnicodeSet escapeCharacters)
Set characters to be escaped in literals, in quoteLiteral and normalize, eg new UnicodeSet("[^\\u0020-\\u007E]");- Parameters:
escapeCharacters- Characters to be set as escape characters.- Returns:
- A PatternTokenizer object in which characters are specified as escape characters.
-
isUsingQuote
public boolean isUsingQuote()
-
setUsingQuote
public PatternTokenizer setUsingQuote(boolean usingQuote)
-
isUsingSlash
public boolean isUsingSlash()
-
setUsingSlash
public PatternTokenizer setUsingSlash(boolean usingSlash)
-
getLimit
public int getLimit()
-
setLimit
public PatternTokenizer setLimit(int limit)
-
getStart
public int getStart()
-
setStart
public PatternTokenizer setStart(int start)
-
setPattern
public PatternTokenizer setPattern(CharSequence pattern)
-
setPattern
public PatternTokenizer setPattern(String pattern)
-
quoteLiteral
public String quoteLiteral(CharSequence string)
-
quoteLiteral
public String quoteLiteral(String string)
Quote a literal string, using the available settings. Thus syntax characters, quote characters, and ignorable characters will be put into quotes.- Parameters:
string- String passed to quote a literal string.- Returns:
- A string using the available settings will place syntax, quote, or ignorable characters into quotes.
-
normalize
public String normalize()
-
next
public int next(StringBuffer buffer)
-
-