Class RuleBasedBreakIterator
- java.lang.Object
-
- com.ibm.icu.text.BreakIterator
-
- com.ibm.icu.text.RuleBasedBreakIterator
-
- All Implemented Interfaces:
Cloneable
public class RuleBasedBreakIterator extends BreakIterator
Rule Based Break Iterator This is a port of the C++ class RuleBasedBreakIterator from ICU4C.
-
-
Field Summary
-
Fields inherited from class com.ibm.icu.text.BreakIterator
DONE, KIND_CHARACTER, KIND_LINE, KIND_SENTENCE, KIND_TITLE, KIND_WORD, WORD_IDEO, WORD_IDEO_LIMIT, WORD_KANA, WORD_KANA_LIMIT, WORD_LETTER, WORD_LETTER_LIMIT, WORD_NONE, WORD_NONE_LIMIT, WORD_NUMBER, WORD_NUMBER_LIMIT
-
-
Constructor Summary
Constructors Constructor Description RuleBasedBreakIterator(String rules)Construct a RuleBasedBreakIterator from a set of rules supplied as a string.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected static voidcheckOffset(int offset, CharacterIterator text)Throw IllegalArgumentException unless begin <= offset < end.Objectclone()Clones this iterator.static voidcompileRules(String rules, OutputStream ruleBinary)Compile a set of source break rules into the binary state tables used by the break iterator engine.intcurrent()Returns the current iteration position.voiddump()Deprecated.This API is ICU internal only.booleanequals(Object that)Returns true if both BreakIterators are of the same class, have the same rules, and iterate over the same text.intfirst()Sets the current iteration position to the beginning of the text.intfollowing(int offset)Sets the iterator to refer to the first boundary position following the specified position.static RuleBasedBreakIteratorgetInstanceFromCompiledRules(InputStream is)Create a break iterator from a precompiled set of break rules.static RuleBasedBreakIteratorgetInstanceFromCompiledRules(ByteBuffer bytes)Deprecated.This API is ICU internal only.intgetRuleStatus()Return the status tag from the break rule that determined the most recently returned break position.intgetRuleStatusVec(int[] fillInArray)Get the status (tag) values from the break rule(s) that determined the most recently returned break position.CharacterIteratorgetText()Return a CharacterIterator over the text being analyzed.inthashCode()Compute a hashcode for this BreakIteratorbooleanisBoundary(int offset)Returns true if the specified position is a boundary position.intlast()Sets the current iteration position to the end of the text.intnext()Advances the iterator to the next boundary position.intnext(int n)Advances the iterator either forward or backward the specified number of steps.intpreceding(int offset)Sets the iterator to refer to the last boundary position before the specified position.intprevious()Moves the iterator backwards, to the last boundary preceding this one.voidsetText(CharacterIterator newText)Set the iterator to analyze a new piece of text.StringtoString()Returns the description (rules) used to create this iterator.-
Methods inherited from class com.ibm.icu.text.BreakIterator
getAvailableLocales, getAvailableULocales, getBreakInstance, getCharacterInstance, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getLineInstance, getLocale, getSentenceInstance, getSentenceInstance, getSentenceInstance, getTitleInstance, getTitleInstance, getTitleInstance, getWordInstance, getWordInstance, getWordInstance, registerInstance, registerInstance, setText, unregister
-
-
-
-
Constructor Detail
-
RuleBasedBreakIterator
public RuleBasedBreakIterator(String rules)
Construct a RuleBasedBreakIterator from a set of rules supplied as a string.- Parameters:
rules- The break rules to be used.
-
-
Method Detail
-
getInstanceFromCompiledRules
public static RuleBasedBreakIterator getInstanceFromCompiledRules(InputStream is) throws IOException
Create a break iterator from a precompiled set of break rules. Creating a break iterator from the binary rules is much faster than creating one from source rules. The binary rules are generated by the RuleBasedBreakIterator.compileRules() function. Binary break iterator rules are not guaranteed to be compatible between different versions of ICU.- Parameters:
is- an input stream supplying the compiled binary rules.- Throws:
IOException- if there is an error while reading the rules from the InputStream.- See Also:
compileRules(String, OutputStream)
-
getInstanceFromCompiledRules
@Deprecated public static RuleBasedBreakIterator getInstanceFromCompiledRules(ByteBuffer bytes) throws IOException
Deprecated.This API is ICU internal only.Create a break iterator from a precompiled set of break rules. Creating a break iterator from the binary rules is much faster than creating one from source rules. The binary rules are generated by the RuleBasedBreakIterator.compileRules() function. Binary break iterator rules are not guaranteed to be compatible between different versions of ICU.- Parameters:
bytes- a buffer supplying the compiled binary rules.- Throws:
IOException- if there is an error while reading the rules from the buffer.- See Also:
compileRules(String, OutputStream)
-
clone
public Object clone()
Clones this iterator.- Overrides:
clonein classBreakIterator- Returns:
- A newly-constructed RuleBasedBreakIterator with the same behavior as this one.
-
equals
public boolean equals(Object that)
Returns true if both BreakIterators are of the same class, have the same rules, and iterate over the same text.- Overrides:
equalsin classObject- Parameters:
that- the object to compare this instance with.- Returns:
trueif the specified object is equal to thisObject;falseotherwise.- See Also:
Object.hashCode()
-
toString
public String toString()
Returns the description (rules) used to create this iterator. (In ICU4C, the same function is RuleBasedBreakIterator::getRules())
-
hashCode
public int hashCode()
Compute a hashcode for this BreakIterator- Overrides:
hashCodein classObject- Returns:
- A hash code
- See Also:
Object.equals(java.lang.Object)
-
dump
@Deprecated public void dump()
Deprecated.This API is ICU internal only.Dump the contents of the state table and character classes for this break iterator. For debugging only.
-
compileRules
public static void compileRules(String rules, OutputStream ruleBinary) throws IOException
Compile a set of source break rules into the binary state tables used by the break iterator engine. Creating a break iterator from precompiled rules is much faster than creating one from source rules. Binary break rules are not guaranteed to be compatible between different versions of ICU.- Parameters:
rules- The source form of the break rulesruleBinary- An output stream to receive the compiled rules.- Throws:
IOException- If there is an error writing the output.- See Also:
getInstanceFromCompiledRules(InputStream)
-
first
public int first()
Sets the current iteration position to the beginning of the text. (i.e., the CharacterIterator's starting offset).- Specified by:
firstin classBreakIterator- Returns:
- The offset of the beginning of the text.
-
last
public int last()
Sets the current iteration position to the end of the text. (i.e., the CharacterIterator's ending offset).- Specified by:
lastin classBreakIterator- Returns:
- The text's past-the-end offset.
-
next
public int next(int n)
Advances the iterator either forward or backward the specified number of steps. Negative values move backward, and positive values move forward. This is equivalent to repeatedly calling next() or previous().- Specified by:
nextin classBreakIterator- Parameters:
n- The number of steps to move. The sign indicates the direction (negative is backwards, and positive is forwards).- Returns:
- The character offset of the boundary position n boundaries away from the current one.
-
next
public int next()
Advances the iterator to the next boundary position.- Specified by:
nextin classBreakIterator- Returns:
- The position of the first boundary after this one.
-
previous
public int previous()
Moves the iterator backwards, to the last boundary preceding this one.- Specified by:
previousin classBreakIterator- Returns:
- The position of the last boundary position preceding this one.
-
following
public int following(int offset)
Sets the iterator to refer to the first boundary position following the specified position.- Specified by:
followingin classBreakIterator- Parameters:
offset- The position from which to begin searching for a break position.- Returns:
- The position of the first break after the current position.
-
preceding
public int preceding(int offset)
Sets the iterator to refer to the last boundary position before the specified position.- Overrides:
precedingin classBreakIterator- Parameters:
offset- The position to begin searching for a break from.- Returns:
- The position of the last boundary before the starting position.
-
checkOffset
protected static final void checkOffset(int offset, CharacterIterator text)Throw IllegalArgumentException unless begin <= offset < end.
-
isBoundary
public boolean isBoundary(int offset)
Returns true if the specified position is a boundary position. As a side effect, leaves the iterator pointing to the first boundary position at or after "offset".- Overrides:
isBoundaryin classBreakIterator- Parameters:
offset- the offset to check.- Returns:
- True if "offset" is a boundary position.
-
current
public int current()
Returns the current iteration position.- Specified by:
currentin classBreakIterator- Returns:
- The current iteration position.
-
getRuleStatus
public int getRuleStatus()
Return the status tag from the break rule that determined the most recently returned break position. The values appear in the rule source within brackets, {123}, for example. For rules that do not specify a status, a default value of 0 is returned. If more than one rule applies, the numerically largest of the possible status values is returned.Of the standard types of ICU break iterators, only the word break iterator provides status values. The values are defined in class RuleBasedBreakIterator, and allow distinguishing between words that contain alphabetic letters, "words" that appear to be numbers, punctuation and spaces, words containing ideographic characters, and more. Call
getRuleStatusafter obtaining a boundary position fromnext(),previous(), or any other break iterator functions that returns a boundary position.- Overrides:
getRuleStatusin classBreakIterator- Returns:
- the status from the break rule that determined the most recently returned break position.
-
getRuleStatusVec
public int getRuleStatusVec(int[] fillInArray)
Get the status (tag) values from the break rule(s) that determined the most recently returned break position. The values appear in the rule source within brackets, {123}, for example. The default status value for rules that do not explicitly provide one is zero.The status values used by the standard ICU break rules are defined as public constants in class RuleBasedBreakIterator.
If the size of the output array is insufficient to hold the data, the output will be truncated to the available length. No exception will be thrown.
- Overrides:
getRuleStatusVecin classBreakIterator- Parameters:
fillInArray- an array to be filled in with the status values.- Returns:
- The number of rule status values from rules that determined the most recent boundary returned by the break iterator. In the event that the array is too small, the return value is the total number of status values that were available, not the reduced number that were actually returned.
-
getText
public CharacterIterator getText()
Return a CharacterIterator over the text being analyzed. This version of this method returns the actual CharacterIterator we're using internally. Changing the state of this iterator can have undefined consequences. If you need to change it, clone it first.- Specified by:
getTextin classBreakIterator- Returns:
- An iterator over the text being analyzed.
-
setText
public void setText(CharacterIterator newText)
Set the iterator to analyze a new piece of text. This function resets the current iteration position to the beginning of the text.- Specified by:
setTextin classBreakIterator- Parameters:
newText- An iterator over the text to analyze.
-
-