Class CollationIterator

  • Direct Known Subclasses:
    IterCollationIterator, UTF16CollationIterator

    public abstract class CollationIterator
    extends Object
    Collation element iterator and abstract character iterator. When a method returns a code point value, it must be in 0..10FFFF, except it can be negative as a sentinel value.
    • Constructor Detail

      • CollationIterator

        public CollationIterator​(CollationData d)
        Partially constructs the iterator. In Java, we cache partially constructed iterators and finish their setup when starting to work on text (via reset(boolean) and the setText(numeric, ...) methods of subclasses). This avoids memory allocations for iterators that remain unused.

        In C++, there is only one constructor, and iterators are stack-allocated as needed.

      • CollationIterator

        public CollationIterator​(CollationData d,
                                 boolean numeric)
    • Method Detail

      • equals

        public boolean equals​(Object other)
        Description copied from class: Object
        Compares this instance with the specified object and indicates if they are equal. In order to be equal, o must represent the same object as this instance using a class-specific comparison. The general contract is that this comparison should be reflexive, symmetric, and transitive. Also, no object reference other than null is equal to null.

        The default implementation returns true only if this == o. See Writing a correct equals method if you intend implementing your own equals method.

        The general contract for the equals and Object.hashCode() methods is that if equals returns true for any two objects, then hashCode() must return the same value for these objects. This means that subclasses of Object usually override either both methods or neither of them.

        Overrides:
        equals in class Object
        Parameters:
        other - the object to compare this instance with.
        Returns:
        true if the specified object is equal to this Object; false otherwise.
        See Also:
        Object.hashCode()
      • resetToOffset

        public abstract void resetToOffset​(int newOffset)
        Resets the iterator state and sets the position to the specified offset. Subclasses must implement, and must call the parent class method, or CollationIterator.reset().
      • getOffset

        public abstract int getOffset()
      • nextCE

        public final long nextCE()
        Returns the next collation element.
      • fetchCEs

        public final int fetchCEs()
        Fetches all CEs.
        Returns:
        getCEsLength()
      • previousCE

        public final long previousCE​(UVector32 offsets)
        Returns the previous collation element.
      • getCEsLength

        public final int getCEsLength()
      • getCE

        public final long getCE​(int i)
      • getCEs

        public final long[] getCEs()
      • clearCEsIfNoneRemaining

        public final void clearCEsIfNoneRemaining()
      • nextCodePoint

        public abstract int nextCodePoint()
        Returns the next code point (with post-increment). Public for identical-level comparison and for testing.
      • previousCodePoint

        public abstract int previousCodePoint()
        Returns the previous code point (with pre-decrement). Public for identical-level comparison and for testing.
      • reset

        protected final void reset()
      • reset

        protected final void reset​(boolean numeric)
        Resets the state as well as the numeric setting, and completes the initialization. Only exists in Java where we reset cached CollationIterator instances rather than stack-allocating temporary ones. (See also the constructor comments.)
      • handleNextCE32

        protected long handleNextCE32()
        Returns the next code point and its local CE32 value. Returns Collation.FALLBACK_CE32 at the end of the text (c<0) or when c's CE32 value is to be looked up in the base data (fallback). The code point is used for fallbacks, context and implicit weights. It is ignored when the returned CE32 is not special (e.g., FFFD_CE32). Returns the code point in bits 63..32 (signed) and the CE32 in bits 31..0.
      • makeCodePointAndCE32Pair

        protected long makeCodePointAndCE32Pair​(int c,
                                                int ce32)
      • handleGetTrailSurrogate

        protected char handleGetTrailSurrogate()
        Called when handleNextCE32() returns a LEAD_SURROGATE_TAG for a lead surrogate code unit. Returns the trail surrogate in that case and advances past it, if a trail surrogate follows the lead surrogate. Otherwise returns any other code unit and does not advance.
      • forbidSurrogateCodePoints

        protected boolean forbidSurrogateCodePoints()
        Returns:
        false if surrogate code points U+D800..U+DFFF map to their own implicit primary weights (for UTF-16), or true if they map to CE(U+FFFD) (for UTF-8)
      • forwardNumCodePoints

        protected abstract void forwardNumCodePoints​(int num)
      • backwardNumCodePoints

        protected abstract void backwardNumCodePoints​(int num)
      • getDataCE32

        protected int getDataCE32​(int c)
        Returns the CE32 from the data trie. Normally the same as data.getCE32(), but overridden in the builder. Call this only when the faster data.getCE32() cannot be used.
      • getCE32FromBuilderData

        protected int getCE32FromBuilderData​(int ce32)
      • appendCEsFromCE32

        protected final void appendCEsFromCE32​(CollationData d,
                                               int c,
                                               int ce32,
                                               boolean forward)
      • isLeadSurrogate

        protected static final boolean isLeadSurrogate​(int c)
      • isTrailSurrogate

        protected static final boolean isTrailSurrogate​(int c)