Class UCharacterProperty


  • public final class UCharacterProperty
    extends Object

    Internal class used for Unicode character property database.

    This classes store binary data read from uprops.icu. It does not have the capability to parse the data into more high-level information. It only returns bytes of information when required.

    Due to the form most commonly used for retrieval, array of char is used to store the binary data.

    UCharacterPropertyDB also contains information on accessing indexes to significant points in the binary data.

    Responsibility for molding the binary data into more meaning form lies on UCharacter.

    Since:
    release 2.1, february 1st 2002
    • Field Detail

      • m_trie_

        public Trie2_16 m_trie_
        Trie data
      • m_unicodeVersion_

        public VersionInfo m_unicodeVersion_
        Unicode version
      • LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_

        public static final char LATIN_CAPITAL_LETTER_I_WITH_DOT_ABOVE_
        Latin capital letter i with dot above
        See Also:
        Constant Field Values
      • LATIN_SMALL_LETTER_DOTLESS_I_

        public static final char LATIN_SMALL_LETTER_DOTLESS_I_
        Latin small letter i with dot above
        See Also:
        Constant Field Values
      • LATIN_SMALL_LETTER_I_

        public static final char LATIN_SMALL_LETTER_I_
        Latin lowercase i
        See Also:
        Constant Field Values
      • SRC_NONE

        public static final int SRC_NONE
        No source, not a supported property.
        See Also:
        Constant Field Values
      • SRC_CHAR

        public static final int SRC_CHAR
        From uchar.c/uprops.icu main trie
        See Also:
        Constant Field Values
      • SRC_PROPSVEC

        public static final int SRC_PROPSVEC
        From uchar.c/uprops.icu properties vectors trie
        See Also:
        Constant Field Values
      • SRC_BIDI

        public static final int SRC_BIDI
        From ubidi_props.c/ubidi.icu
        See Also:
        Constant Field Values
      • SRC_CHAR_AND_PROPSVEC

        public static final int SRC_CHAR_AND_PROPSVEC
        From uchar.c/uprops.icu main trie as well as properties vectors trie
        See Also:
        Constant Field Values
      • SRC_CASE_AND_NORM

        public static final int SRC_CASE_AND_NORM
        From ucase.c/ucase.icu as well as unorm.cpp/unorm.icu
        See Also:
        Constant Field Values
      • SRC_NFC

        public static final int SRC_NFC
        From normalizer2impl.cpp/nfc.nrm
        See Also:
        Constant Field Values
      • SRC_NFKC

        public static final int SRC_NFKC
        From normalizer2impl.cpp/nfkc.nrm
        See Also:
        Constant Field Values
      • SRC_NFKC_CF

        public static final int SRC_NFKC_CF
        From normalizer2impl.cpp/nfkc_cf.nrm
        See Also:
        Constant Field Values
      • SRC_NFC_CANON_ITER

        public static final int SRC_NFC_CANON_ITER
        From normalizer2impl.cpp/nfc.nrm canonical iterator data
        See Also:
        Constant Field Values
      • SRC_COUNT

        public static final int SRC_COUNT
        One more than the highest UPropertySource (SRC_) constant.
        See Also:
        Constant Field Values
      • m_scriptExtensions_

        public char[] m_scriptExtensions_
        Script_Extensions data
      • SCRIPT_X_MASK

        public static final int SCRIPT_X_MASK
        Script_Extensions: mask includes Script
        See Also:
        Constant Field Values
      • SCRIPT_MASK_

        public static final int SCRIPT_MASK_
        Integer properties mask and shift values for scripts. Equivalent to icu4c UPROPS_SHIFT_MASK
        See Also:
        Constant Field Values
      • SCRIPT_X_WITH_INHERITED

        public static final int SCRIPT_X_WITH_INHERITED
        See Also:
        Constant Field Values
    • Method Detail

      • getProperty

        public final int getProperty​(int ch)
        Gets the main property value for code point ch.
        Parameters:
        ch - code point whose property value is to be retrieved
        Returns:
        property value of code point
      • getAdditional

        public int getAdditional​(int codepoint,
                                 int column)
        Gets the unicode additional properties. Java version of C u_getUnicodeProperties().
        Parameters:
        codepoint - codepoint whose additional properties is to be retrieved
        column - The column index.
        Returns:
        unicode properties
      • getAge

        public VersionInfo getAge​(int codepoint)

        Get the "age" of the code point.

        The "age" is the Unicode version when the code point was first designated (as a non-character or for Private Use) or assigned a character.

        This can be useful to avoid emitting code points to receiving processes that do not accept newer characters.

        The data is from the UCD file DerivedAge.txt.

        This API does not check the validity of the codepoint.

        Parameters:
        codepoint - The code point.
        Returns:
        the Unicode version number
      • hasBinaryProperty

        public boolean hasBinaryProperty​(int c,
                                         int which)
      • getType

        public int getType​(int c)
      • getIntPropertyValue

        public int getIntPropertyValue​(int c,
                                       int which)
      • getIntPropertyMaxValue

        public int getIntPropertyMaxValue​(int which)
      • getSource

        public final int getSource​(int which)
      • getMaxValues

        public int getMaxValues​(int column)
        Get the the maximum values for some enum/int properties.
        Returns:
        maximum values for the integer properties.
      • getMask

        public static final int getMask​(int type)
        Gets the type mask
        Parameters:
        type - character type
        Returns:
        mask
      • getEuropeanDigit

        public static int getEuropeanDigit​(int ch)
        Returns the digit values of characters like 'A' - 'Z', normal, half-width and full-width. This method assumes that the other digit characters are checked by the calling method.
        Parameters:
        ch - character to test
        Returns:
        -1 if ch is not a character of the form 'A' - 'Z', otherwise its corresponding digit will be returned.
      • digit

        public int digit​(int c)
      • getNumericValue

        public int getNumericValue​(int c)
      • getUnicodeNumericValue

        public double getUnicodeNumericValue​(int c)
      • upropsvec_addPropertyStarts

        public void upropsvec_addPropertyStarts​(UnicodeSet set)