Class UCharacterName


  • public final class UCharacterName
    extends Object
    Internal class to manage character names. Since data for names are stored in an array of char, by default indexes used in this class is refering to a 2 byte count, unless otherwise stated. Cases where the index is refering to a byte count, the index is halved and depending on whether the index is even or odd, the MSB or LSB of the result char at the halved index is returned. For indexes to an array of int, the index is multiplied by 2, result char at the multiplied index and its following char is returned as an int. UCharacter acts as a public facade for this class Note : 0 - 0x1F are control characters without names in Unicode 3.0
    Since:
    nov0700
    • Field Detail

      • LINES_PER_GROUP_

        public static final int LINES_PER_GROUP_
        Number of lines per group 1 << GROUP_SHIFT_
        See Also:
        Constant Field Values
      • m_groupcount_

        public int m_groupcount_
        Maximum number of groups
    • Method Detail

      • getName

        public String getName​(int ch,
                              int choice)
        Retrieve the name of a Unicode code point. Depending on choice, the character name written into the buffer is the "modern" name or the name that was defined in Unicode version 1.0. The name contains only "invariant" characters like A-Z, 0-9, space, and '-'.
        Parameters:
        ch - the code point for which to get the name.
        choice - Selector for which name to get.
        Returns:
        if code point is above 0x1fff, null is returned
      • getCharFromName

        public int getCharFromName​(int choice,
                                   String name)
        Find a character by its name and return its code point value
        Parameters:
        choice - selector to indicate if argument name is a Unicode 1.0 or the most current version
        name - the name to search for
        Returns:
        code point
      • getGroupLengths

        public int getGroupLengths​(int index,
                                   char[] offsets,
                                   char[] lengths)
        Reads a block of compressed lengths of 32 strings and expands them into offsets and lengths for each string. Lengths are stored with a variable-width encoding in consecutive nibbles: If a nibble<0xc, then it is the length itself (0 = empty string). If a nibble>=0xc, then it forms a length value with the following nibble. The offsets and lengths arrays must be at least 33 (one more) long because there is no check here at the end if the last nibble is still used.
        Parameters:
        index - of group string object in array
        offsets - array to store the value of the string offsets
        lengths - array to store the value of the string length
        Returns:
        next index of the data string immediately after the lengths in terms of byte address
      • getGroupName

        public String getGroupName​(int index,
                                   int length,
                                   int choice)
        Gets the name of the argument group index. UnicodeData.txt uses ';' as a field separator, so no field can contain ';' as part of its contents. In unames.icu, it is marked as token[';'] == -1 only if the semicolon is used in the data file - which is iff we have Unicode 1.0 names or ISO comments or aliases. So, it will be token[';'] == -1 if we store U1.0 names/ISO comments/aliases although we know that it will never be part of a name. Equivalent to ICU4C's expandName.
        Parameters:
        index - of the group name string in byte count
        length - of the group name string
        choice - of Unicode 1.0 name or the most current name
        Returns:
        name of the group
      • getExtendedName

        public String getExtendedName​(int ch)
        Retrieves the extended name
      • getGroup

        public int getGroup​(int codepoint)
        Gets the group index for the codepoint, or the group before it.
        Parameters:
        codepoint - The codepoint index.
        Returns:
        group index containing codepoint or the group before it.
      • getExtendedOr10Name

        public String getExtendedOr10Name​(int ch)
        Gets the extended and 1.0 name when the most current unicode names fail
        Parameters:
        ch - codepoint
        Returns:
        name of codepoint extended or 1.0
      • getGroupMSB

        public int getGroupMSB​(int gindex)
        Gets the MSB from the group index
        Parameters:
        gindex - group index
        Returns:
        the MSB of the group if gindex is valid, -1 otherwise
      • getCodepointMSB

        public static int getCodepointMSB​(int codepoint)
        Gets the MSB of the codepoint
        Parameters:
        codepoint - The codepoint value.
        Returns:
        the MSB of the codepoint
      • getGroupLimit

        public static int getGroupLimit​(int msb)
        Gets the maximum codepoint + 1 of the group
        Parameters:
        msb - most significant byte of the group
        Returns:
        limit codepoint of the group
      • getGroupMin

        public static int getGroupMin​(int msb)
        Gets the minimum codepoint of the group
        Parameters:
        msb - most significant byte of the group
        Returns:
        minimum codepoint of the group
      • getGroupOffset

        public static int getGroupOffset​(int codepoint)
        Gets the offset to a group
        Parameters:
        codepoint - The codepoint value.
        Returns:
        offset to a group
      • getGroupMinFromCodepoint

        public static int getGroupMinFromCodepoint​(int codepoint)
        Gets the minimum codepoint of a group
        Parameters:
        codepoint - The codepoint value.
        Returns:
        minimum codepoint in the group which codepoint belongs to
      • getAlgorithmLength

        public int getAlgorithmLength()
        Get the Algorithm range length
        Returns:
        Algorithm range length
      • getAlgorithmStart

        public int getAlgorithmStart​(int index)
        Gets the start of the range
        Parameters:
        index - algorithm index
        Returns:
        algorithm range start
      • getAlgorithmEnd

        public int getAlgorithmEnd​(int index)
        Gets the end of the range
        Parameters:
        index - algorithm index
        Returns:
        algorithm range end
      • getAlgorithmName

        public String getAlgorithmName​(int index,
                                       int codepoint)
        Gets the Algorithmic name of the codepoint
        Parameters:
        index - algorithmic range index
        codepoint - The codepoint value.
        Returns:
        algorithmic name of codepoint
      • getGroupName

        public String getGroupName​(int ch,
                                   int choice)
        Gets the group name of the character
        Parameters:
        ch - character to get the group name
        choice - name choice selector to choose a unicode 1.0 or newer name
      • getMaxCharNameLength

        public int getMaxCharNameLength()
        Gets the maximum length of any codepoint name. Equivalent to uprv_getMaxCharNameLength.
        Returns:
        the maximum length of any codepoint name
      • getMaxISOCommentLength

        public int getMaxISOCommentLength()
        Gets the maximum length of any iso comments. Equivalent to uprv_getMaxISOCommentLength.
        Returns:
        the maximum length of any codepoint name
      • getCharNameCharacters

        public void getCharNameCharacters​(UnicodeSet set)
        Fills set with characters that are used in Unicode character names. Equivalent to uprv_getCharNameCharacters.
        Parameters:
        set - USet to receive characters. Existing contents are deleted.
      • getISOCommentCharacters

        public void getISOCommentCharacters​(UnicodeSet set)
        Fills set with characters that are used in Unicode character names. Equivalent to uprv_getISOCommentCharacters.
        Parameters:
        set - USet to receive characters. Existing contents are deleted.