Class ModifiedUtf8


  • public class ModifiedUtf8
    extends Object
    Encoding and decoding methods for Modified UTF-8

    Modified UTF-8 is a simple variation of UTF-8 in which is encoded as 0xc0 0x80 . This avoids the presence of bytes 0 in the output.

    • Constructor Detail

      • ModifiedUtf8

        public ModifiedUtf8()
    • Method Detail

      • countBytes

        public static long countBytes​(String s,
                                      boolean shortLength)
                               throws UTFDataFormatException
        Count the number of bytes in the modified UTF-8 representation of s.

        Additionally, if shortLength is true, throw a UTFDataFormatException if the size cannot be presented in an (unsigned) java short.

        Throws:
        UTFDataFormatException
      • encode

        public static void encode​(byte[] dst,
                                  int offset,
                                  String s)
        Encode s into dst starting at offset offset.

        The output buffer is guaranteed to have enough space.

      • encode

        public static byte[] encode​(String s)
                             throws UTFDataFormatException
        Encodes s into a buffer with the following format:

        - the first two bytes of the buffer are the length of the modified-utf8 output (as a big endian short. A UTFDataFormatException is thrown if the encoded size cannot be represented as a short.

        - the remainder of the buffer contains the modified-utf8 output (equivalent to encode(buf, 2, s)).

        Throws:
        UTFDataFormatException
      • decode

        public static String decode​(byte[] in,
                                    char[] out,
                                    int offset,
                                    int length)
                             throws UTFDataFormatException
        Decodes length utf-8 bytes from in starting at offset offset to out,

        A maximum of length chars are written to the output starting at offset 0. out is assumed to have enough space for the output (a standard ArrayIndexOutOfBoundsException is thrown otherwise).

        If a ‘0’ byte is encountered, it is converted to U+0000.

        Throws:
        UTFDataFormatException