Class OffsetSimultaneousEncoder

java.lang.Object
io.confluent.parallelconsumer.OffsetSimultaneousEncoder

public class OffsetSimultaneousEncoder
extends java.lang.Object
Encode with multiple strategies at the same time.

Have results in an accessible structure, easily selecting the highest compression.

See Also:
invoke()
  • Field Summary

    Fields 
    Modifier and Type Field Description
    static boolean compressionForced
    Force the encoder to also add the compressed versions.
    static int LARGE_INPUT_MAP_SIZE_THRESHOLD
    Size threshold in bytes after which compressing the encodings will be compared, as it seems to be typically worth the extra compression step when beyond this size in the source array.
  • Constructor Summary

    Constructors 
    Constructor Description
    OffsetSimultaneousEncoder​(long lowWaterMark, java.lang.Long nextExpectedOffset, java.util.Set<java.lang.Long> incompleteOffsets)  
  • Method Summary

    Modifier and Type Method Description
    java.util.Map<OffsetEncoding,​byte[]> getEncodingMap()
    Map of different encoding types for the same offset data, used for retrieving the data for the encoding type
    java.util.Set<java.lang.Long> getIncompleteOffsets()
    The offsets which have not yet been fully completed and can't have their offset committed
    java.util.PriorityQueue<io.confluent.parallelconsumer.EncodedOffsetPair> getSortedEncodings()
    Ordered set of the the different encodings, used to quickly retrieve the most compressed encoding
    OffsetSimultaneousEncoder invoke()
    Highwater mark already encoded in string - OffsetMapCodecManager.makeOffsetMetadataPayload(long, org.apache.kafka.common.TopicPartition, java.util.Set<java.lang.Long>) - so encoding BitSet run length may not be needed, or could be swapped
    byte[] packSmallest()
    Select the smallest encoding, and pack it.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • LARGE_INPUT_MAP_SIZE_THRESHOLD

      public static final int LARGE_INPUT_MAP_SIZE_THRESHOLD
      Size threshold in bytes after which compressing the encodings will be compared, as it seems to be typically worth the extra compression step when beyond this size in the source array.
      See Also:
      Constant Field Values
    • compressionForced

      public static boolean compressionForced
      Force the encoder to also add the compressed versions. Useful for testing.

      Visible for testing.

  • Constructor Details

    • OffsetSimultaneousEncoder

      public OffsetSimultaneousEncoder​(long lowWaterMark, java.lang.Long nextExpectedOffset, java.util.Set<java.lang.Long> incompleteOffsets)
  • Method Details

    • invoke

      public OffsetSimultaneousEncoder invoke()
      Highwater mark already encoded in string - OffsetMapCodecManager.makeOffsetMetadataPayload(long, org.apache.kafka.common.TopicPartition, java.util.Set<java.lang.Long>) - so encoding BitSet run length may not be needed, or could be swapped

      Simultaneously encodes:

      Conditionaly encodes compression variants: Currently commented out is OffsetEncoding.ByteArray as there doesn't seem to be an advantage over BitSet encoding.

      TODO: optimisation - inline this into the partition iteration loop in WorkManager

      TODO: optimisation - could double the run-length range from Short.MAX_VALUE (~33,000) to Short.MAX_VALUE * 2 (~66,000) by using unsigned shorts instead (higest representable relative offset is Short.MAX_VALUE because each runlength entry is a Short)

      TODO VERY large offests ranges are slow (Integer.MAX_VALUE) - encoding scans could be avoided if passing in map of incompletes which should already be known

    • packSmallest

      public byte[] packSmallest() throws EncodingNotSupportedException
      Select the smallest encoding, and pack it.
      Throws:
      EncodingNotSupportedException
      See Also:
      packEncoding(EncodedOffsetPair)
    • getIncompleteOffsets

      public java.util.Set<java.lang.Long> getIncompleteOffsets()
      The offsets which have not yet been fully completed and can't have their offset committed
    • getEncodingMap

      public java.util.Map<OffsetEncoding,​byte[]> getEncodingMap()
      Map of different encoding types for the same offset data, used for retrieving the data for the encoding type
    • getSortedEncodings

      public java.util.PriorityQueue<io.confluent.parallelconsumer.EncodedOffsetPair> getSortedEncodings()
      Ordered set of the the different encodings, used to quickly retrieve the most compressed encoding
      See Also:
      packSmallest()