Package io.confluent.parallelconsumer
Class OffsetSimultaneousEncoder
java.lang.Object
io.confluent.parallelconsumer.OffsetSimultaneousEncoder
public class OffsetSimultaneousEncoder
extends java.lang.Object
Encode with multiple strategies at the same time.
Have results in an accessible structure, easily selecting the highest compression.
- See Also:
invoke()
-
Field Summary
Fields Modifier and Type Field Description static booleancompressionForcedForce the encoder to also add the compressed versions.static intLARGE_INPUT_MAP_SIZE_THRESHOLDSize threshold in bytes after which compressing the encodings will be compared, as it seems to be typically worth the extra compression step when beyond this size in the source array. -
Constructor Summary
Constructors Constructor Description OffsetSimultaneousEncoder(long lowWaterMark, java.lang.Long nextExpectedOffset, java.util.Set<java.lang.Long> incompleteOffsets) -
Method Summary
Modifier and Type Method Description java.util.Map<OffsetEncoding,byte[]>getEncodingMap()Map of different encoding types for the same offset data, used for retrieving the data for the encoding typejava.util.Set<java.lang.Long>getIncompleteOffsets()The offsets which have not yet been fully completed and can't have their offset committedjava.util.PriorityQueue<io.confluent.parallelconsumer.EncodedOffsetPair>getSortedEncodings()Ordered set of the the different encodings, used to quickly retrieve the most compressed encodingOffsetSimultaneousEncoderinvoke()Highwater mark already encoded in string -OffsetMapCodecManager.makeOffsetMetadataPayload(long, org.apache.kafka.common.TopicPartition, java.util.Set<java.lang.Long>)- so encoding BitSet run length may not be needed, or could be swappedbyte[]packSmallest()Select the smallest encoding, and pack it.
-
Field Details
-
LARGE_INPUT_MAP_SIZE_THRESHOLD
public static final int LARGE_INPUT_MAP_SIZE_THRESHOLDSize threshold in bytes after which compressing the encodings will be compared, as it seems to be typically worth the extra compression step when beyond this size in the source array.- See Also:
- Constant Field Values
-
compressionForced
public static boolean compressionForcedForce the encoder to also add the compressed versions. Useful for testing.Visible for testing.
-
-
Constructor Details
-
OffsetSimultaneousEncoder
public OffsetSimultaneousEncoder(long lowWaterMark, java.lang.Long nextExpectedOffset, java.util.Set<java.lang.Long> incompleteOffsets)
-
-
Method Details
-
invoke
Highwater mark already encoded in string -OffsetMapCodecManager.makeOffsetMetadataPayload(long, org.apache.kafka.common.TopicPartition, java.util.Set<java.lang.Long>)- so encoding BitSet run length may not be needed, or could be swapped Simultaneously encodes: Conditionaly encodes compression variants: Currently commented out isOffsetEncoding.ByteArrayas there doesn't seem to be an advantage over BitSet encoding.TODO: optimisation - inline this into the partition iteration loop in
WorkManagerTODO: optimisation - could double the run-length range from Short.MAX_VALUE (~33,000) to Short.MAX_VALUE * 2 (~66,000) by using unsigned shorts instead (higest representable relative offset is Short.MAX_VALUE because each runlength entry is a Short)
TODO VERY large offests ranges are slow (Integer.MAX_VALUE) - encoding scans could be avoided if passing in map of incompletes which should already be known
-
packSmallest
Select the smallest encoding, and pack it.- Throws:
EncodingNotSupportedException- See Also:
packEncoding(EncodedOffsetPair)
-
getIncompleteOffsets
public java.util.Set<java.lang.Long> getIncompleteOffsets()The offsets which have not yet been fully completed and can't have their offset committed -
getEncodingMap
Map of different encoding types for the same offset data, used for retrieving the data for the encoding type -
getSortedEncodings
public java.util.PriorityQueue<io.confluent.parallelconsumer.EncodedOffsetPair> getSortedEncodings()Ordered set of the the different encodings, used to quickly retrieve the most compressed encoding- See Also:
packSmallest()
-