Class StringTracker


  • public final class StringTracker
    extends java.lang.Object
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static org.apache.datasketches.ArrayOfStringsSerDe ARRAY_OF_STRINGS_SER_DE  
      static int MAX_FREQUENT_ITEM_SIZE  
      static java.util.function.Function<java.lang.String,​java.util.List<java.lang.String>> TOKENIZER  
    • Constructor Summary

      Constructors 
      Constructor Description
      StringTracker()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static StringTracker fromProtobuf​(com.whylogs.core.message.StringsMessage message)  
      StringTracker merge​(StringTracker other)
      Merge this StringTracker object with another.
      com.whylogs.core.message.StringsMessage.Builder toProtobuf()  
      void update​(java.lang.String value)
      Track statistical properties of characters in a string.
      void update​(java.lang.String value, java.lang.String charString)
      Track statistical properties of just the characters from a given character set.
      void update​(java.lang.String value, java.lang.String charString, java.util.function.Function<java.lang.String,​java.util.List<java.lang.String>> tokenizer)
      Track statistical properties of a string.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • TOKENIZER

        public static java.util.function.Function<java.lang.String,​java.util.List<java.lang.String>> TOKENIZER
      • ARRAY_OF_STRINGS_SER_DE

        public static final org.apache.datasketches.ArrayOfStringsSerDe ARRAY_OF_STRINGS_SER_DE
    • Constructor Detail

      • StringTracker

        public StringTracker()
    • Method Detail

      • update

        public void update​(java.lang.String value)
        Track statistical properties of characters in a string.

        `value` is a Unicode string. `value` is tokenized and tokens are passed to CharPosTracker for tracking of position and frequency of unicode codepoints in the token.

        Variants of this function signature allow modification of tokenizer and tracked character set during updates. Unless overridden by one of the other update routines, uses a tokenizer that breaks strings at spaces, and tracks alphanumeric lowercase characters.

        Parameters:
        value - string
      • update

        public void update​(java.lang.String value,
                           java.lang.String charString)
        Track statistical properties of just the characters from a given character set.

        `value` is tokenized, and position and frequency of unicode codepoints within tokens are tracked if they appear in `charString`. If set, `charString` will be applied to subsequent calls to update, overriding the default character set.

        Parameters:
        value - string Unicode string to be tracked
        charString - string - Set of characters that should be tracked. all others will be tracked as 'NITL'
      • update

        public void update​(java.lang.String value,
                           java.lang.String charString,
                           java.util.function.Function<java.lang.String,​java.util.List<java.lang.String>> tokenizer)
        Track statistical properties of a string. Allows control over characters to be tracked and tokenizer function.

        `value` is tokenized according to `tokenizer`. Position and frequency of unicode codepoints within tokens are tracked if they appear in `charString`. If set, `charString` and/or `tokenizer` will be used for subsequent calls to `update`

        Parameters:
        value - string
        charString - string - Set of characters that should be tracked. all others will be tracked as 'NITL'
        tokenizer - function taking string and returning list of strings.
      • merge

        public StringTracker merge​(StringTracker other)
        Merge this StringTracker object with another. This merges the sketches as well
        Parameters:
        other - the other String tracker to merge
        Returns:
        a new StringTracker object
      • toProtobuf

        public com.whylogs.core.message.StringsMessage.Builder toProtobuf()
      • fromProtobuf

        public static StringTracker fromProtobuf​(com.whylogs.core.message.StringsMessage message)