Class StringTracker
- java.lang.Object
-
- com.whylogs.core.statistics.datatypes.StringTracker
-
public final class StringTracker extends java.lang.Object
-
-
Field Summary
Fields Modifier and Type Field Description static org.apache.datasketches.ArrayOfStringsSerDeARRAY_OF_STRINGS_SER_DEstatic intMAX_FREQUENT_ITEM_SIZEstatic java.util.function.Function<java.lang.String,java.util.List<java.lang.String>>TOKENIZER
-
Constructor Summary
Constructors Constructor Description StringTracker()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static StringTrackerfromProtobuf(com.whylogs.core.message.StringsMessage message)StringTrackermerge(StringTracker other)Merge this StringTracker object with another.com.whylogs.core.message.StringsMessage.BuildertoProtobuf()voidupdate(java.lang.String value)Track statistical properties of characters in a string.voidupdate(java.lang.String value, java.lang.String charString)Track statistical properties of just the characters from a given character set.voidupdate(java.lang.String value, java.lang.String charString, java.util.function.Function<java.lang.String,java.util.List<java.lang.String>> tokenizer)Track statistical properties of a string.
-
-
-
Field Detail
-
TOKENIZER
public static java.util.function.Function<java.lang.String,java.util.List<java.lang.String>> TOKENIZER
-
ARRAY_OF_STRINGS_SER_DE
public static final org.apache.datasketches.ArrayOfStringsSerDe ARRAY_OF_STRINGS_SER_DE
-
MAX_FREQUENT_ITEM_SIZE
public static final int MAX_FREQUENT_ITEM_SIZE
- See Also:
- Constant Field Values
-
-
Method Detail
-
update
public void update(java.lang.String value)
Track statistical properties of characters in a string.`value` is a Unicode string. `value` is tokenized and tokens are passed to CharPosTracker for tracking of position and frequency of unicode codepoints in the token.
Variants of this function signature allow modification of tokenizer and tracked character set during updates. Unless overridden by one of the other update routines, uses a tokenizer that breaks strings at spaces, and tracks alphanumeric lowercase characters.
- Parameters:
value- string
-
update
public void update(java.lang.String value, java.lang.String charString)Track statistical properties of just the characters from a given character set.`value` is tokenized, and position and frequency of unicode codepoints within tokens are tracked if they appear in `charString`. If set, `charString` will be applied to subsequent calls to update, overriding the default character set.
- Parameters:
value- string Unicode string to be trackedcharString- string - Set of characters that should be tracked. all others will be tracked as 'NITL'
-
update
public void update(java.lang.String value, java.lang.String charString, java.util.function.Function<java.lang.String,java.util.List<java.lang.String>> tokenizer)Track statistical properties of a string. Allows control over characters to be tracked and tokenizer function.`value` is tokenized according to `tokenizer`. Position and frequency of unicode codepoints within tokens are tracked if they appear in `charString`. If set, `charString` and/or `tokenizer` will be used for subsequent calls to `update`
- Parameters:
value- stringcharString- string - Set of characters that should be tracked. all others will be tracked as 'NITL'tokenizer- function taking string and returning list of strings.
-
merge
public StringTracker merge(StringTracker other)
Merge this StringTracker object with another. This merges the sketches as well- Parameters:
other- the other String tracker to merge- Returns:
- a new StringTracker object
-
toProtobuf
public com.whylogs.core.message.StringsMessage.Builder toProtobuf()
-
fromProtobuf
public static StringTracker fromProtobuf(com.whylogs.core.message.StringsMessage message)
-
-