Class CSVSampler
java.lang.Object
io.nosqlbench.virtdata.library.basics.shared.distributions.CSVSampler
- All Implemented Interfaces:
java.util.function.LongFunction<java.lang.String>
public class CSVSampler
extends java.lang.Object
implements java.util.function.LongFunction<java.lang.String>
This function is a toolkit version of the
WeightedStringsFromCSV function.
It is more capable and should be the preferred function for alias sampling over any CSV data.
This sampler uses a named column in the CSV data as the value. This is also referred to as the
labelColumn. The frequency of this label depends on the weight assigned to it in another named
CSV column, known as the weightColumn.
Combining duplicate labels
When you have CSV data which is not organized around the specific identifier that you want to sample by, you can use some combining functions to tabulate these prior to sampling. In that case, you can use any of "sum", "avg", "count", "min", or "max" as the reducing function on the value in the weight column. If none are specified, then "sum" is used by default. All modes except "count" and "name" require a valid weight column to be specified.- sum, avg, min, max - takes the given stat for the weight of each distinct label
- count - takes the number of occurrences of a given label as the weight
- name - sets the weight of all distinct labels to 1.0d
Map vs Hash mode
As with some of the other statistical functions, you can use this one to pick through the sample values by using the map mode. This is distinct from the default hash mode. When map mode is used, the values will appear monotonically as you scan through the unit interval of all long values. Specifically, 0L represents 0.0d in the unit interval on input, and Long.MAX_VALUE represents 1.0 on the unit interval.) This mode is only recommended for advanced scenarios and should otherwise be avoided. You will know if you need this mode.-
Constructor Summary
Constructors Constructor Description CSVSampler(java.lang.String labelColumn, java.lang.String weightColumn, java.lang.String... data)Build an efficient O(1) sampler for the given column values with respect to the weights, combining equal values by summing the weights. -
Method Summary
Modifier and Type Method Description java.lang.Stringapply(long value)Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Constructor Details
-
CSVSampler
public CSVSampler(java.lang.String labelColumn, java.lang.String weightColumn, java.lang.String... data)Build an efficient O(1) sampler for the given column values with respect to the weights, combining equal values by summing the weights.- Parameters:
labelColumn- The CSV column name containing the valueweightColumn- The CSV column name containing a double weightdata- Sampling modes or file names. Any of map, hash, sum, avg, count are taken as configuration modes, and all others are taken as CSV filenames.
-
-
Method Details
-
apply
public java.lang.String apply(long value)- Specified by:
applyin interfacejava.util.function.LongFunction<java.lang.String>
-