Packages

package aggregate

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class ApproxPercentileFromTDigestExpr(child: Expression, percentiles: Either[Double, Array[Double]], finalDataType: DataType) extends Expression with GpuExpression with ShimExpression with Product with Serializable

    This expression computes an approximate percentile using a t-digest as input.

    This expression computes an approximate percentile using a t-digest as input.

    child

    Expression that produces the t-digests.

    percentiles

    Percentile scalar, or percentiles array to evaluate.

    finalDataType

    Data type for results

  2. trait CpuToGpuAggregateBufferConverter extends AnyRef
  3. trait CpuToGpuBufferTransition extends UnaryExpression with ShimUnaryExpression with CodegenFallback
  4. class CpuToGpuCollectBufferConverter extends CpuToGpuAggregateBufferConverter
  5. case class CpuToGpuCollectBufferTransition(child: Expression, elementType: DataType) extends UnaryExpression with CpuToGpuBufferTransition with Product with Serializable
  6. case class CpuToGpuPercentileBufferConverter(elementType: DataType) extends CpuToGpuAggregateBufferConverter with Product with Serializable

    Convert the incoming byte stream received from Spark CPU into internal histogram buffer format.

  7. case class CpuToGpuPercentileBufferTransition(child: Expression, elementType: DataType) extends UnaryExpression with CpuToGpuBufferTransition with Product with Serializable
  8. trait CudfAggregate extends Serializable
  9. class CudfCollectList extends CudfAggregate
  10. class CudfCollectSet extends CudfAggregate

    Spark handles NaN's equality by different way for non-nested float/double and float/double in nested types.

    Spark handles NaN's equality by different way for non-nested float/double and float/double in nested types. When we use non-nested versions of floats and doubles, NaN values are considered unequal, but when we collect sets of nested versions, NaNs are considered equal on the CPU. So we set NaNEquality dynamically in CudfCollectSet and CudfMergeSets. Note that dataType is ArrayType(child.dataType) here.

  11. class CudfCount extends CudfAggregate
  12. case class CudfHistogram(dataType: DataType) extends CudfAggregate with Product with Serializable
  13. class CudfM2 extends CudfAggregate
  14. class CudfMax extends CudfAggregate
  15. class CudfMaxBy extends CudfMaxMinByAggregate
  16. abstract class CudfMaxMinByAggregate extends CudfAggregate
  17. class CudfMean extends CudfAggregate

    This class is only used by the M2 class aggregates, do not confuse this with GpuAverage.

    This class is only used by the M2 class aggregates, do not confuse this with GpuAverage. In the future, this aggregate class should be removed and the mean values should be generated in the output of libcudf's M2 aggregate.

  18. case class CudfMergeHistogram(dataType: DataType) extends CudfAggregate with Product with Serializable
  19. class CudfMergeLists extends CudfAggregate
  20. class CudfMergeM2 extends CudfAggregate
  21. class CudfMergeSets extends CudfAggregate
  22. class CudfMin extends CudfAggregate
  23. class CudfMinBy extends CudfMaxMinByAggregate
  24. class CudfNthLikeAggregate extends CudfAggregate
  25. class CudfSum extends CudfAggregate
  26. class CudfTDigestMerge extends CudfAggregate
  27. class CudfTDigestUpdate extends CudfAggregate
  28. case class GpuAggregateExpression(origAggregateFunction: GpuAggregateFunction, mode: AggregateMode, isDistinct: Boolean, filter: Option[Expression], resultId: ExprId) extends Expression with GpuExpression with ShimExpression with GpuUnevaluable with Product with Serializable
  29. trait GpuAggregateFunction extends Expression with GpuExpression with ShimExpression with GpuUnevaluable

    Trait that all aggregate functions implement.

    Trait that all aggregate functions implement.

    Aggregates start with some input from the child plan or from another aggregate (or from itself if the aggregate is merging several batches).

    In general terms an aggregate function can be in one of two modes of operation: update or merge. Either the function is aggregating raw input, or it is merging previously aggregated data. Normally, Spark breaks up the processing of the aggregate in two exec nodes (a partial aggregate and a final), and the are separated by a shuffle boundary. That is not true for all aggregates, especially when looking at other flavors of Spark. What doesn't change is the core function of updating or merging. Note that an aggregate can merge right after an update is performed, as we have cases where input batches are update-aggregated and then a bigger batch is built by merging together those pre-aggregated inputs.

    Aggregates have an interface to Spark and that is defined by aggBufferAttributes. This collection of attributes must match the Spark equivalent of the aggregate, so that if half of the aggregate (update or merge) executes on the CPU, we can be compatible. The GpuAggregateFunction adds special steps to ensure that it can produce (and consume) batches in the shape of aggBufferAttributes.

    The general transitions that are implemented in the aggregate function are as follows:

    1) inputProjection -> updateAggregates: inputProjection creates a sequence of values that are operated on by the updateAggregates. The length of inputProjection must be the same as updateAggregates, and updateAggregates (cuDF aggregates) should be able to work with the product of the inputProjection (i.e. types are compatible)

    2) updateAggregates -> postUpdate: after the cuDF update aggregate, a post process step can (optionally) be performed. The postUpdate takes the output of updateAggregate that must match the order of columns and types as specified in aggBufferAttributes.

    3) postUpdate -> preMerge: preMerge prepares batches before going into the mergeAggregate. The preMerge step binds to aggBufferAttributes, so it can be used to transform Spark compatible batch to a batch that the cuDF merge aggregate expects. Its input has the same shape as that produced by postUpdate.

    4) mergeAggregates->postMerge: postMerge optionally transforms the output of the cuDF merge aggregate in two situations: 1 - The step is used to match the aggBufferAttributes references for partial aggregates where each partially aggregated batch is getting merged with AggHelper(merge=true) 2 - In a final aggregate where the merged batches are transformed to what evaluateExpression expects. For simple aggregates like sum or count, evaluateExpression is just aggBufferAttributes, but for more complex aggregates, it is an expression (see GpuAverage and GpuM2 subclasses) that relies on the merge step producing a columns in the shape of aggBufferAttributes.

  30. case class GpuApproximatePercentile(child: Expression, percentageExpression: GpuLiteral, accuracyExpression: GpuLiteral = ...) extends Expression with GpuAggregateFunction with Product with Serializable

    The ApproximatePercentile function returns the approximate percentile(s) of a column at the given percentage(s).

    The ApproximatePercentile function returns the approximate percentile(s) of a column at the given percentage(s). A percentile is a watermark value below which a given percentage of the column values fall. For example, the percentile of column col at percentage 50% is the median of column col.

    This function supports partial aggregation.

    The GPU implementation uses t-digest to perform the initial aggregation (see updateExpressions / mergeExpressions) and then applies the ApproxPercentileFromTDigestExpr expression to compute percentiles from the final t-digest (see evaluateExpression).

    There are two different data types involved here. The t-digests are a map of centroids (Map[mean: Double -> weight: Double]) represented as List[Struct[Double, Double]] and the final output is either a single double or an array of doubles, depending on whether the percentageExpression parameter is a single value or an array.

    child

    child expression that can produce column value with child.eval()

    percentageExpression

    Expression that represents a single percentage value or an array of percentage values. Each percentage value must be between 0.0 and 1.0.

    accuracyExpression

    Integer literal expression of approximation accuracy. Higher value yields better accuracy, the default value is DEFAULT_PERCENTILE_ACCURACY.

  31. case class GpuAssembleSumChunks(chunkAttrs: Seq[AttributeReference], dataType: DecimalType, nullOnOverflow: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable

    Reassembles a 128-bit value from four separate 64-bit sum results

    Reassembles a 128-bit value from four separate 64-bit sum results

    chunkAttrs

    attributes for the four 64-bit sum chunks ordered from least significant to most significant

    dataType

    output type of the reconstructed 128-bit value

    nullOnOverflow

    whether to produce null on overflows

  32. abstract class GpuAverage extends Expression with GpuAggregateFunction with GpuReplaceWindowFunction with Serializable
  33. case class GpuBasicAverage(child: Expression, dt: DataType) extends GpuAverage with Product with Serializable
  34. case class GpuBasicDecimalAverage(child: Expression, dt: DecimalType) extends GpuDecimalAverage with Product with Serializable
  35. case class GpuBasicDecimalSum(child: Expression, dt: DecimalType, failOnErrorOverride: Boolean) extends GpuDecimalSum with Product with Serializable

    Sum aggregations for decimals up to and including DECIMAL64

  36. case class GpuBasicMax(child: Expression) extends GpuMax with Product with Serializable

    Max aggregation without Nan handling

  37. case class GpuBasicMin(child: Expression) extends GpuMin with Product with Serializable

    Min aggregation without Nan handling

  38. case class GpuBasicSum(child: Expression, resultType: DataType, failOnErrorOverride: Boolean) extends GpuSum with Product with Serializable

    Sum aggregation for non-decimal types

  39. case class GpuCheckOverflowAfterSum(data: Expression, isEmpty: Expression, dataType: DecimalType, nullOnOverflow: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable

    This is equivalent to what Spark does after a sum to check for overflow If(isEmpty, Literal.create(null, resultType), CheckOverflowInSum(sum, d, !SQLConf.get.ansiEnabled))

    This is equivalent to what Spark does after a sum to check for overflow If(isEmpty, Literal.create(null, resultType), CheckOverflowInSum(sum, d, !SQLConf.get.ansiEnabled))

    But we are renaming it to avoid confusion with the overflow detection we do as a part of sum itself that takes the place of the overflow checking that happens with add.

  40. trait GpuCollectBase extends Expression with GpuAggregateFunction with GpuDeterministicFirstLastCollectShim with GpuAggregateWindowFunction
  41. case class GpuCollectList(child: Expression, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends Expression with GpuCollectBase with Product with Serializable

    Collects and returns a list of non-unique elements.

    Collects and returns a list of non-unique elements.

    The two 'offset' parameters are not used by GPU version, but are here for the compatibility with the CPU version and automated checks.

  42. case class GpuCollectSet(child: Expression, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends Expression with GpuCollectBase with GpuUnboundedToUnboundedWindowAgg with Product with Serializable

    Collects and returns a set of unique elements.

    Collects and returns a set of unique elements.

    The two 'offset' parameters are not used by GPU version, but are here for the compatibility with the CPU version and automated checks.

  43. case class GpuCount(children: Seq[Expression], failOnError: Boolean = SQLConf.get.ansiEnabled) extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuUnboundToUnboundWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Product with Serializable
  44. case class GpuCreateHistogramIfValid(valuesExpr: Expression, frequenciesExpr: Expression, isReduction: Boolean, outputType: DataType) extends Expression with GpuExpression with ShimExpression with Product with Serializable

    Create a histogram buffer from the input values and frequencies.

    Create a histogram buffer from the input values and frequencies.

    The frequencies are also checked to ensure that they are non-negative. If a negative frequency exists, an exception will be thrown.

  45. case class GpuDecimal128Average(child: Expression, dt: DecimalType) extends GpuDecimalAverage with Product with Serializable

    Average aggregations for DECIMAL128.

    Average aggregations for DECIMAL128.

    To avoid the significantly slower sort-based aggregations in cudf for DECIMAL128 columns, the incoming DECIMAL128 values are split into four 32-bit chunks which are summed separately into 64-bit intermediate results and then recombined into a 128-bit result with overflow checking. See GpuDecimal128Sum for more details.

  46. case class GpuDecimal128Sum(child: Expression, dt: DecimalType, failOnErrorOverride: Boolean, forceWindowSumToNotBeReplaced: Boolean) extends GpuDecimalSum with GpuReplaceWindowFunction with Product with Serializable

    Sum aggregations for DECIMAL128.

    Sum aggregations for DECIMAL128.

    The sum aggregation is performed by splitting the original 128-bit values into 32-bit "chunks" and summing those. The chunking accomplishes two things. First, it helps avoid cudf resorting to a much slower aggregation since currently DECIMAL128 sums are only implemented for sort-based aggregations. Second, chunking allows detection of overflows.

    The chunked approach to sum aggregation works as follows. The 128-bit value is split into its four 32-bit chunks, with the most significant chunk being an INT32 and the remaining three chunks being UINT32. When these are sum aggregated, cudf will implicitly upscale the accumulated result to a 64-bit value. Since cudf only allows up to 2**31 rows to be aggregated at a time, the "extra" upper 32-bits of the upscaled 64-bit accumulation values will be enough to hold the worst-case "carry" bits from summing each 32-bit chunk.

    After the cudf aggregation has completed, the four 64-bit chunks are reassembled into a 128-bit value. The lowest 32-bits of the least significant 64-bit chunk are used directly as the lowest 32-bits of the final value, and the remaining 32-bits are added to the next most significant 64-bit chunk. The lowest 32-bits of that chunk then become the next 32-bits of the 128-bit value and the remaining 32-bits are added to the next 64-bit chunk, and so on. Finally after the 128-bit value is constructed, the remaining "carry" bits of the most significant chunk after reconstruction are checked against the sign bit of the 128-bit result to see if there was an overflow.

  47. abstract class GpuDecimalAverage extends GpuDecimalAverageBase
  48. abstract class GpuDecimalAverageBase extends GpuAverage
  49. abstract class GpuDecimalSum extends GpuSum
  50. case class GpuDecimalSumHighDigits(input: Expression, originalInputType: DecimalType) extends Expression with GpuExpression with ShimExpression with Product with Serializable

    This extracts the highest digits from a Decimal value as a part of doing a SUM.

  51. case class GpuExtractChunk32(data: Expression, chunkIdx: Int, replaceNullsWithZero: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable

    Extracts a 32-bit chunk from a 128-bit value

    Extracts a 32-bit chunk from a 128-bit value

    data

    expression producing 128-bit values

    chunkIdx

    index of chunk to extract (0-3)

    replaceNullsWithZero

    whether to replace nulls with zero

  52. case class GpuFirst(child: Expression, ignoreNulls: Boolean) extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuDeterministicFirstLastCollectShim with ImplicitCastInputTypes with Serializable with Product
  53. case class GpuFloatMax(child: Expression) extends GpuMax with GpuReplaceWindowFunction with Product with Serializable

    Max aggregation for FloatType and DoubleType to handle Nans.

    Max aggregation for FloatType and DoubleType to handle Nans.

    In Spark, Nan is the max float value, however in cuDF, the calculation involving Nan is undefined. We design a workaround method here to match the Spark's behaviour. The high level idea is that, in the projection stage, we create another column isNan. If any value in this column is true, return Nan, Else, return what GpuBasicMax returns.

  54. case class GpuFloatMin(child: Expression) extends GpuMin with GpuReplaceWindowFunction with Product with Serializable

    GpuMin for FloatType and DoubleType to handle Nans.

    GpuMin for FloatType and DoubleType to handle Nans.

    In Spark, Nan is the max float value, however in cuDF, the calculation involving Nan is undefined. We design a workaround method here to match the Spark's behaviour. The high level idea is: if the column contains only Nans or nulls then if the column contains Nan then return Nan else return null else replace all Nans with nulls; use cuDF kernel to find the min value

  55. case class GpuLast(child: Expression, ignoreNulls: Boolean) extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuDeterministicFirstLastCollectShim with ImplicitCastInputTypes with Serializable with Product
  56. abstract class GpuM2 extends Expression with GpuAggregateFunction with ImplicitCastInputTypes with Serializable

    Base class for overriding standard deviation and variance aggregations.

    Base class for overriding standard deviation and variance aggregations. This is also a GPU-based implementation of 'CentralMomentAgg' aggregation class in Spark with the fixed 'momentOrder' variable set to '2'.

  57. abstract class GpuMax extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuUnboundToUnboundWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Serializable
  58. case class GpuMaxBy(valueExpr: Expression, orderingExpr: Expression) extends GpuMaxMinByBase with Product with Serializable
  59. abstract class GpuMaxMinByBase extends Expression with GpuAggregateFunction with Serializable
  60. abstract class GpuMin extends Expression with GpuAggregateFunction with GpuBatchedRunningWindowWithFixer with GpuUnboundToUnboundWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Serializable
  61. case class GpuMinBy(valueExpr: Expression, orderingExpr: Expression) extends GpuMaxMinByBase with Product with Serializable
  62. case class GpuNthValue(child: Expression, offset: Expression, ignoreNulls: Boolean) extends Expression with GpuAggregateWindowFunction with GpuBatchedRunningWindowWithFixer with ImplicitCastInputTypes with Serializable with Product
  63. abstract class GpuPercentile extends Expression with GpuAggregateFunction with Serializable
  64. case class GpuPercentileDefault(childExpr: Expression, percentage: GpuLiteral, isReduction: Boolean) extends GpuPercentile with Product with Serializable

    Compute percentiles from just the input values.

  65. case class GpuPercentileEvaluation(childExpr: Expression, percentage: Either[Double, Array[Double]], outputType: DataType, isReduction: Boolean) extends Expression with GpuExpression with ShimExpression with Product with Serializable

    Perform the final evaluation step to compute percentiles from histograms.

  66. case class GpuPercentileWithFrequency(childExpr: Expression, percentage: GpuLiteral, frequencyExpr: Expression, isReduction: Boolean) extends GpuPercentile with Product with Serializable

    Compute percentiles from the input values associated with frequencies.

  67. case class GpuPivotFirst(pivotColumn: Expression, valueColumn: Expression, pivotColumnValues: Seq[Any]) extends Expression with GpuAggregateFunction with Product with Serializable
  68. case class GpuReplaceNullmask(input: Expression, mask: Expression) extends Expression with GpuExpression with ShimExpression with Product with Serializable
  69. case class GpuStddevPop(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with Product with Serializable
  70. case class GpuStddevSamp(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with GpuReplaceWindowFunction with Product with Serializable
  71. abstract class GpuSum extends Expression with GpuAggregateFunction with ImplicitCastInputTypes with GpuBatchedRunningWindowWithFixer with GpuAggregateWindowFunction with GpuRunningWindowFunction with Serializable
  72. trait GpuToCpuAggregateBufferConverter extends AnyRef
  73. trait GpuToCpuBufferTransition extends UnaryExpression with ShimUnaryExpression with CodegenFallback
  74. class GpuToCpuCollectBufferConverter extends GpuToCpuAggregateBufferConverter
  75. case class GpuToCpuCollectBufferTransition(child: Expression) extends UnaryExpression with GpuToCpuBufferTransition with Product with Serializable
  76. case class GpuToCpuPercentileBufferConverter(elementType: DataType) extends GpuToCpuAggregateBufferConverter with Product with Serializable

    Convert the internal histogram buffer into a byte stream that can be deserialized by Spark CPU.

  77. case class GpuToCpuPercentileBufferTransition(child: Expression, elementType: DataType) extends UnaryExpression with GpuToCpuBufferTransition with Product with Serializable
  78. case class GpuVariancePop(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with Product with Serializable
  79. case class GpuVarianceSamp(child: Expression, nullOnDivideByZero: Boolean) extends GpuM2 with Product with Serializable
  80. case class WindowStddevSamp(child: Expression, nullOnDivideByZero: Boolean) extends Expression with GpuAggregateWindowFunction with Product with Serializable
  81. case class WrappedAggFunction(aggregateFunction: GpuAggregateFunction, filter: Expression) extends Expression with GpuAggregateFunction with Product with Serializable

Value Members

  1. object CudfAll

    Check if all values in a boolean column are trues.

    Check if all values in a boolean column are trues. The CUDF all aggregation does not work for reductions or group by aggregations so we use Min as a workaround for this.

  2. object CudfAny

    Check if there is a true value in a boolean column.

    Check if there is a true value in a boolean column. The CUDF any aggregation does not work for reductions or group by aggregations so we use Max as a workaround for this.

  3. object CudfMaxMinBy
  4. object CudfNthLikeAggregate extends Serializable
  5. object CudfTDigest
  6. object GpuAverage extends Serializable
  7. object GpuDecimalSumOverflow

    All decimal processing in Spark has overflow detection as a part of it.

    All decimal processing in Spark has overflow detection as a part of it. Either it replaces the value with a null in non-ANSI mode, or it throws an exception in ANSI mode. Spark will also do the processing for larger values as Decimal values which are based on BigDecimal and have unbounded precision. So in most cases it is impossible to overflow/underflow so much that an incorrect value is returned. Spark will just use more and more memory to hold the value and then check for overflow at some point when the result needs to be turned back into a 128-bit value.

    We cannot do the same thing. Instead we take three strategies to detect overflow.

    1. For decimal values with a precision of 8 or under we follow Spark and do the SUM on the unscaled value as a long, and then bit-cast the result back to a Decimal value. this means that we can SUM 174,467,442,481 maximum or minimum decimal values with a precision of 8 before overflow can no longer be detected. It is much higher for decimal values with a smaller precision. 2. For decimal values with a precision from 9 to 20 inclusive we sum them as 128-bit values. this is very similar to what we do in the first strategy. The main differences are that we use a 128-bit value when doing the sum, and we check for overflow after processing each batch. In the case of group-by and reduction that happens after the update stage and also after each merge stage. This gives us enough room that we can always detect overflow when summing a single batch. Even on a merge where we could be doing the aggregation on a batch that has all max output values in it. 3. For values from 21 to 28 inclusive we have enough room to not check for overflow on teh update aggregation, but for the merge aggregation we need to do some extra checks. This is done by taking the digits above 28 and sum them separately. We then check to see if they would have overflowed the original limits. This lets us detect overflow in cases where the original value would have wrapped around. The reason this works is because we have a hard limit on the maximum number of values in a single batch being processed. Int.MaxValue, or about 2.2 billion values. So we use a precision on the higher values that is large enough to handle 2.2 billion values and still detect overflow. This equates to a precision of about 10 more than is needed to hold the higher digits. This effectively gives us unlimited overflow detection. 4. For anything larger than precision 28 we do the same overflow detection for strategy 3, but also do it on the update aggregation. This lets us fully detect overflows in any stage of an aggregation.

    Note that for Window operations either there is no merge stage or it only has a single value being merged into a batch instead of an entire batch being merged together. This lets us handle the overflow detection with what is built into GpuAdd.

  8. object GpuMax extends Serializable
  9. object GpuMin extends Serializable
  10. object GpuPercentile extends Serializable
  11. object GpuSum extends Serializable

Ungrouped