Packages

t

org.apache.spark.sql.rapids.aggregate

GpuAggregateFunction

trait GpuAggregateFunction extends Expression with GpuExpression with ShimExpression with GpuUnevaluable

Trait that all aggregate functions implement.

Aggregates start with some input from the child plan or from another aggregate (or from itself if the aggregate is merging several batches).

In general terms an aggregate function can be in one of two modes of operation: update or merge. Either the function is aggregating raw input, or it is merging previously aggregated data. Normally, Spark breaks up the processing of the aggregate in two exec nodes (a partial aggregate and a final), and the are separated by a shuffle boundary. That is not true for all aggregates, especially when looking at other flavors of Spark. What doesn't change is the core function of updating or merging. Note that an aggregate can merge right after an update is performed, as we have cases where input batches are update-aggregated and then a bigger batch is built by merging together those pre-aggregated inputs.

Aggregates have an interface to Spark and that is defined by aggBufferAttributes. This collection of attributes must match the Spark equivalent of the aggregate, so that if half of the aggregate (update or merge) executes on the CPU, we can be compatible. The GpuAggregateFunction adds special steps to ensure that it can produce (and consume) batches in the shape of aggBufferAttributes.

The general transitions that are implemented in the aggregate function are as follows:

1) inputProjection -> updateAggregates: inputProjection creates a sequence of values that are operated on by the updateAggregates. The length of inputProjection must be the same as updateAggregates, and updateAggregates (cuDF aggregates) should be able to work with the product of the inputProjection (i.e. types are compatible)

2) updateAggregates -> postUpdate: after the cuDF update aggregate, a post process step can (optionally) be performed. The postUpdate takes the output of updateAggregate that must match the order of columns and types as specified in aggBufferAttributes.

3) postUpdate -> preMerge: preMerge prepares batches before going into the mergeAggregate. The preMerge step binds to aggBufferAttributes, so it can be used to transform Spark compatible batch to a batch that the cuDF merge aggregate expects. Its input has the same shape as that produced by postUpdate.

4) mergeAggregates->postMerge: postMerge optionally transforms the output of the cuDF merge aggregate in two situations: 1 - The step is used to match the aggBufferAttributes references for partial aggregates where each partially aggregated batch is getting merged with AggHelper(merge=true) 2 - In a final aggregate where the merged batches are transformed to what evaluateExpression expects. For simple aggregates like sum or count, evaluateExpression is just aggBufferAttributes, but for more complex aggregates, it is an expression (see GpuAverage and GpuM2 subclasses) that relies on the merge step producing a columns in the shape of aggBufferAttributes.

Linear Supertypes
GpuUnevaluable, ShimExpression, GpuExpression, Expression, TreeNode[Expression], TreePatternBits, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. GpuAggregateFunction
  2. GpuUnevaluable
  3. ShimExpression
  4. GpuExpression
  5. Expression
  6. TreeNode
  7. TreePatternBits
  8. Product
  9. Equals
  10. AnyRef
  11. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def aggBufferAttributes: Seq[AttributeReference]

    This is the contract with the outside world.

    This is the contract with the outside world. It describes what the output of postUpdate should look like, and what the input to preMerge looks like. It also describes what the output of postMerge must look like.

  2. abstract def canEqual(that: Any): Boolean
    Definition Classes
    Equals
  3. abstract def children: Seq[Expression]
    Definition Classes
    TreeNode
  4. abstract def dataType: DataType
    Definition Classes
    Expression
  5. abstract val evaluateExpression: Expression

    This takes the output of postMerge computes the final result of the aggregation.

    This takes the output of postMerge computes the final result of the aggregation.

    Note

    evaluateExpression is bound to aggBufferAttributes, so the references used in evaluateExpression must also be used in aggBufferAttributes.

  6. abstract val initialValues: Seq[Expression]

    These are values that spark calls initial because it uses them to initialize the aggregation buffer, and returns them in case of an empty aggregate when there are no expressions.

    These are values that spark calls initial because it uses them to initialize the aggregation buffer, and returns them in case of an empty aggregate when there are no expressions.

    In our case they are only used in a very specific case: the empty input reduction case. In this case we don't have input to reduce, but we do have reduction functions, so each reduction function's initialValues is invoked to populate a single row of output.

  7. abstract val inputProjection: Seq[Expression]

    Using the child reference, define the shape of input batches sent to the update expressions

    Using the child reference, define the shape of input batches sent to the update expressions

    Note

    this can be thought of as "pre" update: as update consumes its output in order

  8. abstract val mergeAggregates: Seq[CudfAggregate]

    merge: second half of the aggregation.

    merge: second half of the aggregation. Also used to merge multiple batches in the update or merge stages. These cuDF aggregates consume the output of preMerge. The sequence of CudfAggregate must match the shape of aggBufferAttributes, and care must be taken to ensure that each cuDF aggregate is able to work with the corresponding input (i.e. aggBufferAttributes[i] is the input to mergeAggregates[i]). If a transformation is required, preMerge can be used to mutate the batches before they arrive at mergeAggregates.

  9. abstract def nullable: Boolean
    Definition Classes
    Expression
  10. abstract def productArity: Int
    Definition Classes
    Product
  11. abstract def productElement(n: Int): Any
    Definition Classes
    Product
  12. abstract val updateAggregates: Seq[CudfAggregate]

    update: first half of the aggregation The sequence of CudfAggregate must match the shape of inputProjections, and care must be taken to ensure that each cuDF aggregate is able to work with the corresponding inputProjection (i.e.

    update: first half of the aggregation The sequence of CudfAggregate must match the shape of inputProjections, and care must be taken to ensure that each cuDF aggregate is able to work with the corresponding inputProjection (i.e. inputProjection[i] is the input to updateAggregates[i]).

Concrete Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def apply(number: Int): TreeNode[_]
    Definition Classes
    TreeNode
  5. def argString(maxFields: Int): String
    Definition Classes
    TreeNode
  6. def asCode: String
    Definition Classes
    TreeNode
  7. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  8. lazy val canonicalized: Expression
    Definition Classes
    GpuExpression → Expression
  9. def checkInputDataTypes(): TypeCheckResult
    Definition Classes
    Expression
  10. def childrenResolved: Boolean
    Definition Classes
    Expression
  11. def clone(): Expression
    Definition Classes
    TreeNode → AnyRef
  12. def collect[B](pf: PartialFunction[Expression, B]): Seq[B]
    Definition Classes
    TreeNode
  13. def collectFirst[B](pf: PartialFunction[Expression, B]): Option[B]
    Definition Classes
    TreeNode
  14. def collectLeaves(): Seq[Expression]
    Definition Classes
    TreeNode
  15. final def columnarEval(batch: ColumnarBatch): GpuColumnVector

    Returns the result of evaluating this expression on the entire ColumnarBatch.

    Returns the result of evaluating this expression on the entire ColumnarBatch. The result of calling this is a GpuColumnVector.

    By convention any GpuColumnVector returned by columnarEval is owned by the caller and will need to be closed by them. This can happen by putting it into a ColumnarBatch and closing the batch or by closing the vector directly if it is a temporary value.

    Definition Classes
    GpuUnevaluableGpuExpression
  16. final def columnarEvalAny(batch: ColumnarBatch): Any

    Returns the result of evaluating this expression on the entire ColumnarBatch.

    Returns the result of evaluating this expression on the entire ColumnarBatch. The result of calling this may be a single GpuColumnVector or a scalar value. Scalar values typically happen if they are a part of the expression i.e. col("a") + 100. In this case the 100 is a literal that Add would have to be able to handle.

    By convention any AutoCloseable returned by columnarEvalAny is owned by the caller and will need to be closed by them.

    Definition Classes
    GpuUnevaluableGpuExpression
  17. final def containsAllPatterns(patterns: TreePattern*): Boolean
    Definition Classes
    TreePatternBits
  18. final def containsAnyPattern(patterns: TreePattern*): Boolean
    Definition Classes
    TreePatternBits
  19. lazy val containsChild: Set[TreeNode[_]]
    Definition Classes
    TreeNode
  20. final def containsPattern(t: TreePattern): Boolean
    Definition Classes
    TreePatternBits
    Annotations
    @inline()
  21. def convertToAst(numFirstTableColumns: Int): AstExpression

    Build an equivalent representation of this expression in a cudf AST.

    Build an equivalent representation of this expression in a cudf AST.

    numFirstTableColumns

    number of columns in the leftmost input table. Spark places the columns of all inputs in a single sequence, while cudf AST uses an explicit table reference to make column indices unique. This parameter helps translate input column references from Spark's single sequence into cudf's separate sequences.

    returns

    top node of the equivalent AST

    Definition Classes
    GpuExpression
  22. def copyTagsFrom(other: Expression): Unit
    Definition Classes
    TreeNode
  23. lazy val deterministic: Boolean
    Definition Classes
    Expression
  24. def disableCoalesceUntilInput(): Boolean

    Override this if your expression cannot allow combining of data from multiple files into a single batch before it operates on them.

    Override this if your expression cannot allow combining of data from multiple files into a single batch before it operates on them. These are for things like getting the input file name. Which for spark is stored in a thread local variable which means we have to jump through some hoops to make this work.

    Definition Classes
    GpuExpression
  25. def disableTieredProjectCombine: Boolean

    If this returns true then tiered project will stop looking to combine expressions when this is seen.

    If this returns true then tiered project will stop looking to combine expressions when this is seen.

    Definition Classes
    GpuExpression
  26. final def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode
    Definition Classes
    GpuExpression → Expression
  27. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  28. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  29. final def eval(input: InternalRow = null): Any
    Definition Classes
    GpuExpression → Expression
  30. def fastEquals(other: TreeNode[_]): Boolean
    Definition Classes
    TreeNode
  31. def filteredInputProjection(filter: Expression): Seq[Expression]
  32. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  33. def find(f: (Expression) ⇒ Boolean): Option[Expression]
    Definition Classes
    TreeNode
  34. def flatArguments: Iterator[Any]
    Attributes
    protected
    Definition Classes
    Expression
  35. def flatMap[A](f: (Expression) ⇒ TraversableOnce[A]): Seq[A]
    Definition Classes
    TreeNode
  36. final def foldable: Boolean

    An aggregate function is not foldable.

    An aggregate function is not foldable.

    Definition Classes
    GpuAggregateFunction → Expression
  37. def foreach(f: (Expression) ⇒ Unit): Unit
    Definition Classes
    TreeNode
  38. def foreachUp(f: (Expression) ⇒ Unit): Unit
    Definition Classes
    TreeNode
  39. def genCode(ctx: CodegenContext): ExprCode
    Definition Classes
    Expression
  40. def generateTreeString(depth: Int, lastChildren: Seq[Boolean], append: (String) ⇒ Unit, verbose: Boolean, prefix: String, addSuffix: Boolean, maxFields: Int, printNodeId: Boolean, indent: Int): Unit
    Definition Classes
    TreeNode
  41. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  42. def getDefaultTreePatternBits: BitSet
    Attributes
    protected
    Definition Classes
    TreeNode
  43. def getTagValue[T](tag: TreeNodeTag[T]): Option[T]
    Definition Classes
    TreeNode
  44. def hasSideEffects: Boolean

    Could evaluating this expression cause side-effects, such as throwing an exception?

    Could evaluating this expression cause side-effects, such as throwing an exception?

    Definition Classes
    GpuExpression
  45. def hashCode(): Int
    Definition Classes
    TreeNode → AnyRef → Any
  46. def innerChildren: Seq[TreeNode[_]]
    Definition Classes
    TreeNode
  47. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  48. def isRuleIneffective(ruleId: RuleId): Boolean
    Attributes
    protected
    Definition Classes
    TreeNode
  49. def jsonFields: List[JField]
    Attributes
    protected
    Definition Classes
    TreeNode
  50. final def legacyWithNewChildren(newChildren: Seq[Expression]): Expression
    Attributes
    protected
    Definition Classes
    TreeNode
  51. def makeCopy(newArgs: Array[AnyRef]): Expression
    Definition Classes
    TreeNode
  52. def map[A](f: (Expression) ⇒ A): Seq[A]
    Definition Classes
    TreeNode
  53. def mapChildren(f: (Expression) ⇒ Expression): Expression
    Definition Classes
    TreeNode
  54. def mapProductIterator[B](f: (Any) ⇒ B)(implicit arg0: ClassTag[B]): Array[B]
    Attributes
    protected
    Definition Classes
    TreeNode
  55. def markRuleAsIneffective(ruleId: RuleId): Unit
    Attributes
    protected
    Definition Classes
    TreeNode
  56. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  57. def nodeName: String
    Definition Classes
    TreeNode
  58. val nodePatterns: Seq[TreePattern]
    Attributes
    protected
    Definition Classes
    TreeNode
  59. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  60. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  61. def numberedTreeString: String
    Definition Classes
    TreeNode
  62. val origin: Origin
    Definition Classes
    TreeNode
  63. def otherCopyArgs: Seq[AnyRef]
    Attributes
    protected
    Definition Classes
    TreeNode
  64. def p(number: Int): Expression
    Definition Classes
    TreeNode
  65. lazy val postMerge: Seq[Expression]
  66. final lazy val postMergeAttr: Seq[AttributeReference]

    This is the last aggregation step, which optionally changes the result of the mergeAggregate.

    This is the last aggregation step, which optionally changes the result of the mergeAggregate. postMergeAttr: matches the order (and types) of mergeAggregates postMerge: binds to postMergeAttr and defines an expression that results in what Spark expects from the merge. We set this to postMergeAttr by default, for the pass through case (like in postUpdate). GpuM2 is the exception, where postMerge mutates the result of the mergeAggregates to output what Spark expects.

  67. lazy val postUpdate: Seq[Expression]
  68. final lazy val postUpdateAttr: Seq[AttributeReference]

    This is the last step in the update phase.

    This is the last step in the update phase. It can optionally modify the result of the cuDF update aggregates, or be a pass-through. postUpdateAttr: matches the order (and types) of updateAggregates postUpdate: binds to postUpdateAttr and defines an expression that results in what Spark expects from the update. By default this is postUpdateAttr, as it should match the shape of the Spark agg buffer leaving cuDF, but in the M2 and Count cases we overwrite it, because the cuDF shape isn't what Spark expects.

  69. lazy val preMerge: Seq[Expression]

    This step is the first step into the merge phase.

    This step is the first step into the merge phase. It can optionally modify the result of the postUpdate before it goes into the cuDF merge aggregation. preMerge: modify a partial batch to match the input required by a merge aggregate

    This always binds to aggBufferAttributes as that is the inbound schema for this aggregate from Spark. If it is set to aggBufferAttributes by default so the bind behaves like a pass through in most cases.

  70. def prettyJson: String
    Definition Classes
    TreeNode
  71. def prettyName: String
    Definition Classes
    Expression
  72. def productIterator: Iterator[Any]
    Definition Classes
    Product
  73. def productPrefix: String
    Definition Classes
    Product
  74. def references: AttributeSet
    Definition Classes
    Expression
  75. lazy val resolved: Boolean
    Definition Classes
    Expression
  76. lazy val retryable: Boolean

    true means this expression can be used inside a retry block, otherwise false.

    true means this expression can be used inside a retry block, otherwise false. An expression is retryable when

    • it is deterministic, or
    • when being non-deterministic, it is a Retryable and its children are all retryable.
    Definition Classes
    GpuExpression
  77. val selfNonDeterministic: Boolean

    Whether an expression itself is non-deterministic when its "deterministic" is false, no matter whether it has any non-deterministic children.

    Whether an expression itself is non-deterministic when its "deterministic" is false, no matter whether it has any non-deterministic children. An expression is actually a tree, and deterministic being false means there is at least one tree node is non-deterministic, but we need to know the exact nodes which are non-deterministic to check if it implements the Retryable.

    Default to false because Spark checks only children by default in Expression. So it is non-deterministic iff it has non-deterministic children.

    NOTE When overriding "deterministic", this should be taken care of.

    Definition Classes
    GpuExpression
  78. final def semanticEquals(other: Expression): Boolean
    Definition Classes
    Expression
  79. def semanticHash(): Int
    Definition Classes
    Expression
  80. def setTagValue[T](tag: TreeNodeTag[T], value: T): Unit
    Definition Classes
    TreeNode
  81. def simpleString(maxFields: Int): String
    Definition Classes
    Expression → TreeNode
  82. def simpleStringWithNodeId(): String
    Definition Classes
    Expression → TreeNode
  83. def sql(isDistinct: Boolean): String
  84. def sql: String
    Definition Classes
    Expression
  85. def stringArgs: Iterator[Any]
    Attributes
    protected
    Definition Classes
    TreeNode
  86. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  87. def toAggString(isDistinct: Boolean): String

    String representation used in explain plans.

  88. def toJSON: String
    Definition Classes
    TreeNode
  89. def toString(): String
    Definition Classes
    Expression → TreeNode → AnyRef → Any
  90. def transform(rule: PartialFunction[Expression, Expression]): Expression
    Definition Classes
    TreeNode
  91. def transformDown(rule: PartialFunction[Expression, Expression]): Expression
    Definition Classes
    TreeNode
  92. def transformDownWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): Expression
    Definition Classes
    TreeNode
  93. def transformUp(rule: PartialFunction[Expression, Expression]): Expression
    Definition Classes
    TreeNode
  94. def transformUpWithBeforeAndAfterRuleOnChildren(cond: (Expression) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[(Expression, Expression), Expression]): Expression
    Definition Classes
    TreeNode
  95. def transformUpWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): Expression
    Definition Classes
    TreeNode
  96. def transformWithPruning(cond: (TreePatternBits) ⇒ Boolean, ruleId: RuleId)(rule: PartialFunction[Expression, Expression]): Expression
    Definition Classes
    TreeNode
  97. lazy val treePatternBits: BitSet
    Definition Classes
    TreeNode → TreePatternBits
  98. def treeString(append: (String) ⇒ Unit, verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): Unit
    Definition Classes
    TreeNode
  99. final def treeString(verbose: Boolean, addSuffix: Boolean, maxFields: Int, printOperatorId: Boolean): String
    Definition Classes
    TreeNode
  100. final def treeString: String
    Definition Classes
    TreeNode
  101. def unsetTagValue[T](tag: TreeNodeTag[T]): Unit
    Definition Classes
    TreeNode
  102. final def verboseString(maxFields: Int): String
    Definition Classes
    Expression → TreeNode
  103. def verboseStringWithSuffix(maxFields: Int): String
    Definition Classes
    TreeNode
  104. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  105. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  106. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  107. final def withNewChildren(newChildren: Seq[Expression]): Expression
    Definition Classes
    TreeNode
  108. def withNewChildrenInternal(newChildren: IndexedSeq[Expression]): Expression
    Definition Classes
    ShimExpression → TreeNode

Inherited from GpuUnevaluable

Inherited from ShimExpression

Inherited from GpuExpression

Inherited from Expression

Inherited from TreeNode[Expression]

Inherited from TreePatternBits

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped