class GpuSorter extends Serializable
A class that provides convenience methods for sorting batches of data. A Spark SortOrder
typically will just reference a single column using an AttributeReference. This is the simplest
situation so we just need to bind the attribute references to where they go, but it is possible
that some computation can be done in the SortOrder. This would be a situation like sorting
strings by their length instead of in lexicographical order. Because cudf does not support this
directly we instead go through the SortOrder instances that are a part of this sorter and find
the ones that require computation. We then do the sort in a few stages first we compute any
needed columns from the SortOrder instances that require some computation, and add them to the
original batch. The method appendProjectedColumns does this. This then provides a number of
methods that can be used to operate on a batch that has these new columns added to it. These
include sorting, merge sorting, and finding bounds. These can be combined in various ways to
do different algorithms. When you are done with these different operations you can drop the
temporary columns that were added, just for computation, using removeProjectedColumns.
Some times you may want to pull data back to the CPU and sort rows there too. We provide
cpuOrders that lets you do this on rows that have had the extra ordering columns added to them.
This also provides fullySortBatch as an optimization. If all you want to do is sort a batch
you don't want to have to sort the temp columns too, and this provide that.
- Alphabetic
- By Inheritance
- GpuSorter
- Serializable
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
GpuSorter(sortOrder: Seq[SortOrder], inputSchema: Seq[Attribute])
A class that provides convenience methods for sorting batches of data
A class that provides convenience methods for sorting batches of data
- sortOrder
The unbound sorting order requested (Should be converted to the GPU)
- inputSchema
The schema of the input data
-
new
GpuSorter(sortOrder: Seq[SortOrder], inputSchema: Array[Attribute])
- sortOrder
The unbound sorting order requested (Should be converted to the GPU)
- inputSchema
The schema of the input data
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
appendProjectedAndSort(inputBatch: ColumnarBatch, sortTime: GpuMetric): Table
Append any columns needed for sorting the batch and sort it.
Append any columns needed for sorting the batch and sort it. Be careful because a batch with no columns/only rows will cause errors and should be special cased.
- inputBatch
the batch to sort
- sortTime
metric for the sort time
- returns
a sorted table.
-
final
def
appendProjectedColumns(inputBatch: ColumnarBatch): ColumnarBatch
Append any columns to the batch that need to be materialized for sorting to work.
Append any columns to the batch that need to be materialized for sorting to work.
- inputBatch
the batch to add columns to
- returns
the batch with columns added
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
computeSortOrder(inputBatch: ColumnarBatch, sortTime: GpuMetric): ColumnVector
Get the sort order for a batch of data that is the output of
appendProjectedColumns.Get the sort order for a batch of data that is the output of
appendProjectedColumns. Be careful because a batch with no columns/only rows will cause errors and should be special cased.- inputBatch
the batch to sort
- sortTime
metric for the sort time (really the sort order time here)
- returns
a gather map column
-
def
cpuOrdering: Seq[SortOrder]
A sort order that the CPU can use to sort data that is the output of
appendProjectedColumns.A sort order that the CPU can use to sort data that is the output of
appendProjectedColumns. You cannot use the regular sort order directly because it has been translated to the GPU when computation is needed. -
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
fullySortBatch(inputBatch: ColumnarBatch, sortTime: GpuMetric): ColumnarBatch
Sort a batch start to finish.
Sort a batch start to finish. Add any projected columns that are needed to sort, sort the data, and drop the added columns.
- inputBatch
the batch to sort
- sortTime
metric for the amount of time taken to sort.
- returns
the sorted batch
-
final
def
fullySortBatchAndCloseWithRetry(inputSbBatch: SpillableColumnarBatch, sortTime: GpuMetric, opTime: GpuMetric): ColumnarBatch
Similar as fullySortBatch, but with retry support.
Similar as fullySortBatch, but with retry support. The input
inputSbBatchwill be closed after the call. -
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
lowerBound(findIn: Table, find: Table): ColumnVector
Find the lower bounds on data that is the output of
appendProjectedColumns.Find the lower bounds on data that is the output of
appendProjectedColumns. Be careful because a batch with no columns/only rows will cause errors and should be special cased.- findIn
the data to look in for lower bounds
- find
the data to look for and get the lower bound for
- returns
the rows where the insertions would happen.
-
def
lowerBound(findIn: ColumnarBatch, find: ColumnarBatch): ColumnVector
Find the lower bounds on data that is the output of
appendProjectedColumns.Find the lower bounds on data that is the output of
appendProjectedColumns. Be careful because a batch with no columns/only rows will cause errors and should be special cased.- findIn
the data to look in for lower bounds
- find
the data to look for and get the lower bound for
- returns
the rows where the insertions would happen.
-
final
def
mergeSortAndCloseWithRetry(spillableBatches: RapidsStack[SpillableColumnarBatch], sortTime: GpuMetric): SpillableColumnarBatch
Merge multiple batches together.
Merge multiple batches together. All of these batches should be the output of
appendProjectedColumnsand the output of this will also be in that same format.After this function is called, the argument
spillableBatchesshould not be used.- spillableBatches
the spillable batches to sort
- sortTime
metric for the time spent doing the merge sort
- returns
the sorted data.
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
lazy val
originalTypes: Array[DataType]
The original input types without any temporary columns added to them needed for sorting.
-
lazy val
projectedBatchSchema: Seq[Attribute]
Some SortOrder instances require adding temporary columns which is done as a part of the
appendProjectedColumnsmethod.Some SortOrder instances require adding temporary columns which is done as a part of the
appendProjectedColumnsmethod. This is the schema for the result of that method. -
lazy val
projectedBatchTypes: Array[DataType]
The types and order for the columns returned by
appendProjectedColumns -
final
def
removeProjectedColumns(input: Table): ColumnarBatch
Convert a sorted table into a ColumnarBatch and drop any columns added by appendProjectedColumns
Convert a sorted table into a ColumnarBatch and drop any columns added by appendProjectedColumns
- input
the table to convert
- returns
the ColumnarBatch
-
final
def
sort(inputBatch: ColumnarBatch, sortTime: GpuMetric): Table
Sort a batch of data that is the output of
appendProjectedColumns.Sort a batch of data that is the output of
appendProjectedColumns. Be careful because a batch with no columns/only rows will cause errors and should be special cased.- inputBatch
the batch to sort
- sortTime
metric for the sort time
- returns
a sorted table.
- val sortOrder: Seq[SortOrder]
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
upperBound(findIn: ColumnarBatch, find: ColumnarBatch): ColumnVector
Find the upper bounds on data that is the output of
appendProjectedColumns.Find the upper bounds on data that is the output of
appendProjectedColumns. Be careful because a batch with no columns/only rows will cause errors and should be special cased.- findIn
the data to look in for upper bounds
- find
the data to look for and get the upper bound for
- returns
the rows where the insertions would happen.
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()