class GpuDynamicPartitionDataConcurrentWriter extends GpuDynamicPartitionDataSingleWriter with Logging
Dynamic partition writer with concurrent writers, meaning multiple concurrent writers are opened for writing.
The process has the following steps:
- Step 1: Maintain a map of output writers per each partition columns. Keep all writers opened; Cache the inputted batches by splitting them into sub-groups and each partition holds a list of spillable sub-groups; Find and write the max pending partition data if the total caches exceed the limitation.
- Step 2: If number of concurrent writers exceeds limit, fall back to sort-based write
(
GpuDynamicPartitionDataSingleWriter), sort rest of batches on partition. Write batch by batch, and eagerly close the writer when finishing Caller is expected to callwriteWithIterator()instead ofwrite()to write records. Note: when fall back toGpuDynamicPartitionDataSingleWriter, the single writer should restore un-closed writers and should handle un-flushed spillable caches.
- Alphabetic
- By Inheritance
- GpuDynamicPartitionDataConcurrentWriter
- Logging
- GpuDynamicPartitionDataSingleWriter
- GpuFileFormatDataWriter
- DataWriter
- Closeable
- AutoCloseable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new GpuDynamicPartitionDataConcurrentWriter(description: GpuWriteJobDescription, taskAttemptContext: TaskAttemptContext, committer: FileCommitProtocol, spec: GpuConcurrentOutputWriterSpec)
Type Members
-
class
WriterIndex extends Product2[Option[String], Option[Int]]
Wrapper class to index a unique concurrent output writer.
Wrapper class to index a unique concurrent output writer.
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
class
WriterAndStatus extends AnyRef
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
val
MAX_FILE_COUNTER: Int
Max number of files a single task writes out due to file size.
Max number of files a single task writes out due to file size. In most cases the number of files written should be very small. This is just a safe guard to protect some really bad settings, e.g. maxRecordsPerFile = 1.
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
def
abort(): Unit
- Definition Classes
- GpuFileFormatDataWriter → DataWriter
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
close(): Unit
- Definition Classes
- GpuFileFormatDataWriter → Closeable → AutoCloseable
-
def
commit(): WriteTaskResult
Returns the summary of relative information which includes the list of partition strings written out.
Returns the summary of relative information which includes the list of partition strings written out. The list of partitions is sent back to the driver and used to update the catalog. Other information will be sent back to the driver too and used to e.g. update the metrics in UI.
- Definition Classes
- GpuFileFormatDataWriter → DataWriter
-
def
copyToHostAsBatch(input: Table, colTypes: Array[DataType]): ColumnarBatch
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
def
currentMetricsValues(): Array[CustomTaskMetric]
- Definition Classes
- DataWriter
-
var
currentWriterStatus: WriterAndStatus
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
def
genGetBucketIdFunc(keyHostCb: ColumnarBatch): (Int) ⇒ Option[Int]
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
def
genGetPartitionPathFunc(keyHostCb: ColumnarBatch): (Int) ⇒ Option[String]
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
lazy val
getDataColumnsAsBatch: (ColumnarBatch) ⇒ ColumnarBatch
Extracts the output values of an input batch.
Extracts the output values of an input batch.
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
def
getKeysBatch(cb: ColumnarBatch): ColumnarBatch
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
val
isBucketed: Boolean
Flag saying whether or not the data to be written out is bucketed.
Flag saying whether or not the data to be written out is bucketed.
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
isPartitioned: Boolean
Flag saying whether or not the data to be written out is partitioned.
Flag saying whether or not the data to be written out is partitioned.
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
log: Logger
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
newWriter(partDir: Option[String], bucketId: Option[Int], fileCounter: Int): ColumnarOutputWriter
Opens a new OutputWriter given a partition key and/or a bucket id.
Opens a new OutputWriter given a partition key and/or a bucket id. If bucket id is specified, we will append it to the end of the file name, but before the file extension, e.g. part-r-00009-ea518ad4-455a-4431-b471-d24e03814677-00002.gz.parquet
- partDir
the partition directory
- bucketId
the bucket which all tuples being written by this OutputWriter belong to, currently does not support
bucketId, it's always None- fileCounter
integer indicating the number of files to be written to
partDir
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
- Annotations
- @nowarn()
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
preUpdateCurrentWriterStatus(curWriterId: WriterIndex): Unit
This is for the fallback case, used to clean the writers map.
This is for the fallback case, used to clean the writers map.
- curWriterId
the current writer index
- Definition Classes
- GpuDynamicPartitionDataConcurrentWriter → GpuDynamicPartitionDataSingleWriter
-
final
def
releaseOutWriter(status: WriterAndStatus): Unit
Release resources of a WriterStatus.
Release resources of a WriterStatus.
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
def
releaseResources(): Unit
Release all resources.
Release all resources.
- Definition Classes
- GpuDynamicPartitionDataConcurrentWriter → GpuFileFormatDataWriter
-
final
def
renewOutWriter(newWriterId: WriterIndex, curWriterStatus: WriterAndStatus, closeOldWriter: Boolean = true): Unit
Create a new writer according to the given writer id, and update the given writer status.
Create a new writer according to the given writer id, and update the given writer status. It also closes the old writer in the writer status by default.
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
def
setupCurrentWriter(newWriterId: WriterIndex, writerStatus: WriterAndStatus, closeOldWriter: Boolean): Unit
This is for the fallback case, try to find the writer from cache first.
This is for the fallback case, try to find the writer from cache first.
- Definition Classes
- GpuDynamicPartitionDataConcurrentWriter → GpuDynamicPartitionDataSingleWriter
-
val
statsTrackers: Seq[ColumnarWriteTaskStatsTracker]
Trackers for computing various statistics on the data as it's being written out.
Trackers for computing various statistics on the data as it's being written out.
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
val
updatedPartitions: Set[String]
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
write(cb: ColumnarBatch): Unit
The write path of concurrent writers
The write path of concurrent writers
- cb
the columnar batch to be written
- Definition Classes
- GpuDynamicPartitionDataConcurrentWriter → GpuDynamicPartitionDataSingleWriter → GpuFileFormatDataWriter → DataWriter
-
final
def
writeBatchPerMaxRecordsAndClose(scb: SpillableColumnarBatch, writerId: WriterIndex, writerStatus: WriterAndStatus): Unit
- Attributes
- protected
- Definition Classes
- GpuDynamicPartitionDataSingleWriter
-
final
def
writeUpdateMetricsAndClose(scb: SpillableColumnarBatch, writerStatus: WriterAndStatus): Unit
- Attributes
- protected
- Definition Classes
- GpuFileFormatDataWriter
-
def
writeWithIterator(iterator: Iterator[ColumnarBatch]): Unit
Write an iterator of column batch.
Write an iterator of column batch.
- Definition Classes
- GpuDynamicPartitionDataConcurrentWriter → GpuFileFormatDataWriter