case class DeltaSource(spark: SparkSession, deltaLog: DeltaLog, options: DeltaOptions, filters: Seq[Expression] = Nil) extends DeltaSourceBase with DeltaSourceCDCSupport with Product with Serializable
A streaming source for a Delta table.
When a new stream is started, delta starts by constructing a org.apache.spark.sql.delta.Snapshot at the current version of the table. This snapshot is broken up into batches until all existing data has been processed. Subsequent processing is done by tailing the change log looking for new data. This results in the streaming query returning the same answer as a batch query that had processed the entire dataset at any given point.
- Alphabetic
- By Inheritance
- DeltaSource
- Serializable
- Product
- Equals
- DeltaSourceCDCSupport
- DeltaSourceBase
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- Logging
- SupportsAdmissionControl
- Source
- SparkDataStream
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new DeltaSource(spark: SparkSession, deltaLog: DeltaLog, options: DeltaOptions, filters: Seq[Expression] = Nil)
Type Members
- class AdmissionLimits extends DeltaSourceAdmissionBase
Class that helps controlling how much data should be processed by a single micro-batch.
- trait DeltaSourceAdmissionBase extends AnyRef
- class IndexedChangeFileSeq extends AnyRef
This class represents an iterator of Change metadata(AddFile, RemoveFile, AddCDCFile) for a particular version.
This class represents an iterator of Change metadata(AddFile, RemoveFile, AddCDCFile) for a particular version.
- Definition Classes
- DeltaSourceCDCSupport
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def cleanUpSnapshotResources(): Unit
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def commit(end: Offset): Unit
- Definition Classes
- Source → SparkDataStream
- def commit(end: Offset): Unit
- Definition Classes
- Source
- def createDataFrame(indexedFiles: Iterator[IndexedFile]): DataFrame
Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.
Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.
- indexedFiles
actions iterator from which to generate the DataFrame.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def createDataFrameBetweenOffsets(startVersion: Long, startIndex: Long, isStartingVersion: Boolean, startSourceVersion: Option[Long], startOffsetOption: Option[Offset], endOffset: DeltaSourceOffset): DataFrame
Return the DataFrame between start and end offset.
Return the DataFrame between start and end offset.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- val deltaLog: DeltaLog
- def deserializeOffset(json: String): Offset
- Definition Classes
- Source → SparkDataStream
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- val excludeRegex: Option[Regex]
- Attributes
- protected
- val filters: Seq[Expression]
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def getBatch(startOffsetOption: Option[Offset], end: Offset): DataFrame
- Definition Classes
- DeltaSource → Source
- def getCDCFileChangesAndCreateDataFrame(startVersion: Long, startIndex: Long, isStartingVersion: Boolean, endOffset: DeltaSourceOffset): DataFrame
Get the changes from startVersion, startIndex to the end for CDC case.
Get the changes from startVersion, startIndex to the end for CDC case. We need to call CDCReader to get the CDC DataFrame.
- startVersion
- calculated starting version
- startIndex
- calculated starting index
- isStartingVersion
- whether the stream has to return the initial snapshot or not
- endOffset
- Offset that signifies the end of the stream.
- returns
the DataFrame containing the file changes (AddFile, RemoveFile, AddCDCFile)
- Attributes
- protected
- Definition Classes
- DeltaSourceCDCSupport
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getDefaultReadLimit(): ReadLimit
- Definition Classes
- DeltaSource → SupportsAdmissionControl
- def getFileChanges(fromVersion: Long, fromIndex: Long, isStartingVersion: Boolean): ClosableIterator[IndexedFile]
Get the changes starting from (startVersion, startIndex).
Get the changes starting from (startVersion, startIndex). The start point should not be included in the result.
- Attributes
- protected
- def getFileChangesAndCreateDataFrame(startVersion: Long, startIndex: Long, isStartingVersion: Boolean, endOffset: DeltaSourceOffset): DataFrame
get the changes from startVersion, startIndex to the end
get the changes from startVersion, startIndex to the end
- startVersion
- calculated starting version
- startIndex
- calculated starting index
- isStartingVersion
- whether the stream has to return the initial snapshot or not
- endOffset
- Offset that signifies the end of the stream.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def getFileChangesForCDC(fromVersion: Long, fromIndex: Long, isStartingVersion: Boolean, limits: Option[AdmissionLimits], endOffset: Option[DeltaSourceOffset]): Iterator[(Long, Iterator[IndexedFile])]
Get the changes starting from (fromVersion, fromIndex).
Get the changes starting from (fromVersion, fromIndex). fromVersion is included. It returns an iterator of (log_version, fileActions)
- Attributes
- protected
- Definition Classes
- DeltaSourceCDCSupport
- def getFileChangesWithRateLimit(fromVersion: Long, fromIndex: Long, isStartingVersion: Boolean, limits: Option[AdmissionLimits] = Some(new AdmissionLimits())): ClosableIterator[IndexedFile]
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def getNextOffsetFromPreviousOffset(previousOffset: DeltaSourceOffset, limits: Option[AdmissionLimits]): Option[Offset]
Return the next offset when previous offset exists.
Return the next offset when previous offset exists.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def getOffset: Option[Offset]
- Definition Classes
- DeltaSource → Source
- def getSnapshotAt(version: Long): Iterator[IndexedFile]
- Attributes
- protected
- def getSnapshotFromDeltaLog(version: Long): Snapshot
- Attributes
- protected
- def getStartingOffsetFromSpecificDeltaVersion(fromVersion: Long, isStartingVersion: Boolean, limits: Option[AdmissionLimits]): Option[Offset]
Returns the offset that starts from a specific delta table version.
Returns the offset that starts from a specific delta table version. This function is called when starting a new stream query.
- fromVersion
The version of the delta table to calculate the offset from.
- isStartingVersion
Whether the delta version is for the initial snapshot or not.
- limits
Indicates how much data can be processed by a micro batch.
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- lazy val getStartingVersion: Option[Long]
Extracts whether users provided the option to time travel a relation.
Extracts whether users provided the option to time travel a relation. If a query restarts from a checkpoint and the checkpoint has recorded the offset, this method should never been called.
- Attributes
- protected
- def initialOffset(): Offset
- Definition Classes
- Source → SparkDataStream
- var initialState: DeltaSourceSnapshot
- Attributes
- protected
- var initialStateVersion: Long
- Attributes
- protected
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def iteratorLast[T](iter: ClosableIterator[T]): Option[T]
- Attributes
- protected
- val lastOffsetForTriggerAvailableNow: DeltaSourceOffset
- Attributes
- protected
- Definition Classes
- DeltaSourceBase
- def latestOffset(startOffset: Offset, limit: ReadLimit): Offset
- Definition Classes
- DeltaSource → SupportsAdmissionControl
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val options: DeltaOptions
- def productElementNames: Iterator[String]
- Definition Classes
- Product
- def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = null, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def reportLatestOffset(): Offset
- Definition Classes
- SupportsAdmissionControl
- val schema: StructType
- Definition Classes
- DeltaSourceBase → Source
- val spark: SparkSession
- def stop(): Unit
- Definition Classes
- DeltaSource → SparkDataStream
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- val tableId: String
- Attributes
- protected
- def toString(): String
- Definition Classes
- DeltaSource → AnyRef → Any
- def verifyStreamHygiene(actions: Iterator[Action], version: Long): Unit
- Attributes
- protected
- def verifyStreamHygieneAndFilterAddFiles(actions: Seq[Action], version: Long): Seq[Action]
- Attributes
- protected
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withDmqTag[T](thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter
- object AdmissionLimits