Packages

case class DeltaSource(spark: SparkSession, deltaLog: DeltaLog, options: DeltaOptions, filters: Seq[Expression] = Nil) extends DeltaSourceBase with DeltaSourceCDCSupport with Product with Serializable

A streaming source for a Delta table.

When a new stream is started, delta starts by constructing a org.apache.spark.sql.delta.Snapshot at the current version of the table. This snapshot is broken up into batches until all existing data has been processed. Subsequent processing is done by tailing the change log looking for new data. This results in the streaming query returning the same answer as a batch query that had processed the entire dataset at any given point.

Linear Supertypes
Serializable, Product, Equals, DeltaSourceCDCSupport, DeltaSourceBase, DeltaLogging, DatabricksLogging, DeltaProgressReporter, Logging, SupportsAdmissionControl, Source, SparkDataStream, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DeltaSource
  2. Serializable
  3. Product
  4. Equals
  5. DeltaSourceCDCSupport
  6. DeltaSourceBase
  7. DeltaLogging
  8. DatabricksLogging
  9. DeltaProgressReporter
  10. Logging
  11. SupportsAdmissionControl
  12. Source
  13. SparkDataStream
  14. AnyRef
  15. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new DeltaSource(spark: SparkSession, deltaLog: DeltaLog, options: DeltaOptions, filters: Seq[Expression] = Nil)

Type Members

  1. class AdmissionLimits extends DeltaSourceAdmissionBase

    Class that helps controlling how much data should be processed by a single micro-batch.

  2. trait DeltaSourceAdmissionBase extends AnyRef
  3. class IndexedChangeFileSeq extends AnyRef

    This class represents an iterator of Change metadata(AddFile, RemoveFile, AddCDCFile) for a particular version.

    This class represents an iterator of Change metadata(AddFile, RemoveFile, AddCDCFile) for a particular version.

    Definition Classes
    DeltaSourceCDCSupport

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def cleanUpSnapshotResources(): Unit
    Attributes
    protected
    Definition Classes
    DeltaSourceBase
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  7. def commit(end: Offset): Unit
    Definition Classes
    Source → SparkDataStream
  8. def commit(end: Offset): Unit
    Definition Classes
    Source
  9. def createDataFrame(indexedFiles: Iterator[IndexedFile]): DataFrame

    Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.

    Given an iterator of file actions, create a DataFrame representing the files added to a table Only AddFile actions will be used to create the DataFrame.

    indexedFiles

    actions iterator from which to generate the DataFrame.

    Attributes
    protected
    Definition Classes
    DeltaSourceBase
  10. def createDataFrameBetweenOffsets(startVersion: Long, startIndex: Long, isStartingVersion: Boolean, startSourceVersion: Option[Long], startOffsetOption: Option[Offset], endOffset: DeltaSourceOffset): DataFrame

    Return the DataFrame between start and end offset.

    Return the DataFrame between start and end offset.

    Attributes
    protected
    Definition Classes
    DeltaSourceBase
  11. val deltaLog: DeltaLog
  12. def deserializeOffset(json: String): Offset
    Definition Classes
    Source → SparkDataStream
  13. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. val excludeRegex: Option[Regex]
    Attributes
    protected
  15. val filters: Seq[Expression]
  16. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  17. def getBatch(startOffsetOption: Option[Offset], end: Offset): DataFrame
    Definition Classes
    DeltaSource → Source
  18. def getCDCFileChangesAndCreateDataFrame(startVersion: Long, startIndex: Long, isStartingVersion: Boolean, endOffset: DeltaSourceOffset): DataFrame

    Get the changes from startVersion, startIndex to the end for CDC case.

    Get the changes from startVersion, startIndex to the end for CDC case. We need to call CDCReader to get the CDC DataFrame.

    startVersion

    - calculated starting version

    startIndex

    - calculated starting index

    isStartingVersion

    - whether the stream has to return the initial snapshot or not

    endOffset

    - Offset that signifies the end of the stream.

    returns

    the DataFrame containing the file changes (AddFile, RemoveFile, AddCDCFile)

    Attributes
    protected
    Definition Classes
    DeltaSourceCDCSupport
  19. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  20. def getDefaultReadLimit(): ReadLimit
    Definition Classes
    DeltaSource → SupportsAdmissionControl
  21. def getFileChanges(fromVersion: Long, fromIndex: Long, isStartingVersion: Boolean): ClosableIterator[IndexedFile]

    Get the changes starting from (startVersion, startIndex).

    Get the changes starting from (startVersion, startIndex). The start point should not be included in the result.

    Attributes
    protected
  22. def getFileChangesAndCreateDataFrame(startVersion: Long, startIndex: Long, isStartingVersion: Boolean, endOffset: DeltaSourceOffset): DataFrame

    get the changes from startVersion, startIndex to the end

    get the changes from startVersion, startIndex to the end

    startVersion

    - calculated starting version

    startIndex

    - calculated starting index

    isStartingVersion

    - whether the stream has to return the initial snapshot or not

    endOffset

    - Offset that signifies the end of the stream.

    Attributes
    protected
    Definition Classes
    DeltaSourceBase
  23. def getFileChangesForCDC(fromVersion: Long, fromIndex: Long, isStartingVersion: Boolean, limits: Option[AdmissionLimits], endOffset: Option[DeltaSourceOffset]): Iterator[(Long, Iterator[IndexedFile])]

    Get the changes starting from (fromVersion, fromIndex).

    Get the changes starting from (fromVersion, fromIndex). fromVersion is included. It returns an iterator of (log_version, fileActions)

    Attributes
    protected
    Definition Classes
    DeltaSourceCDCSupport
  24. def getFileChangesWithRateLimit(fromVersion: Long, fromIndex: Long, isStartingVersion: Boolean, limits: Option[AdmissionLimits] = Some(new AdmissionLimits())): ClosableIterator[IndexedFile]
    Attributes
    protected
    Definition Classes
    DeltaSourceBase
  25. def getNextOffsetFromPreviousOffset(previousOffset: DeltaSourceOffset, limits: Option[AdmissionLimits]): Option[Offset]

    Return the next offset when previous offset exists.

    Return the next offset when previous offset exists.

    Attributes
    protected
    Definition Classes
    DeltaSourceBase
  26. def getOffset: Option[Offset]
    Definition Classes
    DeltaSource → Source
  27. def getSnapshotAt(version: Long): Iterator[IndexedFile]
    Attributes
    protected
  28. def getSnapshotFromDeltaLog(version: Long): Snapshot
    Attributes
    protected
  29. def getStartingOffsetFromSpecificDeltaVersion(fromVersion: Long, isStartingVersion: Boolean, limits: Option[AdmissionLimits]): Option[Offset]

    Returns the offset that starts from a specific delta table version.

    Returns the offset that starts from a specific delta table version. This function is called when starting a new stream query.

    fromVersion

    The version of the delta table to calculate the offset from.

    isStartingVersion

    Whether the delta version is for the initial snapshot or not.

    limits

    Indicates how much data can be processed by a micro batch.

    Attributes
    protected
    Definition Classes
    DeltaSourceBase
  30. lazy val getStartingVersion: Option[Long]

    Extracts whether users provided the option to time travel a relation.

    Extracts whether users provided the option to time travel a relation. If a query restarts from a checkpoint and the checkpoint has recorded the offset, this method should never been called.

    Attributes
    protected
  31. def initialOffset(): Offset
    Definition Classes
    Source → SparkDataStream
  32. var initialState: DeltaSourceSnapshot
    Attributes
    protected
  33. var initialStateVersion: Long
    Attributes
    protected
  34. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  35. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  36. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  37. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  38. def iteratorLast[T](iter: ClosableIterator[T]): Option[T]
    Attributes
    protected
  39. val lastOffsetForTriggerAvailableNow: DeltaSourceOffset
    Attributes
    protected
    Definition Classes
    DeltaSourceBase
  40. def latestOffset(startOffset: Offset, limit: ReadLimit): Offset
    Definition Classes
    DeltaSource → SupportsAdmissionControl
  41. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  42. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  43. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  44. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  45. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  46. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  47. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  48. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  49. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  50. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  51. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  52. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  53. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  54. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  55. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  56. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  57. val options: DeltaOptions
  58. def productElementNames: Iterator[String]
    Definition Classes
    Product
  59. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  60. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  61. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  62. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  63. def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  64. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = null, silent: Boolean = true)(thunk: => S): S
    Definition Classes
    DatabricksLogging
  65. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  66. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  67. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  68. def reportLatestOffset(): Offset
    Definition Classes
    SupportsAdmissionControl
  69. val schema: StructType
    Definition Classes
    DeltaSourceBase → Source
  70. val spark: SparkSession
  71. def stop(): Unit
    Definition Classes
    DeltaSource → SparkDataStream
  72. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  73. val tableId: String
    Attributes
    protected
  74. def toString(): String
    Definition Classes
    DeltaSource → AnyRef → Any
  75. def verifyStreamHygiene(actions: Iterator[Action], version: Long): Unit
    Attributes
    protected
  76. def verifyStreamHygieneAndFilterAddFiles(actions: Seq[Action], version: Long): Seq[Action]
    Attributes
    protected
  77. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  78. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  79. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  80. def withDmqTag[T](thunk: => T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  81. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter
  82. object AdmissionLimits

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from DeltaSourceCDCSupport

Inherited from DeltaSourceBase

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from Logging

Inherited from SupportsAdmissionControl

Inherited from Source

Inherited from SparkDataStream

Inherited from AnyRef

Inherited from Any

Ungrouped