Packages

object CDCReader extends DeltaLogging

The API that allows reading Change data between two versions of a table.

The basic abstraction here is the CDC type column defined by CDCReader.CDC_TYPE_COLUMN_NAME. When CDC is enabled, our writer will treat this column as a special partition column even though it's not part of the table. Writers should generate a query that has two types of rows in it: the main data in partition CDC_TYPE_NOT_CDC and the CDC data with the appropriate CDC type value.

org.apache.spark.sql.delta.files.DelayedCommitProtocol does special handling for this column, dispatching the main data to its normal location while the CDC data is sent to AddCDCFile entries.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CDCReader
  2. DeltaLogging
  3. DatabricksLogging
  4. DeltaProgressReporter
  5. Logging
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Type Members

  1. case class CDCDataSpec[T <: FileAction](version: Long, timestamp: Timestamp, actions: Seq[T]) extends Product with Serializable
  2. case class CDCVersionDiffInfo(fileChangeDf: DataFrame, numFiles: Long, numBytes: Long) extends Product with Serializable

    Represents the changes between some start and end version of a Delta table

    Represents the changes between some start and end version of a Delta table

    fileChangeDf

    contains all of the file changes (AddFile, RemoveFile, AddCDCFile)

    numFiles

    the number of AddFile + RemoveFile + AddCDCFiles that are in the df

    numBytes

    the total size of the AddFile + RemoveFile + AddCDCFiles that are in the df

  3. case class DeltaCDFRelation(schema: StructType, sqlContext: SQLContext, deltaLog: DeltaLog, startingVersion: Option[Long], endingVersion: Option[Long]) extends BaseRelation with PrunedFilteredScan with Product with Serializable

    A special BaseRelation wrapper for CDF reads.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val CDC_COLUMNS_IN_DATA: Seq[String]
  5. val CDC_COMMIT_TIMESTAMP: String
  6. val CDC_COMMIT_VERSION: String
  7. val CDC_LOCATION: String
  8. val CDC_PARTITION_COL: String
  9. val CDC_TYPE_COLUMN_NAME: String
  10. val CDC_TYPE_DELETE: String
  11. val CDC_TYPE_INSERT: String
  12. val CDC_TYPE_NOT_CDC: String
  13. val CDC_TYPE_UPDATE_POSTIMAGE: String
  14. val CDC_TYPE_UPDATE_PREIMAGE: String
  15. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  16. def cdcReadSchema(deltaSchema: StructType): StructType

    Append CDC metadata columns to the provided schema.

  17. def changesToBatchDF(deltaLog: DeltaLog, start: Long, end: Long, spark: SparkSession): DataFrame

    Get the block of change data from start to end Delta log versions (both sides inclusive).

    Get the block of change data from start to end Delta log versions (both sides inclusive). The returned DataFrame has isStreaming set to false.

  18. def changesToDF(deltaLog: DeltaLog, start: Long, end: Long, changes: Iterator[(Long, Seq[Action])], spark: SparkSession, isStreaming: Boolean = false): CDCVersionDiffInfo

    For a sequence of changes(AddFile, RemoveFile, AddCDCFile) create a DataFrame that represents that captured change data between start and end inclusive.

    For a sequence of changes(AddFile, RemoveFile, AddCDCFile) create a DataFrame that represents that captured change data between start and end inclusive.

    Builds the DataFrame using the following logic: Per each change of type (Long, Seq[Action]) in changes, iterates over the actions and handles two cases. - If there are any CDC actions, then we ignore the AddFile and RemoveFile actions in that version and create an AddCDCFile instead. - If there are no CDC actions, then we must infer the CDC data from the AddFile and RemoveFile actions, taking only those with dataChange = true.

    These buffers of AddFile, RemoveFile, and AddCDCFile actions are then used to create corresponding FileIndexes (e.g. TahoeChangeFileIndex), where each is suited to use the given action type to read CDC data. These FileIndexes are then unioned to produce the final DataFrame.

    deltaLog

    - DeltaLog for the table for which we are creating a cdc dataFrame

    start

    - startingVersion of the changes

    end

    - endingVersion of the changes

    changes

    - changes is an iterator of all FileActions for a particular commit version.

    spark

    - SparkSession

    isStreaming

    - indicates whether the DataFrame returned is a streaming DataFrame

    returns

    CDCInfo which contains the DataFrame of the changes as well as the statistics related to the changes

  19. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  20. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  21. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  22. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  23. def getCDCRelation(spark: SparkSession, deltaLog: DeltaLog, snapshotToUse: Snapshot, partitionFilters: Seq[Expression], conf: SQLConf, options: CaseInsensitiveStringMap): BaseRelation

    Get a Relation that represents change data between two snapshots of the table.

  24. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  25. def getTimestampsByVersion(deltaLog: DeltaLog, start: Long, end: Long, spark: SparkSession): Map[Long, Timestamp]

    Builds a map from commit versions to associated commit timestamps.

    Builds a map from commit versions to associated commit timestamps.

    start

    start commit version

    end

    end commit version

  26. def getVersionForCDC(spark: SparkSession, deltaLog: DeltaLog, conf: SQLConf, options: CaseInsensitiveStringMap, versionKey: String, timestampKey: String): Option[Long]

    Given timestamp or version this method returns the corresponding version for that timestamp or the version itself.

  27. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  28. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  29. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  30. def isCDCEnabledOnTable(metadata: Metadata): Boolean

    Determine if the metadata provided has cdc enabled or not.

  31. def isCDCRead(options: CaseInsensitiveStringMap): Boolean

    Based on the read options passed it indicates whether the read was a cdc read or not.

  32. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  33. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  34. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  35. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  36. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  37. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  38. def logError(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  39. def logError(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  40. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  41. def logInfo(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  42. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  43. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  44. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  45. def logWarning(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  46. def logWarning(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  47. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  48. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  49. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  50. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  51. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  52. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  53. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  54. def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  55. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = null, silent: Boolean = true)(thunk: => S): S
    Definition Classes
    DatabricksLogging
  56. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  57. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  58. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  59. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  60. def toString(): String
    Definition Classes
    AnyRef → Any
  61. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  62. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  63. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  64. def withDmqTag[T](thunk: => T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  65. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped