Packages

class Snapshot extends StateCache with StatisticsCollection with DataSkippingReader with DeltaLogging

An immutable snapshot of the state of the log at some delta version. Internally this class manages the replay of actions stored in checkpoint or delta files.

After resolving any new actions, it caches the result and collects the following basic information to the driver:

  • Protocol Version
  • Metadata
  • Transaction state
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Snapshot
  2. DataSkippingReader
  3. DataSkippingReaderBase
  4. ReadsMetadataFields
  5. DeltaScanGenerator
  6. DeltaScanGeneratorBase
  7. StatisticsCollection
  8. UsesMetadataFields
  9. StateCache
  10. DeltaLogging
  11. DatabricksLogging
  12. DeltaProgressReporter
  13. Logging
  14. AnyRef
  15. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new Snapshot(path: Path, version: Long, logSegment: LogSegment, minFileRetentionTimestamp: Long, deltaLog: DeltaLog, timestamp: Long, checksumOpt: Option[VersionChecksum], minSetTransactionRetentionTimestamp: Option[Long] = None, checkpointMetadataOpt: Option[CheckpointMetaData] = None)

    timestamp

    The timestamp of the latest commit in milliseconds. Can also be set to -1 if the timestamp of the commit is unknown or the table has not been initialized, i.e. version = -1.

Type Members

  1. class CachedDS[A] extends AnyRef
    Definition Classes
    StateCache

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final val MAX: String("maxValues")
    Definition Classes
    UsesMetadataFields
  5. final val MIN: String("minValues")
    Definition Classes
    UsesMetadataFields
  6. final val NULL_COUNT: String("nullCount")
    Definition Classes
    UsesMetadataFields
  7. final val NUM_RECORDS: String("numRecords")
    Definition Classes
    UsesMetadataFields
  8. def aggregationsToComputeState: Map[String, Column]

    A Map of alias to aggregations which needs to be done to calculate the computedState

    A Map of alias to aggregations which needs to be done to calculate the computedState

    Attributes
    protected
  9. def allFiles: Dataset[AddFile]

    All of the files present in this Snapshot.

    All of the files present in this Snapshot.

    Definition Classes
    SnapshotDataSkippingReaderBase
  10. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  11. def cacheDS[A](ds: Dataset[A], name: String): CachedDS[A]

    Create a CachedDS instance for the given Dataset and the name.

    Create a CachedDS instance for the given Dataset and the name.

    Definition Classes
    StateCache
  12. lazy val checkpointFileIndexOpt: Option[DeltaLogFileIndex]
    Attributes
    protected
  13. def checkpointSizeInBytes(): Long
  14. val checksumOpt: Option[VersionChecksum]
  15. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  16. val columnMappingMode: DeltaColumnMappingMode
    Definition Classes
    DataSkippingReaderBase
  17. def computeChecksum: VersionChecksum

    Computes all the information that is needed by the checksum for the current snapshot.

    Computes all the information that is needed by the checksum for the current snapshot. May kick off state reconstruction if needed by any of the underlying fields. Note that it's safe to set txnId to none, since the snapshot doesn't always have a txn attached. E.g. if a snapshot is created by reading a checkpoint, then no txnId is present.

  18. lazy val computedState: State

    Computes some statistics around the transaction log, therefore on the actions made on this Delta table.

    Computes some statistics around the transaction log, therefore on the actions made on this Delta table.

    Attributes
    protected
  19. def constructPartitionFilters(filters: Seq[Expression]): Column

    Given the partition filters on the data, rewrite these filters by pointing to the metadata columns.

    Given the partition filters on the data, rewrite these filters by pointing to the metadata columns.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  20. def dataSchema: StructType

    Returns the data schema of the table, the schema of the columns written out to file.

    Returns the data schema of the table, the schema of the columns written out to file.

    Definition Classes
    SnapshotStatisticsCollection
  21. lazy val deltaFileIndexOpt: Option[DeltaLogFileIndex]
    Attributes
    protected
  22. def deltaFileSizeInBytes(): Long
  23. val deltaLog: DeltaLog
    Definition Classes
    SnapshotDataSkippingReaderBase
  24. def emptyDF: DataFrame
    Attributes
    protected
  25. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  27. lazy val fileIndices: Seq[DeltaLogFileIndex]
    Attributes
    protected
  28. def fileSizeHistogram: Option[FileSizeHistogram]
  29. def filesForScan(projection: Seq[Attribute], filters: Seq[Expression], keepNumRecords: Boolean): DeltaScan
    Definition Classes
    DataSkippingReaderBase
  30. def filesForScan(projection: Seq[Attribute], filters: Seq[Expression]): DeltaScan

    Gathers files that should be included in a scan based on the given predicates.

    Gathers files that should be included in a scan based on the given predicates. Statistics about the amount of data that will be read are gathered and returned.

    Definition Classes
    DataSkippingReaderBaseDeltaScanGeneratorBase
  31. def filesWithStatsForScan(partitionFilters: Seq[Expression]): DataFrame

    Returns a DataFrame for the given partition filters.

    Returns a DataFrame for the given partition filters. The schema of returned DataFrame is nearly the same as AddFile, except that the stats field is parsed to a struct from a json string.

    Definition Classes
    DataSkippingReaderBaseDeltaScanGeneratorBase
  32. def filterOnPartitions(partitionFilters: Seq[Expression], keepNumRecords: Boolean): (Seq[AddFile], DataSize)

    Get all the files in this table given the partition filter and the corresponding size of the scan.

    Get all the files in this table given the partition filter and the corresponding size of the scan.

    keepNumRecords

    Also select stats.numRecords in the query. This may slow down the query as it has to parse json.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  33. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  34. def getAllFiles(keepNumRecords: Boolean): Seq[AddFile]

    Get all the files in this table.

    Get all the files in this table.

    keepNumRecords

    Also select stats.numRecords in the query. This may slow down the query as it has to parse json.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  35. def getBaseStatsColumn: Column

    Returns a Column that references the stats field data skipping should use

    Returns a Column that references the stats field data skipping should use

    Definition Classes
    ReadsMetadataFields
  36. def getCheckpointMetadataOpt: Option[CheckpointMetaData]
  37. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  38. def getDataSkippedFiles(partitionFilters: Column, dataFilters: DataSkippingPredicate, keepNumRecords: Boolean): (Seq[AddFile], Seq[DataSize])

    Given the partition and data filters, leverage data skipping statistics to find the set of files that need to be queried.

    Given the partition and data filters, leverage data skipping statistics to find the set of files that need to be queried. Returns a tuple of the files and optionally the size of the scan that's generated if there were no filters, if there were only partition filters, and combined effect of partition and data filters respectively.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  39. def getNumPartitions: Int
    Attributes
    protected
  40. def getProperties: HashMap[String, String]

    Return the set of properties of the table.

  41. def getSpecificFilesWithStats(paths: Seq[String]): Seq[AddFile]

    Get AddFile (with stats) actions corresponding to given set of paths in the Snapshot.

    Get AddFile (with stats) actions corresponding to given set of paths in the Snapshot. If a path doesn't exist in snapshot, it will be ignored and no AddFile will be returned for it.

    paths

    Sequence of paths for which we want to get AddFile action

    returns

    a sequence of addFiles for the given paths

    Definition Classes
    DataSkippingReaderBase
  42. final def getStatsColumnOpt(stat: StatsColumn): Option[Column]

    Overload for convenience working with StatsColumn helpers

    Overload for convenience working with StatsColumn helpers

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  43. final def getStatsColumnOpt(statType: String, pathToColumn: Seq[String] = Nil): Option[Column]

    Returns an expression to access the given statistics for a specific column, or None if that stats column does not exist.

    Returns an expression to access the given statistics for a specific column, or None if that stats column does not exist.

    statType

    One of the fields declared by trait UsesMetadataFields

    pathToColumn

    The components of the nested column name to get stats for.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  44. final def getStatsColumnOrNullLiteral(stat: StatsColumn): Column

    Overload for convenience working with StatsColumn helpers

    Overload for convenience working with StatsColumn helpers

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  45. final def getStatsColumnOrNullLiteral(statType: String, pathToColumn: Seq[String] = Nil): Column

    Returns an expression to access the given statistics for a specific column, or a NULL literal expression if that column does not exist.

    Returns an expression to access the given statistics for a specific column, or a NULL literal expression if that column does not exist.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  46. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  47. def indexToRelation(index: DeltaLogFileIndex, schema: StructType = logSchema): LogicalRelation

    Creates a LogicalRelation with the given schema from a DeltaLogFileIndex.

    Creates a LogicalRelation with the given schema from a DeltaLogFileIndex.

    Attributes
    protected
  48. def init(): Unit

    Performs validations during initialization

    Performs validations during initialization

    Attributes
    protected
  49. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  50. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  51. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  52. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  53. def loadActions: DataFrame

    Loads the file indices into a DataFrame that can be used for LogReplay.

    Loads the file indices into a DataFrame that can be used for LogReplay.

    In addition to the usual nested columns provided by the SingleAction schema, it should provide two additional columns to simplify the log replay process: ACTION_SORT_COL_NAME (which, when sorted in ascending order, will order older actions before newer ones, as required by InMemoryLogReplay); and ADD_STATS_TO_USE_COL_NAME (to handle certain combinations of config settings for delta.checkpoint.writeStatsAsJson and delta.checkpoint.writeStatsAsStruct).

    Attributes
    protected
  54. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  55. def logConsole(line: String): Unit
    Definition Classes
    DatabricksLogging
  56. def logDebug(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  57. def logDebug(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  58. def logError(msg: => String, throwable: Throwable): Unit
    Definition Classes
    Snapshot → Logging
  59. def logError(msg: => String): Unit
    Definition Classes
    Snapshot → Logging
  60. def logInfo(msg: => String): Unit
    Definition Classes
    Snapshot → Logging
  61. def logInfo(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  62. def logMissingActionWarning(action: String): Unit

    Helper method to log missing actions when state reconstruction checks are not enabled

    Helper method to log missing actions when state reconstruction checks are not enabled

    Attributes
    protected
  63. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  64. val logSegment: LogSegment
  65. def logTrace(msg: => String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  66. def logTrace(msg: => String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  67. def logWarning(msg: => String, throwable: Throwable): Unit
    Definition Classes
    Snapshot → Logging
  68. def logWarning(msg: => String): Unit
    Definition Classes
    Snapshot → Logging
  69. def metadata: Metadata
    Definition Classes
    SnapshotDataSkippingReaderBase
  70. val minFileRetentionTimestamp: Long
  71. val minSetTransactionRetentionTimestamp: Option[Long]
  72. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  73. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  74. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  75. lazy val numIndexedCols: Int

    Number of columns to collect stats on for data skipping

    Number of columns to collect stats on for data skipping

    Definition Classes
    SnapshotStatisticsCollection
  76. def numOfFiles: Long
    Definition Classes
    SnapshotDataSkippingReaderBase
  77. def numOfMetadata: Long
  78. def numOfProtocol: Long
  79. def numOfRemoves: Long
  80. def numOfSetTransactions: Long
  81. val path: Path
    Definition Classes
    SnapshotDataSkippingReaderBase
  82. def protocol: Protocol
  83. def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    Used to record the occurrence of a single event or report detailed, operation specific statistics.

    path

    Used to log the path of the delta table when deltaLog is null.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  84. def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Used to report the duration as well as the success or failure of an operation on a deltaLog.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  85. def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Used to report the duration as well as the success or failure of an operation on a tahoePath.

    Attributes
    protected
    Definition Classes
    DeltaLogging
  86. def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  87. def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  88. def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = null, silent: Boolean = true)(thunk: => S): S
    Definition Classes
    DatabricksLogging
  89. def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
    Definition Classes
    DatabricksLogging
  90. def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  91. def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
    Definition Classes
    DatabricksLogging
  92. def redactedPath: String
    Definition Classes
    SnapshotDataSkippingReaderBase
  93. def schema: StructType

    Returns the schema of the table.

    Returns the schema of the table.

    Definition Classes
    SnapshotDataSkippingReaderBase
  94. def setTransactions: Seq[SetTransaction]
  95. def sizeInBytes: Long
    Definition Classes
    SnapshotDataSkippingReaderBase
  96. val snapshotToScan: Snapshot

    Snapshot to scan by the DeltaScanGenerator for metadata query optimizations

    Snapshot to scan by the DeltaScanGenerator for metadata query optimizations

    Definition Classes
    SnapshotDeltaScanGeneratorBase
  97. def spark: SparkSession
    Attributes
    protected
    Definition Classes
    SnapshotStatisticsCollectionStateCache
  98. lazy val statCollectionSchema: StructType

    statCollectionSchema is the schema that is composed of all the columns that have the stats collected with our current table configuration.

    statCollectionSchema is the schema that is composed of all the columns that have the stats collected with our current table configuration.

    Definition Classes
    StatisticsCollection
  99. def stateDF: DataFrame

    The current set of actions in this Snapshot as plain Rows

  100. def stateDS: Dataset[SingleAction]

    The current set of actions in this Snapshot as a typed Dataset.

  101. lazy val statsCollector: Column

    Returns a struct column that can be used to collect statistics for the current schema of the table.

    Returns a struct column that can be used to collect statistics for the current schema of the table. The types we keep stats on must be consistent with DataSkippingReader.SkippingEligibleLiteral.

    Definition Classes
    StatisticsCollection
  102. lazy val statsSchema: StructType

    Returns schema of the statistics collected.

    Returns schema of the statistics collected.

    Definition Classes
    StatisticsCollection
  103. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  104. val timestamp: Long
  105. def toString(): String
    Definition Classes
    Snapshot → AnyRef → Any
  106. def tombstones: Dataset[RemoveFile]

    All unexpired tombstones.

  107. lazy val transactions: Map[String, Long]

    A map to look up transaction version by appId.

  108. def uncache(): Unit

    Drop any cached data for this Snapshot.

    Drop any cached data for this Snapshot.

    Definition Classes
    StateCache
  109. def verifyStatsForFilter(referencedStats: Set[StatsColumn]): Column

    Returns an expression that can be used to check that the required statistics are present for a given file.

    Returns an expression that can be used to check that the required statistics are present for a given file. If any required statistics are missing we must include the corresponding file.

    NOTE: We intentionally choose to disable skipping for any file if any required stat is missing, because doing it that way allows us to check each stat only once (rather than once per use). Checking per-use would anyway only help for tables where the number of indexed columns has changed over time, producing add.stats_parsed records with differing schemas. That should be a rare enough case to not worry about optimizing for, given that the fix requires more complex skipping predicates that would penalize the common case.

    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  110. val version: Long
    Definition Classes
    SnapshotDataSkippingReaderBase
  111. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  112. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  113. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  114. def withDmqTag[T](thunk: => T): T
    Attributes
    protected
    Definition Classes
    DeltaLogging
  115. def withNoStats: DataFrame

    All files with the statistics column dropped completely.

    All files with the statistics column dropped completely.

    Definition Classes
    DataSkippingReaderBase
  116. final def withStats: DataFrame

    Returns a parsed and cached representation of files with statistics.

    Returns a parsed and cached representation of files with statistics.

    returns

    cached DataFrame

    Definition Classes
    DataSkippingReaderBase
  117. def withStatsInternal: DataFrame
    Attributes
    protected
    Definition Classes
    DataSkippingReaderBase
  118. def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T

    Report a log to indicate some command is running.

    Report a log to indicate some command is running.

    Definition Classes
    DeltaProgressReporter

Inherited from DataSkippingReader

Inherited from DataSkippingReaderBase

Inherited from ReadsMetadataFields

Inherited from DeltaScanGenerator

Inherited from DeltaScanGeneratorBase

Inherited from StatisticsCollection

Inherited from UsesMetadataFields

Inherited from StateCache

Inherited from DeltaLogging

Inherited from DatabricksLogging

Inherited from DeltaProgressReporter

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped