class DeltaLog extends Checkpoints with MetadataCleanup with LogStoreProvider with SnapshotManagement with DeltaFileFormat with ReadChecksum
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Internally, this class implements an optimistic concurrency control algorithm to handle multiple readers or writers. Any single read is guaranteed to see a consistent snapshot of the table.
- Alphabetic
- By Inheritance
- DeltaLog
- ReadChecksum
- DeltaFileFormat
- SnapshotManagement
- LogStoreProvider
- MetadataCleanup
- Checkpoints
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val LAST_CHECKPOINT: Path
The path to the file that holds metadata about the most recent checkpoint.
The path to the file that holds metadata about the most recent checkpoint.
- Definition Classes
- Checkpoints
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def assertRemovable(): Unit
Checks whether this table only accepts appends.
Checks whether this table only accepts appends. If so it will throw an error in operations that can remove data such as DELETE/UPDATE/MERGE.
- def checkLogStoreConfConflicts(sparkConf: SparkConf): Unit
- Definition Classes
- LogStoreProvider
- def checkpoint(snapshotToCheckpoint: Snapshot): Unit
Creates a checkpoint using snapshotToCheckpoint.
Creates a checkpoint using snapshotToCheckpoint. By default it uses the current log version. Note that this function captures and logs all exceptions, since the checkpoint shouldn't fail the overall commit operation.
- Definition Classes
- Checkpoints
- def checkpoint(): Unit
Creates a checkpoint using the default snapshot.
Creates a checkpoint using the default snapshot.
- Definition Classes
- Checkpoints
- def checkpointAndCleanUpDeltaLog(snapshotToCheckpoint: Snapshot): Unit
- Attributes
- protected
- Definition Classes
- Checkpoints
- def checkpointInterval: Int
Returns the checkpoint interval for this log.
Returns the checkpoint interval for this log. Not transactional.
- Definition Classes
- Checkpoints
- val clock: Clock
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def createDataFrame(snapshot: Snapshot, addFiles: Seq[AddFile], isStreaming: Boolean = false, actionTypeOpt: Option[String] = None): DataFrame
Returns a org.apache.spark.sql.DataFrame containing the new files within the specified version range.
- def createLogDirectory(): Unit
Create the log directory.
Create the log directory. Unlike
ensureLogDirectoryExist, this method doesn't check whether the log directory exists and it will ignore the return value ofmkdirs. - def createLogStore(sparkConf: SparkConf, hadoopConf: Configuration): LogStore
- Definition Classes
- LogStoreProvider
- def createLogStore(spark: SparkSession): LogStore
- Definition Classes
- LogStoreProvider
- def createRelation(partitionFilters: Seq[Expression] = Nil, snapshotToUseOpt: Option[Snapshot] = None, isTimeTravelQuery: Boolean = false, cdcOptions: CaseInsensitiveStringMap = CaseInsensitiveStringMap.empty): BaseRelation
Returns a BaseRelation that contains all of the data present in the table.
Returns a BaseRelation that contains all of the data present in the table. This relation will be continually updated as files are added or removed from the table. However, new BaseRelation must be requested in order to see changes to the schema.
- def createSnapshot(initSegment: LogSegment, minFileRetentionTimestamp: Long, checkpointMetadataOptHint: Option[CheckpointMetaData]): Snapshot
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def createSnapshotFromGivenOrEquivalentLogSegment(initSegment: LogSegment)(snapshotCreator: (LogSegment) => Snapshot): Snapshot
Create a Snapshot from the given LogSegment.
Create a Snapshot from the given LogSegment. If failing to create the snapshot, we will search an equivalent LogSegment using a different checkpoint and retry up to DeltaSQLConf.DELTA_SNAPSHOT_LOADING_MAX_RETRIES times.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- val currentSnapshot: CapturedSnapshot
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- Annotations
- @volatile()
- val dataPath: Path
- Definition Classes
- DeltaLog → Checkpoints
- val defaultLogStoreClass: String
- Definition Classes
- LogStoreProvider
- val deltaLogLock: ReentrantLock
Use ReentrantLock to allow us to call
lockInterruptiblyUse ReentrantLock to allow us to call
lockInterruptibly- Attributes
- protected
- def deltaRetentionMillis: Long
Returns the duration in millis for how long to keep around obsolete logs.
Returns the duration in millis for how long to keep around obsolete logs. We may keep logs beyond this duration until the next calendar day to avoid constantly creating checkpoints.
- Definition Classes
- MetadataCleanup
- def doLogCleanup(): Unit
- Definition Classes
- MetadataCleanup
- def enableExpiredLogCleanup: Boolean
Whether to clean up expired log files and checkpoints.
Whether to clean up expired log files and checkpoints.
- Definition Classes
- MetadataCleanup
- def ensureLogDirectoryExist(): Unit
Creates the log directory if it does not exist.
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def fileFormat(metadata: Metadata = metadata): FileFormat
Build the underlying Spark
FileFormatof the Delta table with specified metadata.Build the underlying Spark
FileFormatof the Delta table with specified metadata.With column mapping, some properties of the underlying file format might change during transaction, so if possible, we should always pass in the latest transaction's metadata instead of one from a past snapshot.
- Definition Classes
- DeltaFileFormat
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def findLastCompleteCheckpoint(cv: CheckpointInstance): Option[CheckpointInstance]
Finds the first verified, complete checkpoint before the given version.
Finds the first verified, complete checkpoint before the given version.
- cv
The CheckpointVersion to compare against
- Attributes
- protected
- Definition Classes
- Checkpoints
- def getChangeLogFiles(startVersion: Long, failOnDataLoss: Boolean = false): Iterator[(Long, FileStatus)]
Get access to all actions starting from "startVersion" (inclusive) via FileStatus.
Get access to all actions starting from "startVersion" (inclusive) via FileStatus. If
startVersiondoesn't exist, return an empty Iterator. - def getChanges(startVersion: Long, failOnDataLoss: Boolean = false): Iterator[(Long, Seq[Action])]
Get all actions starting from "startVersion" (inclusive).
Get all actions starting from "startVersion" (inclusive). If
startVersiondoesn't exist, return an empty Iterator. - def getCheckpointMetadataForSegment(segment: LogSegment, checkpointMetadataOptHint: Option[CheckpointMetaData]): Option[CheckpointMetaData]
Returns the CheckpointMetaData for the given LogSegment.
Returns the CheckpointMetaData for the given LogSegment. If the passed
checkpointMetadataOptHintmatches thesegment, then it is returned directly.- Attributes
- protected
- Definition Classes
- SnapshotManagement
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getLatestCompleteCheckpointFromList(instances: Array[CheckpointInstance], notLaterThan: CheckpointInstance): Option[CheckpointInstance]
Given a list of checkpoint files, pick the latest complete checkpoint instance which is not later than
notLaterThan.Given a list of checkpoint files, pick the latest complete checkpoint instance which is not later than
notLaterThan.- Attributes
- protected
- Definition Classes
- Checkpoints
- def getLogSegmentForVersion(startCheckpoint: Option[Long], versionToLoad: Option[Long], files: Option[Array[FileStatus]]): Option[LogSegment]
Helper function for the getLogSegmentForVersion above.
Helper function for the getLogSegmentForVersion above. Called with a provided files list, and will then try to construct a new LogSegment using that.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def getLogSegmentForVersion(startCheckpoint: Option[Long], versionToLoad: Option[Long] = None): Option[LogSegment]
Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table.Get a list of files that can be used to compute a Snapshot at version
versionToLoad, IfversionToLoadis not provided, will generate the list of files that are needed to load the latest version of the Delta table. This method also performs checks to ensure that the delta files are contiguous.- startCheckpoint
A potential start version to perform the listing of the DeltaLog, typically that of a known checkpoint. If this version's not provided, we will start listing from version 0.
- versionToLoad
A specific version to load. Typically used with time travel and the Delta streaming source. If not provided, we will try to load the latest version of the table.
- returns
Some LogSegment to build a Snapshot if files do exist after the given startCheckpoint. None, if the directory was missing or empty.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def getLogSegmentFrom(startingCheckpoint: Option[CheckpointMetaData]): Option[LogSegment]
Get the LogSegment that will help in computing the Snapshot of the table at DeltaLog initialization, or None if the directory was empty/missing.
Get the LogSegment that will help in computing the Snapshot of the table at DeltaLog initialization, or None if the directory was empty/missing.
- startingCheckpoint
A checkpoint that we can start our listing from
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def getLogStoreConfValue(key: String, sparkConf: SparkConf): Option[String]
We accept keys both with and without the
spark.prefix to maintain compatibility across the Delta ecosystemWe accept keys both with and without the
spark.prefix to maintain compatibility across the Delta ecosystem- key
the spark-prefixed key to access
- Definition Classes
- LogStoreProvider
- def getSnapshotAt(version: Long, commitTimestamp: Option[Long] = None, lastCheckpointHint: Option[CheckpointInstance] = None): Snapshot
Get the snapshot at
version.Get the snapshot at
version.- Definition Classes
- SnapshotManagement
- def getSnapshotAtInit(lastCheckpointOpt: Option[CheckpointMetaData]): CapturedSnapshot
Load the Snapshot for this Delta table at initialization.
Load the Snapshot for this Delta table at initialization. This method uses the
lastCheckpointfile as a hint on where to start listing the transaction log directory. If the _delta_log directory doesn't exist, this method will return anInitialSnapshot.- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- lazy val history: DeltaHistoryManager
Delta History Manager containing version and commit history.
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def installSnapshotInternal(previousSnapshot: Snapshot, segmentOpt: Option[LogSegment], updateTimestamp: Long, isAsync: Boolean): Snapshot
Install the provided segmentOpt as the currentSnapshot on the cluster
Install the provided segmentOpt as the currentSnapshot on the cluster
- Definition Classes
- SnapshotManagement
- def isDeltaCommitOrCheckpointFile(path: Path): Boolean
Returns true if the path is delta log files.
Returns true if the path is delta log files. Delta log files can be delta commit file (e.g., 000000000.json), or checkpoint file. (e.g., 000000001.checkpoint.00001.00003.parquet)
- path
Path of a file
- returns
Boolean Whether the file is delta log files
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isSameLogAs(otherLog: DeltaLog): Boolean
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def listFrom(startVersion: Long): Iterator[FileStatus]
Get an iterator of files in the _delta_log directory starting with the startVersion.
Get an iterator of files in the _delta_log directory starting with the startVersion.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def listFromOrNone(startVersion: Long): Option[Iterator[FileStatus]]
Returns an iterator containing a list of files found from the provided path
Returns an iterator containing a list of files found from the provided path
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def lockInterruptibly[T](body: => T): T
Run
bodyinsidedeltaLogLocklock usinglockInterruptiblyso that the thread can be interrupted when waiting for the lock. - def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- val logPath: Path
- Definition Classes
- DeltaLog → ReadChecksum → Checkpoints
- val logStoreClassConfKey: String
- Definition Classes
- LogStoreProvider
- def logStoreSchemeConfKey(scheme: String): String
- Definition Classes
- LogStoreProvider
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def manuallyLoadCheckpoint(cv: CheckpointInstance): CheckpointMetaData
Loads the given checkpoint manually to come up with the CheckpointMetaData
Loads the given checkpoint manually to come up with the CheckpointMetaData
- Attributes
- protected
- Definition Classes
- Checkpoints
- def maxSnapshotLineageLength: Int
The max lineage length of a Snapshot before Delta forces to build a Snapshot from scratch.
The max lineage length of a Snapshot before Delta forces to build a Snapshot from scratch. Delta will build a Snapshot on top of the previous one if it doesn't see a checkpoint. However, there is a race condition that when two writers are writing at the same time, a writer may fail to pick up checkpoints written by another one, and the lineage will grow and finally cause StackOverflowError. Hence we have to force to build a Snapshot from scratch when the lineage length is too large to avoid hitting StackOverflowError.
- def metadata: Metadata
Return the current metadata for preparing this file format
Return the current metadata for preparing this file format
- Attributes
- protected
- Definition Classes
- DeltaLog → DeltaFileFormat → Checkpoints
- def minFileRetentionTimestamp: Long
Tombstones before this timestamp will be dropped from the state and the files can be garbage collected.
- def minSetTransactionRetentionTimestamp: Option[Long]
SetTransactions before this timestamp will be considered expired and dropped from the state, but no files will be deleted.
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def newDeltaHadoopConf(): Configuration
Returns the Hadoop Configuration object which can be used to access the file system.
Returns the Hadoop Configuration object which can be used to access the file system. All Delta code should use this method to create the Hadoop Configuration object, so that the hadoop file system configurations specified in DataFrame options will come into effect.
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val options: Map[String, String]
- def protocolRead(protocol: Protocol): Unit
Asserts that the client is up to date with the protocol and allowed to read the table that is using the given
protocol. - def protocolWrite(protocol: Protocol, logUpgradeMessage: Boolean = true): Unit
Asserts that the client is up to date with the protocol and allowed to write to the table that is using the given
protocol. - def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = null, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def replaceSnapshot(newSnapshot: Snapshot, updateTimestamp: Long): Unit
Replace the given snapshot with the provided one.
Replace the given snapshot with the provided one.
- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def snapshot: Snapshot
Returns the current snapshot.
Returns the current snapshot. Note this does not automatically
update().- Definition Classes
- SnapshotManagement
- def spark: SparkSession
Return the current Spark session used.
Return the current Spark session used.
- Attributes
- protected
- Definition Classes
- DeltaLog → DeltaFileFormat
- def startTransaction(): OptimisticTransaction
Returns a new OptimisticTransaction that can be used to read the current state of the log and then commit updates.
Returns a new OptimisticTransaction that can be used to read the current state of the log and then commit updates. The reads and updates will be checked for logical conflicts with any concurrent writes to the log.
Note that all reads in a transaction must go through the returned transaction object, and not directly to the DeltaLog otherwise they will not be checked for conflicts.
- lazy val store: LogStore
Used to read and write physical log files and checkpoints.
Used to read and write physical log files and checkpoints.
- Definition Classes
- DeltaLog → ReadChecksum → Checkpoints
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def tableExists: Boolean
Whether a Delta table exists at this directory.
- def tableId: String
The unique identifier for this table.
- def toString(): String
- Definition Classes
- AnyRef → Any
- def update(stalenessAcceptable: Boolean = false, checkIfUpdatedSinceTs: Option[Long] = None): Snapshot
Update ActionLog by applying the new delta files if any.
Update ActionLog by applying the new delta files if any.
- stalenessAcceptable
Whether we can accept working with a stale version of the table. If the table has surpassed our staleness tolerance, we will update to the latest state of the table synchronously. If staleness is acceptable, and the table hasn't passed the staleness tolerance, we will kick off a job in the background to update the table state, and can return a stale snapshot in the meantime.
- checkIfUpdatedSinceTs
Skip the update if we've already updated the snapshot since the specified timestamp.
- Definition Classes
- SnapshotManagement
- def updateInternal(isAsync: Boolean): Snapshot
Queries the store for new delta files and applies them to the current state.
Queries the store for new delta files and applies them to the current state. Note: the caller should hold
deltaLogLockbefore calling this method.- Attributes
- protected
- Definition Classes
- SnapshotManagement
- def upgradeProtocol(newVersion: Protocol = Protocol()): Unit
Upgrade the table's protocol version, by default to the maximum recognized reader and writer versions in this DBR release.
- def verifyLogStoreConfs(sparkConf: SparkConf): Unit
Check for conflicting LogStore configs in the spark configuration.
Check for conflicting LogStore configs in the spark configuration.
To maintain compatibility across the Delta ecosystem, we accept keys both with and without the "spark." prefix. This means for setting the class conf, we accept both "spark.delta.logStore.class" and "delta.logStore.class" and for scheme confs we accept both "spark.delta.logStore.${scheme}.impl" and "delta.logStore.${scheme}.impl"
If a conf is set both with and without the spark prefix, it must be set to the same value, otherwise we throw an error.
- Definition Classes
- LogStoreProvider
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withDmqTag[T](thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def withNewTransaction[T](thunk: (OptimisticTransaction) => T): T
Execute a piece of code within a new OptimisticTransaction.
Execute a piece of code within a new OptimisticTransaction. Reads/write sets will be recorded for this table, and all other tables will be read at a snapshot that is pinned on the first access.
- Note
This uses thread-local variable to make the active transaction visible. So do not use multi-threaded code in the provided thunk.
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter
- def writeCheckpointFiles(snapshotToCheckpoint: Snapshot): CheckpointMetaData
- Attributes
- protected
- Definition Classes
- Checkpoints