package delta
- Alphabetic
- Public
- Protected
Package Members
Type Members
- case class CapturedSnapshot(snapshot: Snapshot, updateTimestamp: Long) extends Product with Serializable
Wraps the most recently updated snapshot along with the timestamp the update was started.
Wraps the most recently updated snapshot along with the timestamp the update was started. Defined outside the class since it's used in tests.
- case class CheckpointInstance(version: Long, numParts: Option[Int]) extends Ordered[CheckpointInstance] with Product with Serializable
A class to help with comparing checkpoints with each other, where we may have had concurrent writers that checkpoint with different number of parts.
- case class CheckpointMetaData(version: Long, size: Long, parts: Option[Int], sizeInBytes: Option[Long], numOfAddFiles: Option[Long], checkpointSchema: Option[StructType], checksum: Option[String] = None) extends Product with Serializable
Records information about a checkpoint.
Records information about a checkpoint.
This class provides the checksum validation logic, needed to ensure that content of LAST_CHECKPOINT file points to a valid json. The readers might read some part from old file and some part from the new file (if the file is read across multiple requests). In some rare scenarios, the split read might produce a valid json and readers will be able to parse it and convert it into a CheckpointMetaData object that contains invalid data. In order to prevent using it, we do a checksum match on the read json to validate that it is consistent.
For old Delta versions, which do not have checksum logic, we want to make sure that the old fields (i.e. version, size, parts) are together in the beginning of last_checkpoint json. All these fields together are less than 50 bytes, so even in split read scenario, we want to make sure that old delta readers which do not do have checksum validation logic, gets all 3 fields from one read request. For this reason, we use
JsonPropertyOrderto force them in the beginning together.- version
the version of this checkpoint
- size
the number of actions in the checkpoint, -1 if the information is unavailable.
- parts
the number of parts when the checkpoint has multiple parts. None if this is a singular checkpoint
- sizeInBytes
the number of bytes of the checkpoint
- numOfAddFiles
the number of AddFile actions in the checkpoint
- checkpointSchema
the schema of the underlying checkpoint files
- checksum
the checksum of the CheckpointMetaData.
- Annotations
- @JsonPropertyOrder()
- trait Checkpoints extends DeltaLogging
- case class ColumnMappingException(msg: String, mode: DeltaColumnMappingMode) extends AnalysisException with Product with Serializable
- class ColumnMappingUnsupportedException extends UnsupportedOperationException
Errors thrown around column mapping.
- case class CommitStats(startVersion: Long, commitVersion: Long, readVersion: Long, txnDurationMs: Long, commitDurationMs: Long, fsWriteDurationMs: Long, stateReconstructionDurationMs: Long, numAdd: Int, numRemove: Int, bytesNew: Long, numFilesTotal: Long, sizeInBytesTotal: Long, numCdcFiles: Long, cdcBytesNew: Long, protocol: Protocol, commitSizeBytes: Long, checkpointSizeBytes: Long, totalCommitsSizeSinceLastCheckpoint: Long, checkpointAttempt: Boolean, info: CommitInfo, newMetadata: Option[Metadata], numAbsolutePathsInAdd: Int, numDistinctPartitionsInAdd: Int, numPartitionColumnsInTable: Int, isolationLevel: String, fileSizeHistogram: Option[FileSizeHistogram] = None, addFilesHistogram: Option[FileSizeHistogram] = None, removeFilesHistogram: Option[FileSizeHistogram] = None, txnId: Option[String] = None) extends Product with Serializable
Record metrics about a successful commit.
- class ConcurrentAppendException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentAppendException instead.
- class ConcurrentDeleteDeleteException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentDeleteDeleteException instead.
- class ConcurrentDeleteReadException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentDeleteReadException instead.
- class ConcurrentTransactionException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentTransactionException instead.
- class ConcurrentWriteException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ConcurrentWriteException instead.
- case class DateFormatPartitionExpr(partitionColumn: String, format: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation expression
DATE_FORMAT(col, format), such as: DATE_FORMAT(timestamp, 'yyyy-MM'), DATE_FORMAT(timestamp, 'yyyy-MM-dd-HH')The rules for the generation expression
DATE_FORMAT(col, format), such as: DATE_FORMAT(timestamp, 'yyyy-MM'), DATE_FORMAT(timestamp, 'yyyy-MM-dd-HH')- partitionColumn
the partition column name using DATE_FORMAT in its generation expression.
- format
the
formatparameter of DATE_FORMAT in the generation expression. unix_timestamp('12345-12', 'yyyy-MM') | unix_timestamp('+12345-12', 'yyyy-MM') EXCEPTION fail | 327432240000 CORRECTED null | 327432240000 LEGACY 327432240000 | null
- case class DatePartitionExpr(partitionColumn: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation expression
CAST(col AS DATE). - case class DayPartitionExpr(dayPart: String) extends OptimizablePartitionExpression with Product with Serializable
This is a placeholder to catch
day(col)so that we can merge YearPartitionExpr, MonthPartitionExpr and DayPartitionExpr to YearMonthDayPartitionExpr.This is a placeholder to catch
day(col)so that we can merge YearPartitionExpr, MonthPartitionExpr and DayPartitionExpr to YearMonthDayPartitionExpr.- dayPart
the day partition column name.
- class DeltaAnalysis extends Rule[LogicalPlan] with AnalysisHelper with DeltaLogging
Analysis rules for Delta.
Analysis rules for Delta. Currently, these rules enable schema enforcement / evolution with INSERT INTO.
- class DeltaAnalysisException extends AnalysisException with DeltaThrowable
- trait DeltaColumnMappingBase extends DeltaLogging
- sealed trait DeltaColumnMappingMode extends AnyRef
A trait for Delta column mapping modes.
- class DeltaColumnMappingUnsupportedException extends ColumnMappingUnsupportedException with DeltaThrowable
- abstract class DeltaConcurrentModificationException extends ConcurrentModificationException
The basic class for all Tahoe commit conflict exceptions.
- case class DeltaConfig[T](key: String, defaultValue: String, fromString: (String) => T, validationFunction: (T) => Boolean, helpMessage: String, minimumProtocolVersion: Option[Protocol] = None, editable: Boolean = true, alternateKeys: Seq[String] = Seq.empty) extends Product with Serializable
- trait DeltaConfigsBase extends DeltaLogging
Contains list of reservoir configs and validation checks.
- case class DeltaDynamicPartitionOverwriteCommand(table: NamedRelation, deltaTable: DeltaTableV2, query: LogicalPlan, writeOptions: Map[String, String], isByName: Boolean) extends LogicalPlan with RunnableCommand with V2WriteCommand with Product with Serializable
A
RunnableCommandthat will execute dynamic partition overwrite using WriteIntoDelta.A
RunnableCommandthat will execute dynamic partition overwrite using WriteIntoDelta.This is a workaround of Spark not supporting V1 fallback for dynamic partition overwrite. Note the following details: - Extends
V2WriteCommmandso that Spark can transform this plan in the same as other commands likeAppendData. - Exposes the query as a child so that the Spark optimizer can optimize it. - trait DeltaErrorsBase extends DocsPath with DeltaLogging
A holder object for Delta errors.
A holder object for Delta errors.
IMPORTANT: Any time you add a test that references the docs, add to the Seq defined in DeltaErrorsSuite so that the doc links that are generated can be verified to work in docs.delta.io
- class DeltaFileAlreadyExistsException extends FileAlreadyExistsException with DeltaThrowable
- trait DeltaFileFormat extends AnyRef
- class DeltaFileNotFoundException extends FileNotFoundException with DeltaThrowable
- case class DeltaHistory(version: Option[Long], timestamp: Timestamp, userId: Option[String], userName: Option[String], operation: String, operationParameters: Map[String, String], job: Option[JobInfo], notebook: Option[NotebookInfo], clusterId: Option[String], readVersion: Option[Long], isolationLevel: Option[String], isBlindAppend: Option[Boolean], operationMetrics: Option[Map[String, String]], userMetadata: Option[String], engineInfo: Option[String]) extends CommitMarker with Product with Serializable
class describing the output schema of org.apache.spark.sql.delta.commands.DescribeDeltaHistoryCommand
- class DeltaHistoryManager extends DeltaLogging
This class keeps tracks of the version of commits and their timestamps for a Delta table to help with operations like describing the history of a table.
- class DeltaIOException extends IOException with DeltaThrowable
- class DeltaIllegalArgumentException extends IllegalArgumentException with DeltaThrowable
- class DeltaIllegalStateException extends IllegalStateException with DeltaThrowable
- class DeltaIndexOutOfBoundsException extends IndexOutOfBoundsException with DeltaThrowable
- class DeltaLog extends Checkpoints with MetadataCleanup with LogStoreProvider with SnapshotManagement with DeltaFileFormat with ReadChecksum
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Used to query the current state of the log as well as modify it by adding new atomic collections of actions.
Internally, this class implements an optimistic concurrency control algorithm to handle multiple readers or writers. Any single read is guaranteed to see a consistent snapshot of the table.
- case class DeltaLogFileIndex extends FileIndex with Logging with Product with Serializable
A specialized file index for files found in the _delta_log directory.
A specialized file index for files found in the _delta_log directory. By using this file index, we avoid any additional file listing, partitioning inference, and file existence checks when computing the state of a Delta table.
- trait DeltaOptionParser extends AnyRef
- class DeltaOptions extends DeltaWriteOptions with DeltaReadOptions with Serializable
Options for the Delta data source.
- class DeltaParquetFileFormat extends ParquetFileFormat
A thin wrapper over the Parquet file format to support columns names without restrictions.
- trait DeltaReadOptions extends DeltaOptionParser
- class DeltaRuntimeException extends RuntimeException with DeltaThrowable
- class DeltaSparkException extends SparkException with DeltaThrowable
- sealed trait DeltaStartingVersion extends AnyRef
Definitions for the starting version of a Delta stream.
- case class DeltaTableIdentifier(path: Option[String] = None, table: Option[TableIdentifier] = None) extends Product with Serializable
An identifier for a Delta table containing one of the path or the table identifier.
- trait DeltaThrowable extends SparkThrowable
The trait for all exceptions of Delta code path.
- case class DeltaTimeTravelSpec(timestamp: Option[Expression], version: Option[Long], creationSource: Option[String]) extends DeltaLogging with Product with Serializable
The specification to time travel a Delta Table to the given
timestamporversion.The specification to time travel a Delta Table to the given
timestamporversion.- timestamp
An expression that can be evaluated into a timestamp. The expression cannot be a subquery.
- version
The version of the table to time travel to. Must be >= 0.
- creationSource
The API used to perform time travel, e.g.
atSyntax,dfReaderor SQL
- class DeltaUnsupportedOperationException extends UnsupportedOperationException with DeltaThrowable
- case class DeltaUnsupportedOperationsCheck(spark: SparkSession) extends (LogicalPlan) => Unit with DeltaLogging with Product with Serializable
A rule to add helpful error messages when Delta is being used with unsupported Hive operations or if an unsupported operation is being made, e.g.
A rule to add helpful error messages when Delta is being used with unsupported Hive operations or if an unsupported operation is being made, e.g. a DML operation like INSERT/UPDATE/DELETE/MERGE when a table doesn't exist.
- trait DeltaWriteOptions extends DeltaWriteOptionsImpl with DeltaOptionParser
- trait DeltaWriteOptionsImpl extends DeltaOptionParser
- trait DocsPath extends AnyRef
- case class HourPartitionExpr(hourPart: String) extends OptimizablePartitionExpression with Product with Serializable
This is a placeholder to catch
hour(col)so that we can merge YearPartitionExpr, MonthPartitionExpr, DayPartitionExpr and HourPartitionExpr to YearMonthDayHourPartitionExpr. - case class IdentityPartitionExpr(partitionColumn: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation of identity expressions, used for partitioning on a nested column.
The rules for the generation of identity expressions, used for partitioning on a nested column. Note: - Writing an empty string to a partition column would become
null(SPARK-24438) so generated partition filters always pick up thenullpartition for safety.- partitionColumn
the partition column name used in the generation expression.
- class InitialSnapshot extends Snapshot
An initial snapshot with only metadata specified.
An initial snapshot with only metadata specified. Useful for creating a DataFrame from an existing parquet table during its conversion to delta.
- sealed trait IsolationLevel extends AnyRef
Trait that defines the level consistency guarantee is going to be provided by
OptimisticTransaction.commit().Trait that defines the level consistency guarantee is going to be provided by
OptimisticTransaction.commit(). Serializable is the most strict level and SnapshotIsolation is the least strict one.- See also
IsolationLevel.allLevelsInDescOrder for all the levels in the descending order of strictness and IsolationLevel.DEFAULT for the default table isolation level.
- case class LogSegment(logPath: Path, version: Long, deltas: Seq[FileStatus], checkpoint: Seq[FileStatus], checkpointVersionOpt: Option[Long], lastCommitTimestamp: Long) extends Product with Serializable
Provides information around which files in the transaction log need to be read to create the given version of the log.
Provides information around which files in the transaction log need to be read to create the given version of the log.
- logPath
The path to the _delta_log directory
- version
The Snapshot version to generate
- deltas
The delta commit files (.json) to read
- checkpoint
The checkpoint file to read
- checkpointVersionOpt
The checkpoint version used to start replay
- lastCommitTimestamp
The "unadjusted" timestamp of the last commit within this segment. By unadjusted, we mean that the commit timestamps may not necessarily be monotonically increasing for the commits within this segment.
- class MetadataChangedException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.MetadataChangedException instead.
- trait MetadataCleanup extends DeltaLogging
Cleans up expired Delta table metadata.
- class MetadataMismatchErrorBuilder extends AnyRef
A helper class in building a helpful error message in case of metadata mismatches.
- case class MonthPartitionExpr(monthPart: String) extends OptimizablePartitionExpression with Product with Serializable
This is a placeholder to catch
month(col)so that we can merge YearPartitionExpr and MonthPartitionExprto YearMonthDayPartitionExpr.This is a placeholder to catch
month(col)so that we can merge YearPartitionExpr and MonthPartitionExprto YearMonthDayPartitionExpr.- monthPart
the month partition column name.
- class OptimisticTransaction extends OptimisticTransactionImpl with DeltaLogging
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log. All reads from the DeltaLog, MUST go through this instance rather than directly to the DeltaLog otherwise they will not be check for logical conflicts with concurrent updates.
This class is not thread-safe.
- trait OptimisticTransactionImpl extends TransactionalWrite with SQLMetricsReporting with DeltaScanGenerator with DeltaLogging
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log.
Used to perform a set of reads in a transaction and then commit a set of updates to the state of the log. All reads from the DeltaLog, MUST go through this instance rather than directly to the DeltaLog otherwise they will not be check for logical conflicts with concurrent updates.
This trait is not thread-safe.
- sealed trait OptimizablePartitionExpression extends AnyRef
Defines rules to convert a data filter to a partition filter for a special generation expression of a partition column.
Defines rules to convert a data filter to a partition filter for a special generation expression of a partition column.
Note: - This may be shared cross multiple
SparkSessions, implementations should not store any state (such as expressions) referring to a specificSparkSession. - Partition columns may have different behaviors than data columns. For example, writing an empty string to a partition column would becomenull(SPARK-24438). We need to pay attention to these slight behavior differences and make sure applying the auto generated partition filters would still return the same result as if they were not applied. - case class PreprocessTableDelete(sqlConf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable
Preprocess the DeltaDelete plan to convert to DeleteCommand.
- case class PreprocessTableMerge(conf: SQLConf) extends Rule[LogicalPlan] with UpdateExpressionsSupport with Product with Serializable
- case class PreprocessTableRestore(sparkSession: SparkSession) extends Rule[LogicalPlan] with Product with Serializable
Preprocesses the RestoreTableStatement logical plan before converting it to RestoreTableCommand.
Preprocesses the RestoreTableStatement logical plan before converting it to RestoreTableCommand. - Resolves the UnresolvedRelation in RestoreTableStatement's child TimeTravel. Currently Delta depends on Spark 3.2 which does not resolve the UnresolvedRelation in TimeTravel. Once Delta upgrades to Spark 3.3, this code can be removed.
- case class PreprocessTableUpdate(sqlConf: SQLConf) extends Rule[LogicalPlan] with UpdateExpressionsSupport with Product with Serializable
Preprocesses the DeltaUpdateTable logical plan before converting it to UpdateCommand.
Preprocesses the DeltaUpdateTable logical plan before converting it to UpdateCommand. - Adjusts the column order, which could be out of order, based on the destination table - Generates expressions to compute the value of all target columns in Delta table, while taking into account that the specified SET clause may only update some columns or nested fields of columns.
- class ProtocolChangedException extends io.delta.exceptions.DeltaConcurrentModificationException
This class is kept for backward compatibility.
This class is kept for backward compatibility. Use io.delta.exceptions.ProtocolChangedException instead.
- trait ReadChecksum extends DeltaLogging
Read checksum files.
- trait RecordChecksum extends DeltaLogging
Record the state of the table as a checksum file along with a commit.
- class Snapshot extends StateCache with StatisticsCollection with DataSkippingReader with DeltaLogging
An immutable snapshot of the state of the log at some delta version.
An immutable snapshot of the state of the log at some delta version. Internally this class manages the replay of actions stored in checkpoint or delta files.
After resolving any new actions, it caches the result and collects the following basic information to the driver:
- Protocol Version
- Metadata
- Transaction state
- trait SnapshotManagement extends AnyRef
Manages the creation, computation, and access of Snapshot's for Delta tables.
Manages the creation, computation, and access of Snapshot's for Delta tables. Responsibilities include:
- Figuring out the set of files that are required to compute a specific version of a table
- Updating and exposing the latest snapshot of the Delta table in a thread-safe manner
- case class StartingVersion(version: Long) extends DeltaStartingVersion with Product with Serializable
- case class SubstringPartitionExpr(partitionColumn: String, substringPos: Int, substringLen: Int) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation expression
SUBSTRING(col, pos, len).The rules for the generation expression
SUBSTRING(col, pos, len). Note: - Writing an empty string to a partition column would becomenull(SPARK-24438) so generated partition filters always pick up thenullpartition for safety. - Whenposis 0, we also support optimizations for comparison operators. Whenposis not 0, we only support optimizations for EqualTo.- partitionColumn
the partition column name using SUBSTRING in its generation expression.
- substringPos
the
posparameter of SUBSTRING in the generation expression.- substringLen
the
lenparameter of SUBSTRING in the generation expression.
- trait UpdateExpressionsSupport extends CastSupport with SQLConfHelper with AnalysisHelper
Trait with helper functions to generate expressions to update target columns, even if they are nested fields.
- trait ValidateChecksum extends DeltaLogging
Verify the state of the table using the checksum information.
- case class VersionChecksum(tableSizeBytes: Long, numFiles: Long, numMetadata: Long, numProtocol: Long, protocol: Protocol, metadata: Metadata, histogramOpt: Option[FileSizeHistogram], txnId: Option[String]) extends Product with Serializable
Stats calculated within a snapshot, which we store along individual transactions for verification.
Stats calculated within a snapshot, which we store along individual transactions for verification.
- tableSizeBytes
The size of the table in bytes
- numFiles
Number of
AddFileactions in the snapshot- numMetadata
Number of
Metadataactions in the snapshot- numProtocol
Number of
Protocolactions in the snapshot- histogramOpt
Optional file size histogram
- txnId
Optional transaction identifier
- case class VersionNotFoundException(userVersion: Long, earliest: Long, latest: Long) extends AnalysisException with Product with Serializable
Thrown when time travelling to a version that does not exist in the Delta Log.
Thrown when time travelling to a version that does not exist in the Delta Log.
- userVersion
- the version time travelling to
- earliest
- earliest version available in the Delta Log
- latest
- The latest version available in the Delta Log
- case class YearMonthDayHourPartitionExpr(yearPart: String, monthPart: String, dayPart: String, hourPart: String) extends OptimizablePartitionExpression with Product with Serializable
Optimize the case that four partition columns uses YEAR, MONTH, DAY and HOUR using the same column, such as
YEAR(eventTime),MONTH(eventTime),DAY(eventTime),HOUR(eventTime).Optimize the case that four partition columns uses YEAR, MONTH, DAY and HOUR using the same column, such as
YEAR(eventTime),MONTH(eventTime),DAY(eventTime),HOUR(eventTime).- yearPart
the year partition column name
- monthPart
the month partition column name
- dayPart
the day partition column name
- hourPart
the hour partition column name
- case class YearMonthDayPartitionExpr(yearPart: String, monthPart: String, dayPart: String) extends OptimizablePartitionExpression with Product with Serializable
Optimize the case that three partition columns uses YEAR, MONTH and DAY using the same column, such as
YEAR(eventTime),MONTH(eventTime)andDAY(eventTime).Optimize the case that three partition columns uses YEAR, MONTH and DAY using the same column, such as
YEAR(eventTime),MONTH(eventTime)andDAY(eventTime).- yearPart
the year partition column name
- monthPart
the month partition column name
- dayPart
the day partition column name
- case class YearMonthPartitionExpr(yearPart: String, monthPart: String) extends OptimizablePartitionExpression with Product with Serializable
Optimize the case that two partition columns uses YEAR and MONTH using the same column, such as
YEAR(eventTime)andMONTH(eventTime).Optimize the case that two partition columns uses YEAR and MONTH using the same column, such as
YEAR(eventTime)andMONTH(eventTime).- yearPart
the year partition column name
- monthPart
the month partition column name
- case class YearPartitionExpr(yearPart: String) extends OptimizablePartitionExpression with Product with Serializable
The rules for the generation expression
YEAR(col).The rules for the generation expression
YEAR(col).- yearPart
the year partition column name.
Value Members
- object AppendDelta
- object CheckpointInstance extends Serializable
- object CheckpointMetaData extends Serializable
- object CheckpointV2
Utility methods for generating and using V2 checkpoints.
Utility methods for generating and using V2 checkpoints. V2 checkpoints have partition values and statistics as struct fields of the
addcolumn. - object Checkpoints extends DeltaLogging
- object ColumnWithDefaultExprUtils extends DeltaLogging
Provide utilities to handle columns with default expressions.
- object DeltaAnalysisException extends Serializable
- object DeltaColumnMapping extends DeltaColumnMappingBase
- object DeltaColumnMappingMode
- object DeltaConfigs extends DeltaConfigsBase
- object DeltaErrors extends DeltaErrorsBase
- object DeltaFullTable
Extractor Object for pulling out the full table scan of a Delta table.
- object DeltaHistory extends Serializable
- object DeltaHistoryManager extends DeltaLogging
Contains many utility methods that can also be executed on Spark executors.
- object DeltaLog extends DeltaLogging
- object DeltaLogFileIndex extends Serializable
- object DeltaOperations
Exhaustive list of operations that can be performed on a Delta table.
Exhaustive list of operations that can be performed on a Delta table. These operations are tracked as the first line in delta logs, and power
DESCRIBE HISTORYfor Delta tables. - object DeltaOptions extends DeltaLogging with Serializable
- object DeltaRelation extends DeltaLogging
Matchers for dealing with a Delta table.
- object DeltaTable
Extractor Object for pulling out the table scan of a Delta table.
Extractor Object for pulling out the table scan of a Delta table. It could be a full scan or a partial scan.
- object DeltaTableIdentifier extends Logging with Serializable
Utilities for DeltaTableIdentifier.
Utilities for DeltaTableIdentifier. TODO(burak): Get rid of these utilities. DeltaCatalog should be the skinny-waist for figuring these things out.
- object DeltaTableUtils extends PredicateHelper with DeltaLogging
- object DeltaThrowableHelper
The helper object for Delta code base to pick error class template and compile the exception message.
- object DeltaTimeTravelSpec extends Serializable
- object DeltaUDF
- object DeltaViewHelper
- object DynamicPartitionOverwriteDelta
- object ExtractBaseColumn
Finds the full dot-separated path to a field and the data type of the field.
Finds the full dot-separated path to a field and the data type of the field. This unifies handling of nested and non-nested fields, and allows pattern matching on the data type.
- object GeneratedColumn extends DeltaLogging with AnalysisHelper
Provide utility methods to implement Generated Columns for Delta.
Provide utility methods to implement Generated Columns for Delta. Users can use the following SQL syntax to create a table with generated columns.
CREATE TABLE table_identifier( column_name column_type, column_name column_type GENERATED ALWAYS AS ( generation_expr ), ... ) USING delta [ PARTITIONED BY (partition_column_name, ...) ]This is an example:
CREATE TABLE foo( id bigint, type string, subType string GENERATED ALWAYS AS ( SUBSTRING(type FROM 0 FOR 4) ), data string, eventTime timestamp, day date GENERATED ALWAYS AS ( days(eventTime) ) USING delta PARTITIONED BY (type, day)When writing to a table, for these generated columns: - If the output is missing a generated column, we will add an expression to generate it. - If a generated column exists in the output, in other words, we will add a constraint to ensure the given value doesn't violate the generation expression.
- case object IdMapping extends DeltaColumnMappingMode with Product with Serializable
Id Mapping uses column ID as the true identifier of a column.
Id Mapping uses column ID as the true identifier of a column. Column IDs are stored as StructField metadata in the schema and will be used when reading and writing Parquet files. The Parquet files in this mode will also have corresponding field Ids for each column in their file schema.
This mode is used for tables converted from Iceberg.
- object IsolationLevel
- object LogSegment extends Serializable
- case object NameMapping extends DeltaColumnMappingMode with Product with Serializable
Name Mapping uses the physical column name as the true identifier of a column.
Name Mapping uses the physical column name as the true identifier of a column. The physical name is stored as part of StructField metadata in the schema and will be used when reading and writing Parquet files. Even if id mapping can be used for reading the physical files, name mapping is used for reading statistics and partition values in the DeltaLog.
- case object NoMapping extends DeltaColumnMappingMode with Product with Serializable
No mapping mode uses a column's display name as its true identifier to read and write data.
No mapping mode uses a column's display name as its true identifier to read and write data.
This is the default mode and is the same mode as Delta always has been.
- object NodeWithOnlyDeterministicProjectAndFilter
- object OptimisticTransaction
- object OptimizablePartitionExpression
- object OverwriteDelta
- case object Serializable extends IsolationLevel with Product with Serializable
This isolation level will ensure serializability between all read and write operations.
This isolation level will ensure serializability between all read and write operations. Specifically, for write operations, this mode will ensure that the result of the table will be perfectly consistent with the visible history of operations, that is, as if all the operations were executed sequentially one by one.
- object Snapshot extends DeltaLogging
- case object SnapshotIsolation extends IsolationLevel with Product with Serializable
This isolation level will ensure that all reads will see a consistent snapshot of the table and any transactional write will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was read by the transaction.
This isolation level will ensure that all reads will see a consistent snapshot of the table and any transactional write will successfully commit only if the values updated by the transaction have not been changed externally since the snapshot was read by the transaction.
This provides a lower consistency guarantee than WriteSerializable but a higher availability than that. For example, unlike WriteSerializable, this level allows two concurrent UPDATE operations reading the same data to be committed successfully as long as they don't modify the same data.
Note that for operations that do not modify data in the table, Snapshot isolation is same as Serializablity. Hence such operations can be safely committed with Snapshot isolation level.
- object SnapshotManagement
- case object StartingVersionLatest extends DeltaStartingVersion with Product with Serializable
- object SupportedGenerationExpressions
This class defines the list of expressions that can be used in a generated column.
- case object WriteSerializable extends IsolationLevel with Product with Serializable
This isolation level will ensure snapshot isolation consistency guarantee between write operations only.
This isolation level will ensure snapshot isolation consistency guarantee between write operations only. In other words, if only the write operations are considered, then there exists a serializable sequence between them that would produce the same result as seen in the table. However, if both read and write operations are considered, then there may not exist a serializable sequence that would explain all the observed reads.
This provides a lower consistency guarantee than Serializable but a higher availability than that. For example, unlike Serializable, this level allows an UPDATE operation to be committed even if there was a concurrent INSERT operation that has already added data that should have been read by the UPDATE. It will be as if the UPDATE was executed before the INSERT even if the former was committed after the latter. As a side effect, the visible history of operations may not be consistent with the result expected if these operations were executed sequentially one by one.