package commands
- Alphabetic
- Public
- Protected
Type Members
- trait AlterDeltaTableCommand extends DeltaCommand
A super trait for alter table commands that modify Delta tables.
- case class AlterTableAddColumnsDeltaCommand(table: DeltaTableV2, colsToAddWithPosition: Seq[QualifiedColType]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command that add columns to a Delta table.
A command that add columns to a Delta table. The syntax of using this command in SQL is:
ALTER TABLE table_identifier ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
- case class AlterTableAddConstraintDeltaCommand(table: DeltaTableV2, name: String, exprText: String) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
Command to add a constraint to a Delta table.
Command to add a constraint to a Delta table. Currently only CHECK constraints are supported.
Adding a constraint will scan all data in the table to verify the constraint currently holds.
- table
The table to which the constraint should be added.
- name
The name of the new constraint.
- exprText
The contents of the new CHECK constraint, to be parsed and evaluated.
- case class AlterTableChangeColumnDeltaCommand(table: DeltaTableV2, columnPath: Seq[String], columnName: String, newColumn: StructField, colPosition: Option[ColumnPosition], syncIdentity: Boolean) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command to change the column for a Delta table, support changing the comment of a column and reordering columns.
A command to change the column for a Delta table, support changing the comment of a column and reordering columns.
The syntax of using this command in SQL is:
ALTER TABLE table_identifier CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment] [FIRST | AFTER column_name];
- case class AlterTableDropColumnsDeltaCommand(table: DeltaTableV2, columnsToDrop: Seq[Seq[String]]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command that drop columns from a Delta table.
A command that drop columns from a Delta table. The syntax of using this command in SQL is:
ALTER TABLE table_identifier DROP COLUMN(S) (col_name_1, col_name_2, ...);
- case class AlterTableDropConstraintDeltaCommand(table: DeltaTableV2, name: String, ifExists: Boolean) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
Command to drop a constraint from a Delta table.
Command to drop a constraint from a Delta table. No-op if a constraint with the given name doesn't exist.
Currently only CHECK constraints are supported.
- table
The table from which the constraint should be dropped
- name
The name of the constraint to drop
- case class AlterTableReplaceColumnsDeltaCommand(table: DeltaTableV2, columns: Seq[StructField]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command to replace columns for a Delta table, support changing the comment of a column, reordering columns, and loosening nullabilities.
A command to replace columns for a Delta table, support changing the comment of a column, reordering columns, and loosening nullabilities.
The syntax of using this command in SQL is:
ALTER TABLE table_identifier REPLACE COLUMNS (col_spec[, col_spec ...]);
- case class AlterTableSetLocationDeltaCommand(table: DeltaTableV2, location: String) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command to change the location of a Delta table.
A command to change the location of a Delta table. Effectively, this only changes the symlink in the Hive MetaStore from one Delta table to another.
This command errors out if the new location is not a Delta table. By default, the new Delta table must have the same schema as the old table, but we have a SQL conf that allows users to bypass this schema check.
The syntax of using this command in SQL is:
ALTER TABLE table_identifier SET LOCATION 'path/to/new/delta/table';
- case class AlterTableSetPropertiesDeltaCommand(table: DeltaTableV2, configuration: Map[String, String]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command that sets Delta table configuration.
A command that sets Delta table configuration.
The syntax of this command is:
ALTER TABLE table1 SET TBLPROPERTIES ('key1' = 'val1', 'key2' = 'val2', ...);
- case class AlterTableUnsetPropertiesDeltaCommand(table: DeltaTableV2, propKeys: Seq[String], ifExists: Boolean) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable
A command that unsets Delta table configuration.
A command that unsets Delta table configuration. If ifExists is false, each individual key will be checked if it exists or not, it's a one-by-one operation, not an all or nothing check. Otherwise, non-existent keys will be ignored.
The syntax of this command is:
ALTER TABLE table1 UNSET TBLPROPERTIES [IF EXISTS] ('key1', 'key2', ...);
- case class ConvertTargetFile(fileStatus: SerializableFileStatus, partitionValues: Option[Map[String, String]] = None) extends Product with Serializable
An interface for the file to be included during conversion.
An interface for the file to be included during conversion.
- fileStatus
the file info
- partitionValues
partition values of this file that may be available from the source table format. If none, the converter will infer partition values from the file path, assuming the Hive directory format.
- trait ConvertTargetFileManifest extends Closeable
An interface for providing an iterator of files for a table.
- trait ConvertTargetTable extends AnyRef
An interface for the table to be converted to Delta.
- case class ConvertToDeltaCommand(tableIdentifier: TableIdentifier, partitionSchema: Option[StructType], deltaPath: Option[String]) extends ConvertToDeltaCommandBase with Product with Serializable
- abstract class ConvertToDeltaCommandBase extends LogicalPlan with LeafRunnableCommand with DeltaCommand
Convert an existing parquet table to a delta table by creating delta logs based on existing files.
Convert an existing parquet table to a delta table by creating delta logs based on existing files. Here are the main components:
- File Listing: Launch a spark job to list files from a given directory in parallel.
- Schema Inference: Given an iterator on the file list result, we group the iterator into sequential batches and launch a spark job to infer schema for each batch, and finally merge schemas from all batches.
- Stats collection: Again, we group the iterator on file list results into sequential batches and launch a spark job to collect stats for each batch.
- Commit the files: We take the iterator of files with stats and write out a delta log file as the first commit. This bypasses the transaction protocol, but it's ok as this would be the very first commit.
- case class CreateDeltaTableCommand(table: CatalogTable, existingTableOpt: Option[CatalogTable], mode: SaveMode, query: Option[LogicalPlan], operation: CreationMode = TableCreationModes.Create, tableByPath: Boolean = false, output: Seq[Attribute] = Nil) extends LogicalPlan with LeafRunnableCommand with DeltaLogging with Product with Serializable
Single entry point for all write or declaration operations for Delta tables accessed through the table name.
Single entry point for all write or declaration operations for Delta tables accessed through the table name.
- table
The table identifier for the Delta table
- existingTableOpt
The existing table for the same identifier if exists
- mode
The save mode when writing data. Relevant when the query is empty or set to Ignore with
CREATE TABLE IF NOT EXISTS.- query
The query to commit into the Delta table if it exist. This can come from
- CTAS
- saveAsTable
- case class DeleteCommand(deltaLog: DeltaLog, target: LogicalPlan, condition: Option[Expression]) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with DeleteCommandMetrics with Product with Serializable
Performs a Delete based on the search condition
Performs a Delete based on the search condition
Algorithm: 1) Scan all the files and determine which files have the rows that need to be deleted. 2) Traverse the affected files and rebuild the touched files. 3) Use the Delta protocol to atomically write the remaining rows to new files and remove the affected files that are identified in step 1.
- trait DeleteCommandMetrics extends AnyRef
- case class DeleteMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, numRemovedFiles: Long, numAddedFiles: Long, numAddedChangeFiles: Long, numFilesBeforeSkipping: Long, numBytesBeforeSkipping: Long, numFilesAfterSkipping: Long, numBytesAfterSkipping: Long, numPartitionsAfterSkipping: Option[Long], numPartitionsAddedTo: Option[Long], numPartitionsRemovedFrom: Option[Long], numCopiedRows: Option[Long], numDeletedRows: Option[Long], numBytesAdded: Long, numBytesRemoved: Long, changeFileBytes: Long, scanTimeMs: Long, rewriteTimeMs: Long) extends Product with Serializable
Used to report details about delete.
Used to report details about delete.
- Note
All the time units are milliseconds.
- trait DeltaCommand extends DeltaLogging
Helper trait for all delta commands.
- case class DeltaGenerateCommand(modeName: String, tableId: TableIdentifier) extends LogicalPlan with LeafRunnableCommand with Product with Serializable
- case class DeltaVacuumStats(isDryRun: Boolean, specifiedRetentionMillis: Option[Long], defaultRetentionMillis: Long, minRetainedTimestamp: Long, dirsPresentBeforeDelete: Long, objectsDeleted: Long) extends Product with Serializable
- case class DescribeDeltaDetailCommand(path: Option[String], tableIdentifier: Option[TableIdentifier]) extends LogicalPlan with LeafRunnableCommand with DeltaLogging with Product with Serializable
A command for describing the details of a table such as the format, name, and size.
- case class DescribeDeltaHistory(path: Option[String], tableIdentifier: Option[TableIdentifier], limit: Option[Int], output: Seq[Attribute] = ExpressionEncoder[DeltaHistory]().schema.toAttributes) extends LogicalPlan with LeafNode with MultiInstanceRelation with Product with Serializable
A logical placeholder for describing a Delta table's history, so that the history can be leveraged in subqueries.
A logical placeholder for describing a Delta table's history, so that the history can be leveraged in subqueries. Replaced with
DescribeDeltaHistoryCommandduring planning. - case class DescribeDeltaHistoryCommand(path: Option[String], tableIdentifier: Option[TableIdentifier], limit: Option[Int], output: Seq[Attribute] = ExpressionEncoder[DeltaHistory]().schema.toAttributes) extends LogicalPlan with LeafRunnableCommand with DeltaLogging with Product with Serializable
A command for describing the history of a Delta table.
- class ManualListingFileManifest extends ConvertTargetFileManifest
A file manifest generated through recursively listing a base path.
- case class MergeClauseStats(condition: Option[String], actionType: String, actionExpr: Seq[String]) extends Product with Serializable
Represents the state of a single merge clause: - merge clause's (optional) predicate - action type (insert, update, delete) - action's expressions
- case class MergeDataSizes(rows: Option[Long] = None, files: Option[Long] = None, bytes: Option[Long] = None, partitions: Option[Long] = None) extends Product with Serializable
- case class MergeIntoCommand(source: LogicalPlan, target: LogicalPlan, targetFileIndex: TahoeFileIndex, condition: Expression, matchedClauses: Seq[DeltaMergeIntoMatchedClause], notMatchedClauses: Seq[DeltaMergeIntoInsertClause], migratedSchema: Option[StructType]) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with PredicateHelper with AnalysisHelper with ImplicitMetadataOperation with Product with Serializable
Performs a merge of a source query/table into a Delta table.
Performs a merge of a source query/table into a Delta table.
Issues an error message when the ON search_condition of the MERGE statement can match a single row from the target table with multiple rows of the source table-reference.
Algorithm:
Phase 1: Find the input files in target that are touched by the rows that satisfy the condition and verify that no two source rows match with the same target row. This is implemented as an inner-join using the given condition. See findTouchedFiles for more details.
Phase 2: Read the touched files again and write new files with updated and/or inserted rows.
Phase 3: Use the Delta protocol to atomically remove the touched files and add the new files.
- source
Source data to merge from
- target
Target table to merge into
- targetFileIndex
TahoeFileIndex of the target table
- condition
Condition for a source row to match with a target row
- matchedClauses
All info related to matched clauses.
- notMatchedClauses
All info related to not matched clause.
- migratedSchema
The final schema of the target - may be changed by schema evolution.
- case class MergeStats(conditionExpr: String, updateConditionExpr: String, updateExprs: Seq[String], insertConditionExpr: String, insertExprs: Seq[String], deleteConditionExpr: String, matchedStats: Seq[MergeClauseStats], notMatchedStats: Seq[MergeClauseStats], source: MergeDataSizes, targetBeforeSkipping: MergeDataSizes, targetAfterSkipping: MergeDataSizes, sourceRowsInSecondScan: Option[Long], targetFilesRemoved: Long, targetFilesAdded: Long, targetChangeFilesAdded: Option[Long], targetChangeFileBytes: Option[Long], targetBytesRemoved: Option[Long], targetBytesAdded: Option[Long], targetPartitionsRemovedFrom: Option[Long], targetPartitionsAddedTo: Option[Long], targetRowsCopied: Long, targetRowsUpdated: Long, targetRowsInserted: Long, targetRowsDeleted: Long) extends Product with Serializable
State for a merge operation
- class MetadataLogFileManifest extends ConvertTargetFileManifest
A file manifest generated from pre-existing parquet MetadataLog.
- class OptimizeExecutor extends DeltaCommand with SQLMetricsReporting with Serializable
Optimize job which compacts small files into larger files to reduce the number of files and potentially allow more efficient reads.
- case class OptimizeTableCommand(path: Option[String], tableId: Option[TableIdentifier], partitionPredicate: Option[String])(zOrderBy: Seq[UnresolvedAttribute]) extends OptimizeTableCommandBase with LeafRunnableCommand with Product with Serializable
The
optimizecommand implementation for Spark SQL.The
optimizecommand implementation for Spark SQL. Example SQL:OPTIMIZE ('/path/to/dir' | delta.table) [WHERE part = 25]; - abstract class OptimizeTableCommandBase extends LogicalPlan with RunnableCommand with DeltaCommand
Base class defining abstract optimize command
- class ParquetTable extends ConvertTargetTable with DeltaLogging
- case class RestoreTableCommand(sourceTable: DeltaTableV2, targetIdent: TableIdentifier) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with RestoreTableCommandBase with Product with Serializable
Perform restore of delta table to a specified version or timestamp
Perform restore of delta table to a specified version or timestamp
Algorithm: 1) Read the latest snapshot of the table. 2) Read snapshot for version or timestamp to restore 3) Compute files available in snapshot for restoring (files were removed by some commit) but missed in the latest. Add these files into commit as AddFile action. 4) Compute files available in the latest snapshot (files were added after version to restore) but missed in the snapshot to restore. Add these files into commit as RemoveFile action. 5) If SQLConf.IGNORE_MISSING_FILES option is false (default value) check availability of AddFile in file system. 6) Commit metadata, Protocol, all RemoveFile and AddFile actions into delta log using
commitLarge(commit will be failed in case of parallel transaction) 7) If table was modified in parallel then ignore restore and raise exception. - trait RestoreTableCommandBase extends AnyRef
Base trait class for RESTORE.
Base trait class for RESTORE. Defines command output schema and metrics.
- case class TableDetail(format: String, id: String, name: String, description: String, location: String, createdAt: Timestamp, lastModified: Timestamp, partitionColumns: Seq[String], numFiles: Long, sizeInBytes: Long, properties: Map[String, String], minReaderVersion: Integer, minWriterVersion: Integer) extends Product with Serializable
The result returned by the
describe detailcommand. - case class UpdateCommand(tahoeFileIndex: TahoeFileIndex, target: LogicalPlan, updateExpressions: Seq[Expression], condition: Option[Expression]) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with Product with Serializable
Performs an Update using
updateExpressionon the rows that matchconditionPerforms an Update using
updateExpressionon the rows that matchconditionAlgorithm: 1) Identify the affected files, i.e., the files that may have the rows to be updated. 2) Scan affected files, apply the updates, and generate a new DF with updated rows. 3) Use the Delta protocol to atomically write the new DF as new files and remove the affected files that are identified in step 1.
- case class UpdateMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, numAddedChangeFiles: Long, changeFileBytes: Long, scanTimeMs: Long, rewriteTimeMs: Long) extends Product with Serializable
Used to report details about update.
Used to report details about update.
- Note
All the time units are milliseconds.
- trait VacuumCommandImpl extends DeltaCommand
- case class WriteIntoDelta(deltaLog: DeltaLog, mode: SaveMode, options: DeltaOptions, partitionColumns: Seq[String], configuration: Map[String, String], data: DataFrame, schemaInCatalog: Option[StructType] = None) extends LogicalPlan with LeafRunnableCommand with ImplicitMetadataOperation with DeltaCommand with Product with Serializable
Used to write a DataFrame into a delta table.
Used to write a DataFrame into a delta table.
New Table Semantics
- The schema of the DataFrame is used to initialize the table.
- The partition columns will be used to partition the table.
Existing Table Semantics
- The save mode will control how existing data is handled (i.e. overwrite, append, etc)
- The schema of the DataFrame will be checked and if there are new columns present they will be added to the tables schema. Conflicting columns (i.e. a INT, and a STRING) will result in an exception
- The partition columns, if present are validated against the existing metadata. If not present, then the partitioning of the table is respected.
In combination with
Overwrite, areplaceWhereoption can be used to transactionally replace data that matches a predicate.In combination with
Overwritedynamic partition overwrite mode (optionpartitionOverwriteModeset todynamic, or in spark confspark.sql.sources.partitionOverwriteModeset todynamic) is also supported.Dynamic partition overwrite mode conflicts with
replaceWhere:- If a
replaceWhereoption is provided, and dynamic partition overwrite mode is enabled in the DataFrameWriter options, an error will be thrown. - If a
replaceWhereoption is provided, and dynamic partition overwrite mode is enabled in the spark conf, data will be overwritten according to thereplaceWhereexpression
- schemaInCatalog
The schema created in Catalog. We will use this schema to update metadata when it is set (in CTAS code path), and otherwise use schema from
data.
Value Members
- object ConvertToDeltaCommand extends Serializable
- object DeleteCommand extends Serializable
- object DeltaGenerateCommand extends Serializable
- object MergeClauseStats extends Serializable
- object MergeIntoCommand extends Serializable
- object MergeStats extends Serializable
- object TableCreationModes
- object TableDetail extends Serializable
- object UpdateCommand extends Serializable
- object VacuumCommand extends VacuumCommandImpl with Serializable
Vacuums the table by clearing all untracked files and folders within this table.
Vacuums the table by clearing all untracked files and folders within this table. First lists all the files and directories in the table, and gets the relative paths with respect to the base of the table. Then it gets the list of all tracked files for this table, which may or may not be within the table base path, and gets the relative paths of all the tracked files with respect to the base of the table. Files outside of the table path will be ignored. Then we take a diff of the files and delete directories that were already empty, and all files that are within the table that are no longer tracked.