Packages

package commands

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. Protected

Package Members

  1. package cdc
  2. package optimize

Type Members

  1. trait AlterDeltaTableCommand extends DeltaCommand

    A super trait for alter table commands that modify Delta tables.

  2. case class AlterTableAddColumnsDeltaCommand(table: DeltaTableV2, colsToAddWithPosition: Seq[QualifiedColType]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable

    A command that add columns to a Delta table.

    A command that add columns to a Delta table. The syntax of using this command in SQL is:

    ALTER TABLE table_identifier
    ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
  3. case class AlterTableAddConstraintDeltaCommand(table: DeltaTableV2, name: String, exprText: String) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable

    Command to add a constraint to a Delta table.

    Command to add a constraint to a Delta table. Currently only CHECK constraints are supported.

    Adding a constraint will scan all data in the table to verify the constraint currently holds.

    table

    The table to which the constraint should be added.

    name

    The name of the new constraint.

    exprText

    The contents of the new CHECK constraint, to be parsed and evaluated.

  4. case class AlterTableChangeColumnDeltaCommand(table: DeltaTableV2, columnPath: Seq[String], columnName: String, newColumn: StructField, colPosition: Option[ColumnPosition], syncIdentity: Boolean) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable

    A command to change the column for a Delta table, support changing the comment of a column and reordering columns.

    A command to change the column for a Delta table, support changing the comment of a column and reordering columns.

    The syntax of using this command in SQL is:

    ALTER TABLE table_identifier
    CHANGE [COLUMN] column_old_name column_new_name column_dataType [COMMENT column_comment]
    [FIRST | AFTER column_name];
  5. case class AlterTableDropColumnsDeltaCommand(table: DeltaTableV2, columnsToDrop: Seq[Seq[String]]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable

    A command that drop columns from a Delta table.

    A command that drop columns from a Delta table. The syntax of using this command in SQL is:

    ALTER TABLE table_identifier
    DROP COLUMN(S) (col_name_1, col_name_2, ...);
  6. case class AlterTableDropConstraintDeltaCommand(table: DeltaTableV2, name: String, ifExists: Boolean) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable

    Command to drop a constraint from a Delta table.

    Command to drop a constraint from a Delta table. No-op if a constraint with the given name doesn't exist.

    Currently only CHECK constraints are supported.

    table

    The table from which the constraint should be dropped

    name

    The name of the constraint to drop

  7. case class AlterTableReplaceColumnsDeltaCommand(table: DeltaTableV2, columns: Seq[StructField]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable

    A command to replace columns for a Delta table, support changing the comment of a column, reordering columns, and loosening nullabilities.

    A command to replace columns for a Delta table, support changing the comment of a column, reordering columns, and loosening nullabilities.

    The syntax of using this command in SQL is:

    ALTER TABLE table_identifier REPLACE COLUMNS (col_spec[, col_spec ...]);
  8. case class AlterTableSetLocationDeltaCommand(table: DeltaTableV2, location: String) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable

    A command to change the location of a Delta table.

    A command to change the location of a Delta table. Effectively, this only changes the symlink in the Hive MetaStore from one Delta table to another.

    This command errors out if the new location is not a Delta table. By default, the new Delta table must have the same schema as the old table, but we have a SQL conf that allows users to bypass this schema check.

    The syntax of using this command in SQL is:

    ALTER TABLE table_identifier SET LOCATION 'path/to/new/delta/table';
  9. case class AlterTableSetPropertiesDeltaCommand(table: DeltaTableV2, configuration: Map[String, String]) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable

    A command that sets Delta table configuration.

    A command that sets Delta table configuration.

    The syntax of this command is:

    ALTER TABLE table1 SET TBLPROPERTIES ('key1' = 'val1', 'key2' = 'val2', ...);
  10. case class AlterTableUnsetPropertiesDeltaCommand(table: DeltaTableV2, propKeys: Seq[String], ifExists: Boolean) extends LogicalPlan with LeafRunnableCommand with AlterDeltaTableCommand with IgnoreCachedData with Product with Serializable

    A command that unsets Delta table configuration.

    A command that unsets Delta table configuration. If ifExists is false, each individual key will be checked if it exists or not, it's a one-by-one operation, not an all or nothing check. Otherwise, non-existent keys will be ignored.

    The syntax of this command is:

    ALTER TABLE table1 UNSET TBLPROPERTIES [IF EXISTS] ('key1', 'key2', ...);
  11. case class ConvertTargetFile(fileStatus: SerializableFileStatus, partitionValues: Option[Map[String, String]] = None) extends Product with Serializable

    An interface for the file to be included during conversion.

    An interface for the file to be included during conversion.

    fileStatus

    the file info

    partitionValues

    partition values of this file that may be available from the source table format. If none, the converter will infer partition values from the file path, assuming the Hive directory format.

  12. trait ConvertTargetFileManifest extends Closeable

    An interface for providing an iterator of files for a table.

  13. trait ConvertTargetTable extends AnyRef

    An interface for the table to be converted to Delta.

  14. case class ConvertToDeltaCommand(tableIdentifier: TableIdentifier, partitionSchema: Option[StructType], deltaPath: Option[String]) extends ConvertToDeltaCommandBase with Product with Serializable
  15. abstract class ConvertToDeltaCommandBase extends LogicalPlan with LeafRunnableCommand with DeltaCommand

    Convert an existing parquet table to a delta table by creating delta logs based on existing files.

    Convert an existing parquet table to a delta table by creating delta logs based on existing files. Here are the main components:

    • File Listing: Launch a spark job to list files from a given directory in parallel.
    • Schema Inference: Given an iterator on the file list result, we group the iterator into sequential batches and launch a spark job to infer schema for each batch, and finally merge schemas from all batches.
    • Stats collection: Again, we group the iterator on file list results into sequential batches and launch a spark job to collect stats for each batch.
    • Commit the files: We take the iterator of files with stats and write out a delta log file as the first commit. This bypasses the transaction protocol, but it's ok as this would be the very first commit.
  16. case class CreateDeltaTableCommand(table: CatalogTable, existingTableOpt: Option[CatalogTable], mode: SaveMode, query: Option[LogicalPlan], operation: CreationMode = TableCreationModes.Create, tableByPath: Boolean = false, output: Seq[Attribute] = Nil) extends LogicalPlan with LeafRunnableCommand with DeltaLogging with Product with Serializable

    Single entry point for all write or declaration operations for Delta tables accessed through the table name.

    Single entry point for all write or declaration operations for Delta tables accessed through the table name.

    table

    The table identifier for the Delta table

    existingTableOpt

    The existing table for the same identifier if exists

    mode

    The save mode when writing data. Relevant when the query is empty or set to Ignore with CREATE TABLE IF NOT EXISTS.

    query

    The query to commit into the Delta table if it exist. This can come from

    • CTAS
    • saveAsTable
  17. case class DeleteCommand(deltaLog: DeltaLog, target: LogicalPlan, condition: Option[Expression]) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with DeleteCommandMetrics with Product with Serializable

    Performs a Delete based on the search condition

    Performs a Delete based on the search condition

    Algorithm: 1) Scan all the files and determine which files have the rows that need to be deleted. 2) Traverse the affected files and rebuild the touched files. 3) Use the Delta protocol to atomically write the remaining rows to new files and remove the affected files that are identified in step 1.

  18. trait DeleteCommandMetrics extends AnyRef
  19. case class DeleteMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, numRemovedFiles: Long, numAddedFiles: Long, numAddedChangeFiles: Long, numFilesBeforeSkipping: Long, numBytesBeforeSkipping: Long, numFilesAfterSkipping: Long, numBytesAfterSkipping: Long, numPartitionsAfterSkipping: Option[Long], numPartitionsAddedTo: Option[Long], numPartitionsRemovedFrom: Option[Long], numCopiedRows: Option[Long], numDeletedRows: Option[Long], numBytesAdded: Long, numBytesRemoved: Long, changeFileBytes: Long, scanTimeMs: Long, rewriteTimeMs: Long) extends Product with Serializable

    Used to report details about delete.

    Used to report details about delete.

    Note

    All the time units are milliseconds.

  20. trait DeltaCommand extends DeltaLogging

    Helper trait for all delta commands.

  21. case class DeltaGenerateCommand(modeName: String, tableId: TableIdentifier) extends LogicalPlan with LeafRunnableCommand with Product with Serializable
  22. case class DeltaVacuumStats(isDryRun: Boolean, specifiedRetentionMillis: Option[Long], defaultRetentionMillis: Long, minRetainedTimestamp: Long, dirsPresentBeforeDelete: Long, objectsDeleted: Long) extends Product with Serializable
  23. case class DescribeDeltaDetailCommand(path: Option[String], tableIdentifier: Option[TableIdentifier]) extends LogicalPlan with LeafRunnableCommand with DeltaLogging with Product with Serializable

    A command for describing the details of a table such as the format, name, and size.

  24. case class DescribeDeltaHistory(path: Option[String], tableIdentifier: Option[TableIdentifier], limit: Option[Int], output: Seq[Attribute] = ExpressionEncoder[DeltaHistory]().schema.toAttributes) extends LogicalPlan with LeafNode with MultiInstanceRelation with Product with Serializable

    A logical placeholder for describing a Delta table's history, so that the history can be leveraged in subqueries.

    A logical placeholder for describing a Delta table's history, so that the history can be leveraged in subqueries. Replaced with DescribeDeltaHistoryCommand during planning.

  25. case class DescribeDeltaHistoryCommand(path: Option[String], tableIdentifier: Option[TableIdentifier], limit: Option[Int], output: Seq[Attribute] = ExpressionEncoder[DeltaHistory]().schema.toAttributes) extends LogicalPlan with LeafRunnableCommand with DeltaLogging with Product with Serializable

    A command for describing the history of a Delta table.

  26. class ManualListingFileManifest extends ConvertTargetFileManifest

    A file manifest generated through recursively listing a base path.

  27. case class MergeClauseStats(condition: Option[String], actionType: String, actionExpr: Seq[String]) extends Product with Serializable

    Represents the state of a single merge clause: - merge clause's (optional) predicate - action type (insert, update, delete) - action's expressions

  28. case class MergeDataSizes(rows: Option[Long] = None, files: Option[Long] = None, bytes: Option[Long] = None, partitions: Option[Long] = None) extends Product with Serializable
  29. case class MergeIntoCommand(source: LogicalPlan, target: LogicalPlan, targetFileIndex: TahoeFileIndex, condition: Expression, matchedClauses: Seq[DeltaMergeIntoMatchedClause], notMatchedClauses: Seq[DeltaMergeIntoInsertClause], migratedSchema: Option[StructType]) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with PredicateHelper with AnalysisHelper with ImplicitMetadataOperation with Product with Serializable

    Performs a merge of a source query/table into a Delta table.

    Performs a merge of a source query/table into a Delta table.

    Issues an error message when the ON search_condition of the MERGE statement can match a single row from the target table with multiple rows of the source table-reference.

    Algorithm:

    Phase 1: Find the input files in target that are touched by the rows that satisfy the condition and verify that no two source rows match with the same target row. This is implemented as an inner-join using the given condition. See findTouchedFiles for more details.

    Phase 2: Read the touched files again and write new files with updated and/or inserted rows.

    Phase 3: Use the Delta protocol to atomically remove the touched files and add the new files.

    source

    Source data to merge from

    target

    Target table to merge into

    targetFileIndex

    TahoeFileIndex of the target table

    condition

    Condition for a source row to match with a target row

    matchedClauses

    All info related to matched clauses.

    notMatchedClauses

    All info related to not matched clause.

    migratedSchema

    The final schema of the target - may be changed by schema evolution.

  30. case class MergeStats(conditionExpr: String, updateConditionExpr: String, updateExprs: Seq[String], insertConditionExpr: String, insertExprs: Seq[String], deleteConditionExpr: String, matchedStats: Seq[MergeClauseStats], notMatchedStats: Seq[MergeClauseStats], source: MergeDataSizes, targetBeforeSkipping: MergeDataSizes, targetAfterSkipping: MergeDataSizes, sourceRowsInSecondScan: Option[Long], targetFilesRemoved: Long, targetFilesAdded: Long, targetChangeFilesAdded: Option[Long], targetChangeFileBytes: Option[Long], targetBytesRemoved: Option[Long], targetBytesAdded: Option[Long], targetPartitionsRemovedFrom: Option[Long], targetPartitionsAddedTo: Option[Long], targetRowsCopied: Long, targetRowsUpdated: Long, targetRowsInserted: Long, targetRowsDeleted: Long) extends Product with Serializable

    State for a merge operation

  31. class MetadataLogFileManifest extends ConvertTargetFileManifest

    A file manifest generated from pre-existing parquet MetadataLog.

  32. class OptimizeExecutor extends DeltaCommand with SQLMetricsReporting with Serializable

    Optimize job which compacts small files into larger files to reduce the number of files and potentially allow more efficient reads.

  33. case class OptimizeTableCommand(path: Option[String], tableId: Option[TableIdentifier], partitionPredicate: Option[String])(zOrderBy: Seq[UnresolvedAttribute]) extends OptimizeTableCommandBase with LeafRunnableCommand with Product with Serializable

    The optimize command implementation for Spark SQL.

    The optimize command implementation for Spark SQL. Example SQL:

    OPTIMIZE ('/path/to/dir' | delta.table) [WHERE part = 25];
  34. abstract class OptimizeTableCommandBase extends LogicalPlan with RunnableCommand with DeltaCommand

    Base class defining abstract optimize command

  35. class ParquetTable extends ConvertTargetTable with DeltaLogging
  36. case class RestoreTableCommand(sourceTable: DeltaTableV2, targetIdent: TableIdentifier) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with RestoreTableCommandBase with Product with Serializable

    Perform restore of delta table to a specified version or timestamp

    Perform restore of delta table to a specified version or timestamp

    Algorithm: 1) Read the latest snapshot of the table. 2) Read snapshot for version or timestamp to restore 3) Compute files available in snapshot for restoring (files were removed by some commit) but missed in the latest. Add these files into commit as AddFile action. 4) Compute files available in the latest snapshot (files were added after version to restore) but missed in the snapshot to restore. Add these files into commit as RemoveFile action. 5) If SQLConf.IGNORE_MISSING_FILES option is false (default value) check availability of AddFile in file system. 6) Commit metadata, Protocol, all RemoveFile and AddFile actions into delta log using commitLarge (commit will be failed in case of parallel transaction) 7) If table was modified in parallel then ignore restore and raise exception.

  37. trait RestoreTableCommandBase extends AnyRef

    Base trait class for RESTORE.

    Base trait class for RESTORE. Defines command output schema and metrics.

  38. case class TableDetail(format: String, id: String, name: String, description: String, location: String, createdAt: Timestamp, lastModified: Timestamp, partitionColumns: Seq[String], numFiles: Long, sizeInBytes: Long, properties: Map[String, String], minReaderVersion: Integer, minWriterVersion: Integer) extends Product with Serializable

    The result returned by the describe detail command.

  39. case class UpdateCommand(tahoeFileIndex: TahoeFileIndex, target: LogicalPlan, updateExpressions: Seq[Expression], condition: Option[Expression]) extends LogicalPlan with LeafRunnableCommand with DeltaCommand with Product with Serializable

    Performs an Update using updateExpression on the rows that match condition

    Performs an Update using updateExpression on the rows that match condition

    Algorithm: 1) Identify the affected files, i.e., the files that may have the rows to be updated. 2) Scan affected files, apply the updates, and generate a new DF with updated rows. 3) Use the Delta protocol to atomically write the new DF as new files and remove the affected files that are identified in step 1.

  40. case class UpdateMetric(condition: String, numFilesTotal: Long, numTouchedFiles: Long, numRewrittenFiles: Long, numAddedChangeFiles: Long, changeFileBytes: Long, scanTimeMs: Long, rewriteTimeMs: Long) extends Product with Serializable

    Used to report details about update.

    Used to report details about update.

    Note

    All the time units are milliseconds.

  41. trait VacuumCommandImpl extends DeltaCommand
  42. case class WriteIntoDelta(deltaLog: DeltaLog, mode: SaveMode, options: DeltaOptions, partitionColumns: Seq[String], configuration: Map[String, String], data: DataFrame, schemaInCatalog: Option[StructType] = None) extends LogicalPlan with LeafRunnableCommand with ImplicitMetadataOperation with DeltaCommand with Product with Serializable

    Used to write a DataFrame into a delta table.

    Used to write a DataFrame into a delta table.

    New Table Semantics

    • The schema of the DataFrame is used to initialize the table.
    • The partition columns will be used to partition the table.

    Existing Table Semantics

    • The save mode will control how existing data is handled (i.e. overwrite, append, etc)
    • The schema of the DataFrame will be checked and if there are new columns present they will be added to the tables schema. Conflicting columns (i.e. a INT, and a STRING) will result in an exception
    • The partition columns, if present are validated against the existing metadata. If not present, then the partitioning of the table is respected.

    In combination with Overwrite, a replaceWhere option can be used to transactionally replace data that matches a predicate.

    In combination with Overwrite dynamic partition overwrite mode (option partitionOverwriteMode set to dynamic, or in spark conf spark.sql.sources.partitionOverwriteMode set to dynamic) is also supported.

    Dynamic partition overwrite mode conflicts with replaceWhere:

    • If a replaceWhere option is provided, and dynamic partition overwrite mode is enabled in the DataFrameWriter options, an error will be thrown.
    • If a replaceWhere option is provided, and dynamic partition overwrite mode is enabled in the spark conf, data will be overwritten according to the replaceWhere expression
    schemaInCatalog

    The schema created in Catalog. We will use this schema to update metadata when it is set (in CTAS code path), and otherwise use schema from data.

Value Members

  1. object ConvertToDeltaCommand extends Serializable
  2. object DeleteCommand extends Serializable
  3. object DeltaGenerateCommand extends Serializable
  4. object MergeClauseStats extends Serializable
  5. object MergeIntoCommand extends Serializable
  6. object MergeStats extends Serializable
  7. object TableCreationModes
  8. object TableDetail extends Serializable
  9. object UpdateCommand extends Serializable
  10. object VacuumCommand extends VacuumCommandImpl with Serializable

    Vacuums the table by clearing all untracked files and folders within this table.

    Vacuums the table by clearing all untracked files and folders within this table. First lists all the files and directories in the table, and gets the relative paths with respect to the base of the table. Then it gets the list of all tracked files for this table, which may or may not be within the table base path, and gets the relative paths of all the tracked files with respect to the base of the table. Files outside of the table path will be ignored. Then we take a diff of the files and delete directories that were already empty, and all files that are within the table that are no longer tracked.

Ungrouped