object GeneratedColumn extends DeltaLogging with AnalysisHelper
Provide utility methods to implement Generated Columns for Delta. Users can use the following SQL syntax to create a table with generated columns.
CREATE TABLE table_identifier(
column_name column_type,
column_name column_type GENERATED ALWAYS AS ( generation_expr ),
...
)
USING delta
[ PARTITIONED BY (partition_column_name, ...) ]
This is an example:
CREATE TABLE foo(
id bigint,
type string,
subType string GENERATED ALWAYS AS ( SUBSTRING(type FROM 0 FOR 4) ),
data string,
eventTime timestamp,
day date GENERATED ALWAYS AS ( days(eventTime) )
USING delta
PARTITIONED BY (type, day)
When writing to a table, for these generated columns: - If the output is missing a generated column, we will add an expression to generate it. - If a generated column exists in the output, in other words, we will add a constraint to ensure the given value doesn't violate the generation expression.
- Alphabetic
- By Inheritance
- GeneratedColumn
- AnalysisHelper
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val MIN_WRITER_VERSION: Int
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def enforcesGeneratedColumns(protocol: Protocol, metadata: Metadata): Boolean
Whether the table has generated columns.
Whether the table has generated columns. A table has generated columns only if its
minWriterVersion>=GeneratedColumn.MIN_WRITER_VERSIONand some of columns in the table schema contain generation expressions.As Spark will propagate column metadata storing the generation expression through the entire plan, old versions that don't support generated columns may create tables whose schema contain generation expressions. However, since these old versions has a lower writer version, we can use the table's
minWriterVersionto identify such tables and treat them as normal tables.- protocol
the table protocol.
- metadata
the table metadata.
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def generatePartitionFilters(spark: SparkSession, snapshot: Snapshot, dataFilters: Seq[Expression], delta: LogicalPlan): Seq[Expression]
Try to generate partition filters from data filters if possible.
Try to generate partition filters from data filters if possible.
- delta
the logical plan that outputs the same attributes as the table schema. This will be used to resolve auto generated expressions.
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getGeneratedColumns(snapshot: Snapshot): Seq[StructField]
Returns the generated columns of a table.
Returns the generated columns of a table. A column is a generated column requires: - The table writer protocol >= GeneratedColumn.MIN_WRITER_VERSION; - It has a generation expression in the column metadata.
- def getGeneratedColumnsAndColumnsUsedByGeneratedColumns(schema: StructType): Set[String]
- def getGenerationExpression(field: StructField): Option[Expression]
Return the generation expression from a field if any.
Return the generation expression from a field if any. This method doesn't check the protocl. The caller should make sure the table writer protocol meets
satisfyGeneratedColumnProtocolbefore calling method. - def getGenerationExpressionStr(metadata: Metadata): Option[String]
Return the generation expression from a field metadata if any.
- def getOptimizablePartitionExpressions(schema: StructType, partitionSchema: StructType): Map[String, Seq[OptimizablePartitionExpression]]
Try to get
OptimizablePartitionExpressions of a data column when a partition column is defined as a generated column and refers to this data column.Try to get
OptimizablePartitionExpressions of a data column when a partition column is defined as a generated column and refers to this data column.- schema
the table schema
- partitionSchema
the partition schema. If a partition column is defined as a generated column, its column metadata should contain the generation expression.
- def hasGeneratedColumns(schema: StructType): Boolean
Whether any generation expressions exist in the schema.
Whether any generation expressions exist in the schema. Note: this doesn't mean the table contains generated columns. A table has generated columns only if its
minWriterVersion>=GeneratedColumn.MIN_WRITER_VERSIONand some of columns in the table schema contain generation expressions. UseenforcesGeneratedColumnsto check generated column tables instead. - def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def improveUnsupportedOpError(f: => Unit): Unit
- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def isGeneratedColumn(protocol: Protocol, field: StructField): Boolean
Whether a column is a generated column.
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def partitionFilterOptimizationEnabled(spark: SparkSession): Boolean
- def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = null, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def resolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planProvidingAttrs: LogicalPlan): Seq[Expression]
Resolve expressions using the attributes provided by
planProvidingAttrs.Resolve expressions using the attributes provided by
planProvidingAttrs. Throw an error if failing to resolve any expressions.- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def satisfyGeneratedColumnProtocol(protocol: Protocol): Boolean
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toDataset(sparkSession: SparkSession, logicalPlan: LogicalPlan): Dataset[Row]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def toString(): String
- Definition Classes
- AnyRef → Any
- def tryResolveReferences(sparkSession: SparkSession)(expr: Expression, planContainingExpr: LogicalPlan): Expression
- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def tryResolveReferencesForExpressions(sparkSession: SparkSession, exprs: Seq[Expression], planContainingExpr: LogicalPlan): Seq[Expression]
- Attributes
- protected
- Definition Classes
- AnalysisHelper
- def validateGeneratedColumns(spark: SparkSession, schema: StructType): Unit
If the schema contains generated columns, check the following unsupported cases: - Refer to a non-existent column or another generated column.
If the schema contains generated columns, check the following unsupported cases: - Refer to a non-existent column or another generated column. - Use an unsupported expression. - The expression type is not the same as the column type.
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withDmqTag[T](thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter