object SchemaUtils extends DeltaLogging
- Alphabetic
- By Inheritance
- SchemaUtils
- DeltaLogging
- DatabricksLogging
- DeltaProgressReporter
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val DELTA_COL_RESOLVER: (String, String) => Boolean
- def addColumn(schema: StructType, column: StructField, position: Seq[Int]): StructType
Add
columnto the specifiedpositioninschema.Add
columnto the specifiedpositioninschema.- position
A Seq of ordinals on where this column should go. It is a Seq to denote positions in nested columns (0-based). For example: tableSchema: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,c3>> column: c2 position: Seq(2, 1) will return result: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,**c2**,c3>>
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def canChangeDataType(from: DataType, to: DataType, resolver: Resolver, columnMappingMode: DeltaColumnMappingMode, columnPath: Seq[String] = Seq.empty): Option[String]
Check if the two data types can be changed.
Check if the two data types can be changed.
- returns
None if the data types can be changed, otherwise Some(err) containing the reason.
- def changeDataType(from: DataType, to: DataType, resolver: Resolver): DataType
Copy the nested data type between two data types.
- def checkFieldNames(names: Seq[String]): Unit
Verifies that the column names are acceptable by Parquet and henceforth Delta.
Verifies that the column names are acceptable by Parquet and henceforth Delta. Parquet doesn't accept the characters ' ,;{}()\n\t='. We ensure that neither the data columns nor the partition columns have these characters.
- def checkSchemaFieldNames(schema: StructType, columnMappingMode: DeltaColumnMappingMode): Unit
Check if the schema contains invalid char in the column names depending on the mode.
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def containsDependentExpression(spark: SparkSession, columnToChange: Seq[String], exprString: String, resolver: Resolver): Boolean
Will a column change, e.g., rename, need to be populated to the expression.
Will a column change, e.g., rename, need to be populated to the expression. This is true when the column to change itself or any of its descendent column is referenced by expression. For example:
- a, length(a) -> true
- b, (b.c + 1) -> true, because renaming b1 will need to change the expr to (b1.c + 1).
- b.c, (cast b as string) -> false, because you can change b.c to b.c1 without affecting b.
- def dropColumn(schema: StructType, position: Seq[Int]): (StructType, StructField)
Drop from the specified
positioninschemaand return with the original column.Drop from the specified
positioninschemaand return with the original column.- position
A Seq of ordinals on where this column should go. It is a Seq to denote positions in nested columns (0-based). For example: tableSchema: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,c2,c3>> position: Seq(2, 1) will return result: <a:STRUCT<a1,a2,a3>, b,c:STRUCT<c1,c3>>
- def dropNullTypeColumns(schema: StructType): StructType
Drops null types from the schema if they exist.
Drops null types from the schema if they exist. We do not recurse into Array and Map types, because we do not expect null types to exist in those columns, as Delta doesn't allow it during writes.
- def dropNullTypeColumns(df: DataFrame): DataFrame
Drops null types from the DataFrame if they exist.
Drops null types from the DataFrame if they exist. We don't have easy ways of generating types such as MapType and ArrayType, therefore if these types contain NullType in their elements, we will throw an AnalysisException.
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def fieldNameToColumn(field: String): Column
converting field name to column type with quoted back-ticks
- def fieldToColumn(field: StructField): Column
- def filterRecursively(schema: StructType, checkComplexTypes: Boolean)(f: (StructField) => Boolean): Seq[(Seq[String], StructField)]
Finds
StructFields that match a given checkf.Finds
StructFields that match a given checkf. Returns the path to the column, and the field.- checkComplexTypes
While
StructTypeis also a complex type, since we're returning StructFields, we definitely recurse into StructTypes. This flag defines whether we should recurse into ArrayType and MapType.
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def findColumnPosition(column: Seq[String], schema: StructType, resolver: Resolver = DELTA_COL_RESOLVER): (Seq[Int], Int)
Returns the given column's ordinal within the given
schemaand the size of the last schema size.Returns the given column's ordinal within the given
schemaand the size of the last schema size. The length of the returned position will be as long as how nested the column is.For ArrayType: accessing the array's element adds a position 0 to the position list. e.g. accessing a.element.y would have the result -> Seq(..., positionOfA, 0, positionOfY)
For MapType: accessing the map's key adds a position 0 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 0, positionOfY)
For MapType: accessing the map's value adds a position 1 to the position list. e.g. accessing m.key.y would have the result -> Seq(..., positionOfM, 1, positionOfY)
- column
The column to search for in the given struct. If the length of
columnis greater than 1, we expect to enter a nested field.- schema
The current struct we are looking at.
- resolver
The resolver to find the column.
- def findDependentGeneratedColumns(sparkSession: SparkSession, targetColumn: Seq[String], protocol: Protocol, schema: StructType): Seq[StructField]
Find all the generated columns that depend on the given target column.
- def findNestedFieldIgnoreCase(schema: StructType, fieldNames: Seq[String], includeCollections: Boolean = false): Option[StructField]
Copied verbatim from Apache Spark.
Copied verbatim from Apache Spark.
Returns a field in this struct and its child structs, case insensitively. This is slightly less performant than the case sensitive version.
If includeCollections is true, this will return fields that are nested in maps and arrays.
- fieldNames
The path to the field, in order from the root. For example, the column nested.a.b.c would be Seq("nested", "a", "b", "c").
- def findNullTypeColumn(schema: StructType): Option[String]
Returns the name of the first column/field that has null type (void).
- def findUndefinedTypes(dt: DataType): Seq[DataType]
Recursively find all types not defined in Delta protocol but used in
dt - def findUnsupportedDataTypes(schema: StructType): Seq[UnsupportedDataTypeInfo]
Find the unsupported data type in a table schema.
Find the unsupported data type in a table schema. Return all columns that are using unsupported data types. For example,
findUnsupportedDataType(struct<a: struct<b: unsupported_type>>)will returnSome(unsupported_type, Some("a.b")). - final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def initializeLogIfNecessary(isInterpreter: Boolean): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def isReadCompatible(existingSchema: StructType, readSchema: StructType): Boolean
As the Delta snapshots update, the schema may change as well.
As the Delta snapshots update, the schema may change as well. This method defines whether the new schema of a Delta table can be used with a previously analyzed LogicalPlan. Our rules are to return false if:
- Dropping any column that was present in the DataFrame schema
- Converting nullable=false to nullable=true for any column
- Any change of datatype
- def isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
- def log: Logger
- Attributes
- protected
- Definition Classes
- Logging
- def logConsole(line: String): Unit
- Definition Classes
- DatabricksLogging
- def logDebug(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logDebug(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logError(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logInfo(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logName: String
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logTrace(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
- def logWarning(msg: => String): Unit
- Attributes
- protected
- Definition Classes
- Logging
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def normalizeColumnNames(baseSchema: StructType, data: Dataset[_]): DataFrame
Rewrite the query field names according to the table schema.
Rewrite the query field names according to the table schema. This method assumes that all schema validation checks have been made and this is the last operation before writing into Delta.
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def prettyFieldName(columnPath: Seq[String]): String
Pretty print the column path passed in.
- def quoteIdentifier(part: String): String
- def recordDeltaEvent(deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty, data: AnyRef = null, path: Option[Path] = None): Unit
Used to record the occurrence of a single event or report detailed, operation specific statistics.
Used to record the occurrence of a single event or report detailed, operation specific statistics.
- path
Used to log the path of the delta table when
deltaLogis null.
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperation[A](deltaLog: DeltaLog, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
deltaLog.Used to report the duration as well as the success or failure of an operation on a
deltaLog.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordDeltaOperationForTablePath[A](tablePath: String, opType: String, tags: Map[TagDefinition, String] = Map.empty)(thunk: => A): A
Used to report the duration as well as the success or failure of an operation on a
tahoePath.Used to report the duration as well as the success or failure of an operation on a
tahoePath.- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordEvent(metric: MetricDefinition, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordFrameProfile[T](group: String, name: String)(thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def recordOperation[S](opType: OpType, opTarget: String = null, extraTags: Map[TagDefinition, String], isSynchronous: Boolean = true, alwaysRecordStats: Boolean = false, allowAuthTags: Boolean = false, killJvmIfStuck: Boolean = false, outputMetric: MetricDefinition = null, silent: Boolean = true)(thunk: => S): S
- Definition Classes
- DatabricksLogging
- def recordProductEvent(metric: MetricDefinition with CentralizableMetric, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, trimBlob: Boolean = true): Unit
- Definition Classes
- DatabricksLogging
- def recordProductUsage(metric: MetricDefinition with CentralizableMetric, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def recordUndefinedTypes(deltaLog: DeltaLog, schema: StructType): Unit
Record all types not defined in Delta protocol but used in the
schema. - def recordUsage(metric: MetricDefinition, quantity: Double, additionalTags: Map[TagDefinition, String] = Map.empty, blob: String = null, forceSample: Boolean = false, trimBlob: Boolean = true, silent: Boolean = false): Unit
- Definition Classes
- DatabricksLogging
- def removeUnenforceableNotNullConstraints(schema: StructType, conf: SQLConf): StructType
Go through the schema to look for unenforceable NOT NULL constraints.
Go through the schema to look for unenforceable NOT NULL constraints. By default we'll throw when they're encountered, but if this is suppressed through SQLConf they'll just be silently removed.
Note that this should only be applied to schemas created from explicit user DDL - in other scenarios, the nullability information may be inaccurate and Delta should always coerce the nullability flag to true.
- def reportDifferences(existingSchema: StructType, specifiedSchema: StructType): Seq[String]
Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:
Compare an existing schema to a specified new schema and return a message describing the first difference found, if any:
- different field name or datatype
- different metadata
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- def transformColumns[E](schema: StructType, input: Seq[(Seq[String], E)])(tf: (Seq[String], StructField, Seq[(Seq[String], E)]) => StructField): StructType
Transform (nested) columns in a schema using the given path and parameter pairs.
Transform (nested) columns in a schema using the given path and parameter pairs. The transform function is only invoked when a field's path matches one of the input paths.
- E
the type of the payload used for transforming fields.
- schema
to transform
- input
paths and parameter pairs. The paths point to fields we want to transform. The parameters will be passed to the transform function for a matching field.
- tf
function to apply per matched field. This function takes the field path, the field itself and the input names and payload pairs that matched the field name. It should return a new field.
- returns
the transformed schema.
- def transformColumnsStructs(schema: StructType, colName: Option[String] = None)(tf: (Seq[String], StructType, Resolver) => Seq[StructField]): StructType
Transform (nested) columns in a schema.
Transform (nested) columns in a schema. Runs the transform function on all nested StructTypes
If
colNameis defined, we also check if the struct to process contains the column name.- schema
to transform.
- colName
Optional name to match for
- tf
function to apply on the StructType.
- returns
the transformed schema.
- def typeAsNullable(dt: DataType): DataType
Turns the data types to nullable in a recursive manner for nested columns.
- def typeExistsRecursively(dt: DataType)(f: (DataType) => Boolean): Boolean
Copied over from DataType for visibility reasons.
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def withDmqTag[T](thunk: => T): T
- Attributes
- protected
- Definition Classes
- DeltaLogging
- def withStatusCode[T](statusCode: String, defaultMessage: String, data: Map[String, Any] = Map.empty)(body: => T): T
Report a log to indicate some command is running.
Report a log to indicate some command is running.
- Definition Classes
- DeltaProgressReporter