p

io.smartdatalake.workflow.action

sparktransformer

package sparktransformer

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class AdditionalColumnsTransformer(name: String = "additionalColumns", description: Option[String] = None, additionalColumns: Map[String, String]) extends ParsableDfTransformer with Product with Serializable

    Add additional columns to the DataFrame by extracting information from the context.

    Add additional columns to the DataFrame by extracting information from the context.

    name

    name of the transformer

    description

    Optional description of the transformer

    additionalColumns

    optional tuples of [column name, spark sql expression] to be added as additional columns to the dataframe. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    Annotations
    @Scaladoc()
  2. case class BlacklistTransformer(name: String = "blacklist", description: Option[String] = None, columnBlacklist: Seq[String]) extends ParsableDfTransformer with Product with Serializable

    Apply a column blacklist to a DataFrame.

    Apply a column blacklist to a DataFrame.

    name

    name of the transformer

    description

    Optional description of the transformer

    columnBlacklist

    List of columns to exclude from DataFrame

    Annotations
    @Scaladoc()
  3. case class DataValidationTransformer(name: String = "dataValidation", description: Option[String] = None, rules: Seq[ValidationRule], errorsColumn: String = "errors") extends ParsableDfTransformer with Product with Serializable

    Apply validation rules to a DataFrame and collect potential violation error messages in a new column.

    Apply validation rules to a DataFrame and collect potential violation error messages in a new column.

    name

    name of the transformer

    description

    Optional description of the transformer

    rules

    list of validation rules to apply to the DataFrame

    errorsColumn

    Optional column name for the list of error messages. Default is "errors".

    Annotations
    @Scaladoc()
  4. trait DfTransformer extends PartitionValueTransformer

    Interface to implement Spark-DataFrame transformers working with one input and one output (1:1)

    Interface to implement Spark-DataFrame transformers working with one input and one output (1:1)

    Annotations
    @Scaladoc()
  5. case class DfTransformerFunctionWrapper(name: String, fn: (DataFrame) ⇒ DataFrame) extends DfTransformer with Product with Serializable

    Legacy wrapper for pure DataFrame transformation function

    Legacy wrapper for pure DataFrame transformation function

    Annotations
    @Scaladoc()
  6. case class DfTransformerWrapperDfsTransformer(transformer: ParsableDfTransformer, subFeedsToApply: Seq[String]) extends ParsableDfsTransformer with Product with Serializable

    A Transformer to use single DataFrame Transformers as multiple DataFrame Transformers.

    A Transformer to use single DataFrame Transformers as multiple DataFrame Transformers. This works by selecting the SubFeeds (DataFrames) the single DataFrame Transformer should be applied to. All other SubFeeds will be passed through without transformation.

    transformer

    Configuration for a DfTransformer to be applied

    subFeedsToApply

    Names of SubFeeds the transformation should be applied to.

    Annotations
    @Scaladoc()
  7. trait DfsTransformer extends PartitionValueTransformer

    Interface to implement Spark-DataFrame transformers working with many inputs and many outputs (n:m)

    Interface to implement Spark-DataFrame transformers working with many inputs and many outputs (n:m)

    Annotations
    @Scaladoc()
  8. case class FilterTransformer(name: String = "filter", description: Option[String] = None, filterClause: String) extends ParsableDfTransformer with Product with Serializable

    Apply a filter condition to a DataFrame.

    Apply a filter condition to a DataFrame.

    name

    name of the transformer

    description

    Optional description of the transformer

    filterClause

    Spark SQL expression to filter the DataFrame

    Annotations
    @Scaladoc()
  9. trait OptionsDfTransformer extends ParsableDfTransformer

    Interface to implement Spark-DataFrame transformers working with one input and one output (1:1).

    Interface to implement Spark-DataFrame transformers working with one input and one output (1:1). This trait extends DfSparkTransformer to pass a map of options as parameter to the transform function. This is mainly used by custom transformers.

    Annotations
    @Scaladoc()
  10. trait OptionsDfsTransformer extends ParsableDfsTransformer

    Interface to implement Spark-DataFrame transformers working with many inputs and many outputs (n:m) This trait extends DfSparkTransformer to pass a map of options as parameter to the transform function.

    Interface to implement Spark-DataFrame transformers working with many inputs and many outputs (n:m) This trait extends DfSparkTransformer to pass a map of options as parameter to the transform function. This is mainly used by custom transformers.

    Annotations
    @Scaladoc()
  11. trait ParsableDfTransformer extends DfTransformer with ParsableFromConfig[ParsableDfTransformer]
  12. trait ParsableDfsTransformer extends DfsTransformer with ParsableFromConfig[ParsableDfsTransformer]
  13. trait PartitionValueTransformer extends AnyRef
  14. case class PythonCodeDfTransformer(name: String = "pythonTransform", description: Option[String] = None, code: Option[String] = None, file: Option[String] = None, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Python/PySpark code.

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Python/PySpark code. Note that this transformer needs a Python and PySpark environment installed. PySpark session is initialize and available under variables sc, session, sqlContext. Other variables available are - inputDf: Input DataFrame - options: Transformation options as Map[String,String] - dataObjectId: Id of input dataObject as String Output DataFrame must be set with setOutputDf(df).

    name

    name of the transformer

    description

    Optional description of the transformer

    code

    Optional python code to user for python transformation. The python code can use variables inputDf, dataObjectId and options. The transformed DataFrame has to be set with setOutputDf.

    file

    Optional file with python code to use for python transformation. The python code can use variables inputDf, dataObjectId and options. The transformed DataFrame has to be set with setOutputDf.

    options

    Options to pass to the transformation

    runtimeOptions

    optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    Annotations
    @Scaladoc()
  15. case class RepartitionTransformer(name: String = "repartition", description: Option[String] = None, numberOfTasksPerPartition: Int, keyCols: Seq[String] = Seq()) extends ParsableDfTransformer with Product with Serializable

    Repartition DataFrame For detailled description about repartitioning DataFrames see also SparkRepartitionDef

    Repartition DataFrame For detailled description about repartitioning DataFrames see also SparkRepartitionDef

    name

    name of the transformer

    description

    Optional description of the transformer

    numberOfTasksPerPartition

    Number of Spark tasks to create per partition value by repartitioning the DataFrame.

    keyCols

    Optional key columns to distribute records over Spark tasks inside a partition value.

    Annotations
    @Scaladoc()
  16. case class RowLevelValidationRule(condition: String, errorMsg: Option[String] = None) extends ValidationRule with Product with Serializable

    Definition for a row level data validation rule.

    Definition for a row level data validation rule.

    condition

    a Spark SQL expression defining the condition to be tested.

    errorMsg

    Optional error msg to be create if the condition fails. Default is to use a text representation of the condition.

    Annotations
    @Scaladoc()
  17. case class SQLDfTransformer(name: String = "sqlTransform", description: Option[String] = None, code: String, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as SQL code.

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as SQL code. The input data is available as temporary view in SQL. As name for the temporary view the input DataObjectId is used (special characters are replaces by underscores). A special token '%{inputViewName}' will be replaced with the name of the temporary view at runtime.

    name

    name of the transformer

    description

    Optional description of the transformer

    code

    SQL code for transformation. Use tokens %{<key>} to replace with runtimeOptions in SQL code. Example: "select * from test where run = %{runId}" A special token %{inputViewName} can be used to insert the temporary view name.

    options

    Options to pass to the transformation

    runtimeOptions

    optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    Annotations
    @Scaladoc()
  18. case class SQLDfsTransformer(name: String = "sqlTransform", description: Option[String] = None, code: Map[DataObjectId, String], options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfsTransformer with Product with Serializable

    Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) as SQL code.

    Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) as SQL code. The input data is available as temporary views in SQL. As name for the temporary views the input DataObjectId is used (special characters are replaces by underscores).

    name

    name of the transformer

    description

    Optional description of the transformer

    code

    SQL code for transformation. Use tokens %{<key>} to replace with runtimeOptions in SQL code. Example: "select * from test where run = %{runId}" A special token %{inputViewName} can be used to insert the temporary view name.

    options

    Options to pass to the transformation

    runtimeOptions

    optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    Annotations
    @Scaladoc()
  19. case class ScalaClassDfTransformer(name: String = "scalaTransform", description: Option[String] = None, className: String, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Java/Scala Class.

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Java/Scala Class. Define a transform function which receives a DataObjectId, a DataFrame and a map of options and has to return a DataFrame. The Java/Scala class has to implement interface CustomDfTransformer.

    name

    name of the transformer

    description

    Optional description of the transformer

    className

    class name implementing trait CustomDfTransformer

    options

    Options to pass to the transformation

    runtimeOptions

    optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    Annotations
    @Scaladoc()
  20. case class ScalaClassDfsTransformer(name: String = "scalaTransform", description: Option[String] = None, className: String, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfsTransformer with Product with Serializable

    Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) Define a transform function which receives a map of input DataObjectIds with DataFrames and a map of options and as to return a map of output DataObjectIds with DataFrames, see also trait CustomDfsTransformer.

    Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) Define a transform function which receives a map of input DataObjectIds with DataFrames and a map of options and as to return a map of output DataObjectIds with DataFrames, see also trait CustomDfsTransformer.

    name

    name of the transformer

    description

    Optional description of the transformer

    className

    class name implementing trait CustomDfsTransformer

    options

    Options to pass to the transformation

    runtimeOptions

    optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    Annotations
    @Scaladoc()
  21. case class ScalaCodeDfTransformer(name: String = "scalaTransform", description: Option[String] = None, code: Option[String] = None, file: Option[String] = None, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Scala code which is compiled at runtime.

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Scala code which is compiled at runtime. Define a transform function which receives a DataObjectId, a DataFrame and a map of options and has to return a DataFrame. The scala code has to implement a function of type fnTransformType.

    name

    name of the transformer

    description

    Optional description of the transformer

    code

    Scala code for transformation. The scala code needs to be a function of type fnTransformType.

    file

    File where scala code for transformation is loaded from. The scala code in the file needs to be a function of type fnTransformType.

    options

    Options to pass to the transformation

    runtimeOptions

    optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    Annotations
    @Scaladoc()
  22. case class ScalaCodeDfsTransformer(name: String = "scalaTransform", description: Option[String] = None, code: Option[String] = None, file: Option[String] = None, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfsTransformer with Product with Serializable

    Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) as Scala code which is compiled at runtime.

    Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) as Scala code which is compiled at runtime. Define a transform function which receives a map of input DataObjectIds with DataFrames and a map of options and has to return a map of output DataObjectIds with DataFrames. The scala code has to implement a function of type fnTransformType.

    name

    name of the transformer

    description

    Optional description of the transformer

    code

    Scala code for transformation. The scala code needs to be a function of type fnTransformType.

    file

    File where scala code for transformation is loaded from. The scala code in the file needs to be a function of type fnTransformType.

    options

    Options to pass to the transformation

    runtimeOptions

    optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    Annotations
    @Scaladoc()
  23. case class ScalaNotebookDfTransformer(name: String = "scalaTransform", description: Option[String] = None, url: String, functionName: String, authMode: Option[AuthMode] = None, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Scala code which is compiled at runtime.

    Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Scala code which is compiled at runtime. The code is loaded from a Notebook. It should define a transform function with a configurable name, which receives a DataObjectId, a DataFrame and a map of options and has to return a DataFrame, see also (fnTransformType). Notebook-cells starting with "//!IGNORE" will be ignored.

    name

    name of the transformer

    description

    Optional description of the transformer

    url

    Url to download notebook in IPYNB-format, which defines transformation.

    functionName

    The notebook needs to contain a Scala-function with this name and type fnTransformType.

    authMode

    optional authentication information for webservice, e.g. BasicAuthMode for user/pw authentication

    options

    Options to pass to the transformation

    runtimeOptions

    optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    Annotations
    @Scaladoc()
  24. case class StandardizeDatatypesTransformer(name: String = "standardizeDatatypes", description: Option[String] = None) extends ParsableDfTransformer with Product with Serializable

    Standardize datatypes of a DataFrame.

    Standardize datatypes of a DataFrame. Current implementation converts all decimal datatypes to a corresponding integral or float datatype

    name

    name of the transformer

    description

    Optional description of the transformer

    Annotations
    @Scaladoc()
  25. sealed trait ValidationRule extends AnyRef
  26. case class WhitelistTransformer(name: String = "whitelist", description: Option[String] = None, columnWhitelist: Seq[String]) extends ParsableDfTransformer with Product with Serializable

    Apply a column whitelist to a DataFrame.

    Apply a column whitelist to a DataFrame.

    name

    name of the transformer

    description

    Optional description of the transformer

    columnWhitelist

    List of columns to keep from DataFrame

    Annotations
    @Scaladoc()

Value Members

  1. object AdditionalColumnsTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  2. object BlacklistTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  3. object DataValidationTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  4. object DfTransformerWrapperDfsTransformer extends FromConfigFactory[ParsableDfsTransformer] with Serializable
  5. object FilterTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  6. object PythonCodeDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  7. object RepartitionTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  8. object SQLDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  9. object SQLDfsTransformer extends FromConfigFactory[ParsableDfsTransformer] with Serializable
  10. object ScalaClassDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  11. object ScalaClassDfsTransformer extends FromConfigFactory[ParsableDfsTransformer] with Serializable
  12. object ScalaCodeDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  13. object ScalaCodeDfsTransformer extends FromConfigFactory[ParsableDfsTransformer] with Serializable
  14. object ScalaNotebookDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  15. object StandardizeDatatypesTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
  16. object WhitelistTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable

Ungrouped