package sparktransformer
- Alphabetic
- Public
- All
Type Members
-
case class
AdditionalColumnsTransformer(name: String = "additionalColumns", description: Option[String] = None, additionalColumns: Map[String, String]) extends ParsableDfTransformer with Product with Serializable
Add additional columns to the DataFrame by extracting information from the context.
Add additional columns to the DataFrame by extracting information from the context.
- name
name of the transformer
- description
Optional description of the transformer
- additionalColumns
optional tuples of [column name, spark sql expression] to be added as additional columns to the dataframe. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- Annotations
- @Scaladoc()
-
case class
BlacklistTransformer(name: String = "blacklist", description: Option[String] = None, columnBlacklist: Seq[String]) extends ParsableDfTransformer with Product with Serializable
Apply a column blacklist to a DataFrame.
Apply a column blacklist to a DataFrame.
- name
name of the transformer
- description
Optional description of the transformer
- columnBlacklist
List of columns to exclude from DataFrame
- Annotations
- @Scaladoc()
-
case class
DataValidationTransformer(name: String = "dataValidation", description: Option[String] = None, rules: Seq[ValidationRule], errorsColumn: String = "errors") extends ParsableDfTransformer with Product with Serializable
Apply validation rules to a DataFrame and collect potential violation error messages in a new column.
Apply validation rules to a DataFrame and collect potential violation error messages in a new column.
- name
name of the transformer
- description
Optional description of the transformer
- rules
list of validation rules to apply to the DataFrame
- errorsColumn
Optional column name for the list of error messages. Default is "errors".
- Annotations
- @Scaladoc()
-
trait
DfTransformer extends PartitionValueTransformer
Interface to implement Spark-DataFrame transformers working with one input and one output (1:1)
Interface to implement Spark-DataFrame transformers working with one input and one output (1:1)
- Annotations
- @Scaladoc()
-
case class
DfTransformerFunctionWrapper(name: String, fn: (DataFrame) ⇒ DataFrame) extends DfTransformer with Product with Serializable
Legacy wrapper for pure DataFrame transformation function
Legacy wrapper for pure DataFrame transformation function
- Annotations
- @Scaladoc()
-
case class
DfTransformerWrapperDfsTransformer(transformer: ParsableDfTransformer, subFeedsToApply: Seq[String]) extends ParsableDfsTransformer with Product with Serializable
A Transformer to use single DataFrame Transformers as multiple DataFrame Transformers.
A Transformer to use single DataFrame Transformers as multiple DataFrame Transformers. This works by selecting the SubFeeds (DataFrames) the single DataFrame Transformer should be applied to. All other SubFeeds will be passed through without transformation.
- transformer
Configuration for a DfTransformer to be applied
- subFeedsToApply
Names of SubFeeds the transformation should be applied to.
- Annotations
- @Scaladoc()
-
trait
DfsTransformer extends PartitionValueTransformer
Interface to implement Spark-DataFrame transformers working with many inputs and many outputs (n:m)
Interface to implement Spark-DataFrame transformers working with many inputs and many outputs (n:m)
- Annotations
- @Scaladoc()
-
case class
FilterTransformer(name: String = "filter", description: Option[String] = None, filterClause: String) extends ParsableDfTransformer with Product with Serializable
Apply a filter condition to a DataFrame.
Apply a filter condition to a DataFrame.
- name
name of the transformer
- description
Optional description of the transformer
- filterClause
Spark SQL expression to filter the DataFrame
- Annotations
- @Scaladoc()
-
trait
OptionsDfTransformer extends ParsableDfTransformer
Interface to implement Spark-DataFrame transformers working with one input and one output (1:1).
Interface to implement Spark-DataFrame transformers working with one input and one output (1:1). This trait extends DfSparkTransformer to pass a map of options as parameter to the transform function. This is mainly used by custom transformers.
- Annotations
- @Scaladoc()
-
trait
OptionsDfsTransformer extends ParsableDfsTransformer
Interface to implement Spark-DataFrame transformers working with many inputs and many outputs (n:m) This trait extends DfSparkTransformer to pass a map of options as parameter to the transform function.
Interface to implement Spark-DataFrame transformers working with many inputs and many outputs (n:m) This trait extends DfSparkTransformer to pass a map of options as parameter to the transform function. This is mainly used by custom transformers.
- Annotations
- @Scaladoc()
- trait ParsableDfTransformer extends DfTransformer with ParsableFromConfig[ParsableDfTransformer]
- trait ParsableDfsTransformer extends DfsTransformer with ParsableFromConfig[ParsableDfsTransformer]
- trait PartitionValueTransformer extends AnyRef
-
case class
PythonCodeDfTransformer(name: String = "pythonTransform", description: Option[String] = None, code: Option[String] = None, file: Option[String] = None, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Python/PySpark code.
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Python/PySpark code. Note that this transformer needs a Python and PySpark environment installed. PySpark session is initialize and available under variables
sc,session,sqlContext. Other variables available are -inputDf: Input DataFrame -options: Transformation options as Map[String,String] -dataObjectId: Id of input dataObject as String Output DataFrame must be set withsetOutputDf(df).- name
name of the transformer
- description
Optional description of the transformer
- code
Optional python code to user for python transformation. The python code can use variables inputDf, dataObjectId and options. The transformed DataFrame has to be set with setOutputDf.
- file
Optional file with python code to use for python transformation. The python code can use variables inputDf, dataObjectId and options. The transformed DataFrame has to be set with setOutputDf.
- options
Options to pass to the transformation
- runtimeOptions
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- Annotations
- @Scaladoc()
-
case class
RepartitionTransformer(name: String = "repartition", description: Option[String] = None, numberOfTasksPerPartition: Int, keyCols: Seq[String] = Seq()) extends ParsableDfTransformer with Product with Serializable
Repartition DataFrame For detailled description about repartitioning DataFrames see also SparkRepartitionDef
Repartition DataFrame For detailled description about repartitioning DataFrames see also SparkRepartitionDef
- name
name of the transformer
- description
Optional description of the transformer
- numberOfTasksPerPartition
Number of Spark tasks to create per partition value by repartitioning the DataFrame.
- keyCols
Optional key columns to distribute records over Spark tasks inside a partition value.
- Annotations
- @Scaladoc()
-
case class
RowLevelValidationRule(condition: String, errorMsg: Option[String] = None) extends ValidationRule with Product with Serializable
Definition for a row level data validation rule.
Definition for a row level data validation rule.
- condition
a Spark SQL expression defining the condition to be tested.
- errorMsg
Optional error msg to be create if the condition fails. Default is to use a text representation of the condition.
- Annotations
- @Scaladoc()
-
case class
SQLDfTransformer(name: String = "sqlTransform", description: Option[String] = None, code: String, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as SQL code.
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as SQL code. The input data is available as temporary view in SQL. As name for the temporary view the input DataObjectId is used (special characters are replaces by underscores). A special token '%{inputViewName}' will be replaced with the name of the temporary view at runtime.
- name
name of the transformer
- description
Optional description of the transformer
- code
SQL code for transformation. Use tokens %{<key>} to replace with runtimeOptions in SQL code. Example: "select * from test where run = %{runId}" A special token %{inputViewName} can be used to insert the temporary view name.
- options
Options to pass to the transformation
- runtimeOptions
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- Annotations
- @Scaladoc()
-
case class
SQLDfsTransformer(name: String = "sqlTransform", description: Option[String] = None, code: Map[DataObjectId, String], options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfsTransformer with Product with Serializable
Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) as SQL code.
Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) as SQL code. The input data is available as temporary views in SQL. As name for the temporary views the input DataObjectId is used (special characters are replaces by underscores).
- name
name of the transformer
- description
Optional description of the transformer
- code
SQL code for transformation. Use tokens %{<key>} to replace with runtimeOptions in SQL code. Example: "select * from test where run = %{runId}" A special token %{inputViewName} can be used to insert the temporary view name.
- options
Options to pass to the transformation
- runtimeOptions
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- Annotations
- @Scaladoc()
-
case class
ScalaClassDfTransformer(name: String = "scalaTransform", description: Option[String] = None, className: String, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Java/Scala Class.
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Java/Scala Class. Define a transform function which receives a DataObjectId, a DataFrame and a map of options and has to return a DataFrame. The Java/Scala class has to implement interface CustomDfTransformer.
- name
name of the transformer
- description
Optional description of the transformer
- className
class name implementing trait CustomDfTransformer
- options
Options to pass to the transformation
- runtimeOptions
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- Annotations
- @Scaladoc()
-
case class
ScalaClassDfsTransformer(name: String = "scalaTransform", description: Option[String] = None, className: String, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfsTransformer with Product with Serializable
Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) Define a transform function which receives a map of input DataObjectIds with DataFrames and a map of options and as to return a map of output DataObjectIds with DataFrames, see also trait CustomDfsTransformer.
Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) Define a transform function which receives a map of input DataObjectIds with DataFrames and a map of options and as to return a map of output DataObjectIds with DataFrames, see also trait CustomDfsTransformer.
- name
name of the transformer
- description
Optional description of the transformer
- className
class name implementing trait CustomDfsTransformer
- options
Options to pass to the transformation
- runtimeOptions
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- Annotations
- @Scaladoc()
-
case class
ScalaCodeDfTransformer(name: String = "scalaTransform", description: Option[String] = None, code: Option[String] = None, file: Option[String] = None, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Scala code which is compiled at runtime.
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Scala code which is compiled at runtime. Define a transform function which receives a DataObjectId, a DataFrame and a map of options and has to return a DataFrame. The scala code has to implement a function of type fnTransformType.
- name
name of the transformer
- description
Optional description of the transformer
- code
Scala code for transformation. The scala code needs to be a function of type fnTransformType.
- file
File where scala code for transformation is loaded from. The scala code in the file needs to be a function of type fnTransformType.
- options
Options to pass to the transformation
- runtimeOptions
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- Annotations
- @Scaladoc()
-
case class
ScalaCodeDfsTransformer(name: String = "scalaTransform", description: Option[String] = None, code: Option[String] = None, file: Option[String] = None, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfsTransformer with Product with Serializable
Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) as Scala code which is compiled at runtime.
Configuration of a custom Spark-DataFrame transformation between many inputs and many outputs (n:m) as Scala code which is compiled at runtime. Define a transform function which receives a map of input DataObjectIds with DataFrames and a map of options and has to return a map of output DataObjectIds with DataFrames. The scala code has to implement a function of type fnTransformType.
- name
name of the transformer
- description
Optional description of the transformer
- code
Scala code for transformation. The scala code needs to be a function of type fnTransformType.
- file
File where scala code for transformation is loaded from. The scala code in the file needs to be a function of type fnTransformType.
- options
Options to pass to the transformation
- runtimeOptions
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- Annotations
- @Scaladoc()
-
case class
ScalaNotebookDfTransformer(name: String = "scalaTransform", description: Option[String] = None, url: String, functionName: String, authMode: Option[AuthMode] = None, options: Map[String, String] = Map(), runtimeOptions: Map[String, String] = Map()) extends OptionsDfTransformer with Product with Serializable
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Scala code which is compiled at runtime.
Configuration of a custom Spark-DataFrame transformation between one input and one output (1:1) as Scala code which is compiled at runtime. The code is loaded from a Notebook. It should define a transform function with a configurable name, which receives a DataObjectId, a DataFrame and a map of options and has to return a DataFrame, see also (fnTransformType). Notebook-cells starting with "//!IGNORE" will be ignored.
- name
name of the transformer
- description
Optional description of the transformer
- url
Url to download notebook in IPYNB-format, which defines transformation.
- functionName
The notebook needs to contain a Scala-function with this name and type fnTransformType.
- authMode
optional authentication information for webservice, e.g. BasicAuthMode for user/pw authentication
- options
Options to pass to the transformation
- runtimeOptions
optional tuples of [key, spark sql expression] to be added as additional options when executing transformation. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- Annotations
- @Scaladoc()
-
case class
StandardizeDatatypesTransformer(name: String = "standardizeDatatypes", description: Option[String] = None) extends ParsableDfTransformer with Product with Serializable
Standardize datatypes of a DataFrame.
Standardize datatypes of a DataFrame. Current implementation converts all decimal datatypes to a corresponding integral or float datatype
- name
name of the transformer
- description
Optional description of the transformer
- Annotations
- @Scaladoc()
- sealed trait ValidationRule extends AnyRef
-
case class
WhitelistTransformer(name: String = "whitelist", description: Option[String] = None, columnWhitelist: Seq[String]) extends ParsableDfTransformer with Product with Serializable
Apply a column whitelist to a DataFrame.
Apply a column whitelist to a DataFrame.
- name
name of the transformer
- description
Optional description of the transformer
- columnWhitelist
List of columns to keep from DataFrame
- Annotations
- @Scaladoc()
Value Members
- object AdditionalColumnsTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object BlacklistTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object DataValidationTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object DfTransformerWrapperDfsTransformer extends FromConfigFactory[ParsableDfsTransformer] with Serializable
- object FilterTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object PythonCodeDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object RepartitionTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object SQLDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object SQLDfsTransformer extends FromConfigFactory[ParsableDfsTransformer] with Serializable
- object ScalaClassDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object ScalaClassDfsTransformer extends FromConfigFactory[ParsableDfsTransformer] with Serializable
- object ScalaCodeDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object ScalaCodeDfsTransformer extends FromConfigFactory[ParsableDfsTransformer] with Serializable
- object ScalaNotebookDfTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object StandardizeDatatypesTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable
- object WhitelistTransformer extends FromConfigFactory[ParsableDfTransformer] with Serializable