package action
- Alphabetic
- Public
- All
Type Members
-
case class
ActionMetadata(name: Option[String] = None, description: Option[String] = None, feed: Option[String] = None, tags: Seq[String] = Seq()) extends Product with Serializable
Additional metadata for an Action
Additional metadata for an Action
- name
Readable name of the Action
- description
Description of the content of the Action
- feed
Name of the feed this Action belongs to
- tags
Optional custom tags for this object
- Annotations
- @Scaladoc()
-
abstract
class
ActionSubFeedsImpl[S <: SubFeed] extends Action
Implementation of SubFeed handling.
Implementation of SubFeed handling. This is a generic implementation that supports many input and output SubFeeds.
- S
SubFeed type this Action is designed for.
- Annotations
- @Scaladoc()
-
case class
CopyAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, deleteDataAfterRead: Boolean = false, transformer: Option[CustomDfTransformerConfig] = None, transformers: Seq[ParsableDfTransformer] = Seq(), columnBlacklist: Option[Seq[String]] = None, columnWhitelist: Option[Seq[String]] = None, additionalColumns: Option[Map[String, String]] = None, filterClause: Option[String] = None, standardizeDatatypes: Boolean = false, breakDataFrameLineage: Boolean = false, persist: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, saveModeOptions: Option[SaveModeOptions] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends SparkOneToOneActionImpl with Product with Serializable
Action to copy files (i.e.
Action to copy files (i.e. from stage to integration)
- inputId
inputs DataObject
- outputId
output DataObject
- deleteDataAfterRead
a flag to enable deletion of input partitions after copying.
- transformer
optional custom transformation to apply.
- transformers
optional list of transformations to apply. See sparktransformer for a list of included Transformers. The transformations are applied according to the lists ordering.
- columnBlacklist
Remove all columns on blacklist from dataframe
- columnWhitelist
Keep only columns on whitelist in dataframe
- additionalColumns
optional tuples of [column name, spark sql expression] to be added as additional columns to the dataframe. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- executionMode
optional execution mode for this Action
- executionCondition
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
- metricsFailCondition
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
- saveModeOptions
override and parametrize saveMode set in output DataObject configurations when writing to DataObjects.
- Annotations
- @Scaladoc()
-
case class
CustomFileAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, transformer: CustomFileTransformerConfig, filesPerPartition: Int = 10, breakFileRefLineage: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends FileOneToOneActionImpl with SmartDataLakeLogger with Product with Serializable
Action to transform files between two Hadoop Data Objects.
Action to transform files between two Hadoop Data Objects. The transformation is executed in distributed mode on the Spark executors. A custom file transformer must be given, which reads a file from Hadoop and writes it back to Hadoop.
- inputId
inputs DataObject
- outputId
output DataObject
- transformer
a custom file transformer, which reads a file from HadoopFileDataObject and writes it back to another HadoopFileDataObject
- filesPerPartition
number of files per Spark partition
- executionMode
optional execution mode for this Action
- executionCondition
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
- metricsFailCondition
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
- Annotations
- @Scaladoc()
-
case class
CustomScriptAction(id: ActionId, inputIds: Seq[DataObjectId], outputIds: Seq[DataObjectId], scripts: Seq[ParsableScriptDef] = Seq(), executionCondition: Option[Condition] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends ScriptActionImpl with Product with Serializable
Action execute script after multiple input DataObjects are ready, notifying multiple output DataObjects when script succeeded.
Action execute script after multiple input DataObjects are ready, notifying multiple output DataObjects when script succeeded.
- inputIds
input DataObject's
- outputIds
output DataObject's
- scripts
definition of scripts to execute
- executionCondition
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
- Annotations
- @Scaladoc()
-
case class
CustomSparkAction(id: ActionId, inputIds: Seq[DataObjectId], outputIds: Seq[DataObjectId], transformer: Option[CustomDfsTransformerConfig] = None, transformers: Seq[ParsableDfsTransformer] = Seq(), breakDataFrameLineage: Boolean = false, persist: Boolean = false, mainInputId: Option[DataObjectId] = None, mainOutputId: Option[DataObjectId] = None, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None, recursiveInputIds: Seq[DataObjectId] = Seq(), inputIdsToIgnoreFilter: Seq[DataObjectId] = Seq())(implicit instanceRegistry: InstanceRegistry) extends SparkActionImpl with Product with Serializable
Action to transform data according to a custom transformer.
Action to transform data according to a custom transformer. Allows to transform multiple input and output dataframes.
- inputIds
input DataObject's
- outputIds
output DataObject's
- transformer
custom transformation for multiple dataframes to apply
- mainInputId
optional selection of main inputId used for execution mode and partition values propagation. Only needed if there are multiple input DataObject's.
- mainOutputId
optional selection of main outputId used for execution mode and partition values propagation. Only needed if there are multiple output DataObject's.
- executionMode
optional execution mode for this Action
- executionCondition
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
- metricsFailCondition
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
- recursiveInputIds
output of action that are used as input in the same action
- inputIdsToIgnoreFilter
optional list of input ids to ignore filter (partition values & filter clause)
- Annotations
- @Scaladoc()
-
case class
DeduplicateAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, transformer: Option[CustomDfTransformerConfig] = None, transformers: Seq[ParsableDfTransformer] = Seq(), columnBlacklist: Option[Seq[String]] = None, columnWhitelist: Option[Seq[String]] = None, additionalColumns: Option[Map[String, String]] = None, filterClause: Option[String] = None, standardizeDatatypes: Boolean = false, ignoreOldDeletedColumns: Boolean = false, ignoreOldDeletedNestedColumns: Boolean = true, updateCapturedColumnOnlyWhenChanged: Boolean = false, mergeModeEnable: Boolean = false, mergeModeAdditionalJoinPredicate: Option[String] = None, breakDataFrameLineage: Boolean = false, persist: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends SparkOneToOneActionImpl with Product with Serializable
Action to deduplicate a subfeed.
Action to deduplicate a subfeed. Deduplication keeps the last record for every key, also after it has been deleted in the source. DeduplicateAction adds an additional Column TechnicalTableColumn.captured. It contains the timestamp of the last occurrence of the record in the source. This creates lots of updates. Especially when using saveMode.Merge it is better to set TechnicalTableColumn.captured to the last change of the record in the source. Use updateCapturedColumnOnlyWhenChanged = true to enable this optimization.
DeduplicateAction needs a transactional table (e.g. TransactionalSparkTableDataObject) as output with defined primary keys. If output implements CanMergeDataFrame, saveMode.Merge can be enabled by setting mergeModeEnable = true. This allows for much better performance.
- inputId
inputs DataObject
- outputId
output DataObject
- transformer
optional custom transformation to apply
- transformers
optional list of transformations to apply before deduplication. See sparktransformer for a list of included Transformers. The transformations are applied according to the lists ordering.
- columnBlacklist
Remove all columns on blacklist from dataframe
- columnWhitelist
Keep only columns on whitelist in dataframe
- additionalColumns
optional tuples of [column name, spark sql expression] to be added as additional columns to the dataframe. The spark sql expressions are evaluated against an instance of io.smartdatalake.util.misc.DefaultExpressionData.
- ignoreOldDeletedColumns
if true, remove no longer existing columns in Schema Evolution
- ignoreOldDeletedNestedColumns
if true, remove no longer existing columns from nested data types in Schema Evolution. Keeping deleted columns in complex data types has performance impact as all new data in the future has to be converted by a complex function.
- updateCapturedColumnOnlyWhenChanged
Set to true to enable update Column TechnicalTableColumn.captured only if Record has changed in the source, instead of updating it with every execution (default=false). This results in much less records updated with saveMode.Merge.
- mergeModeEnable
Set to true to use saveMode.Merge for much better performance. Output DataObject must implement CanMergeDataFrame if enabled (default = false).
- mergeModeAdditionalJoinPredicate
To optimize performance it might be interesting to limit the records read from the existing table data, e.g. it might be sufficient to use only the last 7 days. Specify a condition to select existing data to be used in transformation as Spark SQL expression. Use table alias 'existing' to reference columns of the existing table data.
- executionMode
optional execution mode for this Action
- executionCondition
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
- metricsFailCondition
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
- Annotations
- @Scaladoc()
-
abstract
class
FileOneToOneActionImpl extends ActionSubFeedsImpl[FileSubFeed]
Implementation of logic needed to use FileSubFeeds with only one input and one output SubFeed.
Implementation of logic needed to use FileSubFeeds with only one input and one output SubFeed.
- Annotations
- @Scaladoc()
-
case class
FileTransferAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, overwrite: Boolean = true, breakFileRefLineage: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends FileOneToOneActionImpl with Product with Serializable
Action to transfer files between SFtp, Hadoop and local Fs.
Action to transfer files between SFtp, Hadoop and local Fs.
- inputId
inputs DataObject
- outputId
output DataObject
- breakFileRefLineage
If set to true, file references passed on from previous action are ignored by this action. The action will detect on its own what files it is going to process.
- executionMode
optional execution mode for this Action
- executionCondition
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
- metricsFailCondition
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
- Annotations
- @Scaladoc()
-
case class
HistorizeAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, transformer: Option[CustomDfTransformerConfig] = None, transformers: Seq[ParsableDfTransformer] = Seq(), columnBlacklist: Option[Seq[String]] = None, columnWhitelist: Option[Seq[String]] = None, additionalColumns: Option[Map[String, String]] = None, standardizeDatatypes: Boolean = false, filterClause: Option[String] = None, historizeBlacklist: Option[Seq[String]] = None, historizeWhitelist: Option[Seq[String]] = None, ignoreOldDeletedColumns: Boolean = false, ignoreOldDeletedNestedColumns: Boolean = true, mergeModeEnable: Boolean = false, mergeModeAdditionalJoinPredicate: Option[String] = None, breakDataFrameLineage: Boolean = false, persist: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends SparkOneToOneActionImpl with Product with Serializable
Action to historize a subfeed.
Action to historize a subfeed. Historization creates a technical history of data by creating valid-from/to columns. It needs a transactional table as output with defined primary keys.
- inputId
inputs DataObject
- outputId
output DataObject
- transformer
optional custom transformation to apply
- transformers
optional list of transformations to apply before historization. See sparktransformer for a list of included Transformers. The transformations are applied according to the lists ordering.
- columnBlacklist
Remove all columns on blacklist from dataframe
- columnWhitelist
Keep only columns on whitelist in dataframe
- additionalColumns
optional tuples of [column name, spark sql expression] to be added as additional columns to the dataframe. The spark sql expressions are evaluated against an instance of DefaultExpressionData.
- filterClause
Filter of data to be processed by historization. It can be used to exclude historical data not needed to create new history, for performance reasons. Note that filterClause is only applied if mergeModeEnable=false. Use mergeModeAdditionalJoinPredicate if mergeModeEnable=true to achieve a similar performance tuning.
- historizeBlacklist
optional list of columns to ignore when comparing two records in historization. Can not be used together with historizeWhitelist.
- historizeWhitelist
optional final list of columns to use when comparing two records in historization. Can not be used together with historizeBlacklist.
- ignoreOldDeletedColumns
if true, remove no longer existing columns in Schema Evolution
- ignoreOldDeletedNestedColumns
if true, remove no longer existing columns from nested data types in Schema Evolution. Keeping deleted columns in complex data types has performance impact as all new data in the future has to be converted by a complex function.
- mergeModeEnable
Set to true to use saveMode.Merge for much better performance. Output DataObject must implement CanMergeDataFrame if enabled (default = false).
- mergeModeAdditionalJoinPredicate
To optimize performance it might be interesting to limit the records read from the existing table data, e.g. it might be sufficient to use only the last 7 days. Specify a condition to select existing data to be used in transformation as Spark SQL expression. Use table alias 'existing' to reference columns of the existing table data.
- executionMode
optional execution mode for this Action
- executionCondition
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
- metricsFailCondition
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
- Annotations
- @Scaladoc()
- case class Metric(dataObjectId: String, key: Option[String], value: Option[String]) extends Product with Serializable
-
case class
NoDataToProcessWarning(actionId: NodeId, msg: String, results: Option[Seq[SubFeed]] = None) extends TaskSkippedDontStopWarning[SubFeed] with Product with Serializable
Execution modes can throw this exception to indicate that there is no data to process.
Execution modes can throw this exception to indicate that there is no data to process.
- results
SDL might add fake results to this exception to allow further execution of DAG. When creating the exception result should be set to None.
- Annotations
- @Scaladoc() @DeveloperApi()
- case class ResultRuntimeInfo(subFeed: SubFeed, mainMetrics: Map[String, Any]) extends Product with Serializable
-
case class
RuntimeEvent(tstmp: LocalDateTime, phase: ExecutionPhase, state: RuntimeEventState, msg: Option[String], results: Seq[SubFeed]) extends Product with Serializable
A structure to collect runtime event information
A structure to collect runtime event information
- Annotations
- @Scaladoc()
-
case class
RuntimeInfo(executionId: ExecutionId = SDLExecutionId(-1, -1), state: RuntimeEventState, startTstmp: Option[LocalDateTime] = None, duration: Option[Duration] = None, msg: Option[String] = None, results: Seq[ResultRuntimeInfo] = Seq(), dataObjectsState: Seq[DataObjectState] = Seq()) extends Product with Serializable
Summarized runtime information
Summarized runtime information
- Annotations
- @Scaladoc()
-
case class
SDLExecutionId(runId: Int, attemptId: Int = 1) extends ExecutionId with Product with Serializable
Standard execution id for actions that are executed synchronous by SDL.
Standard execution id for actions that are executed synchronous by SDL.
- Annotations
- @Scaladoc()
-
abstract
class
ScriptActionImpl extends ActionSubFeedsImpl[ScriptSubFeed]
Implementation of logic needed for Script Actions
Implementation of logic needed for Script Actions
- Annotations
- @Scaladoc()
-
abstract
class
SparkOneToOneActionImpl extends SparkActionImpl
Implementation of logic needed to use SparkAction with only one input and one output SubFeed.
Implementation of logic needed to use SparkAction with only one input and one output SubFeed.
- Annotations
- @Scaladoc()
-
case class
SparkStreamingExecutionId(batchId: Long) extends ExecutionId with Product with Serializable
Execution id for spark streaming jobs.
Execution id for spark streaming jobs. They need a different execution id as they are executed asynchronous.
- Annotations
- @Scaladoc()
- case class SubFeedExpressionData(partitionValues: Seq[Map[String, String]], isDAGStart: Boolean, isSkipped: Boolean) extends Product with Serializable
- case class SubFeedsExpressionData(inputSubFeeds: Map[String, SubFeedExpressionData]) extends Product with Serializable
-
case class
WriteSubFeedResult(noData: Option[Boolean], metrics: Option[Map[String, Any]] = None) extends Product with Serializable
Return value of writing a SubFeed.
Return value of writing a SubFeed.
- noData
true if there was no data to write, otherwise false. If unknown set to None.
- metrics
Depending on the engine, metrics are received by a listener (SparkSubFeed) or can be returned directly by filling this attribute (FileSubFeed).
- Annotations
- @Scaladoc()
Value Members
- object CopyAction extends FromConfigFactory[Action] with Serializable
- object CustomFileAction extends FromConfigFactory[Action] with Serializable
- object CustomScriptAction extends FromConfigFactory[Action] with Serializable
- object CustomSparkAction extends FromConfigFactory[Action] with Serializable
- object DeduplicateAction extends FromConfigFactory[Action] with Serializable
- object FileTransferAction extends FromConfigFactory[Action] with Serializable
- object HistorizeAction extends FromConfigFactory[Action] with Serializable
- object SDLExecutionId extends Serializable
- object SubFeedsExpressionData extends Serializable