case class HistorizeAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, transformer: Option[CustomDfTransformerConfig] = None, transformers: Seq[ParsableDfTransformer] = Seq(), columnBlacklist: Option[Seq[String]] = None, columnWhitelist: Option[Seq[String]] = None, additionalColumns: Option[Map[String, String]] = None, standardizeDatatypes: Boolean = false, filterClause: Option[String] = None, historizeBlacklist: Option[Seq[String]] = None, historizeWhitelist: Option[Seq[String]] = None, ignoreOldDeletedColumns: Boolean = false, ignoreOldDeletedNestedColumns: Boolean = true, mergeModeEnable: Boolean = false, mergeModeAdditionalJoinPredicate: Option[String] = None, breakDataFrameLineage: Boolean = false, persist: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends SparkOneToOneActionImpl with Product with Serializable

Action to historize a subfeed. Historization creates a technical history of data by creating valid-from/to columns. It needs a transactional table as output with defined primary keys.

inputId

inputs DataObject

outputId

output DataObject

transformer

optional custom transformation to apply

transformers

optional list of transformations to apply before historization. See sparktransformer for a list of included Transformers. The transformations are applied according to the lists ordering.

columnBlacklist

Remove all columns on blacklist from dataframe

columnWhitelist

Keep only columns on whitelist in dataframe

additionalColumns

optional tuples of [column name, spark sql expression] to be added as additional columns to the dataframe. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

filterClause

Filter of data to be processed by historization. It can be used to exclude historical data not needed to create new history, for performance reasons. Note that filterClause is only applied if mergeModeEnable=false. Use mergeModeAdditionalJoinPredicate if mergeModeEnable=true to achieve a similar performance tuning.

historizeBlacklist

optional list of columns to ignore when comparing two records in historization. Can not be used together with historizeWhitelist.

historizeWhitelist

optional final list of columns to use when comparing two records in historization. Can not be used together with historizeBlacklist.

ignoreOldDeletedColumns

if true, remove no longer existing columns in Schema Evolution

ignoreOldDeletedNestedColumns

if true, remove no longer existing columns from nested data types in Schema Evolution. Keeping deleted columns in complex data types has performance impact as all new data in the future has to be converted by a complex function.

mergeModeEnable

Set to true to use saveMode.Merge for much better performance. Output DataObject must implement CanMergeDataFrame if enabled (default = false).

mergeModeAdditionalJoinPredicate

To optimize performance it might be interesting to limit the records read from the existing table data, e.g. it might be sufficient to use only the last 7 days. Specify a condition to select existing data to be used in transformation as Spark SQL expression. Use table alias 'existing' to reference columns of the existing table data.

executionMode

optional execution mode for this Action

executionCondition

optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.

metricsFailCondition

optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

Annotations
@Scaladoc()
Linear Supertypes
Serializable, Serializable, Product, Equals, SparkOneToOneActionImpl, SparkActionImpl, ActionSubFeedsImpl[SparkSubFeed], Action, AtlasExportable, SmartDataLakeLogger, DAGNode, ParsableFromConfig[Action], SdlConfigObject, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. HistorizeAction
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. SparkOneToOneActionImpl
  7. SparkActionImpl
  8. ActionSubFeedsImpl
  9. Action
  10. AtlasExportable
  11. SmartDataLakeLogger
  12. DAGNode
  13. ParsableFromConfig
  14. SdlConfigObject
  15. AnyRef
  16. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new HistorizeAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, transformer: Option[CustomDfTransformerConfig] = None, transformers: Seq[ParsableDfTransformer] = Seq(), columnBlacklist: Option[Seq[String]] = None, columnWhitelist: Option[Seq[String]] = None, additionalColumns: Option[Map[String, String]] = None, standardizeDatatypes: Boolean = false, filterClause: Option[String] = None, historizeBlacklist: Option[Seq[String]] = None, historizeWhitelist: Option[Seq[String]] = None, ignoreOldDeletedColumns: Boolean = false, ignoreOldDeletedNestedColumns: Boolean = true, mergeModeEnable: Boolean = false, mergeModeAdditionalJoinPredicate: Option[String] = None, breakDataFrameLineage: Boolean = false, persist: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry)

    inputId

    inputs DataObject

    outputId

    output DataObject

    transformer

    optional custom transformation to apply

    transformers

    optional list of transformations to apply before historization. See sparktransformer for a list of included Transformers. The transformations are applied according to the lists ordering.

    columnBlacklist

    Remove all columns on blacklist from dataframe

    columnWhitelist

    Keep only columns on whitelist in dataframe

    additionalColumns

    optional tuples of [column name, spark sql expression] to be added as additional columns to the dataframe. The spark sql expressions are evaluated against an instance of DefaultExpressionData.

    filterClause

    Filter of data to be processed by historization. It can be used to exclude historical data not needed to create new history, for performance reasons. Note that filterClause is only applied if mergeModeEnable=false. Use mergeModeAdditionalJoinPredicate if mergeModeEnable=true to achieve a similar performance tuning.

    historizeBlacklist

    optional list of columns to ignore when comparing two records in historization. Can not be used together with historizeWhitelist.

    historizeWhitelist

    optional final list of columns to use when comparing two records in historization. Can not be used together with historizeBlacklist.

    ignoreOldDeletedColumns

    if true, remove no longer existing columns in Schema Evolution

    ignoreOldDeletedNestedColumns

    if true, remove no longer existing columns from nested data types in Schema Evolution. Keeping deleted columns in complex data types has performance impact as all new data in the future has to be converted by a complex function.

    mergeModeEnable

    Set to true to use saveMode.Merge for much better performance. Output DataObject must implement CanMergeDataFrame if enabled (default = false).

    mergeModeAdditionalJoinPredicate

    To optimize performance it might be interesting to limit the records read from the existing table data, e.g. it might be sufficient to use only the last 7 days. Specify a condition to select existing data to be used in transformation as Spark SQL expression. Use table alias 'existing' to reference columns of the existing table data.

    executionMode

    optional execution mode for this Action

    executionCondition

    optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.

    metricsFailCondition

    optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def addRuntimeEvent(executionId: ExecutionId, phase: ExecutionPhase, state: RuntimeEventState, msg: Option[String] = None, results: Seq[SubFeed] = Seq(), tstmp: LocalDateTime = LocalDateTime.now): Unit

    Adds a runtime event for this Action

    Adds a runtime event for this Action

    Definition Classes
    Action
    Annotations
    @Scaladoc()
  5. def addRuntimeMetrics(executionId: Option[ExecutionId], dataObjectId: Option[DataObjectId], metric: ActionMetrics): Unit

    Adds a runtime metric for this Action

    Adds a runtime metric for this Action

    Definition Classes
    Action
    Annotations
    @Scaladoc()
  6. def applyExecutionMode(mainInput: DataObject, mainOutput: DataObject, subFeed: SubFeed, partitionValuesTransform: (Seq[PartitionValues]) ⇒ Map[PartitionValues, PartitionValues])(implicit context: ActionPipelineContext): Unit

    Applies the executionMode and stores result in executionModeResult variable

    Applies the executionMode and stores result in executionModeResult variable

    Attributes
    protected
    Definition Classes
    Action
    Annotations
    @Scaladoc()
  7. def applyTransformers(transformers: Seq[DfTransformer], inputSubFeed: SparkSubFeed, outputSubFeed: SparkSubFeed)(implicit context: ActionPipelineContext): SparkSubFeed

    apply transformer to SubFeed

    apply transformer to SubFeed

    Attributes
    protected
    Definition Classes
    SparkOneToOneActionImpl
    Annotations
    @Scaladoc()
  8. def applyTransformers(transformers: Seq[PartitionValueTransformer], partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Map[PartitionValues, PartitionValues]

    apply transformer to partition values

    apply transformer to partition values

    Attributes
    protected
    Definition Classes
    SparkActionImpl
    Annotations
    @Scaladoc()
  9. def applyTransformers(transformers: Seq[DfsTransformer], inputPartitionValues: Seq[PartitionValues], inputSubFeeds: Seq[SparkSubFeed], outputSubFeeds: Seq[SparkSubFeed])(implicit context: ActionPipelineContext): Seq[SparkSubFeed]

    apply transformer to SubFeeds

    apply transformer to SubFeeds

    Attributes
    protected
    Definition Classes
    SparkActionImpl
    Annotations
    @Scaladoc()
  10. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  11. def atlasName: String
    Definition Classes
    Action → AtlasExportable
  12. def atlasQualifiedName(prefix: String): String
    Definition Classes
    AtlasExportable
  13. val breakDataFrameLineage: Boolean

    Stop propagating input DataFrame through action and instead get a new DataFrame from DataObject.

    Stop propagating input DataFrame through action and instead get a new DataFrame from DataObject. This can help to save memory and performance if the input DataFrame includes many transformations from previous Actions. The new DataFrame will be initialized according to the SubFeed's partitionValues.

    Definition Classes
    HistorizeAction → SparkActionImpl
  14. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  15. def createEmptyDataFrame(dataObject: DataObject with CanCreateDataFrame, subFeed: SparkSubFeed)(implicit context: ActionPipelineContext): DataFrame
    Definition Classes
    SparkActionImpl
  16. def enrichSubFeedDataFrame(input: DataObject with CanCreateDataFrame, subFeed: SparkSubFeed, phase: ExecutionPhase, isRecursive: Boolean = false)(implicit context: ActionPipelineContext): SparkSubFeed

    Enriches SparkSubFeed with DataFrame if not existing

    Enriches SparkSubFeed with DataFrame if not existing

    input

    input data object.

    subFeed

    input SubFeed.

    phase

    current execution phase

    isRecursive

    true if this input is a recursive input

    Definition Classes
    SparkActionImpl
    Annotations
    @Scaladoc()
  17. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  18. final def exec(subFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Seq[SubFeed]

    Executes the main task of an action.

    Executes the main task of an action. In this step the data of the SubFeed's is moved from Input- to Output-DataObjects.

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    ActionSubFeedsImpl → Action
  19. val executionCondition: Option[Condition]

    execution condition for this action.

    execution condition for this action.

    Definition Classes
    HistorizeAction → Action
  20. val executionConditionResult: Option[(Boolean, Option[String])]
    Attributes
    protected
    Definition Classes
    Action
  21. val executionMode: Option[ExecutionMode]

    execution mode for this action.

    execution mode for this action.

    Definition Classes
    HistorizeAction → Action
  22. val executionModeResult: Option[Try[Option[ExecutionModeResult]]]
    Attributes
    protected
    Definition Classes
    Action
  23. def factory: FromConfigFactory[Action]

    Returns the factory that can parse this type (that is, type CO).

    Returns the factory that can parse this type (that is, type CO).

    Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.

    returns

    the factory (object) for this class.

    Definition Classes
    HistorizeAction → ParsableFromConfig
  24. def filterDataFrame(df: DataFrame, partitionValues: Seq[PartitionValues], genericFilter: Option[Column]): DataFrame

    Filter DataFrame with given partition values

    Filter DataFrame with given partition values

    df

    DataFrame to filter

    partitionValues

    partition values to use as filter condition

    genericFilter

    filter expression to apply

    returns

    filtered DataFrame

    Definition Classes
    SparkActionImpl
    Annotations
    @Scaladoc()
  25. def fullHistorizeDataFrame(existingDf: Option[DataFrame], pks: Seq[String], refTimestamp: LocalDateTime)(newDf: DataFrame)(implicit context: ActionPipelineContext): DataFrame
    Attributes
    protected
  26. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  27. def getDataObjectsState: Seq[DataObjectState]

    Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.

    Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.

    Definition Classes
    Action
    Annotations
    @Scaladoc()
  28. def getInputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T
    Attributes
    protected
    Definition Classes
    Action
  29. def getLatestRuntimeEventState: Option[RuntimeEventState]

    Get latest runtime state

    Get latest runtime state

    Definition Classes
    Action
    Annotations
    @Scaladoc()
  30. def getMainInput(inputSubFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): DataObject
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  31. def getMainPartitionValues(inputSubFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Seq[PartitionValues]
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  32. def getOutputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T
    Attributes
    protected
    Definition Classes
    Action
  33. def getRuntimeDataImpl: RuntimeData
    Definition Classes
    SparkActionImpl → Action
  34. def getRuntimeInfo(executionId: Option[ExecutionId] = None): Option[RuntimeInfo]

    Get summarized runtime information for a given ExecutionId.

    Get summarized runtime information for a given ExecutionId.

    executionId

    ExecutionId to get runtime information for. If empty runtime information for last ExecutionId are returned.

    Definition Classes
    Action
    Annotations
    @Scaladoc()
  35. def getRuntimeMetrics(executionId: Option[ExecutionId] = None): Map[DataObjectId, Option[ActionMetrics]]

    Get the latest metrics for all DataObjects and a given SDLExecutionId.

    Get the latest metrics for all DataObjects and a given SDLExecutionId.

    executionId

    ExecutionId to get metrics for. If empty metrics for last ExecutionId are returned.

    Definition Classes
    Action
    Annotations
    @Scaladoc()
  36. def getTransformers(transformation: Option[CustomDfTransformerConfig], columnBlacklist: Option[Seq[String]], columnWhitelist: Option[Seq[String]], additionalColumns: Option[Map[String, String]], standardizeDatatypes: Boolean, additionalTransformers: Seq[DfTransformer], filterClauseExpr: Option[Column] = None)(implicit context: ActionPipelineContext): Seq[DfTransformer]

    Combines all transformations into a list of DfTransformers

    Combines all transformations into a list of DfTransformers

    Definition Classes
    SparkOneToOneActionImpl
    Annotations
    @Scaladoc()
  37. val id: ActionId

    A unique identifier for this instance.

    A unique identifier for this instance.

    Definition Classes
    HistorizeAction → Action → SdlConfigObject
  38. val ignoreOldDeletedColumns: Boolean
  39. val ignoreOldDeletedNestedColumns: Boolean
  40. def incrementalHistorizeDataFrame(existingDf: Option[DataFrame], pks: Seq[String], refTimestamp: LocalDateTime)(newDf: DataFrame)(implicit context: ActionPipelineContext): DataFrame
    Attributes
    protected
  41. final def init(subFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Seq[SubFeed]

    Initialize Action with SubFeed's to be processed.

    Initialize Action with SubFeed's to be processed. In this step the execution mode is evaluated and the result stored for the exec phase. If successful - the DAG can be built - Spark DataFrame lineage can be built

    subFeeds

    SparkSubFeed's to be processed

    returns

    processed SparkSubFeed's

    Definition Classes
    ActionSubFeedsImpl → Action
  42. val input: DataObject with CanCreateDataFrame

    Input DataObject which can CanCreateDataFrame

    Input DataObject which can CanCreateDataFrame

    Definition Classes
    HistorizeActionSparkOneToOneActionImpl
  43. val inputId: DataObjectId
  44. def inputIdsToIgnoreFilter: Seq[DataObjectId]
    Definition Classes
    ActionSubFeedsImpl
  45. val inputs: Seq[DataObject with CanCreateDataFrame]

    Input DataObjects To be implemented by subclasses

    Input DataObjects To be implemented by subclasses

    Definition Classes
    HistorizeAction → SparkActionImpl → Action
  46. def isAsynchronous: Boolean

    If this Action should be run as asynchronous streaming process

    If this Action should be run as asynchronous streaming process

    Definition Classes
    SparkActionImpl → Action
  47. def isAsynchronousProcessStarted: Boolean
    Definition Classes
    SparkActionImpl → Action
  48. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  49. def logWritingFinished(subFeed: SparkSubFeed, noData: Option[Boolean], duration: Duration)(implicit context: ActionPipelineContext): Unit
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  50. def logWritingStarted(subFeed: SparkSubFeed)(implicit context: ActionPipelineContext): Unit
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  51. lazy val logger: Logger
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
    Annotations
    @transient()
  52. def mainInputId: Option[DataObjectId]
    Definition Classes
    ActionSubFeedsImpl
  53. lazy val mainOutput: DataObject
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  54. def mainOutputId: Option[DataObjectId]
    Definition Classes
    ActionSubFeedsImpl
  55. val mergeModeAdditionalJoinPredicate: Option[String]
  56. val mergeModeEnable: Boolean
  57. val metadata: Option[ActionMetadata]

    Additional metadata for the Action

    Additional metadata for the Action

    Definition Classes
    HistorizeAction → Action
  58. val metricsFailCondition: Option[String]

    Spark SQL condition evaluated as where-clause against dataframe of metrics.

    Spark SQL condition evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.

    Definition Classes
    HistorizeAction → Action
  59. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  60. def nodeId: String

    provide an implementation of the DAG node id

    provide an implementation of the DAG node id

    Definition Classes
    Action → DAGNode
    Annotations
    @Scaladoc()
  61. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  62. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  63. val output: TransactionalSparkTableDataObject

    Output DataObject which can CanWriteDataFrame

    Output DataObject which can CanWriteDataFrame

    Definition Classes
    HistorizeActionSparkOneToOneActionImpl
  64. val outputId: DataObjectId
  65. val outputs: Seq[TransactionalSparkTableDataObject]

    Output DataObjects To be implemented by subclasses

    Output DataObjects To be implemented by subclasses

    Definition Classes
    HistorizeAction → SparkActionImpl → Action
  66. val persist: Boolean

    Force persisting input DataFrame's on Disk.

    Force persisting input DataFrame's on Disk. This improves performance if dataFrame is used multiple times in the transformation and can serve as a recovery point in case a task get's lost. Note that DataFrames are persisted automatically by the previous Action if later Actions need the same data. To avoid this behaviour set breakDataFrameLineage=false.

    Definition Classes
    HistorizeAction → SparkActionImpl
  67. final def postExec(inputSubFeeds: Seq[SubFeed], outputSubFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Unit

    Executes operations needed after executing an action.

    Executes operations needed after executing an action. In this step any task on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.

    Definition Classes
    SparkOneToOneActionImpl → SparkActionImpl → ActionSubFeedsImpl → Action
  68. def postExecFailed(implicit context: ActionPipelineContext): Unit

    Executes operations needed to cleanup after executing an action failed.

    Executes operations needed to cleanup after executing an action failed.

    Definition Classes
    SparkActionImpl → Action
  69. def postExecSubFeed(inputSubFeed: SubFeed, outputSubFeed: SubFeed)(implicit context: ActionPipelineContext): Unit

    Executes operations needed after executing an action for the SubFeed.

    Executes operations needed after executing an action for the SubFeed. Can be implemented by sub classes.

    Definition Classes
    SparkOneToOneActionImpl
    Annotations
    @Scaladoc()
  70. def postprocessOutputSubFeedCustomized(subFeed: SparkSubFeed)(implicit context: ActionPipelineContext): SparkSubFeed

    Implement additional processing logic for SubFeeds after transformation.

    Implement additional processing logic for SubFeeds after transformation. Can be implemented by subclass.

    Definition Classes
    SparkActionImpl → ActionSubFeedsImpl
  71. def postprocessOutputSubFeeds(subFeeds: Seq[SparkSubFeed])(implicit context: ActionPipelineContext): Seq[SparkSubFeed]
    Definition Classes
    ActionSubFeedsImpl
  72. def preExec(subFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Unit

    Executes operations needed before executing an action.

    Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql

    Definition Classes
    SparkActionImpl → Action
  73. def preInit(subFeeds: Seq[SubFeed], dataObjectsState: Seq[DataObjectState])(implicit context: ActionPipelineContext): Unit

    Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.

    Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.

    Definition Classes
    Action
    Annotations
    @Scaladoc()
  74. def prepare(implicit context: ActionPipelineContext): Unit

    Prepare DataObjects prerequisites.

    Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table

    This runs during the "prepare" phase of the DAG.

    Definition Classes
    ActionSubFeedsImpl → Action
  75. def prepareInputSubFeed(input: DataObject with CanCreateDataFrame, subFeed: SparkSubFeed, ignoreFilters: Boolean = false)(implicit context: ActionPipelineContext): SparkSubFeed

    Applies changes to a SubFeed from a previous action in order to be used as input for this actions transformation.

    Applies changes to a SubFeed from a previous action in order to be used as input for this actions transformation.

    Definition Classes
    SparkActionImpl
    Annotations
    @Scaladoc()
  76. def prepareInputSubFeeds(subFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): (Seq[SparkSubFeed], Seq[SparkSubFeed])
    Definition Classes
    ActionSubFeedsImpl
  77. def preprocessInputSubFeedCustomized(subFeed: SparkSubFeed, ignoreFilters: Boolean, isRecursive: Boolean)(implicit context: ActionPipelineContext): SparkSubFeed

    Implement additional preprocess logic for SubFeeds before transformation Can be implemented by subclass.

    Implement additional preprocess logic for SubFeeds before transformation Can be implemented by subclass.

    isRecursive

    If subfeed is recursive (input & output)

    Attributes
    protected
    Definition Classes
    SparkActionImpl → ActionSubFeedsImpl
  78. lazy val prioritizedMainInputCandidates: Seq[DataObject]
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  79. val recursiveInputs: Seq[TransactionalSparkTableDataObject]

    Recursive Inputs are DataObjects that are used as Output and Input in the same action.

    Recursive Inputs are DataObjects that are used as Output and Input in the same action. This is usually prohibited as it creates loops in the DAG. In special cases this makes sense, i.e. when building a complex comparision/update logic.

    Usage: add DataObjects used as Output and Input as outputIds and recursiveInputIds, but not as inputIds.

    Definition Classes
    HistorizeAction → SparkActionImpl → Action
  80. def saveModeOptions: Option[SaveModeOptions]

    Override and parametrize saveMode in output DataObject configurations when writing to DataObjects.

    Override and parametrize saveMode in output DataObject configurations when writing to DataObjects.

    Definition Classes
    HistorizeAction → SparkActionImpl
  81. def setSparkJobMetadata(operation: Option[String] = None)(implicit context: ActionPipelineContext): Unit

    Sets the util job description for better traceability in the Spark UI

    Sets the util job description for better traceability in the Spark UI

    Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.

    operation

    phase description (be short...)

    Definition Classes
    Action
    Annotations
    @Scaladoc()
  82. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  83. final def toString(executionId: Option[ExecutionId]): String
    Definition Classes
    Action
  84. final def toString(): String

    This is displayed in ascii graph visualization

    This is displayed in ascii graph visualization

    Definition Classes
    Action → AnyRef → Any
    Annotations
    @Scaladoc()
  85. def toStringMedium: String
    Definition Classes
    Action
  86. def toStringShort: String
    Definition Classes
    Action
  87. def transform(inputSubFeed: SparkSubFeed, outputSubFeed: SparkSubFeed)(implicit context: ActionPipelineContext): SparkSubFeed

    Transform a SparkSubFeed.

    Transform a SparkSubFeed. To be implemented by subclasses.

    inputSubFeed

    SparkSubFeed to be transformed

    outputSubFeed

    SparkSubFeed to be enriched with transformed result

    returns

    transformed output SparkSubFeed

    Definition Classes
    HistorizeActionSparkOneToOneActionImpl
  88. final def transform(inputSubFeeds: Seq[SparkSubFeed], outputSubFeeds: Seq[SparkSubFeed])(implicit context: ActionPipelineContext): Seq[SparkSubFeed]

    Transform subfeed content To be implemented by subclass.

    Transform subfeed content To be implemented by subclass.

    Definition Classes
    SparkOneToOneActionImplActionSubFeedsImpl
  89. def transformPartitionValues(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Map[PartitionValues, PartitionValues]

    Transform partition values.

    Transform partition values. Can be implemented by subclass.

    Definition Classes
    HistorizeActionActionSubFeedsImpl
  90. val transformers: Seq[ParsableDfTransformer]
  91. def validateAndUpdateSubFeedCustomized(output: DataObject, subFeed: SparkSubFeed)(implicit context: ActionPipelineContext): SparkSubFeed

    The transformed DataFrame is validated to have the output's partition columns included, partition columns are moved to the end and SubFeeds partition values updated.

    The transformed DataFrame is validated to have the output's partition columns included, partition columns are moved to the end and SubFeeds partition values updated.

    output

    output DataObject

    subFeed

    SubFeed with transformed DataFrame

    returns

    validated and updated SubFeed

    Definition Classes
    SparkActionImpl
    Annotations
    @Scaladoc()
  92. def validateConfig(): Unit

    put configuration validation checks here

    put configuration validation checks here

    Definition Classes
    ActionSubFeedsImpl → Action
    Annotations
    @Scaladoc()
  93. def validateDataFrameContainsCols(df: DataFrame, columns: Seq[String], debugName: String): Unit

    Validate that DataFrame contains a given list of columns, throwing an exception otherwise.

    Validate that DataFrame contains a given list of columns, throwing an exception otherwise.

    df

    DataFrame to validate

    columns

    Columns that must exist in DataFrame

    debugName

    name to mention in exception

    Definition Classes
    SparkActionImpl
    Annotations
    @Scaladoc()
  94. def validatePartitionValuesExisting(dataObject: DataObject with CanHandlePartitions, subFeed: SubFeed)(implicit context: ActionPipelineContext): Unit
    Attributes
    protected
    Definition Classes
    ActionSubFeedsImpl
  95. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  96. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  97. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  98. def writeOutputSubFeeds(subFeeds: Seq[SparkSubFeed])(implicit context: ActionPipelineContext): Unit
    Definition Classes
    ActionSubFeedsImpl
  99. def writeSubFeed(subFeed: SparkSubFeed, output: DataObject with CanWriteDataFrame, isRecursiveInput: Boolean = false)(implicit context: ActionPipelineContext): Option[Boolean]

    writes subfeed to output respecting given execution mode

    writes subfeed to output respecting given execution mode

    returns

    true if no data was transferred, otherwise false. None if unknown.

    Definition Classes
    SparkActionImpl
    Annotations
    @Scaladoc()
  100. def writeSubFeed(subFeed: SparkSubFeed, isRecursive: Boolean)(implicit context: ActionPipelineContext): WriteSubFeedResult

    Write subfeed data to output.

    Write subfeed data to output. To be implemented by subclass.

    isRecursive

    If subfeed is recursive (input & output)

    returns

    false if there was no data to process, otherwise true.

    Attributes
    protected
    Definition Classes
    SparkActionImpl → ActionSubFeedsImpl

Deprecated Value Members

  1. val additionalColumns: Option[Map[String, String]]
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.5) Use transformers instead.

  2. val columnBlacklist: Option[Seq[String]]
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.5) Use transformers instead.

  3. val columnWhitelist: Option[Seq[String]]
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.5) Use transformers instead.

  4. val filterClause: Option[String]
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.5) Use transformers instead.

  5. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated
  6. val historizeBlacklist: Option[Seq[String]]
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.5) Use transformers instead.

  7. val historizeWhitelist: Option[Seq[String]]
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.5) Use transformers instead.

  8. val standardizeDatatypes: Boolean
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.5) Use transformers instead.

  9. val transformer: Option[CustomDfTransformerConfig]
    Annotations
    @deprecated
    Deprecated

    (Since version 2.0.5) Use transformers instead.

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from SparkOneToOneActionImpl

Inherited from SparkActionImpl

Inherited from Action

Inherited from AtlasExportable

Inherited from SmartDataLakeLogger

Inherited from DAGNode

Inherited from ParsableFromConfig[Action]

Inherited from SdlConfigObject

Inherited from AnyRef

Inherited from Any

Ungrouped