package workflow

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class ActionDAGRunState(appConfig: SmartDataLakeBuilderConfig, runId: Int, attemptId: Int, runStartTime: LocalDateTime, attemptStartTime: LocalDateTime, actionsState: Map[ActionId, RuntimeInfo], isFinal: Boolean) extends Product with Serializable

    ActionDAGRunState contains all configuration and state of an ActionDAGRun needed to start a recovery run in case of failure.

    ActionDAGRunState contains all configuration and state of an ActionDAGRun needed to start a recovery run in case of failure.

    Annotations
    @Scaladoc()
  2. case class ActionPipelineContext(feed: String, application: String, executionId: SDLExecutionId, instanceRegistry: InstanceRegistry, referenceTimestamp: Option[LocalDateTime] = None, appConfig: SmartDataLakeBuilderConfig, runStartTime: LocalDateTime = LocalDateTime.now(), attemptStartTime: LocalDateTime = LocalDateTime.now(), simulation: Boolean = false, phase: ExecutionPhase = ExecutionPhase.Prepare, dataFrameReuseStatistics: Map[(DataObjectId, Seq[PartitionValues]), Seq[ActionId]] = mutable.Map(), actionsSelected: Seq[ActionId] = Seq(), actionsSkipped: Seq[ActionId] = Seq(), serializableHadoopConf: SerializableHadoopConfiguration, globalConfig: GlobalConfig) extends SmartDataLakeLogger with Product with Serializable

    ActionPipelineContext contains start and runtime information about a SmartDataLake run.

    ActionPipelineContext contains start and runtime information about a SmartDataLake run.

    feed

    feed selector of the run

    application

    application name of the run

    executionId

    SDLExecutionId of this runs. Contains runId and attemptId. Both stay 1 if state is not enabled.

    instanceRegistry

    registry of all SmartDataLake objects parsed from the config

    referenceTimestamp

    timestamp used as reference in certain actions (e.g. HistorizeAction)

    appConfig

    the command line parameters parsed into a SmartDataLakeBuilderConfig object

    runStartTime

    start time of the run

    attemptStartTime

    start time of attempt

    simulation

    true if this is a simulation run

    phase

    current execution phase

    dataFrameReuseStatistics

    Counter how many times a DataFrame of a SparkSubFeed is reused by an Action later in the pipeline. The counter is increased during ExecutionPhase.Init when preparing the SubFeeds for an Action and it is decreased in ExecutionPhase.Exec to unpersist the DataFrame after there is no need for it anymore.

    actionsSelected

    actions selected for execution by command line parameter --feed-sel

    actionsSkipped

    actions selected but skipped in current attempt because they already succeeded in a previous attempt.

    Annotations
    @Scaladoc() @DeveloperApi()
  3. trait AtlasExportable extends AnyRef
  4. case class DataObjectState(dataObjectId: DataObjectId, state: String) extends Product with Serializable
  5. case class FileRefMapping(src: FileRef, tgt: FileRef) extends Product with Serializable

    Src/Tgt tuple representing the mapping of a file reference

    Src/Tgt tuple representing the mapping of a file reference

    Annotations
    @Scaladoc()
  6. case class FileSubFeed(fileRefs: Option[Seq[FileRef]], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false, fileRefMapping: Option[Seq[FileRefMapping]] = None) extends SubFeed with Product with Serializable

    A FileSubFeed is used to transport references to files between Actions.

    A FileSubFeed is used to transport references to files between Actions.

    fileRefs

    path to files to be processed

    dataObjectId

    id of the DataObject this SubFeed corresponds to

    partitionValues

    Values of Partitions transported by this SubFeed

    isDAGStart

    true if this subfeed is a start node of the dag

    isSkipped

    true if this subfeed is the result of a skipped action

    fileRefMapping

    store mapping of input to output file references. This is also used for post processing (e.g. delete after read).

    Annotations
    @Scaladoc()
  7. case class HadoopFileStateId(path: Path, appName: String, runId: Int, attemptId: Int) extends StateId with Product with Serializable
  8. case class InitSubFeed(dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isSkipped: Boolean = false) extends SubFeed with Product with Serializable

    An InitSubFeed is used to initialize first Nodes of a DAG.

    An InitSubFeed is used to initialize first Nodes of a DAG.

    dataObjectId

    id of the DataObject this SubFeed corresponds to

    partitionValues

    Values of Partitions transported by this SubFeed

    isSkipped

    true if this subfeed is the result of a skipped action

    Annotations
    @Scaladoc()
  9. class PrimaryKeyConstraintViolationException extends RuntimeException
  10. class ProcessingLogicException extends RuntimeException

    Exception to signal that a configured pipeline can't be executed properly

    Exception to signal that a configured pipeline can't be executed properly

    Annotations
    @Scaladoc()
  11. case class ScriptSubFeed(parameters: Option[Map[String, String]] = None, dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false) extends SubFeed with Product with Serializable

    A ScriptSubFeed is used to notify DataObjects and subsequent actions about the completion of a script.

    A ScriptSubFeed is used to notify DataObjects and subsequent actions about the completion of a script. It allows to pass on arbitrary informations as key/values.

    parameters

    arbitrary informations as key/value to pass on

    dataObjectId

    id of the DataObject this SubFeed corresponds to

    partitionValues

    Values of Partitions transported by this SubFeed

    isDAGStart

    true if this subfeed is a start node of the dag

    isSkipped

    true if this subfeed is the result of a skipped action

    Annotations
    @Scaladoc()
  12. class SimplifiedAnalysisException extends Exception with Serializable

    AnalysisException with reduced logical plan output Output of logical plan is reduced to max 5 lines

    AnalysisException with reduced logical plan output Output of logical plan is reduced to max 5 lines

    Annotations
    @Scaladoc()
  13. case class SparkSubFeed(dataFrame: Option[DataFrame], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false, isDummy: Boolean = false, filter: Option[String] = None) extends SubFeed with Product with Serializable

    A SparkSubFeed is used to transport DataFrame's between Actions.

    A SparkSubFeed is used to transport DataFrame's between Actions.

    dataFrame

    Spark DataFrame to be processed. DataFrame should not be saved to state (@transient).

    dataObjectId

    id of the DataObject this SubFeed corresponds to

    partitionValues

    Values of Partitions transported by this SubFeed

    isDAGStart

    true if this subfeed is a start node of the dag

    isSkipped

    true if this subfeed is the result of a skipped action

    isDummy

    true if this subfeed only contains a dummy DataFrame. Dummy DataFrames can be used for validating the lineage in init phase, but not for the exec phase.

    filter

    a spark sql filter expression. This is used by SparkIncrementalMode.

    Annotations
    @Scaladoc()
  14. trait SubFeed extends DAGResult with SmartDataLakeLogger

    A SubFeed transports references to data between Actions.

    A SubFeed transports references to data between Actions. Data can be represented by different technologies like Files or DataFrame.

    Note: SubFeed is implementing ParsableFromConfig to persist to

    Annotations
    @Scaladoc()
  15. trait SubFeedConverter[S <: SubFeed] extends AnyRef

    An interface to be implemented by SubFeed companion objects for subfeed conversion

    An interface to be implemented by SubFeed companion objects for subfeed conversion

    Annotations
    @Scaladoc()

Value Members

  1. object ExecutionPhase extends Enumeration
  2. object FileSubFeed extends SubFeedConverter[FileSubFeed] with Serializable
  3. object ScriptSubFeed extends SubFeedConverter[ScriptSubFeed] with Serializable
  4. object SparkSubFeed extends SubFeedConverter[SparkSubFeed] with Serializable
  5. object SubFeed

Ungrouped