package workflow
- Alphabetic
- Public
- All
Type Members
-
case class
ActionDAGRunState(appConfig: SmartDataLakeBuilderConfig, runId: Int, attemptId: Int, runStartTime: LocalDateTime, attemptStartTime: LocalDateTime, actionsState: Map[ActionId, RuntimeInfo], isFinal: Boolean) extends Product with Serializable
ActionDAGRunState contains all configuration and state of an ActionDAGRun needed to start a recovery run in case of failure.
ActionDAGRunState contains all configuration and state of an ActionDAGRun needed to start a recovery run in case of failure.
- Annotations
- @Scaladoc()
-
case class
ActionPipelineContext(feed: String, application: String, executionId: SDLExecutionId, instanceRegistry: InstanceRegistry, referenceTimestamp: Option[LocalDateTime] = None, appConfig: SmartDataLakeBuilderConfig, runStartTime: LocalDateTime = LocalDateTime.now(), attemptStartTime: LocalDateTime = LocalDateTime.now(), simulation: Boolean = false, phase: ExecutionPhase = ExecutionPhase.Prepare, dataFrameReuseStatistics: Map[(DataObjectId, Seq[PartitionValues]), Seq[ActionId]] = mutable.Map(), actionsSelected: Seq[ActionId] = Seq(), actionsSkipped: Seq[ActionId] = Seq(), serializableHadoopConf: SerializableHadoopConfiguration, globalConfig: GlobalConfig) extends SmartDataLakeLogger with Product with Serializable
ActionPipelineContext contains start and runtime information about a SmartDataLake run.
ActionPipelineContext contains start and runtime information about a SmartDataLake run.
- feed
feed selector of the run
- application
application name of the run
- executionId
SDLExecutionId of this runs. Contains runId and attemptId. Both stay 1 if state is not enabled.
- instanceRegistry
registry of all SmartDataLake objects parsed from the config
- referenceTimestamp
timestamp used as reference in certain actions (e.g. HistorizeAction)
- appConfig
the command line parameters parsed into a SmartDataLakeBuilderConfig object
- runStartTime
start time of the run
- attemptStartTime
start time of attempt
- simulation
true if this is a simulation run
- phase
current execution phase
- dataFrameReuseStatistics
Counter how many times a DataFrame of a SparkSubFeed is reused by an Action later in the pipeline. The counter is increased during ExecutionPhase.Init when preparing the SubFeeds for an Action and it is decreased in ExecutionPhase.Exec to unpersist the DataFrame after there is no need for it anymore.
- actionsSelected
actions selected for execution by command line parameter --feed-sel
- actionsSkipped
actions selected but skipped in current attempt because they already succeeded in a previous attempt.
- Annotations
- @Scaladoc() @DeveloperApi()
- trait AtlasExportable extends AnyRef
- case class DataObjectState(dataObjectId: DataObjectId, state: String) extends Product with Serializable
-
case class
FileRefMapping(src: FileRef, tgt: FileRef) extends Product with Serializable
Src/Tgt tuple representing the mapping of a file reference
Src/Tgt tuple representing the mapping of a file reference
- Annotations
- @Scaladoc()
-
case class
FileSubFeed(fileRefs: Option[Seq[FileRef]], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false, fileRefMapping: Option[Seq[FileRefMapping]] = None) extends SubFeed with Product with Serializable
A FileSubFeed is used to transport references to files between Actions.
A FileSubFeed is used to transport references to files between Actions.
- fileRefs
path to files to be processed
- dataObjectId
id of the DataObject this SubFeed corresponds to
- partitionValues
Values of Partitions transported by this SubFeed
- isDAGStart
true if this subfeed is a start node of the dag
- isSkipped
true if this subfeed is the result of a skipped action
- fileRefMapping
store mapping of input to output file references. This is also used for post processing (e.g. delete after read).
- Annotations
- @Scaladoc()
- case class HadoopFileStateId(path: Path, appName: String, runId: Int, attemptId: Int) extends StateId with Product with Serializable
-
case class
InitSubFeed(dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isSkipped: Boolean = false) extends SubFeed with Product with Serializable
An InitSubFeed is used to initialize first Nodes of a DAG.
An InitSubFeed is used to initialize first Nodes of a DAG.
- dataObjectId
id of the DataObject this SubFeed corresponds to
- partitionValues
Values of Partitions transported by this SubFeed
- isSkipped
true if this subfeed is the result of a skipped action
- Annotations
- @Scaladoc()
- class PrimaryKeyConstraintViolationException extends RuntimeException
-
class
ProcessingLogicException extends RuntimeException
Exception to signal that a configured pipeline can't be executed properly
Exception to signal that a configured pipeline can't be executed properly
- Annotations
- @Scaladoc()
-
case class
ScriptSubFeed(parameters: Option[Map[String, String]] = None, dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false) extends SubFeed with Product with Serializable
A ScriptSubFeed is used to notify DataObjects and subsequent actions about the completion of a script.
A ScriptSubFeed is used to notify DataObjects and subsequent actions about the completion of a script. It allows to pass on arbitrary informations as key/values.
- parameters
arbitrary informations as key/value to pass on
- dataObjectId
id of the DataObject this SubFeed corresponds to
- partitionValues
Values of Partitions transported by this SubFeed
- isDAGStart
true if this subfeed is a start node of the dag
- isSkipped
true if this subfeed is the result of a skipped action
- Annotations
- @Scaladoc()
-
class
SimplifiedAnalysisException extends Exception with Serializable
AnalysisException with reduced logical plan output Output of logical plan is reduced to max 5 lines
AnalysisException with reduced logical plan output Output of logical plan is reduced to max 5 lines
- Annotations
- @Scaladoc()
-
case class
SparkSubFeed(dataFrame: Option[DataFrame], dataObjectId: DataObjectId, partitionValues: Seq[PartitionValues], isDAGStart: Boolean = false, isSkipped: Boolean = false, isDummy: Boolean = false, filter: Option[String] = None) extends SubFeed with Product with Serializable
A SparkSubFeed is used to transport DataFrame's between Actions.
A SparkSubFeed is used to transport DataFrame's between Actions.
- dataFrame
Spark DataFrame to be processed. DataFrame should not be saved to state (@transient).
- dataObjectId
id of the DataObject this SubFeed corresponds to
- partitionValues
Values of Partitions transported by this SubFeed
- isDAGStart
true if this subfeed is a start node of the dag
- isSkipped
true if this subfeed is the result of a skipped action
- isDummy
true if this subfeed only contains a dummy DataFrame. Dummy DataFrames can be used for validating the lineage in init phase, but not for the exec phase.
- filter
a spark sql filter expression. This is used by SparkIncrementalMode.
- Annotations
- @Scaladoc()
-
trait
SubFeed extends DAGResult with SmartDataLakeLogger
A SubFeed transports references to data between Actions.
A SubFeed transports references to data between Actions. Data can be represented by different technologies like Files or DataFrame.
Note: SubFeed is implementing ParsableFromConfig to persist to
- Annotations
- @Scaladoc()
-
trait
SubFeedConverter[S <: SubFeed] extends AnyRef
An interface to be implemented by SubFeed companion objects for subfeed conversion
An interface to be implemented by SubFeed companion objects for subfeed conversion
- Annotations
- @Scaladoc()
Value Members
- object ExecutionPhase extends Enumeration
- object FileSubFeed extends SubFeedConverter[FileSubFeed] with Serializable
- object ScriptSubFeed extends SubFeedConverter[ScriptSubFeed] with Serializable
- object SparkSubFeed extends SubFeedConverter[SparkSubFeed] with Serializable
- object SubFeed