case class FileTransferAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, overwrite: Boolean = true, breakFileRefLineage: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends FileOneToOneActionImpl with Product with Serializable
Action to transfer files between SFtp, Hadoop and local Fs.
- inputId
inputs DataObject
- outputId
output DataObject
- breakFileRefLineage
If set to true, file references passed on from previous action are ignored by this action. The action will detect on its own what files it is going to process.
- executionMode
optional execution mode for this Action
- executionCondition
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
- metricsFailCondition
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
- Annotations
- @Scaladoc()
- Alphabetic
- By Inheritance
- FileTransferAction
- Serializable
- Serializable
- Product
- Equals
- FileOneToOneActionImpl
- ActionSubFeedsImpl
- Action
- AtlasExportable
- SmartDataLakeLogger
- DAGNode
- ParsableFromConfig
- SdlConfigObject
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
FileTransferAction(id: ActionId, inputId: DataObjectId, outputId: DataObjectId, overwrite: Boolean = true, breakFileRefLineage: Boolean = false, executionMode: Option[ExecutionMode] = None, executionCondition: Option[Condition] = None, metricsFailCondition: Option[String] = None, metadata: Option[ActionMetadata] = None)(implicit instanceRegistry: InstanceRegistry)
- inputId
inputs DataObject
- outputId
output DataObject
- breakFileRefLineage
If set to true, file references passed on from previous action are ignored by this action. The action will detect on its own what files it is going to process.
- executionMode
optional execution mode for this Action
- executionCondition
optional spark sql expression evaluated against SubFeedsExpressionData. If true Action is executed, otherwise skipped. Details see Condition.
- metricsFailCondition
optional spark sql expression evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
addRuntimeEvent(executionId: ExecutionId, phase: ExecutionPhase, state: RuntimeEventState, msg: Option[String] = None, results: Seq[SubFeed] = Seq(), tstmp: LocalDateTime = LocalDateTime.now): Unit
Adds a runtime event for this Action
Adds a runtime event for this Action
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
def
addRuntimeMetrics(executionId: Option[ExecutionId], dataObjectId: Option[DataObjectId], metric: ActionMetrics): Unit
Adds a runtime metric for this Action
Adds a runtime metric for this Action
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
def
applyExecutionMode(mainInput: DataObject, mainOutput: DataObject, subFeed: SubFeed, partitionValuesTransform: (Seq[PartitionValues]) ⇒ Map[PartitionValues, PartitionValues])(implicit context: ActionPipelineContext): Unit
Applies the executionMode and stores result in executionModeResult variable
Applies the executionMode and stores result in executionModeResult variable
- Attributes
- protected
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
atlasName: String
- Definition Classes
- Action → AtlasExportable
-
def
atlasQualifiedName(prefix: String): String
- Definition Classes
- AtlasExportable
-
val
breakFileRefLineage: Boolean
Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue.
Stop propagating input FileRefs through action and instead get new FileRefs from DataObject according to the SubFeed's partitionValue. This is needed to reprocess all files of a path/partition instead of the FileRef's passed from the previous Action.
- Definition Classes
- FileTransferAction → FileOneToOneActionImpl
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
exec(subFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Seq[SubFeed]
Executes the main task of an action.
Executes the main task of an action. In this step the data of the SubFeed's is moved from Input- to Output-DataObjects.
- subFeeds
SparkSubFeed's to be processed
- returns
processed SparkSubFeed's
- Definition Classes
- ActionSubFeedsImpl → Action
-
val
executionCondition: Option[Condition]
execution condition for this action.
execution condition for this action.
- Definition Classes
- FileTransferAction → Action
-
val
executionConditionResult: Option[(Boolean, Option[String])]
- Attributes
- protected
- Definition Classes
- Action
-
val
executionMode: Option[ExecutionMode]
execution mode for this action.
execution mode for this action.
- Definition Classes
- FileTransferAction → Action
-
val
executionModeResult: Option[Try[Option[ExecutionModeResult]]]
- Attributes
- protected
- Definition Classes
- Action
-
def
factory: FromConfigFactory[Action]
Returns the factory that can parse this type (that is, type
CO).Returns the factory that can parse this type (that is, type
CO).Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
- returns
the factory (object) for this class.
- Definition Classes
- FileTransferAction → ParsableFromConfig
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
getDataObjectsState: Seq[DataObjectState]
Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.
Get potential state of input DataObjects when executionMode is DataObjectStateIncrementalMode.
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
def
getInputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T
- Attributes
- protected
- Definition Classes
- Action
-
def
getLatestRuntimeEventState: Option[RuntimeEventState]
Get latest runtime state
Get latest runtime state
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
def
getMainInput(inputSubFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): DataObject
- Attributes
- protected
- Definition Classes
- ActionSubFeedsImpl
-
def
getMainPartitionValues(inputSubFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Seq[PartitionValues]
- Attributes
- protected
- Definition Classes
- ActionSubFeedsImpl
-
def
getOutputDataObject[T <: DataObject](id: DataObjectId)(implicit arg0: ClassTag[T], arg1: scala.reflect.api.JavaUniverse.TypeTag[T], registry: InstanceRegistry): T
- Attributes
- protected
- Definition Classes
- Action
-
def
getRuntimeDataImpl: RuntimeData
- Attributes
- protected
- Definition Classes
- Action
-
def
getRuntimeInfo(executionId: Option[ExecutionId] = None): Option[RuntimeInfo]
Get summarized runtime information for a given ExecutionId.
Get summarized runtime information for a given ExecutionId.
- executionId
ExecutionId to get runtime information for. If empty runtime information for last ExecutionId are returned.
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
def
getRuntimeMetrics(executionId: Option[ExecutionId] = None): Map[DataObjectId, Option[ActionMetrics]]
Get the latest metrics for all DataObjects and a given SDLExecutionId.
Get the latest metrics for all DataObjects and a given SDLExecutionId.
- executionId
ExecutionId to get metrics for. If empty metrics for last ExecutionId are returned.
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
val
id: ActionId
A unique identifier for this instance.
A unique identifier for this instance.
- Definition Classes
- FileTransferAction → Action → SdlConfigObject
-
final
def
init(subFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Seq[SubFeed]
Initialize Action with SubFeed's to be processed.
Initialize Action with SubFeed's to be processed. In this step the execution mode is evaluated and the result stored for the exec phase. If successful - the DAG can be built - Spark DataFrame lineage can be built
- subFeeds
SparkSubFeed's to be processed
- returns
processed SparkSubFeed's
- Definition Classes
- ActionSubFeedsImpl → Action
-
val
input: FileRefDataObject with CanCreateInputStream
Input FileRefDataObject which can CanCreateInputStream
Input FileRefDataObject which can CanCreateInputStream
- Definition Classes
- FileTransferAction → FileOneToOneActionImpl
- val inputId: DataObjectId
-
def
inputIdsToIgnoreFilter: Seq[DataObjectId]
- Definition Classes
- ActionSubFeedsImpl
-
val
inputs: Seq[FileRefDataObject]
Input DataObjects To be implemented by subclasses
Input DataObjects To be implemented by subclasses
- Definition Classes
- FileTransferAction → Action
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
logWritingFinished(subFeed: FileSubFeed, noData: Option[Boolean], duration: Duration)(implicit context: ActionPipelineContext): Unit
- Attributes
- protected
- Definition Classes
- ActionSubFeedsImpl
-
def
logWritingStarted(subFeed: FileSubFeed)(implicit context: ActionPipelineContext): Unit
- Attributes
- protected
- Definition Classes
- ActionSubFeedsImpl
-
lazy val
logger: Logger
- Attributes
- protected
- Definition Classes
- SmartDataLakeLogger
- Annotations
- @transient()
-
def
mainInputId: Option[DataObjectId]
- Definition Classes
- ActionSubFeedsImpl
-
lazy val
mainOutput: DataObject
- Attributes
- protected
- Definition Classes
- ActionSubFeedsImpl
-
def
mainOutputId: Option[DataObjectId]
- Definition Classes
- ActionSubFeedsImpl
-
val
metadata: Option[ActionMetadata]
Additional metadata for the Action
Additional metadata for the Action
- Definition Classes
- FileTransferAction → Action
-
val
metricsFailCondition: Option[String]
Spark SQL condition evaluated as where-clause against dataframe of metrics.
Spark SQL condition evaluated as where-clause against dataframe of metrics. Available columns are dataObjectId, key, value. If there are any rows passing the where clause, a MetricCheckFailed exception is thrown.
- Definition Classes
- FileTransferAction → Action
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
nodeId: String
provide an implementation of the DAG node id
provide an implementation of the DAG node id
- Definition Classes
- Action → DAGNode
- Annotations
- @Scaladoc()
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
val
output: FileRefDataObject with CanCreateOutputStream
Output FileRefDataObject which can CanCreateOutputStream
Output FileRefDataObject which can CanCreateOutputStream
- Definition Classes
- FileTransferAction → FileOneToOneActionImpl
- val outputId: DataObjectId
-
val
outputs: Seq[FileRefDataObject]
Output DataObjects To be implemented by subclasses
Output DataObjects To be implemented by subclasses
- Definition Classes
- FileTransferAction → Action
- val overwrite: Boolean
-
def
postExec(inputSubFeeds: Seq[SubFeed], outputSubFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Unit
Executes operations needed after executing an action.
Executes operations needed after executing an action. In this step any task on Input- or Output-DataObjects needed after the main task is executed, e.g. JdbcTableDataObjects postWriteSql or CopyActions deleteInputData.
- Definition Classes
- ActionSubFeedsImpl → Action
-
def
postExecFailed(implicit context: ActionPipelineContext): Unit
Executes operations needed to cleanup after executing an action failed.
Executes operations needed to cleanup after executing an action failed.
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
def
postprocessOutputSubFeedCustomized(subFeed: FileSubFeed)(implicit context: ActionPipelineContext): FileSubFeed
Implement additional processing logic for SubFeeds after transformation.
Implement additional processing logic for SubFeeds after transformation. Can be implemented by subclass.
- Attributes
- protected
- Definition Classes
- ActionSubFeedsImpl
- Annotations
- @Scaladoc()
-
def
postprocessOutputSubFeeds(subFeeds: Seq[FileSubFeed])(implicit context: ActionPipelineContext): Seq[FileSubFeed]
- Definition Classes
- ActionSubFeedsImpl
-
def
preExec(subFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): Unit
Executes operations needed before executing an action.
Executes operations needed before executing an action. In this step any phase on Input- or Output-DataObjects needed before the main task is executed, e.g. JdbcTableDataObjects preWriteSql
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
def
preInit(subFeeds: Seq[SubFeed], dataObjectsState: Seq[DataObjectState])(implicit context: ActionPipelineContext): Unit
Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.
Checks before initalization of Action In this step execution condition is evaluated and Action init is skipped if result is false.
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
def
prepare(implicit context: ActionPipelineContext): Unit
Prepare DataObjects prerequisites.
Prepare DataObjects prerequisites. In this step preconditions are prepared & tested: - connections can be created - needed structures exist, e.g Kafka topic or Jdbc table
This runs during the "prepare" phase of the DAG.
- Definition Classes
- ActionSubFeedsImpl → Action
-
def
prepareInputSubFeeds(subFeeds: Seq[SubFeed])(implicit context: ActionPipelineContext): (Seq[FileSubFeed], Seq[FileSubFeed])
- Definition Classes
- ActionSubFeedsImpl
-
def
preprocessInputSubFeedCustomized(subFeed: FileSubFeed, ignoreFilter: Boolean, isRecursive: Boolean)(implicit context: ActionPipelineContext): FileSubFeed
Implement additional preprocess logic for SubFeeds before transformation Can be implemented by subclass.
Implement additional preprocess logic for SubFeeds before transformation Can be implemented by subclass.
- ignoreFilter
If filters should be ignored for this feed
- isRecursive
If subfeed is recursive (input & output)
- Definition Classes
- FileOneToOneActionImpl → ActionSubFeedsImpl
-
lazy val
prioritizedMainInputCandidates: Seq[DataObject]
- Attributes
- protected
- Definition Classes
- ActionSubFeedsImpl
-
def
recursiveInputs: Seq[FileRefDataObject with CanCreateInputStream]
Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.
Recursive Inputs on FileSubFeeds are not supported so empty Seq is set.
- Definition Classes
- FileOneToOneActionImpl → Action
- Annotations
- @Scaladoc()
-
def
setSparkJobMetadata(operation: Option[String] = None)(implicit context: ActionPipelineContext): Unit
Sets the util job description for better traceability in the Spark UI
Sets the util job description for better traceability in the Spark UI
Note: This sets Spark local properties, which are propagated to the respective executor tasks. We rely on this to match metrics back to Actions and DataObjects. As writing to a DataObject on the Driver happens uninterrupted in the same exclusive thread, this is suitable.
- operation
phase description (be short...)
- Definition Classes
- Action
- Annotations
- @Scaladoc()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
final
def
toString(executionId: Option[ExecutionId]): String
- Definition Classes
- Action
-
final
def
toString(): String
This is displayed in ascii graph visualization
This is displayed in ascii graph visualization
- Definition Classes
- Action → AnyRef → Any
- Annotations
- @Scaladoc()
-
def
toStringMedium: String
- Definition Classes
- Action
-
def
toStringShort: String
- Definition Classes
- Action
-
def
transform(inputSubFeed: FileSubFeed, outputSubFeed: FileSubFeed)(implicit context: ActionPipelineContext): FileSubFeed
Transform a SparkSubFeed.
Transform a SparkSubFeed. To be implemented by subclasses.
- inputSubFeed
SparkSubFeed to be transformed
- outputSubFeed
SparkSubFeed to be enriched with transformed result
- returns
transformed output SparkSubFeed
- Definition Classes
- FileTransferAction → FileOneToOneActionImpl
-
def
transform(inputSubFeeds: Seq[FileSubFeed], outputSubFeeds: Seq[FileSubFeed])(implicit context: ActionPipelineContext): Seq[FileSubFeed]
Transform subfeed content To be implemented by subclass.
Transform subfeed content To be implemented by subclass.
- Attributes
- protected
- Definition Classes
- FileOneToOneActionImpl → ActionSubFeedsImpl
-
def
transformPartitionValues(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Map[PartitionValues, PartitionValues]
Transform partition values.
Transform partition values. Can be implemented by subclass.
- Attributes
- protected
- Definition Classes
- ActionSubFeedsImpl
- Annotations
- @Scaladoc()
-
def
validateConfig(): Unit
put configuration validation checks here
put configuration validation checks here
- Definition Classes
- FileOneToOneActionImpl → ActionSubFeedsImpl → Action
-
def
validatePartitionValuesExisting(dataObject: DataObject with CanHandlePartitions, subFeed: SubFeed)(implicit context: ActionPipelineContext): Unit
- Attributes
- protected
- Definition Classes
- ActionSubFeedsImpl
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
def
writeOutputSubFeeds(subFeeds: Seq[FileSubFeed])(implicit context: ActionPipelineContext): Unit
- Definition Classes
- ActionSubFeedsImpl
-
def
writeSubFeed(subFeed: FileSubFeed, isRecursive: Boolean)(implicit context: ActionPipelineContext): WriteSubFeedResult
Write subfeed data to output.
Write subfeed data to output. To be implemented by subclass.
- isRecursive
If subfeed is recursive (input & output)
- returns
false if there was no data to process, otherwise true.
- Definition Classes
- FileTransferAction → ActionSubFeedsImpl
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated