io.smartdatalake.workflow.dataobject
WebserviceFileDataObject
Companion object WebserviceFileDataObject
case class WebserviceFileDataObject(id: DataObjectId, url: String, additionalHeaders: Map[String, String] = Map(), timeouts: Option[HttpTimeoutConfig] = None, readTimeoutMs: Option[Int] = None, authMode: Option[AuthMode] = None, mimeType: Option[String] = None, writeMethod: WebserviceMethod = WebserviceMethod.Post, proxy: Option[HttpProxyConfig] = None, followRedirects: Boolean = false, partitionDefs: Seq[WebservicePartitionDefinition] = Seq(), partitionLayout: Option[String] = None, metadata: Option[DataObjectMetadata] = None)(implicit instanceRegistry: InstanceRegistry) extends FileRefDataObject with CanCreateInputStream with CanCreateOutputStream with SmartDataLakeLogger with Product with Serializable
DataObject to call webservice and return response as InputStream This is implemented as FileRefDataObject because the response is treated as some file content. FileRefDataObjects support partitioned data. For a WebserviceFileDataObject partitions are mapped as query parameters to create query string. All possible query parameter values must be given in configuration.
- partitionDefs
list of partitions with list of possible values for every entry
- partitionLayout
definition of partitions in query string. Use %<partitionColName>% as placeholder for partition column value in layout.
- Annotations
- @Scaladoc()
- Alphabetic
- By Inheritance
- WebserviceFileDataObject
- Serializable
- Serializable
- Product
- Equals
- CanCreateOutputStream
- CanCreateInputStream
- FileRefDataObject
- FileDataObject
- CanHandlePartitions
- DataObject
- AtlasExportable
- SmartDataLakeLogger
- ParsableFromConfig
- SdlConfigObject
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
WebserviceFileDataObject(id: DataObjectId, url: String, additionalHeaders: Map[String, String] = Map(), timeouts: Option[HttpTimeoutConfig] = None, readTimeoutMs: Option[Int] = None, authMode: Option[AuthMode] = None, mimeType: Option[String] = None, writeMethod: WebserviceMethod = WebserviceMethod.Post, proxy: Option[HttpProxyConfig] = None, followRedirects: Boolean = false, partitionDefs: Seq[WebservicePartitionDefinition] = Seq(), partitionLayout: Option[String] = None, metadata: Option[DataObjectMetadata] = None)(implicit instanceRegistry: InstanceRegistry)
- partitionDefs
list of partitions with list of possible values for every entry
- partitionLayout
definition of partitions in query string. Use %<partitionColName>% as placeholder for partition column value in layout.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- val additionalHeaders: Map[String, String]
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
atlasName: String
- Definition Classes
- DataObject → AtlasExportable
-
def
atlasQualifiedName(prefix: String): String
- Definition Classes
- AtlasExportable
- val authMode: Option[AuthMode]
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
createInputStream(query: String)(implicit context: ActionPipelineContext): InputStream
Same as getResponse, but returns response as InputStream
Same as getResponse, but returns response as InputStream
- query
it should be possible to define the partition to read as query string, but this is not yet implemented
- Definition Classes
- WebserviceFileDataObject → CanCreateInputStream
- Annotations
- @Scaladoc()
-
def
createOutputStream(path: String, overwrite: Boolean)(implicit context: ActionPipelineContext): OutputStream
- path
is ignored for webservices
- overwrite
is ignored for webservices
- returns
outputstream that writes to WebService once it's closed
- Definition Classes
- WebserviceFileDataObject → CanCreateOutputStream
- Annotations
- @Scaladoc()
-
def
deleteAll(implicit context: ActionPipelineContext): Unit
Delete all data.
Delete all data. This is used to implement SaveMode.Overwrite.
- Definition Classes
- FileRefDataObject
- Annotations
- @Scaladoc()
-
def
deleteFileRefs(fileRefs: Seq[FileRef])(implicit context: ActionPipelineContext): Unit
Delete given files.
Delete given files. This is used to cleanup files after they are processed.
- Definition Classes
- FileRefDataObject
- Annotations
- @Scaladoc()
-
def
endWritingOutputStreams(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Unit
This is called after all output streams have been written.
This is called after all output streams have been written. It is used for e.g. making sure empty partitions are created as well.
- Definition Classes
- WebserviceFileDataObject → CanCreateOutputStream
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
expectedPartitionsCondition: Option[String]
Definition of partitions that are expected to exists.
Definition of partitions that are expected to exists. This is used to validate that partitions being read exists and don't return no data. Define a Spark SQL expression that is evaluated against a PartitionValues instance and returns true or false example: "elements['yourColName'] > 2017"
- returns
true if partition is expected to exist.
- Definition Classes
- WebserviceFileDataObject → CanHandlePartitions
-
def
extractPartitionValuesFromPath(filePath: String)(implicit context: ActionPipelineContext): PartitionValues
Extract partition values from a given file path
Extract partition values from a given file path
- Attributes
- protected
- Definition Classes
- FileRefDataObject
- Annotations
- @Scaladoc()
-
def
factory: FromConfigFactory[DataObject]
Returns the factory that can parse this type (that is, type
CO).Returns the factory that can parse this type (that is, type
CO).Typically, implementations of this method should return the companion object of the implementing class. The companion object in turn should implement FromConfigFactory.
- returns
the factory (object) for this class.
- Definition Classes
- WebserviceFileDataObject → ParsableFromConfig
-
val
fileName: String
Definition of fileName.
Definition of fileName. Default is an asterix to match everything. This is concatenated with the partition layout to search for files.
- Definition Classes
- FileRefDataObject
- Annotations
- @Scaladoc()
- val followRedirects: Boolean
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
getConnection[T <: Connection](connectionId: ConnectionId)(implicit registry: InstanceRegistry, ct: ClassTag[T], tt: scala.reflect.api.JavaUniverse.TypeTag[T]): T
Handle class cast exception when getting objects from instance registry
Handle class cast exception when getting objects from instance registry
- Attributes
- protected
- Definition Classes
- DataObject
- Annotations
- @Scaladoc()
-
def
getConnectionReg[T <: Connection](connectionId: ConnectionId, registry: InstanceRegistry)(implicit ct: ClassTag[T], tt: scala.reflect.api.JavaUniverse.TypeTag[T]): T
- Attributes
- protected
- Definition Classes
- DataObject
-
def
getFileRefs(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Seq[FileRef]
For WebserviceFileDataObject, every partition is mapped to one FileRef
For WebserviceFileDataObject, every partition is mapped to one FileRef
- partitionValues
List of partition values to be filtered. If empty all files in root path of DataObject will be listed.
- returns
List of FileRefs
- Definition Classes
- WebserviceFileDataObject → FileRefDataObject
- Annotations
- @Scaladoc()
-
def
getPartitionString(partitionValues: PartitionValues)(implicit context: ActionPipelineContext): Option[String]
get partition values formatted by partition layout
get partition values formatted by partition layout
- Definition Classes
- FileRefDataObject
- Annotations
- @Scaladoc()
-
def
getPath(implicit context: ActionPipelineContext): String
Method for subclasses to override the base path for this DataObject.
Method for subclasses to override the base path for this DataObject. This is for instance needed if pathPrefix is defined in a connection.
- Definition Classes
- FileRefDataObject
- Annotations
- @Scaladoc()
-
def
getResponse(query: Option[String] = None): Array[Byte]
Calls webservice and returns response
Calls webservice and returns response
- query
optional URL with replaced placeholders to call
- returns
Response as Array[Byte]
- Annotations
- @Scaladoc()
-
def
getSearchPaths(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Seq[(PartitionValues, String)]
prepare paths to be searched
prepare paths to be searched
- Attributes
- protected
- Definition Classes
- FileRefDataObject
- Annotations
- @Scaladoc()
-
def
housekeepingMode: Option[HousekeepingMode]
Configure a housekeeping mode to e.g cleanup, archive and compact partitions.
Configure a housekeeping mode to e.g cleanup, archive and compact partitions. Default is None.
- Definition Classes
- DataObject
- Annotations
- @Scaladoc()
-
val
id: DataObjectId
A unique identifier for this instance.
A unique identifier for this instance.
- Definition Classes
- WebserviceFileDataObject → DataObject → SdlConfigObject
- implicit val instanceRegistry: InstanceRegistry
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
listPartitions(implicit context: ActionPipelineContext): Seq[PartitionValues]
List partition values defined for this web service.
List partition values defined for this web service. Note that this is a fixed list.
- Definition Classes
- WebserviceFileDataObject → CanHandlePartitions
- Annotations
- @Scaladoc()
-
lazy val
logger: Logger
- Attributes
- protected
- Definition Classes
- SmartDataLakeLogger
- Annotations
- @transient()
-
val
metadata: Option[DataObjectMetadata]
Additional metadata for the DataObject
Additional metadata for the DataObject
- Definition Classes
- WebserviceFileDataObject → DataObject
- val mimeType: Option[String]
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- val partitionDefs: Seq[WebservicePartitionDefinition]
-
val
partitionLayout: Option[String]
Definition of partition layout use %<partitionColName>% as placeholder and * for globs in layout Note: if you have globs in partition layout, it's not possible to write files to this DataObject Note: if this is a directory, you must add a final backslash to the partition layout
Definition of partition layout use %<partitionColName>% as placeholder and * for globs in layout Note: if you have globs in partition layout, it's not possible to write files to this DataObject Note: if this is a directory, you must add a final backslash to the partition layout
- Definition Classes
- WebserviceFileDataObject → FileRefDataObject
-
def
partitions: Seq[String]
Definition of partition columns
Definition of partition columns
- Definition Classes
- WebserviceFileDataObject → CanHandlePartitions
-
def
path: String
No root path needed for Webservice.
No root path needed for Webservice. It can be included in webserviceOptions.url.
- Definition Classes
- WebserviceFileDataObject → FileDataObject
- Annotations
- @Scaladoc()
-
def
postResponse(body: Array[Byte], query: Option[String] = None): Array[Byte]
Calls webservice POST method with binary data as body
Calls webservice POST method with binary data as body
- body
post body as Byte Array, type will be determined by Tika
- query
optional URL with replaced placeholders to call
- returns
Response as Array[Byte]
- Annotations
- @Scaladoc()
-
def
postWrite(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Unit
Runs operations after writing to DataObject
Runs operations after writing to DataObject
- Definition Classes
- WebserviceFileDataObject → DataObject
-
def
prepare(implicit context: ActionPipelineContext): Unit
Prepare & test DataObject's prerequisits
Prepare & test DataObject's prerequisits
This runs during the "prepare" operation of the DAG.
- Definition Classes
- WebserviceFileDataObject → FileDataObject → DataObject
- val proxy: Option[HttpProxyConfig]
- val readTimeoutMs: Option[Int]
-
def
relativizePath(filePath: String)(implicit context: ActionPipelineContext): String
Make a given path relative to this DataObjects base path
Make a given path relative to this DataObjects base path
- Definition Classes
- WebserviceFileDataObject → FileDataObject
-
val
saveMode: SDLSaveMode
Overwrite or Append new data.
Overwrite or Append new data. When writing partitioned data, this applies only to partitions concerned.
- Definition Classes
- WebserviceFileDataObject → FileRefDataObject
-
val
separator: Char
default separator for paths
default separator for paths
- Attributes
- protected
- Definition Classes
- FileDataObject
- Annotations
- @Scaladoc()
-
def
startWritingOutputStreams(partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Unit
This is called before any output stream is created to initialize writing.
This is called before any output stream is created to initialize writing. It is used to apply SaveMode, e.g. deleting existing partitions.
- Definition Classes
- WebserviceFileDataObject → CanCreateOutputStream
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- val tika: Tika
- val timeouts: Option[HttpTimeoutConfig]
-
def
toStringShort: String
- Definition Classes
- DataObject
-
def
translateFileRefs(fileRefs: Seq[FileRef])(implicit context: ActionPipelineContext): Seq[FileRefMapping]
Given some FileRef for another DataObject, translate the paths to the root path of this DataObject
Given some FileRef for another DataObject, translate the paths to the root path of this DataObject
- Definition Classes
- FileRefDataObject
- Annotations
- @Scaladoc()
- val url: String
-
def
validateSchemaHasPartitionCols(df: DataFrame, role: String): Unit
Validate the schema of a given Spark Data Frame
dfthat it contains the specified partition columnsValidate the schema of a given Spark Data Frame
dfthat it contains the specified partition columns- df
The data frame to validate.
- role
role used in exception message. Set to read or write.
- Definition Classes
- CanHandlePartitions
- Annotations
- @Scaladoc()
- Exceptions thrown
SchemaViolationExceptionif the partitions columns are not included.
-
def
validateSchemaHasPrimaryKeyCols(df: DataFrame, primaryKeyCols: Seq[String], role: String): Unit
Validate the schema of a given Spark Data Frame
dfthat it contains the specified primary key columnsValidate the schema of a given Spark Data Frame
dfthat it contains the specified primary key columns- df
The data frame to validate.
- role
role used in exception message. Set to read or write.
- Definition Classes
- CanHandlePartitions
- Annotations
- @Scaladoc()
- Exceptions thrown
SchemaViolationExceptionif the partitions columns are not included.
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
- val writeMethod: WebserviceMethod
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated