class RawFileToNormalizedETL extends RawToNormalizedETL
- Alphabetic
- By Inheritance
- RawFileToNormalizedETL
- RawToNormalizedETL
- ETLSingleDestination
- ETL
- Runnable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new RawFileToNormalizedETL(source: DatasetConf, mainDestination: DatasetConf, transformations: List[Transformation])(implicit conf: Configuration)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
implicit
val
conf: Configuration
- Definition Classes
- RawFileToNormalizedETL → RawToNormalizedETL → ETL
-
def
defaultRepartition: (DataFrame) ⇒ DataFrame
- Definition Classes
- ETL
-
def
defaultSampling: PartialFunction[String, (DataFrame) ⇒ DataFrame]
- Definition Classes
- ETL
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
extract(lastRunDateTime: LocalDateTime, currentRunDateTime: LocalDateTime)(implicit spark: SparkSession): Map[String, DataFrame]
Reads data from a file system and produce a Map[DatasetConf, DataFrame].
Reads data from a file system and produce a Map[DatasetConf, DataFrame]. This method should avoid transformation and joins but can implement filters in order to make the ETL more efficient.
- spark
an instance of SparkSession
- returns
all the data needed to pass to the transform method and produce the desired output.
- Definition Classes
- RawToNormalizedETL → ETL
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
getLastRunDateFor(ds: DatasetConf)(implicit spark: SparkSession): LocalDateTime
If possible, fetch the last run date time from the dataset passed in argument
If possible, fetch the last run date time from the dataset passed in argument
- ds
dataset
- spark
a spark session
- returns
the last run date or the minDateTime
- Definition Classes
- ETL
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
load(data: Map[String, DataFrame], lastRunDateTime: LocalDateTime, currentRunDateTime: LocalDateTime, repartition: (DataFrame) ⇒ DataFrame)(implicit spark: SparkSession): Map[String, DataFrame]
Loads the output data into a persistent storage.
Loads the output data into a persistent storage. The output destination can be any of: object store, database or flat files...
- data
output data produced by the transform method.
- spark
an instance of SparkSession
- Definition Classes
- ETLSingleDestination → ETL
-
def
loadDataset(df: DataFrame, ds: DatasetConf, repartition: (DataFrame) ⇒ DataFrame)(implicit spark: SparkSession): DataFrame
- Definition Classes
- ETL
-
def
loadSingle(data: DataFrame, lastRunDateTime: LocalDateTime = minDateTime, currentRunDateTime: LocalDateTime = LocalDateTime.now(), repartition: (DataFrame) ⇒ DataFrame = defaultRepartition)(implicit spark: SparkSession): DataFrame
- Definition Classes
- ETLSingleDestination
-
val
log: Logger
- Definition Classes
- ETL
-
val
mainDestination: DatasetConf
- Definition Classes
- RawFileToNormalizedETL → RawToNormalizedETL → ETL
-
val
maxDateTime: LocalDateTime
- Definition Classes
- ETL
-
val
minDateTime: LocalDateTime
- Definition Classes
- ETL
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
publish()(implicit spark: SparkSession): Unit
OPTIONAL - Contains all actions needed to be done in order to make the data available to users like creating a view with the data.
OPTIONAL - Contains all actions needed to be done in order to make the data available to users like creating a view with the data.
- spark
an instance of SparkSession
- Definition Classes
- RawFileToNormalizedETL → ETL
-
def
replaceWhere: Option[String]
replaceWhere is used in for OverWriteStaticPartition load.
replaceWhere is used in for OverWriteStaticPartition load. It avoids to compute dataframe to infer which partitions to replace. Most of the time, these partitions can be inferred statically. Always prefer that to dynamically overwrite partitions.
- Definition Classes
- ETL
-
def
reset()(implicit spark: SparkSession): Unit
Reset the ETL by removing the destination dataset.
Reset the ETL by removing the destination dataset.
- Definition Classes
- RawFileToNormalizedETL → ETL
-
def
run(runSteps: Seq[RunStep] = RunStep.default_load, lastRunDateTime: Option[LocalDateTime] = None, currentRunDateTime: Option[LocalDateTime] = None)(implicit spark: SparkSession): Map[String, DataFrame]
Entry point of the etl - execute this method in order to run the whole ETL
-
def
sampling: PartialFunction[String, (DataFrame) ⇒ DataFrame]
Logic used when the ETL is run as a SAMPLE_LOAD
Logic used when the ETL is run as a SAMPLE_LOAD
- Definition Classes
- ETL
-
val
source: DatasetConf
- Definition Classes
- RawFileToNormalizedETL → RawToNormalizedETL
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toMain(df: ⇒ DataFrame): Map[String, DataFrame]
- Definition Classes
- ETL
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
transform(data: Map[String, DataFrame], lastRunDateTime: LocalDateTime = minDateTime, currentRunDateTime: LocalDateTime = LocalDateTime.now())(implicit spark: SparkSession): Map[String, DataFrame]
Takes a Map[DatasetConf, DataFrame] as input and apply a set of transformation to it to produce the ETL output.
Takes a Map[DatasetConf, DataFrame] as input and apply a set of transformation to it to produce the ETL output. It is recommended to not read any additional data but to use the extract() method instead to inject input data.
- data
input data
- spark
an instance of SparkSession
- Definition Classes
- ETLSingleDestination → ETL
-
def
transformSingle(data: Map[String, DataFrame], lastRunDateTime: LocalDateTime, currentRunDateTime: LocalDateTime)(implicit spark: SparkSession): DataFrame
Takes a Map[DataSource, DataFrame] as input and apply a set of transformation to it to produce the ETL output.
Takes a Map[DataSource, DataFrame] as input and apply a set of transformation to it to produce the ETL output. It is recommended to not read any additional data but to use the extract() method instead to inject input data.
- data
input data
- spark
an instance of SparkSession
- Definition Classes
- RawFileToNormalizedETL → RawToNormalizedETL → ETLSingleDestination
-
val
transformations: List[Transformation]
- Definition Classes
- RawFileToNormalizedETL → RawToNormalizedETL
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated