class RawToNormalizedETL extends ETL
- Alphabetic
- By Inheritance
- RawToNormalizedETL
- ETL
- Runnable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new RawToNormalizedETL(source: DatasetConf, mainDestination: DatasetConf, transformations: List[Transformation])(implicit conf: Configuration)
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
implicit
val
conf: Configuration
- Definition Classes
- RawToNormalizedETL → ETL
-
def
defaultRepartition: (DataFrame) ⇒ DataFrame
- Definition Classes
- ETL
-
val
defaultRowPerPartition: Int
- Definition Classes
- ETL
-
def
defaultSampling: PartialFunction[String, (DataFrame) ⇒ DataFrame]
- Definition Classes
- ETL
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
extract(lastRunDateTime: LocalDateTime, currentRunDateTime: LocalDateTime)(implicit spark: SparkSession): Map[String, DataFrame]
Reads data from a file system and produce a Map[DatasetConf, DataFrame].
Reads data from a file system and produce a Map[DatasetConf, DataFrame]. This method should avoid transformation and joins but can implement filters in order to make the ETL more efficient.
- spark
an instance of SparkSession
- returns
all the data needed to pass to the transform method and produce the desired output.
- Definition Classes
- RawToNormalizedETL → ETL
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
getLastRunDateFor(ds: DatasetConf)(implicit spark: SparkSession): LocalDateTime
If possible, fetch the last run date time from the dataset passed in argument
If possible, fetch the last run date time from the dataset passed in argument
- ds
dataset
- spark
a spark session
- returns
the last run date or the minDateTime
- Definition Classes
- ETL
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
load(data: Map[String, DataFrame], lastRunDateTime: LocalDateTime = minDateTime, currentRunDateTime: LocalDateTime = LocalDateTime.now(), repartition: (DataFrame) ⇒ DataFrame = defaultRepartition)(implicit spark: SparkSession): Map[String, DataFrame]
Loads the output data into a persistent storage.
Loads the output data into a persistent storage. The output destination can be any of: object store, database or flat files...
- data
output data produced by the transform method.
- spark
an instance of SparkSession
- Definition Classes
- ETL
-
val
log: Logger
- Definition Classes
- ETL
-
val
mainDestination: DatasetConf
- Definition Classes
- RawToNormalizedETL → ETL
-
val
maxDateTime: LocalDateTime
- Definition Classes
- ETL
-
val
minDateTime: LocalDateTime
- Definition Classes
- ETL
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
publish()(implicit spark: SparkSession): Unit
OPTIONAL - Contains all actions needed to be done in order to make the data available to users like creating a view with the data.
OPTIONAL - Contains all actions needed to be done in order to make the data available to users like creating a view with the data.
- spark
an instance of SparkSession
- Definition Classes
- ETL
-
def
reset()(implicit spark: SparkSession): Unit
Reset the ETL by removing the destination dataset.
Reset the ETL by removing the destination dataset.
- Definition Classes
- ETL
-
def
run(runSteps: Seq[RunStep] = RunStep.default_load, lastRunDateTime: Option[LocalDateTime] = None, currentRunDateTime: Option[LocalDateTime] = None)(implicit spark: SparkSession): Map[String, DataFrame]
Entry point of the etl - execute this method in order to run the whole ETL
-
def
sampling: PartialFunction[String, (DataFrame) ⇒ DataFrame]
Logic used when the ETL is run as a SAMPLE_LOAD
Logic used when the ETL is run as a SAMPLE_LOAD
- Definition Classes
- ETL
- val source: DatasetConf
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
transform(data: Map[String, DataFrame], lastRunDateTime: LocalDateTime, currentRunDateTime: LocalDateTime)(implicit spark: SparkSession): Map[String, DataFrame]
Takes a Map[DataSource, DataFrame] as input and apply a set of transformation to it to produce the ETL output.
Takes a Map[DataSource, DataFrame] as input and apply a set of transformation to it to produce the ETL output. It is recommended to not read any additional data but to use the extract() method instead to inject input data.
- data
input data
- spark
an instance of SparkSession
- Definition Classes
- RawToNormalizedETL → ETL
- val transformations: List[Transformation]
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated