class Genes extends ETL
- Alphabetic
- By Inheritance
- Genes
- ETL
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
- new Genes()(implicit conf: Configuration)
Type Members
- implicit class DataFrameOps extends AnyRef
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
implicit
val
conf: Configuration
- Definition Classes
- ETL
- val cosmic_gene_set: DatasetConf
- val ddd_gene_set: DatasetConf
- val destination: DatasetConf
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
extract(lastRunDateTime: LocalDateTime, currentRunDateTime: LocalDateTime)(implicit spark: SparkSession): Map[String, DataFrame]
Reads data from a file system and produce a Map[DatasetConf, DataFrame].
Reads data from a file system and produce a Map[DatasetConf, DataFrame]. This method should avoid transformation and joins but can implement filters in order to make the ETL more efficient.
- spark
an instance of SparkSession
- returns
all the data needed to pass to the transform method and produce the desired output.
-
val
fs: FileSystem
Default file system
Default file system
- Definition Classes
- ETL
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
getLastRunDateFor(ds: DatasetConf)(implicit spark: SparkSession): LocalDateTime
If possible, fetch the last run date time from the dataset passed in argument
If possible, fetch the last run date time from the dataset passed in argument
- ds
dataset
- spark
a spark session
- returns
the last run date or the minDateTime
- Definition Classes
- ETL
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- val hpo_gene_set: DatasetConf
- val human_genes: DatasetConf
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
load(data: DataFrame, lastRunDateTime: LocalDateTime, currentRunDateTime: LocalDateTime)(implicit spark: SparkSession): DataFrame
Loads the output data into a persistent storage.
-
val
maxDateTime: LocalDateTime
- Definition Classes
- ETL
-
val
minDateTime: LocalDateTime
- Definition Classes
- ETL
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- val omim_gene_set: DatasetConf
- val orphanet_gene_set: DatasetConf
-
def
publish()(implicit spark: SparkSession): Unit
OPTIONAL - Contains all actions needed to be done in order to make the data available to users like creating a view with the data.
OPTIONAL - Contains all actions needed to be done in order to make the data available to users like creating a view with the data.
- spark
an instance of SparkSession
- Definition Classes
- ETL
-
def
reset(): Unit
Reset the ETL by removing the destination dataset.
Reset the ETL by removing the destination dataset.
- Definition Classes
- ETL
-
def
run()(implicit spark: SparkSession): DataFrame
Entry point of the etl - execute this method in order to run the whole ETL
Entry point of the etl - execute this method in order to run the whole ETL
- spark
an instance of SparkSession
- Definition Classes
- ETL
-
def
run(lastRunDateTime: LocalDateTime = minDateTime, currentRunDateTime: LocalDateTime = LocalDateTime.now())(implicit spark: SparkSession): DataFrame
Entry point of the etl - execute this method in order to run the whole ETL for a specific date
Entry point of the etl - execute this method in order to run the whole ETL for a specific date
- lastRunDateTime
the last time this etl was run. default is minDateTime
- currentRunDateTime
the time at which the etl needs to be ran, usually now().
- spark
an instance of SparkSession
- Definition Classes
- ETL
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
transform(data: Map[String, DataFrame], lastRunDateTime: LocalDateTime, currentRunDateTime: LocalDateTime)(implicit spark: SparkSession): DataFrame
Takes a Map[DatasetConf, DataFrame] as input and apply a set of transformation to it to produce the ETL output.
Takes a Map[DatasetConf, DataFrame] as input and apply a set of transformation to it to produce the ETL output. It is recommended to not read any additional data but to use the extract() method instead to inject input data.
- data
input data
- spark
an instance of SparkSession
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated