Packages

object VcfLoader extends Loader

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. VcfLoader
  2. Loader
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  8. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  9. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  10. def insert(location: String, databaseName: String, tableName: String, updates: DataFrame, partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame

    Insert or append data into a table Does not resolve duplicates

    Insert or append data into a table Does not resolve duplicates

    location

    full path of where the data will be located

    databaseName

    database name

    tableName

    the name of the updated/created table

    updates

    new data to be merged with existing data

    partitioning

    how the data should be partitioned

    format

    spark format

    options

    write options

    spark

    a valid spark session

    returns

    the data as a dataframe

    Definition Classes
    VcfLoaderLoader
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. val log: Logger
    Definition Classes
    Loader
  13. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  15. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  16. def overwritePartition(location: String, databaseName: String, tableName: String, df: DataFrame, partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame

    Keeps old partition and overwrite new partitions.

    Keeps old partition and overwrite new partitions.

    location

    where to write the data

    databaseName

    database name

    tableName

    table name

    df

    new data to write into the table

    partitioning

    how the data is partitionned

    format

    format

    options

    write options

    spark

    a spark session

    returns

    updated data

    Definition Classes
    VcfLoaderLoader
  17. def read(location: String, format: String, readOptions: Map[String, String], databaseName: Option[String], tableName: Option[String])(implicit spark: SparkSession): DataFrame

    Default read logic for a loader

    Default read logic for a loader

    location

    absolute path of where the data is

    format

    string representing the format

    readOptions

    read options

    databaseName

    Optional database name

    tableName

    Optional table name

    spark

    spark session

    returns

    the data as a dataframe

    Definition Classes
    VcfLoaderLoader
  18. def scd1(location: String, databaseName: String, tableName: String, updates: DataFrame, primaryKeys: Seq[String], oidName: String, createdOnName: String, updatedOnName: String, partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame

    Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is not required.

    Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is not required.

    location

    full path of where the data will be located

    tableName

    the name of the updated/created table

    updates

    new data to be merged with existing data

    primaryKeys

    name of the columns holding the unique id

    oidName

    name of the column holding the hash of the column that can change over time (or version number)

    createdOnName

    name of the column holding the creation timestamp

    updatedOnName

    name of the column holding the last update timestamp

    partitioning

    column(s) used for partitioning

    format

    spark format

    options

    write options

    spark

    a valid spark session

    returns

    the data as a dataframe

    Definition Classes
    VcfLoaderLoader
  19. def scd2(location: String, databaseName: String, tableName: String, updates: DataFrame, primaryKeys: Seq[String], buidName: String, oidName: String, isCurrentName: String, partitioning: List[String], format: String, validFromName: String, validToName: String, options: Map[String, String], minValidFromDate: LocalDate, maxValidToDate: LocalDate)(implicit spark: SparkSession): DataFrame

    Update the data only if the data has changed Insert new data When the data has changed, a new line is created while the old line is kept.

    Update the data only if the data has changed Insert new data When the data has changed, a new line is created while the old line is kept. usually used for dimension table for which keeping the full historic is required.

    location

    full path of where the data will be located

    tableName

    the name of the updated/created table

    updates

    new data to be merged with existing data

    primaryKeys

    name of the columns holding the unique id

    buidName

    name of the column holding the hash of the column that can change over time (or version number)

    oidName

    name of the column holding the hash of the column that can change over time (or version number)

    isCurrentName

    name of the column for the current version flag

    partitioning

    list of columns used for partition

    format

    spark format

    spark

    a spark session

    Definition Classes
    VcfLoaderLoader
  20. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  21. def toString(): String
    Definition Classes
    AnyRef → Any
  22. def upsert(location: String, databaseName: String, tableName: String, updates: DataFrame, primaryKeys: Seq[String], partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame

    Update or insert data into a table Resolves duplicates by using the list of primary key passed as argument

    Update or insert data into a table Resolves duplicates by using the list of primary key passed as argument

    location

    full path of where the data will be located

    tableName

    the name of the updated/created table

    updates

    new data to be merged with existing data

    primaryKeys

    name of the columns holding the unique id

    format

    spark format

    options

    write options

    spark

    a valid spark session

    returns

    the data as a dataframe

    Definition Classes
    VcfLoaderLoader
  23. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  25. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. def writeOnce(location: String, databaseName: String, tableName: String, df: DataFrame, partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame

    Overwrites the data located in output/tableName usually used for small/test tables.

    Overwrites the data located in output/tableName usually used for small/test tables.

    location

    where to write the data

    databaseName

    database name

    tableName

    table name

    df

    new data to write into the table

    partitioning

    how the data is partitionned

    format

    format

    options

    write options

    spark

    a spark session

    returns

    updated data

    Definition Classes
    VcfLoaderLoader

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from Loader

Inherited from AnyRef

Inherited from Any

Ungrouped