object GenericLoader extends Loader
- Alphabetic
- By Inheritance
- GenericLoader
- Loader
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
insert(location: String, databaseName: String, tableName: String, updates: DataFrame, partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame
Insert or append data into a table Does not resolve duplicates
Insert or append data into a table Does not resolve duplicates
- location
full path of where the data will be located
- databaseName
database name
- tableName
the name of the updated/created table
- updates
new data to be merged with existing data
- partitioning
how the data should be partitioned
- format
spark format
- options
write options
- spark
a valid spark session
- returns
the data as a dataframe
- Definition Classes
- GenericLoader → Loader
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
val
log: Logger
- Definition Classes
- Loader
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
overwritePartition(location: String, databaseName: String, tableName: String, df: DataFrame, partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame
Keeps old partition and overwrite new partitions.
Keeps old partition and overwrite new partitions.
- location
where to write the data
- databaseName
database name
- tableName
table name
- df
new data to write into the table
- partitioning
how the data is partitionned
- format
format
- options
write options
- spark
a spark session
- returns
updated data
- Definition Classes
- GenericLoader → Loader
-
def
read(location: String, format: String, readOptions: Map[String, String], databaseName: Option[String], tableName: Option[String])(implicit spark: SparkSession): DataFrame
Default read logic for a loader
Default read logic for a loader
- location
absolute path of where the data is
- format
string representing the format
- readOptions
read options
- databaseName
Optional database name
- tableName
Optional table name
- spark
spark session
- returns
the data as a dataframe
- Definition Classes
- GenericLoader → Loader
-
def
scd1(location: String, databaseName: String, tableName: String, updates: DataFrame, primaryKeys: Seq[String], oidName: String, createdOnName: String, updatedOnName: String, partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame
Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is not required.
Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is not required.
- location
full path of where the data will be located
- tableName
the name of the updated/created table
- updates
new data to be merged with existing data
- primaryKeys
name of the columns holding the unique id
- oidName
name of the column holding the hash of the column that can change over time (or version number)
- createdOnName
name of the column holding the creation timestamp
- updatedOnName
name of the column holding the last update timestamp
- partitioning
column(s) used for partitioning
- format
spark format
- options
write options
- spark
a valid spark session
- returns
the data as a dataframe
- Definition Classes
- GenericLoader → Loader
-
def
scd2(location: String, databaseName: String, tableName: String, updates: DataFrame, primaryKeys: Seq[String], buidName: String, oidName: String, isCurrentName: String, partitioning: List[String], format: String, validFromName: String, validToName: String, options: Map[String, String], minValidFromDate: LocalDate, maxValidToDate: LocalDate)(implicit spark: SparkSession): DataFrame
Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is not required.
Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is not required.
- location
full path of where the data will be located
- tableName
the name of the updated/created table
- updates
new data to be merged with existing data
- primaryKeys
name of the columns holding the unique id
- buidName
name of the column holding the hash of the column that can change over time (or version number)
- oidName
name of the column holding the hash of the column that can change over time (or version number)
- isCurrentName
name of the column for the current version flag
- partitioning
list of columns used for partition
- format
spark format
- spark
a valid spark session
- returns
the data as a dataframe
- Definition Classes
- GenericLoader → Loader
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
upsert(location: String, databaseName: String, tableName: String, updates: DataFrame, primaryKeys: Seq[String], partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame
Update or insert data into a table Resolves duplicates by using the list of primary key passed as argument
Update or insert data into a table Resolves duplicates by using the list of primary key passed as argument
- location
full path of where the data will be located
- tableName
the name of the updated/created table
- updates
new data to be merged with existing data
- primaryKeys
name of the columns holding the unique id
- format
spark format
- options
write options
- spark
a valid spark session
- returns
the data as a dataframe
- Definition Classes
- GenericLoader → Loader
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
- def write(df: DataFrame, format: String, mode: SaveMode, partitioning: List[String], databaseName: String, tableName: String, location: String, options: Map[String, String]): DataFrame
-
def
writeOnce(location: String, databaseName: String, tableName: String, df: DataFrame, partitioning: List[String], format: String, options: Map[String, String])(implicit spark: SparkSession): DataFrame
Overwrites the data located in output/tableName usually used for small/test tables.
Overwrites the data located in output/tableName usually used for small/test tables.
- location
where to write the data
- databaseName
database name
- tableName
table name
- df
new data to write into the table
- partitioning
how the data is partitionned
- format
format
- options
write options
- spark
a spark session
- returns
updated data
- Definition Classes
- GenericLoader → Loader
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated