object DeltaLoader extends Loader
- Alphabetic
- By Inheritance
- DeltaLoader
- Loader
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
insert(location: String, databaseName: String, tableName: String, updates: DataFrame, partitioning: List[String], format: String)(implicit spark: SparkSession): DataFrame
Insert or append data into a table Does not resolve duplicates
Insert or append data into a table Does not resolve duplicates
- location
full path of where the data will be located
- tableName
the name of the updated/created table
- updates
new data to be merged with existing data
- spark
a valid spark session
- returns
the data as a dataframe
- Definition Classes
- DeltaLoader → Loader
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
read(location: String, format: String, readOptions: Map[String, String])(implicit spark: SparkSession): DataFrame
Default read logic for a loader
Default read logic for a loader
- location
absolute path of where the data is
- format
string representing the format
- readOptions
read options
- spark
spark session
- returns
the data as a dataframe
- Definition Classes
- DeltaLoader → Loader
-
def
scd1(location: String, databaseName: String, tableName: String, updates: DataFrame, primaryKeys: Seq[String], oidName: String, createdOnName: String, updatedOnName: String, partitioning: List[String], format: String)(implicit spark: SparkSession): DataFrame
Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is not required.
Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is not required.
- location
full path of where the data will be located
- tableName
the name of the updated/created table
- updates
new data to be merged with existing data
- primaryKeys
name of the columns holding the unique id
- oidName
name of the column holding the hash of the column that can change over time (or version number)
- createdOnName
name of the column holding the creation timestamp
- updatedOnName
name of the column holding the last update timestamp
- spark
a valid spark session
- returns
the data as a dataframe
- Definition Classes
- DeltaLoader → Loader
-
def
scd2(location: String, databaseName: String, tableName: String, updates: DataFrame, primaryKeys: Seq[String], oidName: String, createdOnName: String, updatedOnName: String, partitioning: List[String], format: String, validFromName: String, validToName: String, minValidFromDate: LocalDate, maxValidToDate: LocalDate)(implicit spark: SparkSession): DataFrame
Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is required.
Update the data only if the data has changed Insert new data maintains updatedOn and createdOn timestamps for each record usually used for dimension table for which keeping the full historic is required.
- location
full path of where the data will be located
- tableName
the name of the updated/created table
- updates
new data to be merged with existing data
- primaryKeys
name of the columns holding the unique id
- oidName
name of the column holding the hash of the column that can change over time (or version number)
- createdOnName
name of the column holding the creation timestamp
- updatedOnName
name of the column holding the last update timestamp
- spark
a valid spark session
- returns
the data as a dataframe
- Definition Classes
- DeltaLoader → Loader
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
upsert(location: String, databaseName: String, tableName: String, updates: DataFrame, primaryKeys: Seq[String], partitioning: List[String], format: String)(implicit spark: SparkSession): DataFrame
Update or insert data into a table Resolves duplicates by using the list of primary key passed as argument
Update or insert data into a table Resolves duplicates by using the list of primary key passed as argument
- location
full path of where the data will be located
- tableName
the name of the updated/created table
- updates
new data to be merged with existing data
- primaryKeys
name of the columns holding the unique id
- spark
a valid spark session
- returns
the data as a dataframe
- Definition Classes
- DeltaLoader → Loader
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
def
writeOnce(location: String, databaseName: String, tableName: String, df: DataFrame, partitioning: List[String], format: String, dataChange: Boolean)(implicit spark: SparkSession): DataFrame
Overwrites the data located in output/tableName usually used for small/test tables.
Overwrites the data located in output/tableName usually used for small/test tables.
- location
full path of where the data will be located
- tableName
the name of the table
- df
the data to write
- dataChange
if the data is expected to be different from the data already written
- spark
a valid spark session
- returns
the data as a dataframe
- Definition Classes
- DeltaLoader → Loader
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated