case class DatasetConf(id: String, storageid: String, path: String, format: Format, loadtype: LoadType, table: Option[TableConf] = None, keys: List[String] = List(), partitionby: List[String] = List(), readoptions: Map[String, String] = Map(), writeoptions: Map[String, String] = WriteOptions.DEFAULT_OPTIONS, documentationpath: Option[String] = None, view: Option[TableConf] = None) extends Product with Serializable
Abstraction on a dataset configuration
- storageid
an alias designating where the data is sitting. this can point to an object store url in the configuration like s3://my-bucket/
- path
the relative path from the root of the storage to the dataset. ie, /raw/my-system/my-source
- format
data format
- loadtype
how the data is written
- table
OPTIONAL - configuration of a table associated to the dataset
- readoptions
OPTIONAL - read options to pass to spark in order to read the data into a DataFrame
- writeoptions
OPTIONAL - write options to pass to spark in order to write the data into files
- documentationpath
OPTIONAL - where the documentation is located.
- view
OPTIONAL - schema of the view pointing to the concrete table
- Alphabetic
- By Inheritance
- DatasetConf
- Serializable
- Serializable
- Product
- Equals
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Instance Constructors
-
new
DatasetConf(id: String, storageid: String, path: String, format: Format, loadtype: LoadType, table: Option[TableConf] = None, keys: List[String] = List(), partitionby: List[String] = List(), readoptions: Map[String, String] = Map(), writeoptions: Map[String, String] = WriteOptions.DEFAULT_OPTIONS, documentationpath: Option[String] = None, view: Option[TableConf] = None)
- storageid
an alias designating where the data is sitting. this can point to an object store url in the configuration like s3://my-bucket/
- path
the relative path from the root of the storage to the dataset. ie, /raw/my-system/my-source
- format
data format
- loadtype
how the data is written
- table
OPTIONAL - configuration of a table associated to the dataset
- readoptions
OPTIONAL - read options to pass to spark in order to read the data into a DataFrame
- writeoptions
OPTIONAL - write options to pass to spark in order to write the data into files
- documentationpath
OPTIONAL - where the documentation is located.
- view
OPTIONAL - schema of the view pointing to the concrete table
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
buid: String
A dataset BUID is the column representing the business identifier.
A dataset BUID is the column representing the business identifier. If the dataset defines a table the buid column name is the table name followed by '_buid'.
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
- val documentationpath: Option[String]
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- val format: Format
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- val id: String
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val keys: List[String]
- val loadtype: LoadType
-
def
location(implicit config: Configuration): String
the absolute path where the dataset is stored
the absolute path where the dataset is stored
- config
configuration currently loaded
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
oid: String
A dataset OID is the column representing the object identifier, in other words: it is a hash representing all the column expect the BUID.
A dataset OID is the column representing the object identifier, in other words: it is a hash representing all the column expect the BUID. If the dataset defines a table the oid column name is the table name followed by '_oid'.
- val partitionby: List[String]
- val path: String
-
def
read(implicit config: Configuration, spark: SparkSession): DataFrame
Using an instance of Spark and the current configuration, reads the dataset from either the tableName or from the location.
Using an instance of Spark and the current configuration, reads the dataset from either the tableName or from the location.
- config
configuration currently loaded
- spark
instance of SparkSession
- val readoptions: Map[String, String]
-
def
rootPath(implicit config: Configuration): String
The absolute path of the root of the storage where the dataset is stored.
The absolute path of the root of the storage where the dataset is stored.
- config
configuration currently loaded
- val storageid: String
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
- val table: Option[TableConf]
-
def
uid: String
A dataset UID is the column representing the unique identifier.
A dataset UID is the column representing the unique identifier. If the dataset defines a table the uid column name is the table name followed by '_uid'. In most cases the uid is the hash of the keys.
- val view: Option[TableConf]
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
- val writeoptions: Map[String, String]
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated