object Storage extends Logging
Contains methods to create tables, open tables.
Created by Alexei Perelighin on 2018/04/11
- Alphabetic
- By Inheritance
- Storage
- Logging
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
createFileTable(sparkSession: SparkSession, basePath: Path, tableInfo: AuditTableInfo): Try[AuditTable]
Creates file table with default configurations, region ids will Longs with zeros added to the left to make it 20 chars.
Creates file table with default configurations, region ids will Longs with zeros added to the left to make it 20 chars.
- basePath
parent folder which contains folders with table names
- tableInfo
table metadata
- Exceptions thrown
StorageException: Storage exceptions when: 1) primary keys are not specified; 2) the folder already exists
-
def
createFops(sparkSession: SparkSession, basePath: Path): FileStorageOps
Creates File Operations object that is a bridge between the process actions and actual storage and handle write to temp with move to permanent operations.
Creates File Operations object that is a bridge between the process actions and actual storage and handle write to temp with move to permanent operations.
- basePath
parent folder which contains folders with table names, .tmp and .Trash folders will be underneath.
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
getOrCreateFileTables(sparkSession: SparkSession, basePath: Path, tableNames: Seq[String], metadataRetrieval: Option[(String) ⇒ AuditTableInfo], updateTableMetadata: ⇒ Boolean, includeHot: Boolean = true): Seq[AuditTable]
Opens or creates a storage layer table.
Opens or creates a storage layer table. Creates a table if it does not already exist in the storage layer and the optional
metadataRetrievalfunction is given. Fails if the table does not exist in the storage layer and the optionalmetadataRetrievalfunction is not given.- sparkSession
Spark Session object
- basePath
Base path of the storage directory
- tableNames
the tables we want to open in the storage layer
- metadataRetrieval
an optional function that generates table metadata from a table name. This function is used during table creation if a table does not exist in the storage layer or to update the metadata if updateTableMetadata is set to true
- updateTableMetadata
whether or not to update the table metadata
- includeHot
whether or not to include hot partitions in the read
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
def
isTraceEnabled(): Boolean
- Attributes
- protected
- Definition Classes
- Logging
-
def
logAndReturn[A](a: A, msg: String, level: Level): A
- Definition Classes
- Logging
-
def
logAndReturn[A](a: A, message: (A) ⇒ String, level: Level): A
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logDebug(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logError(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logInfo(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logName: String
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logTrace(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String, throwable: Throwable): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
def
logWarning(msg: ⇒ String): Unit
- Attributes
- protected
- Definition Classes
- Logging
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
def
openFileTables(sparkSession: SparkSession, basePath: Path, tableNames: Seq[String], includeHot: Boolean = true): (Map[String, Try[AuditTable]], Seq[String])
Opens multiple tables, reads their metadata and reads the state of the table regions (ids, count of records per region, values of region latest timestamp).
Opens multiple tables, reads their metadata and reads the state of the table regions (ids, count of records per region, values of region latest timestamp).
You can specify to only read the cold (compacted) state of the table. This is useful to avoid consistency issues during development on tables into which data is currently flowing. This is useful to avoid consistency issues during development on tables into which data is currently flowing.
- basePath
parent folder which contains folders with table names
- tableNames
list of tables to open
- includeHot
include freshly appended data that has not been compacted yet. Useful for using production data in development.
- returns
(Map[TABLE NAME, AuditTable], Seq[MISSING TABLES]) - audit table objects that exist and of table names that were not found under the basePath
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
writeToFileTable(table: AuditTable, toAppend: Dataset[_], lastUpdatedCol: String, appendDateTime: ZonedDateTime, doCompaction: CompactionDecision, recompactAll: Boolean, trashMaxAge: Duration, smallRegionRowThreshold: Long, compactionPartitioner: CompactionPartitioner): Unit
Writes a Dataset to the storage layer.
Writes a Dataset to the storage layer. This function allows you to set various parameters directory (e.g. recompaction all, compactor to use, trash max age, region row threshold) and is called by an identical function that takes this information from a Spark context. This specific function should be used if you want to directly set these configuration parameters in the API yourself, however use the less specific function if you wish for these parameters to be handled for you.
- table
to append data to
- toAppend
dataset to append to the table
- lastUpdatedCol
the last updated column in the Dataset
- appendDateTime
timestamp of the append, zoned to a timezone
- doCompaction
a lambda used to decide whether a compaction should happen after an append. Takes list of table regions, the count of records added in this batch and the compaction zoned date time.
- recompactAll
Whether to force a recompaction of all regions (expensive)
- trashMaxAge
Maximum age of trashed region before it is deleted
- smallRegionRowThreshold
Threshold of a region before it is no longer considered for compaction
- compactionPartitioner
The compaction partitioner to use when performing a compaction
-
def
writeToFileTable(flowContext: FlowContext, table: AuditTable, toAppend: Dataset[_], lastUpdatedCol: String, appendDateTime: ZonedDateTime, doCompaction: CompactionDecision): Unit
Writes a Dataset to the storage layer.
Writes a Dataset to the storage layer. This function will use various configuration parameters given in the
flowContext(e.g. recompaction all, compactor to use, trash max age, region row threshold) and call a more specific function below. The more specific function should be used if you want to directly set these configuration parameters in the API yourself.- flowContext
flow context object
- table
to append data to
- toAppend
dataset to append to the table
- lastUpdatedCol
the last updated column in the Dataset
- appendDateTime
timestamp of the append, zoned to a timezone
- doCompaction
a lambda used to decide whether a compaction should happen after an append. Takes list of table regions, the count of records added in this batch and the compaction zoned date time.