package storage
- Alphabetic
- Public
- All
Type Members
-
trait
AuditTable extends AnyRef
Main abstraction for an audit table that a client application must use to store records with a timestamp.
Main abstraction for an audit table that a client application must use to store records with a timestamp. It hides all details of the physical storage, so that client apps can use various file systems (Ex: HDFS, ADLS, S3, Local, etc) or key value (Ex: HBase).
Also this abstraction can produce a snapshot of data de-duplicated on the primary key and true to the specified moment in time.
Also surfaces custom attributes initialised during table creation, so that client applications do not need to worry about storing the relevant metadata in a separate storage. It also will simplify backup, restore and sharing of data between environments.
Some storage layers might be quite inefficient when it comes to storing lots of appends in multiple files and storage optimisation, aka compaction, should not intervene with normal operation of the application. Therefore application should be able to control when compaction can take place.
An instance of AuditTable represents a functional state, if data was modified, do not use it again.
There are 2 types of operations on the table:
- data extraction - which do not modify the state of the table, thus same instance of the AuditTable can be used for multiple data extraction operations; 2. data mutators - adding data to the table, optimising storage. These lead to new state of the underlying storage and the same instance of AuditTable can not be used for data mutators again.
Created by Alexei Perelighin on 2018/03/03
-
class
AuditTableFile extends AuditTable with Logging
Implementation of the AuditTable which is backed up by append only block storage like HDFS.
Implementation of the AuditTable which is backed up by append only block storage like HDFS.
Created by Alexei Perelighin on 2018/03/03
-
case class
AuditTableInfo(table_name: String, primary_keys: Seq[String], meta: Map[String, String], retain_history: Boolean) extends Product with Serializable
Static information about the table, that is persisted when audit table is initialised.
Static information about the table, that is persisted when audit table is initialised.
- table_name
name of the table
- primary_keys
list of columns that make up primary key, these will be used for snapshot generation and record deduplication
- meta
application/custom metadata that will not be used in this library.
- retain_history
whether to retain history for this table. If set to false, the table will be deduplicated on every compaction
-
case class
AuditTableRegionInfo(table_name: String, store_type: String, store_region: String, created_on: Timestamp, is_deprecated: Boolean, count: Long, max_last_updated: Timestamp) extends Product with Serializable
- table_name
name of the table
- store_type
cold or hot, appended regions are added to hot and after compaction make it into cold. Cold regions can also be compacted
- store_region
id of the region, for simplicity, at least for now it will be GUID
- created_on
timestamp when region was created as a result of an append or compact operation
- is_deprecated
true - its data was compacted into another region, false - it was not compacted
- count
number of records in the region, can be used for optimisation and compaction decisions
- max_last_updated
all records in the audit table will contain column that shows the last updated time, this will be used to generated ingestion queries
- trait CompactionPartitionerGenerator extends AnyRef
-
trait
FileStorageOps extends AnyRef
Contains operations that interact with physical storage.
Contains operations that interact with physical storage. Will also handle commit to the file system.
Created by Alexei Perelighin on 2018/03/05
-
class
FileStorageOpsWithStaging extends FileStorageOps with Logging
Implementation around FileSystem and SparkSession with temporary and trash folders.
-
case class
StorageException(text: String, cause: Throwable = null) extends RuntimeException with Product with Serializable
Is thrown by storage layer.
Is thrown by storage layer.
Created by Alexei Perelighin on 2018/03/04
Value Members
- object AuditTable
- object AuditTableFile extends Logging
- object CompactionPartitionerGenerator
-
object
Storage extends Logging
Contains methods to create tables, open tables.
Contains methods to create tables, open tables.
Created by Alexei Perelighin on 2018/04/11
-
object
StorageActions extends Logging
Created by Vicky Avison on 11/05/18.
-
object
TotalBytesPartitioner extends CompactionPartitionerGenerator
A compaction partitioner that partitions on the approximate maximum number of bytes to be in each partition file
-
object
TotalCellsPartitioner extends CompactionPartitionerGenerator
A compaction partitioner that partitions on the approximate maximum number of cells (numRows * numColumns) to be in each partition file