trait AuditTable extends AnyRef
Main abstraction for an audit table that a client application must use to store records with a timestamp. It hides all details of the physical storage, so that client apps can use various file systems (Ex: HDFS, ADLS, S3, Local, etc) or key value (Ex: HBase).
Also this abstraction can produce a snapshot of data de-duplicated on the primary key and true to the specified moment in time.
Also surfaces custom attributes initialised during table creation, so that client applications do not need to worry about storing the relevant metadata in a separate storage. It also will simplify backup, restore and sharing of data between environments.
Some storage layers might be quite inefficient when it comes to storing lots of appends in multiple files and storage optimisation, aka compaction, should not intervene with normal operation of the application. Therefore application should be able to control when compaction can take place.
An instance of AuditTable represents a functional state, if data was modified, do not use it again.
There are 2 types of operations on the table:
- data extraction - which do not modify the state of the table, thus same instance of the AuditTable can be used for multiple data extraction operations; 2. data mutators - adding data to the table, optimising storage. These lead to new state of the underlying storage and the same instance of AuditTable can not be used for data mutators again.
Created by Alexei Perelighin on 2018/03/03
- Alphabetic
- By Inheritance
- AuditTable
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Abstract Value Members
-
abstract
def
allBetween(from: Option[Timestamp], to: Option[Timestamp]): Option[Dataset[_]]
Include all records between the given timestamps.
Include all records between the given timestamps.
- returns
if no data in storage layer, return None
-
abstract
def
append(ds: Dataset[_], lastUpdated: Column, appendTS: Timestamp): Try[(AuditTable, Long)]
Appends a new set of records to the audit table.
Appends a new set of records to the audit table.
Fails when is called second time on same instance.
- ds
records to append
- lastUpdated
column that returns java.sql.Timestamp that will be used for de-duplication on the primary keys
- appendTS
timestamp of when the append has happened. It will not be used for de-duplications
- returns
(new state of the AuditTable, count of appended records) or error
-
abstract
def
compact(compactTS: Timestamp, trashMaxAge: Duration, smallRegionRowThreshold: Long, compactionPartitioner: CompactionPartitioner, recompactAll: Boolean = false): Try[AuditTable]
Request optimisation of the storage layer.
Request optimisation of the storage layer.
Fails when is called second time on same instance.
- compactTS
timestamp of when the compaction is requested, will not be used for any filtering of the data
- trashMaxAge
Maximum age of old region files kept in the .Trash folder after a compaction has happened.
- smallRegionRowThreshold
the row number threshold to use for determining small regions to be compacted.
- compactionPartitioner
a partitioner function that dictates how many partitions should be generated for a given region
- recompactAll
Whether to recompact all regions regardless of size (i.e. ignore smallRegionRowThreshold)
- returns
new state of the AuditTable
-
abstract
def
getLatestTimestamp(): Option[Timestamp]
Returns latest timestamp of records stored in the audit table.
-
abstract
def
initNewTable(): Try[AuditTable]
Initializes audit table in the storage layer.
Initializes audit table in the storage layer. It will also persist all of the metadata (name, primary keys, custom meta) to the storage layer.
- returns
new state of the table or error
-
abstract
def
meta: Map[String, String]
Custom attributes assigned by the client application during table creation.
- abstract def regions: Seq[AuditTableRegionInfo]
-
abstract
def
snapshot(ts: Timestamp): Option[Dataset[_]]
Generates snapshot that contains only the latest records for the given timestamp.
Generates snapshot that contains only the latest records for the given timestamp. De-duplication happens on the primary keys.
- ts
use records that are closest to this timestamp
- returns
if no data in storage layer, return None
-
abstract
def
tableName: String
Name of the table.
-
abstract
def
updateTableInfo(tableInfo: AuditTableInfo): Try[AuditTable]
Update the metadata for this table
Update the metadata for this table
- tableInfo
the new metadata
- returns
new state of the AuditTable
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()