com.coxautodata.waimak.storage.StorageActions
Opens a storage layer table and adds the AuditTable object to the flow with a given label.
Opens a storage layer table and adds the AuditTable object to the flow with a given label. This can then be used with the writeToStorage action. Fails if the table does not exist in the storage layer.
the base path of the storage layer
optionally prefix the output label for the AuditTable.
If set, the label of the AuditTable will be s"${labelPrefix}_$table"
whether or not to include hot partitions in the read
the tables we want to open in the storage layer
a new SparkDataFlow with the get action added
Opens or creates a storage layer table and adds the AuditTable object to the flow with a given label.
Opens or creates a storage layer table and adds the AuditTable object to the flow with a given label.
This can then be used with the writeToStorage action.
Creates a table if it does not already exist in the storage layer and the optional metadataRetrieval
function is given.
Fails if the table does not exist in the storage layer and the optional metadataRetrieval
function is not given.
the base path of the storage layer
an optional function that generates table metadata from a table name. This function is used during table creation if a table does not exist in the storage layer or to update the metadata if updateTableMetadata is set to true
optionally prefix the output label for the AuditTable.
If set, the label of the AuditTable will be s"${labelPrefix}_$table"
whether or not to include hot partitions in the read
whether or not to update the table metadata. Uses spark.waimak.storage.updateMetadata by default (which defaults to false)
the tables we want to open in the storage layer
a new SparkDataFlow with the get action added
Load everything between two timestamps for the given tables
Load everything between two timestamps for the given tables
NB; this will not give you a snapshot of the tables at a given time, it will give you the entire history of events which have occurred between the provided dates for each table. To get a snapshot, use snapshotFromStorage
the base path of the storage layer
Optionally, the lower bound last updated timestamp (if undefined, it will read from the beginning of time)
Optionally, the upper bound last updated timestamp (if undefined, it will read up until the most recent events)
the tables to load
a new SparkDataFlow with the read actions added
Get a snapshot of tables in the storage layer for a given timestamp
Get a snapshot of tables in the storage layer for a given timestamp
the base path of the storage layer
the snapshot timestamp
whether or not to include hot partitions in the read
optionally prefix the output label for the Dataset.
If set, the label of the snapshot Dataset will be s"${outputPrefix}_$table"
the tables we want to snapshot
a new SparkDataFlow with the snapshot actions added
Writes a Dataset to the storage layer.
Writes a Dataset to the storage layer. The table must have been already opened on the flow by using either the getOrCreateAuditTable or getAuditTable actions.
the label whose Dataset we wish to write
the last updated column in the Dataset
timestamp of the append, zoned to a timezone
a lambda used to decide whether a compaction should happen after an append. Takes list of table regions, the count of records added in this batch and the compaction zoned date time. Default is not to trigger a compaction.
the prefix of the audit table entity on the flow. The AuditTable will be
found with s"${auditTableLabelPrefix}_$labelName"
a new SparkDataFlow with the write action added