object Storage extends Logging

Contains methods to create tables, open tables.

Created by Alexei Perelighin on 2018/04/11

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Storage
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  6. def createFileTable(sparkSession: SparkSession, basePath: Path, tableInfo: AuditTableInfo): Try[AuditTable]

    Creates file table with default configurations, region ids will Longs with zeros added to the left to make it 20 chars.

    Creates file table with default configurations, region ids will Longs with zeros added to the left to make it 20 chars.

    basePath

    parent folder which contains folders with table names

    tableInfo

    table metadata

    Exceptions thrown

    StorageException : Storage exceptions when: 1) primary keys are not specified; 2) the folder already exists

  7. def createFops(sparkSession: SparkSession, basePath: Path): FileStorageOps

    Creates File Operations object that is a bridge between the process actions and actual storage and handle write to temp with move to permanent operations.

    Creates File Operations object that is a bridge between the process actions and actual storage and handle write to temp with move to permanent operations.

    basePath

    parent folder which contains folders with table names, .tmp and .Trash folders will be underneath.

  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  12. def getOrCreateFileTables(sparkSession: SparkSession, basePath: Path, tableNames: Seq[String], metadataRetrieval: Option[(String) ⇒ AuditTableInfo], updateTableMetadata: ⇒ Boolean, includeHot: Boolean = true): Seq[AuditTable]

    Opens or creates a storage layer table.

    Opens or creates a storage layer table. Creates a table if it does not already exist in the storage layer and the optional metadataRetrieval function is given. Fails if the table does not exist in the storage layer and the optional metadataRetrieval function is not given.

    sparkSession

    Spark Session object

    basePath

    Base path of the storage directory

    tableNames

    the tables we want to open in the storage layer

    metadataRetrieval

    an optional function that generates table metadata from a table name. This function is used during table creation if a table does not exist in the storage layer or to update the metadata if updateTableMetadata is set to true

    updateTableMetadata

    whether or not to update the table metadata

    includeHot

    whether or not to include hot partitions in the read

  13. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  14. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  15. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  16. def logAndReturn[A](a: A, msg: String, level: Level): A
    Definition Classes
    Logging
  17. def logAndReturn[A](a: A, message: (A) ⇒ String, level: Level): A
    Definition Classes
    Logging
  18. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  19. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  20. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  21. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  22. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  23. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  24. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  25. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  26. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  27. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  28. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  29. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  30. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  31. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  32. def openFileTables(sparkSession: SparkSession, basePath: Path, tableNames: Seq[String], includeHot: Boolean = true): (Map[String, Try[AuditTable]], Seq[String])

    Opens multiple tables, reads their metadata and reads the state of the table regions (ids, count of records per region, values of region latest timestamp).

    Opens multiple tables, reads their metadata and reads the state of the table regions (ids, count of records per region, values of region latest timestamp).

    You can specify to only read the cold (compacted) state of the table. This is useful to avoid consistency issues during development on tables into which data is currently flowing. This is useful to avoid consistency issues during development on tables into which data is currently flowing.

    basePath

    parent folder which contains folders with table names

    tableNames

    list of tables to open

    includeHot

    include freshly appended data that has not been compacted yet. Useful for using production data in development.

    returns

    (Map[TABLE NAME, AuditTable], Seq[MISSING TABLES]) - audit table objects that exist and of table names that were not found under the basePath

  33. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  34. def toString(): String
    Definition Classes
    AnyRef → Any
  35. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  37. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  38. def writeToFileTable(table: AuditTable, toAppend: Dataset[_], lastUpdatedCol: String, appendDateTime: ZonedDateTime, doCompaction: CompactionDecision, recompactAll: Boolean, trashMaxAge: Duration, smallRegionRowThreshold: Long, compactionPartitioner: CompactionPartitioner): Unit

    Writes a Dataset to the storage layer.

    Writes a Dataset to the storage layer. This function allows you to set various parameters directory (e.g. recompaction all, compactor to use, trash max age, region row threshold) and is called by an identical function that takes this information from a Spark context. This specific function should be used if you want to directly set these configuration parameters in the API yourself, however use the less specific function if you wish for these parameters to be handled for you.

    table

    to append data to

    toAppend

    dataset to append to the table

    lastUpdatedCol

    the last updated column in the Dataset

    appendDateTime

    timestamp of the append, zoned to a timezone

    doCompaction

    a lambda used to decide whether a compaction should happen after an append. Takes list of table regions, the count of records added in this batch and the compaction zoned date time.

    recompactAll

    Whether to force a recompaction of all regions (expensive)

    trashMaxAge

    Maximum age of trashed region before it is deleted

    smallRegionRowThreshold

    Threshold of a region before it is no longer considered for compaction

    compactionPartitioner

    The compaction partitioner to use when performing a compaction

  39. def writeToFileTable(flowContext: FlowContext, table: AuditTable, toAppend: Dataset[_], lastUpdatedCol: String, appendDateTime: ZonedDateTime, doCompaction: CompactionDecision): Unit

    Writes a Dataset to the storage layer.

    Writes a Dataset to the storage layer. This function will use various configuration parameters given in the flowContext (e.g. recompaction all, compactor to use, trash max age, region row threshold) and call a more specific function below. The more specific function should be used if you want to directly set these configuration parameters in the API yourself.

    flowContext

    flow context object

    table

    to append data to

    toAppend

    dataset to append to the table

    lastUpdatedCol

    the last updated column in the Dataset

    appendDateTime

    timestamp of the append, zoned to a timezone

    doCompaction

    a lambda used to decide whether a compaction should happen after an append. Takes list of table regions, the count of records added in this batch and the compaction zoned date time.

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped