c

io.smartdatalake.workflow.dataobject

PartitionArchiveCompactionMode

case class PartitionArchiveCompactionMode(archivePartitionExpression: Option[String] = None, compactPartitionExpression: Option[String] = None, description: Option[String] = None) extends HousekeepingMode with SmartDataLakeLogger with Product with Serializable

Archive and compact old partitions: Archive partition reduces the number of partitions in the past by moving older partitions into special "archive partitions". Compact partition reduces the number of files in a partition by rewriting them with Spark. Example: archive and compact a table with partition layout run_id=<integer>

  • archive partitions after 1000 partitions into "archive partition" equal to floor(run_id/1000)
  • compact "archive partition" when full
housekeepingMode = {
  type = PartitionArchiveCompactionMode
  archivePartitionExpression = "if( elements['run_id'] < runId - 1000, map('run_id', elements['run_id'] div 1000), elements)"
  compactPartitionExpression = "elements['run_id'] % 1000 = 0 and elements['run_id'] <= runId - 2000"
}
archivePartitionExpression

Expression to define the archive partition for a given partition. Define a spark sql expression working with the attributes of PartitionExpressionData returning archive partition values as Map[String,String]. If return value is the same as input elements, partition is not touched, otherwise all files of the partition are moved to the returned partition definition. Be aware that the value of the partition columns changes for these files/records.

compactPartitionExpression

Expression to define partitions which should be compacted. Define a spark sql expression working with the attributes of PartitionExpressionData returning a boolean = true when this partition should be compacted. Once a partition is compacted, it is marked as compacted and will not be compacted again. It is therefore ok to return true for all partitions which should be compacted, regardless if they have been compacted already.

Annotations
@Scaladoc()
Linear Supertypes
Serializable, Serializable, Product, Equals, SmartDataLakeLogger, HousekeepingMode, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. PartitionArchiveCompactionMode
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. SmartDataLakeLogger
  7. HousekeepingMode
  8. AnyRef
  9. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new PartitionArchiveCompactionMode(archivePartitionExpression: Option[String] = None, compactPartitionExpression: Option[String] = None, description: Option[String] = None)

    archivePartitionExpression

    Expression to define the archive partition for a given partition. Define a spark sql expression working with the attributes of PartitionExpressionData returning archive partition values as Map[String,String]. If return value is the same as input elements, partition is not touched, otherwise all files of the partition are moved to the returned partition definition. Be aware that the value of the partition columns changes for these files/records.

    compactPartitionExpression

    Expression to define partitions which should be compacted. Define a spark sql expression working with the attributes of PartitionExpressionData returning a boolean = true when this partition should be compacted. Once a partition is compacted, it is marked as compacted and will not be compacted again. It is therefore ok to return true for all partitions which should be compacted, regardless if they have been compacted already.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. val archivePartitionExpression: Option[String]
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  7. val compactPartitionExpression: Option[String]
  8. val description: Option[String]
  9. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  10. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. lazy val logger: Logger
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
    Annotations
    @transient()
  13. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  15. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  16. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  17. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  18. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  19. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from SmartDataLakeLogger

Inherited from HousekeepingMode

Inherited from AnyRef

Inherited from Any

Ungrouped