o

io.smartdatalake.util.misc

CompactionUtil

object CompactionUtil extends SmartDataLakeLogger

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CompactionUtil
  2. SmartDataLakeLogger
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  6. def compactHadoopStandardPartitions(dataObject: DataObject with CanHandlePartitions with CanCreateDataFrame with CanWriteDataFrame with HasHadoopStandardFilestore, partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Seq[PartitionValues]

    Compacting hadoop partitions is not supported out-of-the-box by hadoop, as files need to be read with the correct format and written again.

    Compacting hadoop partitions is not supported out-of-the-box by hadoop, as files need to be read with the correct format and written again. The following steps are used to compact partitions with Spark: 1. Check if compaction is already in progress by looking for a special file "_SDL_COMPACTING" in data objects root hadoop path. If it exists and is not older than 12h exit compaction with Exception. Otherwise create/update special file "_COMPACTION". If the file is older than 12h the compaction process is assumed to be crashed. 2. As step 5 is not atomic (delete and move are two operations), we need to check for possibly incomplete compactions of previous crashed runs and fix them. Incomplete compactions are marked with a special file "_SDL_MOVING" in the temporary path. Incomplete compacted partitions must be moved from temporary path to hadoop path (see step 5) and marked as compacted (see step 6). 3. Filter already compacted partitions from given partitions by looking for "_SDL_COMPACTED" file, see step 5 4. Data from partitions to be compacted is rewritten into a temporary path under this data objects hadoop path. 5. Partitions to be compacted are deleted from the hadoop path and moved from the temporary path to the hadoop path. This should be done one-by-one to reduce risk of data loss. To recover in case of unexpected abort between delete and move, a special file "_SDL_MOVING" is created in temporary path before deleting hadoop path. After moving the temporary path, this file is deleted again. Mark compacted partitions by creating a special file "_SDL_COMPACTED" and 6. Delete "_SDL_COMPACTING" file created in step 1.

    Annotations
    @Scaladoc()
  7. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  9. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  10. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. lazy val logger: Logger
    Attributes
    protected
    Definition Classes
    SmartDataLakeLogger
    Annotations
    @transient()
  13. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  14. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  15. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  16. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  17. def toString(): String
    Definition Classes
    AnyRef → Any
  18. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  19. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  20. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from SmartDataLakeLogger

Inherited from AnyRef

Inherited from Any

Ungrouped