object CompactionUtil extends SmartDataLakeLogger
- Alphabetic
- By Inheritance
- CompactionUtil
- SmartDataLakeLogger
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
compactHadoopStandardPartitions(dataObject: DataObject with CanHandlePartitions with CanCreateDataFrame with CanWriteDataFrame with HasHadoopStandardFilestore, partitionValues: Seq[PartitionValues])(implicit context: ActionPipelineContext): Seq[PartitionValues]
Compacting hadoop partitions is not supported out-of-the-box by hadoop, as files need to be read with the correct format and written again.
Compacting hadoop partitions is not supported out-of-the-box by hadoop, as files need to be read with the correct format and written again. The following steps are used to compact partitions with Spark: 1. Check if compaction is already in progress by looking for a special file "_SDL_COMPACTING" in data objects root hadoop path. If it exists and is not older than 12h exit compaction with Exception. Otherwise create/update special file "_COMPACTION". If the file is older than 12h the compaction process is assumed to be crashed. 2. As step 5 is not atomic (delete and move are two operations), we need to check for possibly incomplete compactions of previous crashed runs and fix them. Incomplete compactions are marked with a special file "_SDL_MOVING" in the temporary path. Incomplete compacted partitions must be moved from temporary path to hadoop path (see step 5) and marked as compacted (see step 6). 3. Filter already compacted partitions from given partitions by looking for "_SDL_COMPACTED" file, see step 5 4. Data from partitions to be compacted is rewritten into a temporary path under this data objects hadoop path. 5. Partitions to be compacted are deleted from the hadoop path and moved from the temporary path to the hadoop path. This should be done one-by-one to reduce risk of data loss. To recover in case of unexpected abort between delete and move, a special file "_SDL_MOVING" is created in temporary path before deleting hadoop path. After moving the temporary path, this file is deleted again. Mark compacted partitions by creating a special file "_SDL_COMPACTED" and 6. Delete "_SDL_COMPACTING" file created in step 1.
- Annotations
- @Scaladoc()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
lazy val
logger: Logger
- Attributes
- protected
- Definition Classes
- SmartDataLakeLogger
- Annotations
- @transient()
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated