package app

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class DefaultSmartDataLakeBuilder extends SmartDataLakeBuilder

    Default Smart Data Lake Command Line Application.

    Default Smart Data Lake Command Line Application.

    Implementation Note: This must be a class and not an object in order to be found by reflection in DatabricksSmartDataLakeBuilder

    Annotations
    @Scaladoc()
  2. case class GlobalConfig(kryoClasses: Option[Seq[String]] = None, sparkOptions: Option[Map[String, String]] = None, statusInfo: Option[StatusInfoConfig] = None, enableHive: Boolean = true, memoryLogTimer: Option[MemoryLogTimerConfig] = None, shutdownHookLogger: Boolean = false, stateListeners: Seq[StateListenerConfig] = Seq(), sparkUDFs: Option[Map[String, SparkUDFCreatorConfig]] = None, pythonUDFs: Option[Map[String, PythonUDFCreatorConfig]] = None, secretProviders: Option[Map[String, SecretProviderConfig]] = None, allowOverwriteAllPartitionsWithoutPartitionValues: Seq[DataObjectId] = Seq(), synchronousStreamingTriggerIntervalSec: Int = 60) extends SmartDataLakeLogger with Product with Serializable

    Global configuration options

    Global configuration options

    Note that global configuration is responsible to hold SparkSession, so that its created once and only once per SDLB job. This is especially important if JVM is shared between different SDL jobs (e.g. Databricks cluster), because sharing SparkSession in object Environment survives the current SDLB job.

    kryoClasses

    classes to register for spark kryo serialization

    sparkOptions

    spark options

    statusInfo

    enable a REST API providing live status info, see detailed configuration StatusInfoConfig

    enableHive

    enable hive for spark session

    memoryLogTimer

    enable periodic memory usage logging, see detailed configuration MemoryLogTimerConfig

    shutdownHookLogger

    enable shutdown hook logger to trace shutdown cause

    stateListeners

    Define state listeners to be registered for receiving events of the execution of SmartDataLake job

    sparkUDFs

    Define UDFs to be registered in spark session. The registered UDFs are available in Spark SQL transformations and expression evaluation, e.g. configuration of ExecutionModes.

    pythonUDFs

    Define UDFs in python to be registered in spark session. The registered UDFs are available in Spark SQL transformations but not for expression evaluation.

    secretProviders

    Define SecretProvider's to be registered.

    allowOverwriteAllPartitionsWithoutPartitionValues

    Configure a list of exceptions for partitioned DataObject id's, which are allowed to overwrite the all partitions of a table if no partition values are set. This is used to override/avoid a protective error when using SDLSaveMode.OverwriteOptimized|OverwritePreserveDirectories. Define it as a list of DataObject id's.

    synchronousStreamingTriggerIntervalSec

    Trigger interval for synchronous actions in streaming mode in seconds (default = 60 seconds) The synchronous actions of the DAG will be executed with this interval if possile. Note that for asynchronous actions there are separate settings, e.g. SparkStreamingMode.triggerInterval.

    Annotations
    @Scaladoc()
  3. case class MemoryLogTimerConfig(intervalSec: Int, logLinuxMem: Boolean = true, logLinuxCGroupMem: Boolean = false, logBuffers: Boolean = false) extends Product with Serializable

    Configuration for periodic memory usage logging

    Configuration for periodic memory usage logging

    intervalSec

    interval in seconds between memory usage logs

    logLinuxMem

    enable logging linux memory

    logLinuxCGroupMem

    enable logging details about linux cgroup memory

    logBuffers

    enable logging details about different jvm buffers

    Annotations
    @Scaladoc()
  4. trait ModulePlugin extends AnyRef

    Hooks for modules to interact with sdl-core

    Hooks for modules to interact with sdl-core

    Annotations
    @Scaladoc()
  5. trait SDLPlugin extends AnyRef

    SDL Plugin defines an interface to execute custom code on SDL startup and shutdown.

    SDL Plugin defines an interface to execute custom code on SDL startup and shutdown. Configure it by setting a java system property "sdl.pluginClassName" to a class name implementing SDLPlugin interface. The class needs to have a constructor without any parameters.

    Annotations
    @Scaladoc() @DeveloperApi()
  6. abstract class SmartDataLakeBuilder extends SmartDataLakeLogger

    Abstract Smart Data Lake Command Line Application.

    Abstract Smart Data Lake Command Line Application.

    Annotations
    @Scaladoc()
  7. case class SmartDataLakeBuilderConfig(feedSel: String = null, applicationName: Option[String] = None, configuration: Option[Seq[String]] = None, master: Option[String] = None, deployMode: Option[String] = None, username: Option[String] = None, kerberosDomain: Option[String] = None, keytabPath: Option[File] = None, partitionValues: Option[Seq[PartitionValues]] = None, multiPartitionValues: Option[Seq[PartitionValues]] = None, parallelism: Int = 1, statePath: Option[String] = None, overrideJars: Option[Seq[String]] = None, test: Option[TestMode.Value] = None, streaming: Boolean = false) extends Product with Serializable

    This case class represents a default configuration for the App.

    This case class represents a default configuration for the App. It is populated by parsing command-line arguments. It also specifies default values.

    feedSel

    Expressions to select the actions to execute. See AppUtil.filterActionList() or commandline help for syntax description.

    applicationName

    Application name.

    configuration

    One or multiple configuration files or directories containing configuration files, separated by comma.

    master

    The Spark master URL passed to SparkContext when in local mode.

    deployMode

    The Spark deploy mode passed to SparkContext when in local mode.

    username

    Kerberos user name (username@kerberosDomain) for local mode.

    kerberosDomain

    Kerberos domain (username@kerberosDomain) for local mode.

    keytabPath

    Path to Kerberos keytab file for local mode.

    test

    Run in test mode:

    • "config": validate configuration
    • "dry-run": execute "prepare" and "init" phase to check environment
    Annotations
    @Scaladoc()
  8. trait StateListener extends AnyRef

    Interface to notify interested parties about action results & metric

    Interface to notify interested parties about action results & metric

    Annotations
    @Scaladoc()
  9. case class StateListenerConfig(className: String, options: Option[Map[String, String]] = None) extends Product with Serializable

    Configuration to notify interested parties about action results & metric

    Configuration to notify interested parties about action results & metric

    className

    fully qualified class name of class implementing StateListener interface. The class needs a constructor with one parameter options: Map[String,String].

    options

    Options are passed to StateListener constructor.

    Annotations
    @Scaladoc()
  10. case class StatusInfoConfig(port: Int = 4440, maxPortRetries: Int = 10, stopOnEnd: Boolean = true) extends Product with Serializable

    Configuration for the Server that provides live status info of the current DAG Execution

    Configuration for the Server that provides live status info of the current DAG Execution

    port

    : port with which the first connection attempt is made

    maxPortRetries

    : If port is already in use, we will increment port by one and try with that new port. maxPortRetries describes how many times this should be attempted. If set to 0 it will not be attempted. Values below 0 are not allowed.

    stopOnEnd

    : Set to false if the Server should remain online even after SDL has finished its execution. In that case, the Application needs to be stopped manually. Useful for debugging.

    Annotations
    @Scaladoc()

Value Members

  1. object DatabricksSmartDataLakeBuilder extends SmartDataLakeBuilder

    Databricks Smart Data Lake Command Line Application.

    Databricks Smart Data Lake Command Line Application.

    As there is an old version of config-*.jar deployed on Databricks, this special App uses a ChildFirstClassLoader to override it in the classpath.

  2. object DefaultSmartDataLakeBuilder
  3. object GlobalConfig extends ConfigImplicits with Serializable
  4. object LocalSmartDataLakeBuilder extends SmartDataLakeBuilder

    Smart Data Lake Builder application for local mode.

    Smart Data Lake Builder application for local mode.

    Sets master to local[*] and deployMode to client by default.

  5. object ModulePlugin
  6. object TestMode extends Enumeration

Ungrouped