Class/Object

tech.sourced.engine

Engine

Related Docs: object Engine | package engine

Permalink

class Engine extends Logging

Engine is the main entry point to all usage of the source{d} spark-engine. It has methods to configure all possible configurable options as well as the available methods to start analysing repositories of code.

import tech.sourced.engine._

val engine = Engine(sparkSession, "/path/to/repositories")

NOTE: Keep in mind that you will need to register the UDFs in the session manually if you choose to instantiate this class directly instead of using the companion object.

import tech.sourced.engine.{Engine, SessionFunctions}

engine = new Engine(sparkSession)
sparkSession.registerUDFs()

The only method available as of now is getRepositories, which will generate a DataFrame of repositories, which is the very first thing you need to analyse repositories of code.

Linear Supertypes
Logging, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Engine
  2. Logging
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Engine(session: SparkSession, repositoriesPath: String, repositoriesFormat: String)

    Permalink

    creates a Engine instance with the given Spark session.

    creates a Engine instance with the given Spark session.

    session

    Spark session to be used

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  7. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. def fromMetadata(dbPath: String, dbName: String = MetadataSource.DefaultDbName): Engine

    Permalink

    Registers in the current session the views of the MetadataSource so the data is obtained from the metadata database instead of reading the repositories with the DefaultSource.

    Registers in the current session the views of the MetadataSource so the data is obtained from the metadata database instead of reading the repositories with the DefaultSource.

    dbPath

    path to the folder that contains the database.

    dbName

    name of the database file (engine_metadata.db) by default.

    returns

    the same instance of the engine

  10. def fromRepositories(): Engine

    Permalink

    Registers in the current session the views of the DefaultSource so the data is obtained by reading the repositories instead of reading from the MetadataSource.

    Registers in the current session the views of the DefaultSource so the data is obtained by reading the repositories instead of reading from the MetadataSource. This has no effect if Engine#fromMetadata has not been called before.

    returns

    the same instance of the engine

  11. def getBlobs(repositoryIds: Seq[String] = Seq(), referenceNames: Seq[String] = Seq(), commitHashes: Seq[String] = Seq()): DataFrame

    Permalink

    Retrieves the blobs of a list of repositories, reference names and commit hashes.

    Retrieves the blobs of a list of repositories, reference names and commit hashes. So the result will be a org.apache.spark.sql.DataFrame of all the blobs in the given commits that are in the given references that belong to the given repositories.

    val blobsDf = engine.getBlobs(repoIds, refNames, hashes)

    Calling this function with no arguments is the same as:

    engine.getRepositories.getReferences.getCommits.getTreeEntries.getBlobs
    repositoryIds

    List of the repository ids to filter by (optional)

    referenceNames

    List of reference names to filter by (optional)

    commitHashes

    List of commit hashes to filter by (optional)

    returns

    org.apache.spark.sql.DataFrame with blobs of the given commits, refs and repos.

  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def getRepositories: DataFrame

    Permalink

    Returns a DataFrame with the data about the repositories found at the specified repositories path in the form of siva files.

    Returns a DataFrame with the data about the repositories found at the specified repositories path in the form of siva files. To call this method you need to have set before the repositories path, you can do so by calling setRepositoriesPath or, preferably, instantiating the Engine using the companion object.

    val reposDf = engine.getRepositories
    returns

    DataFrame

  14. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  15. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  16. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  17. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  18. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  19. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  20. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  21. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  22. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  23. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  24. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  25. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  26. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  27. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  28. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  29. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  30. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  31. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  32. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  33. def saveMetadata(path: String, dbName: String = MetadataSource.DefaultDbName): Unit

    Permalink

    Saves all the metadata in a SQLite database on the given path as "engine_metadata.db".

    Saves all the metadata in a SQLite database on the given path as "engine_metadata.db". If the database already exists, it will be overwritten. The given path must exist and must be a directory, otherwise it will throw a SparkException. Saved tables are repositories, references, commits and tree_entries. Blobs are not saved.

    path

    where database with the metadata will be stored.

    dbName

    name of the database file

    Exceptions thrown

    SparkException when the given path is not a folder or does not exist.

  34. val session: SparkSession

    Permalink

    Spark session to be used

  35. def setRepositoriesFormat(format: String): Engine

    Permalink

    Sets the format of the stored repositories on the specified path.

    Sets the format of the stored repositories on the specified path.

    Actual compatible formats are:

    - siva: to read siva files - bare: to read bare repositories - standard: to read standard git repositories (with workspace)

    format

    of the repositories.

    returns

    instance of the engine itself

  36. def setRepositoriesPath(path: String): Engine

    Permalink

    Sets the path where the siva files of the repositories are stored.

    Sets the path where the siva files of the repositories are stored. Although this can actually be called the proper way to use Engine is to instantiate it using the Engine companion object, which already asks for the path in its apply method. If you already instantiated the API instance using the Engine companion object you don't need to call this unless you want to change the repositories path. Note that setting this will affect the session, so any other uses of the session outside the Engine instance will also have that config set.

    engine.setRepositoriesPath("/path/to/repositories")
    path

    of the repositories.

    returns

    instance of the engine itself

  37. def skipCleanup(skip: Boolean): Engine

    Permalink

    Configures the Engine so it won't cleanup the unpacked siva files after it's done with them to avoid having to unpack them afterwards.

    Configures the Engine so it won't cleanup the unpacked siva files after it's done with them to avoid having to unpack them afterwards.

    // disable cleanup
    engine.skipCleanup(true)
    
    // enable cleanup again
    engine.skipCleanup(false)
    skip

    whether to skip cleanup or not

    returns

    instance of the engine itself

  38. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  39. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  40. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  41. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  42. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Logging

Inherited from AnyRef

Inherited from Any

Ungrouped