Package

tech.sourced

engine

Permalink

package engine

Provides the tech.sourced.engine.Engine class, which is the main entry point of all the analysis you might do using this library as well as some implicits to make it easier to use. In particular, it adds some methods to be able to join with other "tables" directly from any org.apache.spark.sql.DataFrame.

import tech.sourced.engine._

val engine = Engine(sparkSession, "/path/to/repositories")

If you don't want to import everything in the engine, even though it only exposes what's truly needed to not pollute the user namespace, you can do it by just importing the tech.sourced.engine.Engine class and the tech.sourced.engine.EngineDataFrame implicit class.

import tech.sourced.engine.{Engine, EngineDataFrame}

val engine = Engine(sparkSession, "/path/to/repositories")
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. engine
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. class DefaultSource extends RelationProvider with DataSourceRegister

    Permalink

    Default source to provide new git relations.

  2. class Engine extends Logging

    Permalink

    Engine is the main entry point to all usage of the source{d} spark-engine.

    Engine is the main entry point to all usage of the source{d} spark-engine. It has methods to configure all possible configurable options as well as the available methods to start analysing repositories of code.

    import tech.sourced.engine._
    
    val engine = Engine(sparkSession, "/path/to/repositories")

    NOTE: Keep in mind that you will need to register the UDFs in the session manually if you choose to instantiate this class directly instead of using the companion object.

    import tech.sourced.engine.{Engine, SessionFunctions}
    
    engine = new Engine(sparkSession)
    sparkSession.registerUDFs()

    The only method available as of now is getRepositories, which will generate a DataFrame of repositories, which is the very first thing you need to analyse repositories of code.

  3. implicit class EngineDataFrame extends AnyRef

    Permalink

    Adds some utility methods to the org.apache.spark.sql.DataFrame class so you can, for example, get the references, commits, etc from a data frame containing repositories.

  4. case class GitRelation(session: SparkSession, schema: StructType, joinConditions: Option[Expression] = None, tableSource: Option[String] = None) extends BaseRelation with CatalystScan with Product with Serializable

    Permalink

    A relation based on git data from rooted repositories in siva files.

    A relation based on git data from rooted repositories in siva files. The data this relation will offer depends on the given tableSource, which controls the table that will be accessed. Also, the tech.sourced.engine.rule.GitOptimizer might merge some table sources into one by squashing joins, so the result will be the resultant table chained with the previous one using chained iterators.

    session

    Spark session

    schema

    schema of the relation

    joinConditions

    join conditions, if any

    tableSource

    source table if any

  5. case class Join(left: String, right: String, conditions: Seq[JoinCondition]) extends Product with Serializable

    Permalink
  6. case class JoinCondition(leftTable: String, leftCol: String, rightTable: String, rightCol: String) extends Product with Serializable

    Permalink
  7. case class MetadataRelation(session: SparkSession, schema: StructType, dbPath: String, joinConditions: Option[Expression] = None, tableSource: Option[String] = None) extends BaseRelation with CatalystScan with Product with Serializable

    Permalink
  8. class MetadataSource extends RelationProvider with DataSourceRegister

    Permalink

    Data source to provide new metadata relations.

  9. implicit class SessionFunctions extends AnyRef

    Permalink

    Implicit class that adds some functions to the org.apache.spark.sql.SparkSession.

Value Members

  1. val BlobsTable: String

    Permalink
  2. val CommitsTable: String

    Permalink
  3. object DefaultSource

    Permalink

    Just contains some useful constants for the DefaultSource class to use.

  4. val DefaultSourceName: String

    Permalink
  5. object Engine

    Permalink

    Factory for tech.sourced.engine.Engine instances.

  6. object MetadataSource

    Permalink
  7. val MetadataSourceName: String

    Permalink
  8. val ReferencesTable: String

    Permalink
  9. val RepositoriesTable: String

    Permalink
  10. val RepositoryHasCommitsTable: String

    Permalink
  11. object Sources

    Permalink

    Defines the hierarchy between data sources.

  12. object Tables

    Permalink
  13. val TreeEntriesTable: String

    Permalink
  14. package iterator

    Permalink
  15. def parseUASTNode(data: Array[Byte]): Node

    Permalink

    Create a new Node from a binary-encoded node as a byte array.

    Create a new Node from a binary-encoded node as a byte array.

    data

    binary-encoded node as byte array

    returns

    parsed Node

  16. package provider

    Permalink
  17. package rule

    Permalink
  18. package udf

    Permalink
  19. package util

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped