Registers in the current session the views of the MetadataSource so the data is obtained from the metadata database instead of reading the repositories with the DefaultSource.
Registers in the current session the views of the MetadataSource so the data is obtained from the metadata database instead of reading the repositories with the DefaultSource.
path to the folder that contains the database.
name of the database file (engine_metadata.db) by default.
the same instance of the engine
Registers in the current session the views of the DefaultSource so the data is obtained by reading the repositories instead of reading from the MetadataSource.
Registers in the current session the views of the DefaultSource so the data is obtained by reading the repositories instead of reading from the MetadataSource. This has no effect if Engine#fromMetadata has not been called before.
the same instance of the engine
Retrieves the blobs of a list of repositories, reference names and commit hashes.
Retrieves the blobs of a list of repositories, reference names and commit hashes. So the result will be a org.apache.spark.sql.DataFrame of all the blobs in the given commits that are in the given references that belong to the given repositories.
val blobsDf = engine.getBlobs(repoIds, refNames, hashes)Calling this function with no arguments is the same as:
engine.getRepositories.getReferences.getCommits.getTreeEntries.getBlobs
List of the repository ids to filter by (optional)
List of reference names to filter by (optional)
List of commit hashes to filter by (optional)
org.apache.spark.sql.DataFrame with blobs of the given commits, refs and repos.
Returns a DataFrame with the data about the repositories found at the specified repositories path in the form of siva files.
Returns a DataFrame with the data about the repositories found at the specified repositories path in the form of siva files. To call this method you need to have set before the repositories path, you can do so by calling setRepositoriesPath or, preferably, instantiating the Engine using the companion object.
val reposDf = engine.getRepositoriesDataFrame
Saves all the metadata in a SQLite database on the given path as "engine_metadata.db".
Saves all the metadata in a SQLite database on the given path as "engine_metadata.db". If the database already exists, it will be overwritten. The given path must exist and must be a directory, otherwise it will throw a SparkException. Saved tables are repositories, references, commits and tree_entries. Blobs are not saved.
where database with the metadata will be stored.
name of the database file
SparkException when the given path is not a folder or does not exist.
Spark session to be used
Sets the format of the stored repositories on the specified path.
Sets the format of the stored repositories on the specified path.
Actual compatible formats are:
- siva: to read siva files - bare: to read bare repositories - standard: to read standard git repositories (with workspace)
of the repositories.
instance of the engine itself
Sets the path where the siva files of the repositories are stored.
Sets the path where the siva files of the repositories are stored. Although this can actually be called the proper way to use Engine is to instantiate it using the Engine companion object, which already asks for the path in its apply method. If you already instantiated the API instance using the Engine companion object you don't need to call this unless you want to change the repositories path. Note that setting this will affect the session, so any other uses of the session outside the Engine instance will also have that config set.
engine.setRepositoriesPath("/path/to/repositories")of the repositories.
instance of the engine itself
Configures the Engine so it won't cleanup the unpacked siva files after it's done with them to avoid having to unpack them afterwards.
Configures the Engine so it won't cleanup the unpacked siva files after it's done with them to avoid having to unpack them afterwards.
// disable cleanup engine.skipCleanup(true) // enable cleanup again engine.skipCleanup(false)
whether to skip cleanup or not
instance of the engine itself
Engine is the main entry point to all usage of the source{d} spark-engine. It has methods to configure all possible configurable options as well as the available methods to start analysing repositories of code.
NOTE: Keep in mind that you will need to register the UDFs in the session manually if you choose to instantiate this class directly instead of using the companion object.
The only method available as of now is getRepositories, which will generate a DataFrame of repositories, which is the very first thing you need to analyse repositories of code.