Packages

o

io.projectglow

functions

object functions

Functions provided by Glow. These functions can be used with Spark's DataFrame API.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. functions
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def add_struct_fields(struct: Column, fields: Column*): Column

    Adds fields to a struct.

    Adds fields to a struct.

    struct

    The struct to which fields will be added

    fields

    The new fields to add. The arguments must alternate between string-typed literal field names and field values.

    returns

    A struct consisting of the input struct and the added fields

    Since

    0.3.0

  5. def aggregate_by_index(arr: Column, initialValue: Column, update: (Column, Column) ⇒ Column, merge: (Column, Column) ⇒ Column): Column
  6. def aggregate_by_index(arr: Column, initialValue: Column, update: (Column, Column) ⇒ Column, merge: (Column, Column) ⇒ Column, evaluate: (Column) ⇒ Column): Column

    Computes custom per-sample aggregates.

    Computes custom per-sample aggregates.

    arr

    array of values.

    initialValue

    the initial value

    update

    update function

    merge

    merge function

    evaluate

    evaluate function

    returns

    An array of aggregated values. The number of elements in the array is equal to the number of samples.

    Since

    0.3.0

  7. def array_summary_stats(arr: Column): Column

    Computes the minimum, maximum, mean, standard deviation for an array of numerics.

    Computes the minimum, maximum, mean, standard deviation for an array of numerics.

    arr

    An array of any numeric type

    returns

    A struct containing double mean, stdDev, min, and max fields

    Since

    0.3.0

  8. def array_to_dense_vector(arr: Column): Column

    Converts an array of numerics into a spark.ml DenseVector.

    Converts an array of numerics into a spark.ml DenseVector.

    arr

    The array of numerics

    returns

    A spark.ml DenseVector

    Since

    0.3.0

  9. def array_to_sparse_vector(arr: Column): Column

    Converts an array of numerics into a spark.ml SparseVector.

    Converts an array of numerics into a spark.ml SparseVector.

    arr

    The array of numerics

    returns

    A spark.ml SparseVector

    Since

    0.3.0

  10. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  11. def call_summary_stats(genotypes: Column): Column

    Computes call summary statistics for an array of genotype structs.

    Computes call summary statistics for an array of genotype structs. See :ref:variant-qc for more details.

    genotypes

    The array of genotype structs with calls field

    returns

    A struct containing callRate, nCalled, nUncalled, nHet, nHomozygous, nNonRef, nAllelesCalled, alleleCounts, alleleFrequencies fields. See :ref:variant-qc.

    Since

    0.3.0

  12. def clone(): AnyRef
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )
  13. def dp_summary_stats(genotypes: Column): Column

    Computes summary statistics for the depth field from an array of genotype structs.

    Computes summary statistics for the depth field from an array of genotype structs. See :ref:variant-qc.

    genotypes

    An array of genotype structs with depth field

    returns

    A struct containing mean, stdDev, min, and max of genotype depths

    Since

    0.3.0

  14. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  16. def expand_struct(struct: Column): Column

    Promotes fields of a nested struct to top-level columns similar to using struct.* from SQL, but can be used in more contexts.

    Promotes fields of a nested struct to top-level columns similar to using struct.* from SQL, but can be used in more contexts.

    struct

    The struct to expand

    returns

    Columns corresponding to fields of the input struct

    Since

    0.3.0

  17. def explode_matrix(matrix: Column): Column

    Explodes a spark.ml Matrix (sparse or dense) into multiple arrays, one per row of the matrix.

    Explodes a spark.ml Matrix (sparse or dense) into multiple arrays, one per row of the matrix.

    matrix

    The sparl.ml Matrix to explode

    returns

    An array column in which each row is a row of the input matrix

    Since

    0.3.0

  18. def finalize(): Unit
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. def genotype_states(genotypes: Column): Column

    Gets the number of alternate alleles for an array of genotype structs.

    Gets the number of alternate alleles for an array of genotype structs. Returns -1 if there are any -1 s (no-calls) in the calls array.

    genotypes

    An array of genotype structs with calls field

    returns

    An array of integers containing the number of alternate alleles in each call array

    Since

    0.3.0

  20. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  21. def gq_summary_stats(genotypes: Column): Column

    Computes summary statistics about the genotype quality field for an array of genotype structs.

    Computes summary statistics about the genotype quality field for an array of genotype structs. See :ref:variant-qc.

    genotypes

    The array of genotype structs with conditionalQuality field

    returns

    A struct containing mean, stdDev, min, and max of genotype qualities

    Since

    0.3.0

  22. def hard_calls(probabilities: Column, numAlts: Column, phased: Column): Column
  23. def hard_calls(probabilities: Column, numAlts: Column, phased: Column, threshold: Double): Column

    Converts an array of probabilities to hard calls.

    Converts an array of probabilities to hard calls. The probabilities are assumed to be diploid. See :ref:variant-data-transformations for more details.

    probabilities

    The array of probabilities to convert

    numAlts

    The number of alternate alleles

    phased

    Whether the probabilities are phased. If phased, we expect one 2 * numAlts values in the probabilities array. If unphased, we expect one probability per possible genotype.

    threshold

    The minimum probability to make a call. If no probability falls into the range of [0, 1 - threshold] or [threshold, 1], a no-call (represented by -1 s) will be emitted. If not provided, this parameter defaults to 0.9.

    returns

    An array of hard calls

    Since

    0.3.0

  24. def hardy_weinberg(genotypes: Column): Column

    Computes statistics relating to the Hardy Weinberg equilibrium.

    Computes statistics relating to the Hardy Weinberg equilibrium. See :ref:variant-qc for more details.

    genotypes

    The array of genotype structs with calls field

    returns

    A struct containing two fields, hetFreqHwe (the expected heterozygous frequency according to Hardy-Weinberg equilibrium) and pValueHwe (the associated p-value)

    Since

    0.3.0

  25. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  26. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  27. def lift_over_coordinates(contigName: Column, start: Column, end: Column, chainFile: String): Column
  28. def lift_over_coordinates(contigName: Column, start: Column, end: Column, chainFile: String, minMatchRatio: Double): Column

    Performs liftover for the coordinates of a variant.

    Performs liftover for the coordinates of a variant. To perform liftover of alleles and add additional metadata, see :ref:liftover.

    contigName

    The current contig name

    start

    The current start

    end

    The current end

    chainFile

    Location of the chain file on each node in the cluster

    minMatchRatio

    Minimum fraction of bases that must remap to do liftover successfully. If not provided, defaults to 0.95.

    returns

    A struct containing contigName, start, and end fields after liftover

    Since

    0.3.0

  29. def linear_regression_gwas(genotypes: Column, phenotypes: Column, covariates: Column): Column

    Performs a linear regression association test optimized for performance in a GWAS setting.

    Performs a linear regression association test optimized for performance in a GWAS setting. See :ref:linear-regression for details.

    genotypes

    A numeric array of genotypes

    phenotypes

    A numeric array of phenotypes

    covariates

    A spark.ml Matrix of covariates

    returns

    A struct containing beta, standardError, and pValue fields. See :ref:linear-regression.

    Since

    0.3.0

  30. def logistic_regression_gwas(genotypes: Column, phenotypes: Column, covariates: Column, test: String): Column

    Performs a logistic regression association test optimized for performance in a GWAS setting.

    Performs a logistic regression association test optimized for performance in a GWAS setting. See :ref:logistic-regression for more details.

    genotypes

    An numeric array of genotypes

    phenotypes

    A double array of phenotype values

    covariates

    A spark.ml Matrix of covariates

    test

    Which logistic regression test to use. Can be LRT or Firth

    returns

    A struct containing beta, oddsRatio, waldConfidenceInterval, and pValue fields. See :ref:logistic-regression.

    Since

    0.3.0

  31. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  32. def normalize_variant(contigName: Column, start: Column, end: Column, refAllele: Column, altAlleles: Column, refGenomePathString: String): Column

    Normalizes the variant with a behavior similar to vt normalize or bcftools norm.

    Normalizes the variant with a behavior similar to vt normalize or bcftools norm. Creates a StructType column including the normalized start, end, referenceAllele and alternateAlleles fields (whether they are changed or unchanged as the result of normalization) as well as a StructType field called normalizationStatus that contains the following fields:

    changed: A boolean field indicating whether the variant data was changed as a result of normalization

    errorMessage: An error message in case the attempt at normalizing the row hit an error. In this case, the changed field will be set to false. If no errors occur, this field will be null.

    In case of an error, the start, end, referenceAllele and alternateAlleles fields in the generated struct will be null.

    contigName

    The current contig name

    start

    The current start

    end

    The current end

    refAllele

    The current reference allele

    altAlleles

    The current array of alternate alleles

    refGenomePathString

    A path to the reference genome .fasta file. The .fasta file must be accompanied with a .fai index file in the same folder.

    returns

    A struct as explained above

    Since

    0.3.0

  33. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  34. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  35. def sample_call_summary_stats(genotypes: Column, refAllele: Column, alternateAlleles: Column): Column

    Computes per-sample call summary statistics.

    Computes per-sample call summary statistics. See :ref:sample-qc for more details.

    genotypes

    An array of genotype structs with calls field

    refAllele

    The reference allele

    alternateAlleles

    An array of alternate alleles

    returns

    A struct containing sampleId, callRate, nCalled, nUncalled, nHomRef, nHet, nHomVar, nSnp, nInsertion, nDeletion, nTransition, nTransversion, nSpanningDeletion, rTiTv, rInsertionDeletion, rHetHomVar fields. See :ref:sample-qc.

    Since

    0.3.0

  36. def sample_dp_summary_stats(genotypes: Column): Column

    Computes per-sample summary statistics about the depth field in an array of genotype structs.

    Computes per-sample summary statistics about the depth field in an array of genotype structs.

    genotypes

    An array of genotype structs with depth field

    returns

    An array of structs where each struct contains mean, stDev, min, and max of the genotype depths for a sample. If sampleId is present in a genotype, it will be propagated to the resulting struct as an extra field.

    Since

    0.3.0

  37. def sample_gq_summary_stats(genotypes: Column): Column

    Computes per-sample summary statistics about the genotype quality field in an array of genotype structs.

    Computes per-sample summary statistics about the genotype quality field in an array of genotype structs.

    genotypes

    An array of genotype structs with conditionalQuality field

    returns

    An array of structs where each struct contains mean, stDev, min, and max of the genotype qualities for a sample. If sampleId is present in a genotype, it will be propagated to the resulting struct as an extra field.

    Since

    0.3.0

  38. def subset_struct(struct: Column, fields: String*): Column

    Selects fields from a struct.

    Selects fields from a struct.

    struct

    Struct from which to select fields

    fields

    Fields to select

    returns

    A struct containing only the indicated fields

    Since

    0.3.0

  39. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  40. def toString(): String
    Definition Classes
    AnyRef → Any
  41. def vector_to_array(vector: Column): Column

    Converts a spark.ml Vector (sparse or dense) to an array of doubles.

    Converts a spark.ml Vector (sparse or dense) to an array of doubles.

    vector

    Vector to convert

    returns

    An array of doubles

    Since

    0.3.0

  42. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  43. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @throws( ... )

Inherited from AnyRef

Inherited from Any

complex_type_manipulation

etl

gwas_functions

quality_control

Ungrouped