package expressions
- Alphabetic
- Public
- All
Type Members
-
case class
AddStructFields(struct: Expression, newFields: Seq[Expression]) extends Expression with RewriteAfterResolution with Product with Serializable
Expression that adds fields to an existing struct.
Expression that adds fields to an existing struct.
At optimization time, this expression is rewritten as the creation of new struct with all the fields of the existing struct as well as the new fields. See io.projectglow.sql.optimizer.ReplaceExpressionsRule for more details.
-
trait
AggregateByIndex extends DeclarativeAggregate with HigherOrderFunction
An expression that allows users to aggregate over all array elements at a specific index in an array column.
An expression that allows users to aggregate over all array elements at a specific index in an array column. For example, this expression can be used to compute per-sample summary statistics from a genotypes column.
The user must provide the following arguments: - The array for aggregation - The initialValue for each element in the per-index buffer - An update function to update the buffer with a new element - A merge function to combine two buffers
The user may optionally provide an evaluate function. If it's not provided, the identity function is used.
Example usage to calculate average depth across all sites for a sample: aggregate_by_index( genotypes, named_struct('sum', 0l, 'count', 0l), (buf, genotype) -> named_struct('sum', buf.sum + genotype.depth, 'count', buf.count + 1), (buf1, buf2) -> named_struct('sum', buf1.sum + buf2.sum, 'count', buf1.count + buf2.count), buf -> buf.sum / buf.count)
- case class ArrayStatsSummary(array: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class ArrayToDenseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class ArrayToSparseVector(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class CallStats(genotypes: Expression) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable
- case class CallStatsStruct(callRate: Double, nCalled: Int, nUncalled: Int, nHet: Int, nHomozygous: Array[Int], nNonRef: Int, nAllelesCalled: Int, alleleCounts: Array[Int], alleleFrequencies: Array[Double]) extends Product with Serializable
-
case class
CallSummaryStats(genotypes: Expression, refAllele: Expression, altAlleles: Expression, mutableAggBufferOffset: Int, inputAggBufferOffset: Int) extends TypedImperativeAggregate[ArrayBuffer[SampleCallStats]] with ExpectsGenotypeFields with GlowLogging with Product with Serializable
Computes summary statistics per-sample in a genomic cohort.
Computes summary statistics per-sample in a genomic cohort. These statistics include the call rate and the number of different types of variants.
The return type is an array of summary statistics. If sample ids are included in the input schema, they'll be propagated to the results.
-
case class
CovariateQRContext(covQt: DenseMatrix[Double], degreesOfFreedom: Int) extends Product with Serializable
Context that can be computed once for all variant sites for a linear regression GWAS analysis.
- case class DpSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable
-
case class
ExpandStruct(struct: Expression) extends Expression with Unevaluable with Product with Serializable
Expands all the fields of a potentially unnamed struct.
-
case class
ExplodeMatrix(matrixExpr: Expression) extends Expression with Generator with CodegenFallback with ExpectsInputTypes with Product with Serializable
Explodes a matrix by row.
Explodes a matrix by row. Each row of the input matrix will be output as an array of doubles.
If the input expression is null or has 0 rows, the output will be empty.
- matrixExpr
The matrix to explode. May be dense or sparse.
- case class FirthFit(fitState: FirthNewtonArgs, logLkhd: Double, converged: Boolean, exploded: Boolean) extends Product with Serializable
- case class FirthFitState(x: DenseMatrix[Double], nullFitArgs: FirthNewtonArgs, fullFitArgs: FirthNewtonArgs) extends Product with Serializable
- class FirthNewtonArgs extends AnyRef
-
case class
GenotypeStates(genotypes: Expression) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable
Converts a complex genotype array into an array of ints, where each element is the sum of the calls array for the sample at that position if no calls are missing, or -1 if any calls are missing.
- case class GqSummaryStats(child: Expression) extends Expression with Rewrite with Product with Serializable
-
case class
HardCalls(probabilities: Expression, numAlts: Expression, phased: Expression, threshold: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable
Converts an array of probabilities (most likely the genotype probabilities from a BGEN file) into hard calls.
Converts an array of probabilities (most likely the genotype probabilities from a BGEN file) into hard calls. The input probabilities are assumed to be diploid.
If the input probabilities are phased, each haplotype is called separately by finding the maximum probability greater than the threshold (0.9 by default, a la plink). If no probability is greater than the threshold, the call is -1 (missing).
If the input probabilities are unphased, the probabilities refer to the complete genotype. In this case, we find the maximum probability greater than the threshold and then convert that value to a genotype call.
If any of the required parameters (probabilities, numAlts, phased) are null, the expression returns null.
- probabilities
The probabilities to convert to hard calls. The algorithm does not check that they sum to 1. If the probabilities are unphased, they are assumed to correspond to the genotypes in colex order, which is standard for both BGEN and VCF files.
- numAlts
The number of alternate alleles at this site.
- phased
Whether the probabilities are phased (per haplotype) or unphased (whole genotype).
- threshold
Calls are only generated if at least one probability is above this threshold.
- case class HardyWeinberg(genotypes: Expression) extends UnaryExpression with ExpectsGenotypeFields with Product with Serializable
- case class HardyWeinbergStruct(hetFreqHwe: Double, pValueHwe: Double) extends Product with Serializable
- case class LRTFitState(x: DenseMatrix[Double], hessian: DenseMatrix[Double], nullFit: NewtonResult, newtonState: NewtonIterationsState) extends Product with Serializable
-
case class
LiftOverCoordinatesExpr(contigName: Expression, start: Expression, end: Expression, chainFile: Expression, minMatchRatio: Expression) extends Expression with CodegenFallback with ImplicitCastInputTypes with Product with Serializable
Performs lift over from the specified 0-start, half-open interval (contigName, start, end) on the reference sequence to a query sequence, using the specified chain file and minimum fraction of bases that must remap.
Performs lift over from the specified 0-start, half-open interval (contigName, start, end) on the reference sequence to a query sequence, using the specified chain file and minimum fraction of bases that must remap.
We assume the chain file is a constant value so that the LiftOver object can be reused between rows.
If any of the required parameters (contigName, start, end) are null, the expression returns null. If minMatchRatioOpt contains null, the expression returns null; if it is empty, we use 0.95 to match LiftOver.DEFAULT_LIFTOVER_MINMATCH.
- contigName
Chromosome name on the reference sequence.
- start
Start position (0-start) on the reference sequence.
- end
End position on the reference sequence.
- chainFile
UCSC chain format file mapping blocks from the reference sequence to the query sequence.
- minMatchRatio
The minimum fraction of bases that must remap to lift over successfully.
- case class LinearRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression) extends TernaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class LogisticRegressionExpr(genotypes: Expression, phenotypes: Expression, covariates: Expression, test: Expression) extends QuaternaryExpression with ImplicitCastInputTypes with Product with Serializable
- class LogisticRegressionState extends AnyRef
-
trait
LogitTest extends Serializable
Base trait for logistic regression tests
-
case class
LogitTestResults(beta: Double, oddsRatio: Double, waldConfidenceInterval: Seq[Double], pValue: Double) extends Product with Serializable
Statistics returned upon performing a logit test.
Statistics returned upon performing a logit test.
- beta
Log-odds associated with the genotype, NaN if the null/full model fit failed
- oddsRatio
Odds ratio associated with the genotype, NaN if the null/full model fit failed
- waldConfidenceInterval
Wald 95% confidence interval of the odds ratio, NaN if the null/full model fit failed
- pValue
P-value for the specified test, NaN if the null/full model fit failed. Determined using the profile likelihood method.
-
case class
MomentAggState(count: Long = 0, min: Double = 0, max: Double = 0, mean: Double = 0, m2: Double = 0) extends Product with Serializable
The state necessary for maintaining moment based aggregations, currently only supported up to m2.
The state necessary for maintaining moment based aggregations, currently only supported up to m2.
This functionality is based on the org.apache.spark.sql.catalyst.expressions.aggregate.CentralMomentAgg implementation in Spark and is used to compute summary statistics on arrays as well across many rows for sample based aggregations.
- class NewtonIterationsState extends AnyRef
- case class NewtonResult(args: NewtonIterationsState, logLkhd: Double, nIter: Int, converged: Boolean, exploded: Boolean) extends Product with Serializable
- case class NormalizeVariantExpr(contigName: Expression, start: Expression, end: Expression, refAllele: Expression, altAlleles: Expression, refGenomePathString: Expression) extends SenaryExpression with ImplicitCastInputTypes with Product with Serializable
-
case class
PerSampleSummaryStatistics(genotypes: Expression, field: Expression, mutableAggBufferOffset: Int = 0, inputAggBufferOffset: Int = 0) extends TypedImperativeAggregate[ArrayBuffer[SampleSummaryStatsState]] with ExpectsGenotypeFields with GlowLogging with Product with Serializable
Computes summary statistics (count, min, max, mean, stdev) for a numeric genotype field for each sample in a cohort.
Computes summary statistics (count, min, max, mean, stdev) for a numeric genotype field for each sample in a cohort. The field is determined by the provided StructField. If the field does not exist in the genotype struct, an analysis error will be thrown.
The return type is an array of summary statistics. If sample ids are included in the input, they'll be propagated to the results.
- case class RegressionStats(beta: Double, standardError: Double, pValue: Double) extends Product with Serializable
- case class SampleCallStats(sampleId: String = null, nCalled: Long = 0, nUncalled: Long = 0, nHomRef: Long = 0, nHet: Long = 0, nHomVar: Long = 0, nInsertion: Long = 0, nDeletion: Long = 0, nTransversion: Long = 0, nTransition: Long = 0, nSpanningDeletion: Long = 0) extends Product with Serializable
- case class SampleDpSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable
- case class SampleGqSummaryStatistics(child: Expression) extends Expression with Rewrite with Product with Serializable
- case class SampleSummaryStatsState(sampleId: String, momentAggState: MomentAggState) extends Product with Serializable
- case class SubsetStruct(struct: Expression, fields: Seq[Expression]) extends Expression with Rewrite with Product with Serializable
- case class UnwrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression) extends DeclarativeAggregate with AggregateByIndex with UnwrappedAggregateFunction with Product with Serializable
-
trait
UnwrappedAggregateFunction extends AggregateFunction
A hack to make Spark SQL recognize AggregateByIndex as an aggregate expression.
A hack to make Spark SQL recognize AggregateByIndex as an aggregate expression.
See io.projectglow.sql.optimizer.ResolveAggregateFunctionsRule for details.
- trait VariantType extends AnyRef
- case class VectorToArray(child: Expression) extends UnaryExpression with ImplicitCastInputTypes with Product with Serializable
- case class WrappedAggregateByIndex(arr: Expression, initialValue: Expression, update: Expression, merge: Expression, evaluate: Expression = LambdaFunction.identity) extends DeclarativeAggregate with AggregateByIndex with Product with Serializable
Value Members
- object ArrayToDenseVector extends Serializable
- object ArrayToSparseVector extends Serializable
- object CallStats extends Serializable
- object CovariateQRContext extends GlowLogging with Serializable
- object FirthTest extends LogitTest
- object HardyWeinberg extends Serializable
- object LikelihoodRatioTest extends LogitTest
- object LinearRegressionExpr extends Serializable
- object LinearRegressionGwas extends GlowLogging
- object LogisticRegressionExpr extends Serializable
-
object
LogisticRegressionGwas extends GlowLogging
Some of the logic used for logistic regression is from the Hail project.
Some of the logic used for logistic regression is from the Hail project. The Hail project can be found on Github: https://github.com/hail-is/hail. The Hail project is under an MIT license: https://github.com/hail-is/hail/blob/master/LICENSE.
- object LogitTestResults extends Serializable
- object MomentAggState extends GlowLogging with Serializable
- object NormalizeVariantExpr extends Serializable
- object SampleCallStats extends GlowLogging with Serializable
-
object
VariantQcExprs extends GlowLogging
Contains implementations of QC functions.
Contains implementations of QC functions. These implementations are called during both whole-stage codegen and interpreted execution.
The functions are exposed to the user as Catalyst expressions.
- object VariantType
-
object
VariantUtilExprs
Implementations of utility functions for transforming variant representations.
Implementations of utility functions for transforming variant representations. These implementations are called during both whole-stage codegen and interpreted execution.
The functions are exposed to the user as Catalyst expressions.
- object VectorToArray extends Serializable