Runs base quality score recalibration on a set of reads.
Runs base quality score recalibration on a set of reads. Uses a table of known SNPs to mask true variation during the recalibration process.
A table of known SNPs to mask valid variants.
An optional local path to dump recalibration observations to.
Returns an RDD of recalibrated reads.
Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.
Calculates the set of unique attribute values that occur for the given tag, and the number of time each value occurs.
The name of the optional field whose values are to be counted.
A Map whose keys are the values of the tag, and whose values are the number of time each tag-value occurs.
Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.
Converts a set of records into an RDD containing the pairs of all unique tagStrings within the records, along with the count (number of records) which have that particular attribute.
An RDD of attribute name / count pairs.
Converts an RDD of ADAM read records into SAM records.
Converts an RDD of ADAM read records into SAM records.
Returns a SAM/BAM formatted RDD of reads, as well as the file header.
Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.
Cuts reads into _k_-mers, and then counts the number of occurrences of each _k_-mer.
The value of _k_ to use for cutting _k_-mers.
Returns an RDD containing k-mer/count pairs.
adamCountQmers
Returns the subset of the ADAMRecords which have an attribute with the given name.
Returns the subset of the ADAMRecords which have an attribute with the given name.
The name of the attribute to filter on (should be length 2)
An RDD[Read] containing the subset of records with a tag that matches the given name.
Collects a dictionary summarizing the read groups in an RDD of ADAMRecords.
Collects a dictionary summarizing the read groups in an RDD of ADAMRecords.
A dictionary describing the read groups in this RDD.
Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.
Aggregates together a sequence dictionary from the different individual reference sequences used in this dataset.
A sequence dictionary describing the reference contigs in this dataset.
Reassembles read pairs from two sets of unpaired reads.
Reassembles read pairs from two sets of unpaired reads. The assumption is that the two sets were _originally_ paired together.
The rdd containing the second read from the pairs.
How stringently to validate the reads.
Returns an RDD with the pair information recomputed.
The RDD that this is called on should be the RDD with the first read from the pair.
Realigns indels using a concensus-based heuristic.
Realigns indels using a concensus-based heuristic.
If the input data is sorted, setting this parameter to true avoids a second sort.
The size of the largest indel to use for realignment.
The maximum number of consensus sequences to realign against per target region.
Log-odds threhold to use when realigning; realignments are only finalized if the log-odds threshold is exceeded.
The maximum width of a single target region for realignment.
Returns an RDD of mapped reads which have been realigned.
RealignIndels
Saves an RDD of ADAM read data into the SAM/BAM format.
Saves an RDD of ADAM read data into the SAM/BAM format.
Path to save files to.
Selects whether to save as SAM or BAM. The default value is true (save in SAM format).
Saves reads in FASTQ format.
Saves reads in FASTQ format.
Path to save files at.
Whether to sort the FASTQ files by read name or not. Defaults to false. Sorting the output will recover pair order, if desired.
Saves these AlignmentRecords to two FASTQ files: one for the first mate in each pair, and the other for the second.
Saves these AlignmentRecords to two FASTQ files: one for the first mate in each pair, and the other for the second.
Path at which to save a FASTQ file containing the first mate of each pair.
Path at which to save a FASTQ file containing the second mate of each pair.
Iff strict, throw an exception if any read in this RDD is not accompanied by its mate.
Groups all reads by record group and read name
Groups all reads by record group and read name
SingleReadBuckets with primary, secondary and unmapped reads
Calculates the subset of the RDD whose AlignmentRecords overlap the corresponding query ReferenceRegion.
Calculates the subset of the RDD whose AlignmentRecords overlap the corresponding query ReferenceRegion. Equality of the reference sequence (to which these are aligned) is tested by string equality of the names. AlignmentRecords whose 'getReadMapped' method return 'false' are ignored.
The end of the record against the reference sequence is calculated from the cigar string using the ADAMContext.referenceLengthFromCigar method.
The query region, only records which overlap this region are returned.
The subset of AlignmentRecords (corresponding to either primary or secondary alignments) that overlap the query region.
For a single RDD element, returns 0+ sequence record elements.
For a single RDD element, returns 0+ sequence record elements.
Element from which to extract sequence records.
A seq of sequence records.