All Classes and Interfaces
Class
Description
Abstract class that coordinates the general task of taking in a set of alignment information,
possibly in SAM format, possibly in other formats, and merging that with the set of all reads
for which alignment was attempted, stored in an unmapped SAM file.
The position files of Illumina are nearly the same form: Pos files consist of text based tabbed
x-y coordinate float pairs, locs files are binary x-y float pairs, clocs are compressed binary
x-y float pairs.
Class for parsing text files where each line consists of fields separated by whitespace.
Abstract class that holds parameters and methods common to classes that perform duplicate
detection and/or marking within SAM/BAM/CRAM files.
Little class used to package up a header and an iterable/iterator.
Abstract class that holds parameters and methods common to classes that optical duplicate detection.
Class for collecting data on reference coverage, base qualities and excluded bases from one AbstractLocusInfo object for
CollectWgsMetrics.
Combines multiple Picard QualityYieldMetrics files into a single file.
Combines multiple Variant Calling Metrics files into a single file.
Store one or more AdapterPairs to use to mark adapter sequence of SAMRecords.
A utility class for matching reads to adapters.
A tool to add comments to a BAM file header.
Assigns all the reads in a file to a single new read-group.
High level metrics about the alignment of reads within a SAM file, produced by
the CollectAlignmentSummaryMetrics program and usually stored in a file with
the extension ".alignment_summary_metrics".
Filters out a record if the allele balance for heterozygotes is out of a defined range across all samples.
Utilities class containing methods for restricting
VariantContext
and GenotypesContext
objects to a
reduced set of alleles, as well as for choosing the best set of alleles to keep and for cleaning up annotations and
genotypes after subsetting.Exception thrown when loading gene annotations.
A simple class to store names and counts for the the Control Information fields that are stored in an Illumina GTC file.
Wrapper around a CloseableIterator that reads in a separate thread, for cases in which that might be
efficient.
Describes
Designs baits for hybrid selection!
Set of possible design strategies for bait design.
Command line program to print statistics from BAM index (.bai) file
Statistics include count of aligned and unaligned reads for each reference sequence
and a count of all records with no start coordinate.
Deprecated.
A class for finding the distance between multiple (matched) barcodes and multiple barcode reads.
BarcodeExtractor is used to match barcodes and collect barcode match metrics.
Utility class to hang onto data about the best match for a given barcode
Created by jcarey on 3/13/14.
Reads a single barcode file line by line and returns the barcode if there was a match or NULL otherwise.
Metrics produced by the ExtractIlluminaBarcodes program that is used to parse data in
the basecalls directory and determine to which barcode each read should be assigned.
An interface that can take a collection of bases (provided as
SamLocusIterator.RecordAndOffset
and SamLocusAndReferenceIterator.SAMLocusAndReference
) and generates a
ErrorMetric
from them.Tools that process sequencing machine data, e.g.
BasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data
from standard Illumina formats to specific output records (FASTA records/SAM records).
Interface that defines a converter that takes ClusterData and returns OUTPUT_RECORD type objects.
Interface that defines a writer that will write out OUTPUT_RECORD type objects.
BasecallsConverterBuilder creates and configures BasecallsConverter objects.
An interface and implementations for classes that apply a
RecordAndOffsetStratifier
to put bases into various "bins" and then compute an ErrorMetric
on these bases using a BaseErrorCalculator
.An error metric for the errors in bases.
Parse various formats and versions of Illumina Basecall files, and use them the to populate
ClusterData objects.
TextFileParser which reads a single text file.
Created by jcarey on 3/14/14.
A class that implements the IlluminaData interfaces provided by this parser
One BclData object is returned to IlluminaDataProvider per cluster and each
first level array in bases and qualities represents a single read in that
cluster
Annoyingly, there are two different files with extension .bci in NextSeq output.
Describes a mechanism for revising and evaluating qualities read from a BCL file.
BCL Files are base call and quality score binary files containing a (base,quality) pair for successive clusters.
For an aligner that aligns each end independently, select the alignment for each end with the best MAPQ, and
make that the primary.
This strategy was designed for TopHat output, but could be of general utility.
A simple program to convert an Illumina bpm (bead pool manifest file) into a normalization manifest (bpm.csv) file
The normalization manifest (bpm.csv) is a simple text file generated by Illumina tools - it has a specific format
and is used by ZCall .
A class to represent an 'Extended' Illumina Manifest file.
A class to represent a record (line) from an Extended Illumina Manifest [Assay] entry
Command line program to generate a BAM index (.bai) file from a BAM (.bam) file
Takes a VCFFileReader and an IntervalList and provides a single iterator over all variants in all the intervals.
Calculates various metrics on a sample fingerprint, indicating whether the fingerprint satisfies the assumptions we have.
Collects variants and generates metrics about them.
A read name encoder conforming to the standard described by Illumina Casava 1.8.
This class provides that data structure for cbcls.
------------------------------------- CBCL Header -----------------------------------
Bytes 0 - 1 Version number, current version is 1 unsigned 16 bits little endian integer
Bytes 2 - 5 Header size unsigned 32 bits little endian integer
Byte 6 Number of bits per basecall unsigned
Byte 7 Number of bits per q-score unsigned
Checks the sample identity of the sequence/genotype data in the provided file (SAM/BAM or VCF)
against a set of known genotypes in the supplied genotype file (in VCF format).
Program to check a lane of an Illumina output directory.
Simple class to check the terminator block of a SAM file.
Implementation of a circular byte buffer that uses a large byte[] internally and supports basic
read/write operations from/to other byte[]s passed as arguments.
Utilities to clip the adapter sequence from a SAMRecord read
The clocs file format is one of 3 Illumina formats(pos, locs, and clocs) that stores position data exclusively.
Summary
Store the information from Illumina files for a single cluster with one or more reads.
Takes ClusterData provided by an IlluminaDataProvider into one or two SAMRecords,
as appropriate, and optionally marking adapter sequence.
A metric class to hold the result of
ClusterCrosscheckMetrics
fingerprints.A command line tool to read a BAM file and produce standard alignment metrics that would be applicable to any alignment.
Collects summary and per-sample metrics about variant calls in a VCF file.
Collect DuplicateMark'ing metrics from an input file that was already Duplicate-Marked.
Tool to collect information about GC bias in the reads in a given BAM file.
Collect metrics regarding the reason for reads (sequenced by HiSeqX) not passing the Illumina PF Filter.
a metric class for describing FP failing reads from an Illumina HiSeqX lane *
Metrics produced by the GetHiSeqXPFFailMetrics program.
This tool takes a SAM/BAM file input and collects metrics that are specific for sequence
datasets generated through hybrid-selection.
A Command line tool to collect Illumina Basecalling metrics for a sequencing run
Requires a Lane and an input file of Barcodes to expect.
Command-line wrapper around
CollectIlluminaLaneMetrics.IlluminaLaneMetricsCollector
.Utility for collating Tile records from the Illumina TileMetrics file into lane-level and phasing-level metrics.
A CLP that, given a BAM and a VCF with genotypes of the same sample, estimates the rate of independent replication of reads within the bam.
Command line program to read non-duplicate insert sizes, create a Histogram
and report distribution statistics.
Command-line program to compute metrics about outward-facing pairs, inward-facing
pairs, and chimeras in a jumping library.
Class that is designed to instantiate and execute multiple metrics programs that extend
SinglePassSamProgram while making only a single pass through the SAM file and supplying
each program with the records as it goes.
Class for trying to quantify the CpCG->CpCA error rate.
Metrics class for outputs.
Command line program to calculate quality yield metrics
A set of metrics used to describe the general quality of a BAM file
Command line program to calculate quality yield metrics for flow based read files
A set of metrics used to describe the general quality of a BAM file
Command line program to calculate SNV quality yield metrics for read files
A set of metrics used to describe the general quality of a BAM file
Computes a number of metrics that are useful for evaluating coverage and performance of whole genome sequencing
experiments, same implementation as
CollectWgsMetrics
, with different defaults: lacks baseQ and mappingQ filters
and has much higher coverage cap.Calculates and reports QC metrics for RRBS data based on the methylation status at individual C/G bases as well
as CpG sites across all reads in the input BAM/SAM file.
Program to collect error metrics on bases stratified in various ways.
Quantify substitution errors caused by mismatched base pairings during various
stages of sample / library prep.
Both CollectTargetedPCRMetrics and CollectHsSelection share virtually identical program structures except
for the name of their targeting mechanisms (e.g.
This tool calculates a set of PCR-related metrics from an aligned SAM or
BAM file containing targeted sequencing data.
Collects summary and per-sample metrics about variant calls in a VCF file.
A collection of metrics relating to snps and indels within a variant-calling file (VCF) for a given sample.
A collection of metrics relating to snps and indels within a variant-calling file (VCF).
Computes a number of metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments.
Metrics for evaluating the performance of whole genome sequencing experiments.
A simple program to combine multiple genotyping array VCFs into one VCF
The input VCFs must have the same sequence dictionary and same list of variant loci.
Embodies defaults for global values that affect how the Picard Command Line operates.
Abstract class to facilitate writing command-line programs.
Class for handling translation of Picard-style command line argument syntax to POSIX-style argument syntax;
used for running tests written with Picard style syntax against the Barclay command line parser.
A simple tool to compare two Illumina GTC files.
Compare two metrics files.
Rudimentary SAM comparer.
Class for managing a list of Counters of integer,
provides methods to access data from Counters with respect to an offset.
Counting filter that discards reads are unaligned or aligned with MQ==0 and whose 5' ends look like adapter
Sequence
Counting filter that discards reads that have been marked as duplicates.
A SamRecordFilter that counts the number of bases in the reads which it filters out.
Counting filter that discards reads below a configurable mapping quality threshold.
Counting filter that discards reads that are unpaired in sequencing and paired reads whose mates are not mapped.
A simple program to create a standard picard metrics file
from the output of bafRegress
Create an Extended Illumina Manifest by performing a liftover to Build 37.
Create a SAM/BAM file from a fasta containing reference sequence.
A simple program to create a standard picard metrics file
from the output of VerifyIDIntensity
Checks that all data in the set of input files appear to come from the same
individual.
A class to hold the result of crosschecking fingerprints.
The data type.
Deprecated.
Utility class to use with DbSnp files to determine is a locus is
a dbSnp site.
Little tuple class to contain one bitset for SNPs and another for Indels.
Iterate through a delimited text file in which columns are found by looking at a header line rather than by position.
Filters out a record if all variant samples have depth lower than the given value.
Tools that collect sequencing quality-related and comparative metrics
A genotype produced by one of the concrete implementations of AbstractAlleleCaller.
Simple enum to represent the three possible combinations of major/major, major/minor
and minor/minor haplotypes for a diploid individual.
Disk-based implementation of ReadEndsForMarkDuplicatesMap.
Summary
Metrics that are calculated during the process of marking duplicates
within a stream of SAMRecords.
Factory class that creates either regular or flow-based duplication metrics.
When it is necessary to pick a primary alignment from a group of alignments for a read, pick the one that maps
the earliest base in the read.
Created by farjoun on 6/26/18.
Summary metrics produced by
CollectSequencingArtifactMetrics
as a roll up of the
context-specific error rates, to provide global error rates per type of base substitution.Attempts to estimate library complexity from sequence alone.
Program to create a fingerprint for the contaminating sample when the level of contamination is both known and
uniform in the genome.
Determine the barcode for each read in an Illumina lane.
Extracts barcodes and accumulates metrics for an entire tile.
Simple command line program that allows sub-sequences represented by an interval
list to be extracted from a reference sequence file.
Converts a FASTQ file to an unaligned BAM or SAM file.
Class represents fast algorithm for collecting data from
AbstractLocusInfo
with a list of aligned EdgingRecordAndOffset
objects.Summary
Iterator that dynamically applies filter strings to VariantContext records supplied by an underlying
iterator.
Created by jcarey on 3/13/14.
Illumina uses an algorithm described in "Theory of RTA" that determines whether or not a cluster passes filter("PF") or not.
Summary
Applies a set of hard filters to Variants and to Genotypes within a VCF.
Summary
class to represent a genetic fingerprint as a set of HaplotypeProbabilities
objects that give the relative probabilities of each of the possible haplotypes
at a locus.
Major class that coordinates the activities involved in comparing genetic fingerprint
data whether the source is from a genotyping platform or derived from sequence data.
class to hold the details of a element of fingerprinting PU tag
Detailed metrics about an individual SNP/Haplotype comparison within a fingerprint comparison.
Summary fingerprinting metrics and statistics about the comparison of the sequence data
from a single read group (lane or index within a lane) vs.
Class for holding metrics on a single fingerprint.
Class that is used to represent the results of comparing a read group within a SAM file, or a sample
within a VCF against one or more set of fingerprint genotypes.
A set of utilities used in the fingerprinting environment
A class that holds VariantContexts sorted by genomic position
Filters records based on the phred scaled p-value from the Fisher Strand test stored in
the FS attribute.
Summary
Tool for replacing or fixing up a VCF header.
Utility class for working with flow-based reads
The main member static class is
ReadGroupInfo
that contains methods that allow
working with headers of flow based reads and extracting the flow order and the maximal hmer class called.The scheme is defined in the constructor.
The default scheme is derived from the GA4GH Benchmarking Work Group's proposed evaluation scheme.
Concatenate efficiently BAM files that resulted from a scattered parallel analysis.
Simple little class that combines multiple VCFs that have exactly the same set of samples
and nonoverlapping sets of loci.
Class that holds detailed metrics about reads that fall within windows of a certain
GC bin on the reference genome.
Calculates GC Bias Metrics on multiple levels
Created by kbergin on 3/23/15.
High level metrics that capture how biased the coverage in a certain lane is.
Utilities to calculate GC Bias
Created by kbergin on 9/23/15.
Holds annotation of a gene for storage in an OverlapDetector.
Load gene annotations into an OverlapDetector of Gene objects.
Summary
A simple structure to return the results of getAlleles.
Class that holds metrics about the Genotype Concordance contingency tables.
A class to store the counts for various truth and call state classifications relative to a reference.
Class that holds detail metrics about Genotype Concordance
This defines for each valid TruthState and CallState tuple, the set of contingency table entries that to which the tuple should contribute.
Created by kbergin on 6/19/15.
Created by kbergin on 7/30/15.
A class to store the various classifications for:
1.
These states represent the relationship between the call genotype and the truth genotype relative to
a reference sequence.
A specific state for a 2x2 contingency table.
A minute class to store the truth and call state respectively.
These states represent the relationship between a truth genotype and the reference sequence.
Class that holds summary metrics about Genotype Concordance
An interface for classes that perform Genotype filtration.
Genotype filter that filters out genotypes below a given quality threshold.
Miscellaneous tools, e.g.
Created by farjoun on 11/2/16.
Class to convert an Illumina GTC file into a VCF file.
An accumulator for collecting metrics about a single-sample GVCF.
Represents information about a group of SNPs that form a haplotype in perfect LD
with one another.
A collection of metadata about Haplotype Blocks including multiple in memory "indices" of the data
to make it easy to query the correct HaplotypeBlock or Snp by snp names, positions etc.
Abstract class for storing and calculating various likelihoods and probabilities
for haplotype alleles given evidence.
Log10(P(evidence| haplotype)) for the 3 different possible haplotypes
{aa, ab, bb}
Represents the probability of the underlying haplotype of the contaminating sample given the data.
Represents a set of HaplotypeProbabilities that were derived from a single SNP
genotype at a point in time.
Represents the likelihood of the HaplotypeBlock given the GenotypeLikelihoods (GL field from a VCF, which is actually a log10-likelihood)
for each of the SNPs in that block.
Represents the probability of the underlying haplotype given the data.
A wrapper class for any HaplotypeProbabilities instance that will assume that the given evidence is that of a tumor sample and
provide an hp for the normal sample that tumor came from.
Calculates HS metrics for a given SAM or BAM file.
Metrics generated by CollectHsMetrics for the analysis of target-capture sequencing experiments.
Program to create a fingerprint for the contaminating sample when the level of contamination is both known and
uniform in the genome.
A class to encompass writing an Illumina adpc.bin file.
Metric for Illumina Basecalling that stores means and standard deviations on a per-barcode per-lane basis.
Simple switch to control the read name format to emit.
IlluminaBasecallsToSam transforms a lane of Illumina data file formats (bcl, locs, clocs, qseqs, etc.) into
SAM, BAM or CRAM file format.
A class to parse the contents of an Illumina Bead Pool Manifest (BPM) file
A BPM file contains metadata (including the alleles, mapping and normalization information) on an Illumina Genotyping Array
Each type of genotyping array has a specific BPM .
A simple class to represent a locus entry in an Illumina Bead Pool Manifest (BPM) file
IlluminaDataProviderFactory accepts options for parsing Illumina data files for a lane and creates an
IlluminaDataProvider, an iterator over the ClusterData for that lane, which utilizes these options.
List of data types of interest when parsing Illumina data.
General utils for dealing with IlluminaFiles as well as utils for specific, support formats.
Embodies characteristics that describe a lane.
A class to represent an Illumina Manifest file.
A class to represent a record (line) from an Illumina Manifest [Assay] entry
Illumina's TileMetricsOut.bin file codes various metrics, both concrete (all density id's are code 100) or as a base code
(e.g.
Metrics for Illumina Basecalling that stores median phasing and prephasing percentages on a per-template-read, per-lane basis.
A read name encoder following the encoding initially produced by picard fastq writers.
Misc utilities for working with Illumina specific files and data
Describes adapters used on each pair of strands
A calculator that estimates the error rate of the bases it observes for indels only.
Metric to be used for InDel errors
A class to store information relevant for biological rate estimation
A class to provide methods for accessing Illumina Infinium Data Files.
A class to parse the contents of an Illumina Infinium cluster (EGT) file
A cluster file contains information about the clustering information used in mapping red / green intensity information
to genotype calls
A class to encapsulate the table of contents for an Illumina Infinium Data Files.
A class to parse the contents of an Illumina Infinium genotype (GTC) file
A GTC file is the output of Illumina's genotype calling software (either Autocall or Autoconvert) and
contains genotype calls, confidence scores, basecalls and raw intensities for all calls made on the chip.
A class to parse the contents of an Illumina Infinium Normalization Manifest file
An Illumina Infinium Normalization Manifest file contains a subset of the information contained in the Illumina
Manifest file in addition to the normalization ID which is needed for normalizating intensities in GtcToVcf
A class to store fields that are specific to a VCF generated from an Illumina GTC file.
Metrics about the insert size distribution of a paired-end library, created by the
CollectInsertSizeMetrics program and usually written to a file with the extension
".insert_size_metrics".
Collects InsertSizeMetrics on the specified accumulationLevels using
The channels in a FourChannelIntensityData object, and the channels produced by a ClusterIntensityFileReader,
for cases in which it is desirable to handle these abstractly rather than having the specific names
in the source code.
Base interface for an interval argument collection.
An interface for a class that scatters IntervalLists.
a Baseclass for scatterers that scatter by uniqued base count.
Scatters
IntervalList
by interval count so that resulting IntervalList
's have the same number of intervals in them.Scatters
IntervalList
by into `interval count` shards so that resulting IntervalList
's have
approximately same number of intervals in them.A BaseCount Scatterer that avoid breaking-up intervals.
Like
IntervalListScattererWithoutSubdivision
but will overflow current list if the projected size of the
remaining lists is bigger than the "ideal".An IntervalListScatterer that attempts to place the same number of (uniquified) bases in each output interval list.
An enum to control the creation of the various IntervalListScatter objects
Trivially simple command line program to convert an IntervalList file to a BED file.
Performs various
IntervalList
manipulations.Tools that process genomic intervals in various formats.
High level metrics about the presence of outward- and inward-facing pairs
within a SAM file generated with a jumping library, produced by
the CollectJumpingLibraryMetrics program and usually stored in a file with
the extension ".jump_metrics".
Helper class used to transform tile data for a lane into a collection of IlluminaPhasingMetrics
A class to generate library Ids and keep duplication metrics by library IDs.
Liftover SNPs in HaplotypeMaps from one reference to another
This tool adjusts the coordinates in an interval list on one reference to its homologous interval list on another
reference, based on a chain file that describes the correspondence between the two references.
Summary
Created by jcarey on 3/13/14.
The locs file format is one 3 Illumina formats(pos, locs, and clocs) that stores position data exclusively.
Describes the behavior of a locus relative to a gene.
Creates a VCF that contains all the site-level information for all records in the input VCF but no genotype information.
Creates a TSV from sample name to VCF/GVCF path, with one line per input.
A better duplication marking algorithm that handles all cases including clipped
and gapped alignments.
Enum used to control how duplicates are flagged in the DT optional tag on each read.
Enum for the possible values that a duplicate read can be tagged with in the DT attribute.
MarkDuplicates calculation helper class for flow based mode
The class extends the behavior of MarkDuplicates which contains the complete
code for the non-flow based mode.
An even better duplication marking algorithm that handles all cases including clipped
and gapped alignments.
This will iterate through a coordinate sorted SAM file (iterator) and either mark or
remove duplicates as appropriate.
Command line program to mark the location of adapter sequences.
This is the mark queue.
Represents the results of a fingerprint comparison between one dataset and a specific
fingerprint file.
General math utilities
A collection of common math operations that work with log values.
Program to generate a data table and chart of mean quality by cycle from a
BAM file.
Map from String to ReadEnds object.
Describes the type and number of mendelian violations found within a Trio.
Created by farjoun on 6/25/16.
An extension of MetricBase that knows how to merge-by-adding fields that are appropriately annotated (
MergeByAdding
).Metrics whose values can be merged by adding.
Metrics whose values should be equal when merging.
Metrics that are merged manually in the
MergeableMetricBase.merge(MergeableMetricBase)
()}.Metrics that are not merged, but are subsequently derived from other metrics, for example by
MergeableMetricBase.calculateDerivedFields()
.Metrics that are not merged.
Summary
Class to take genotype calls from a ped file output from zCall and merge them into a vcf from autocall.
This tool is used for combining SAM and/or BAM files from different runs or read groups into a single file, similar
to the \"merge\" function of Samtools (http://www.htslib.org/doc/samtools.html).
Combines multiple variant files into a single variant file.
For use with Picard metrics programs that may output metrics for multiple levels
of aggregation with an analysis.
MMapBackedIteratorFactory a file reader that takes a header size and a binary file, maps the file to
a read-only byte buffer and provides methods to retrieve the header as it's own bytebuffer and create
iterators of different data types over the values of file (starting after the end of the header).
For a paired-end aligner that aligns each end independently, select the pair of alignments that result
in the largest insert size.
MultiLevelCollector handles accumulating Metrics at different MetricAccumulationLevels(ALL_READS, SAMPLE, LIBRARY, READ_GROUP).
Created by jcarey on 3/13/14.
NextSeq-style bcl's have all tiles for a cycle in a single file.
Parse .bcl.bgzf files that contain multiple tiles in a single file.
For file types for which there is one file per lane, with fixed record size, and all the tiles in it,
so the s_.bci file can be used to figure out where each tile starts and ends.
Read filter file that contains multiple tiles in a single file.
Created by jcarey on 3/13/14.
Read locs file that contains multiple tiles in a single file.
Abstract class for files with fixed-length records for multiple tiles, e.g.
A tool to count the number of non-N bases in a fasta file
Little program to "normalize" a fasta file to ensure that all line of sequence are the
same length, and are a reasonable length!
Contains methods for finding optical/co-localized/sequencing duplicates.
Picard default argument collection for an optional reference.
Miscellaneous tools, e.g.
Base interface for an output argument collection.
In multiple locations we need to know what cycles are output, as of now we output all non-skip cycles, but rather than sprinkle
this knowledge throughout the parser code, instead OutputMapping provides all the data a client might want about the
cycles to be output including what ReadType they are.
An error metric for the errors invovling bases in the overlapping region of a read-pair.
A calculator that estimates the error rate of the bases it observes, assuming that the reference is truth.
Simple Pair class.
An iterator that takes a pair of iterators over VariantContexts and iterates over them in tandem.
Little class to hold a pair of VariantContexts that are in sync with one another.
A base class for Metrics for targeted panels.
A class whose purpose is to initialize the various plugins that provide Path support.
Represents a .ped file of family information as documented here:
http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml
Stores the information in memory as a map of individualId -> Pedigree information for that individual
Abstract base class for Parsers that open a single tile file at a time and iterate through them.
PerRecordCollector - An interface for classes that collect data in order to generate one or more metrics.
Argument Collection which holds parameters common to classes that want to add PG tags to reads in SAM/BAM files
Small interface that provides access to the physical location information about a cluster.
Stores the minimal information needed for optical duplicate detection.
This stores records that are comparable for detecting optical duplicates.
Small class that provides access to the physical location information about a cluster.
Small class that provides access to the physical location information about a cluster.
Derived from BucketUtils.java in GATK
This is the main class of Picard and is the way of executing individual command line programs.
Basic Picard runtime exception that, for now, does nothing much
Ported from GATKIOUtils.java
Created by jcarey on 3/13/14.
The pos file format is one 3 Illumina formats(pos, locs, and clocs) that stores position data exclusively.
Summary
PosParser parses multiple files formatted as one of the three file formats that contain position information
only (pos, locs, and clocs).
Performs on-the-fly filtering of the provided
VariantContext
Iterator
such that only variants that satisfy
all predicates are emitted.It is useful to define a key such that the key will occur at most once among the primary alignments in a given file
(assuming the file is valid).
Given a set of alignments for a read or read pair, mark one alignment as primary, according to whatever
strategy is appropriate.
Utility for loading properties files from resources.
Filters out sites that have a QD annotation applied to them and where the QD value is lower than a
lower limit.
Charts quality score distribution within a BAM file.
A collection of helper utilities for iterating through reads that are in query-name sorted
read order as pairs
While structurally identical to CompositeIndex, this class is maintained as it makes code more readable when the two are used together (see QSeqParser)
Classes, methods, and enums that deal with the stratification of read bases and reference information.
Stratifies into quintiles of read cycle.
Stratifies according to the number of matching cigar operators (from CIGAR string) that the read has.
A CollectionStratifier is a stratifier that uses a collection of stratifiers to inform the stratification.
Types of consensus reads as determined by the number of duplicates used from
first and second strands.
Stratify by tags used during duplex and single index consensus calling.
An enum designed to hold a binned version of any probability-like number (between 0 and 1)
in quintiles
Stratifies base into their read's tile which is parsed from the read-name.
Stratifies base into their read's X coordinate which is parsed from the read-name.
Stratifies base into their read's Y coordinate which is parsed from the read-name.
A stratifier that uses GC (of the read) to stratify.
Stratifies according to the length of an insertion or deletion.
Stratifies according to the number of indel bases (from CIGAR string) that the read has.
Stratify bases according to the type of Homopolymer that they belong to (repeating element, final reference base and
whether the length is "long" or not).
Stratifies according to the overall mismatches (from
SAMTag.NM
) that the read has against the reference, NOT
including the current base.Stratify by the number of Ns found in the read.
An enum for holding a reads read-pair's Orientation (i.e.
A PairStratifier is a stratifier that uses two other stratifiers to inform the stratification.
An enum to hold information about the "properness" of a read pair
An enum for holding the direction for a read (positive strand or negative strand
An enum to hold the ordinality of a read
The main interface for a stratifier.
Data for a single end of a paired-end read, a barcode read, or for the entire read if not paired end.
Tools that manipulate read data in SAM, BAM or CRAM format
Represents one set of cycles in an ReadStructure (e.g.
Little struct-like class to hold read pair (and fragment) end data for duplicate marking.
Little struct-like class to hold read pair (and fragment) end data for MarkDuplicatesWithMateCigar
Codec for ReadEnds that just outputs the primitive fields and reads them back.
Interface for storing and retrieving ReadEnds objects.
Created by nhomer on 9/13/15.
A class to store individual records for MarkDuplicatesWithMateCigar.
Provides access to the physical location information about a cluster.
Describes the intended logical output structure of clusters of an Illumina run.
A read type describes a stretch of cycles in an ReadStructure
(e.g.
Base interface for a reference argument collection.
Tools that analyze and manipulate FASTA format references
Loads gene annotations from a refFlat file into an OverlapDetector.
Class which contains utility functions that use reflection.
Renames a sample within a VCF or BCF.
Reorders a SAM/BAM input file according to the order of contigs in a second reference file.
Little struct-like class to hold a record index, the index of the corresponding representative read, and duplicate set size information.
Codec for read names and integers that outputs the primitive fields and reads them back.
Argument collection for references that are required (and not common).
This tool reverts the original base qualities (if specified) and adds the mate cigar tag to mapped SAM, BAM or CRAM files.
Used as a return for the canSkipSAMFile function.
Reverts a SAM file by optionally restoring original quality scores and by removing
all alignment information.
Util class for executing R scripts.
Metrics about the alignment of RNA-seq reads within a SAM file to genes, produced by the CollectRnaSeqMetrics
program and usually stored in a file with the extension ".rna_metrics".
Holds information about CpG sites encountered for RRBS processing QC
Holds summary statistics from RRBS processing QC
Class that takes in a set of alignment information in SAM format and merges it with the set
of all reads for which alignment was attempted, stored in an unmapped SAM file.
Compare two SAM/BAM files.
Argument collection for SAM comparison
Metric for results of SamComparison.
Converts a BAM file to human-readable SAM output or vice versa
Defines a MultilevelPerRecordCollector using the argument type of SAMRecord so that this doesn't have to be redefined for each subclass of MultilevelPerRecordCollector
This class sets the duplicate read flag as the result state when examining sets of records.
Class to take unmapped reads in SAM/BAM/CRAM file format and create Maq binary fastq format file(s) --
one or two of them, depending on whether it's a paired-end read.
Extracts read sequences and qualities from the input SAM/BAM file and writes them into
the output file in Sanger FASTQ format.
Extracts read sequences and qualities from the input SAM/BAM file and SAM/BAM tags and writes them into
output files in Sanger FASTQ format.
A Tool for breaking up a reference into intervals of alternating regions of N and ACGT bases.
Class with helper methods for generating and writing SequenceDictionary objects.
Bait bias artifacts broken down by context.
Summary analysis of a single bait bias artifact, also known as a reference bias artifact.
Pre-adapter artifacts broken down by context.
Summary analysis of a single pre-adapter artifact.
Deprecated.
Fixes the NM, MD, and UQ tags in a SAM or BAM file.
Represents the sex of an individual.
A calculator that estimates the error rate of the bases it observes, assuming that the reference is truth.
This is a simple tool to mark duplicates using the DuplicateSetIterator, DuplicateSet, and SAMRecordDuplicateComparator.
A class for finding the distance between a single barcode and a barcode-read (with base qualities)
Super class that is designed to provide some consistent structure between subclasses that
simply iterate once over a coordinate sorted BAM and collect information from the records
as the go in order to produce some kind of output.
Class to represent a SNP in context of a haplotype block that is used in fingerprinting.
SortedBasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data
from standard Illumina formats to specific output records (FASTA records/SAM records).
Summary
Sorts a SAM or BAM file.
Sorts one or more VCF files according to the order of the contigs in the header/sequence dictionary and then
by coordinate.
Command-line program to split a SAM/BAM/CRAM file into separate files based on
library name.
Splits the input queryname sorted or query-grouped SAM/BAM/CRAM file and writes it into
multiple BAM files, each with an approximately equal number of reads.
Splits the input VCF file into two, one for indels and one for SNPs.
A set of String constants in which the name of the constant (minus the _SHORT_NAME suffix)
is the standard long Option name, and the value of the constant is the standard shortName.
Parser for tab-delimited files
Parse a tabbed text file in which columns are found by looking at a header line rather than by position.
Metrics class for the analysis of reads obtained from targeted pcr experiments e.g.
Calculates HS metrics for a given SAM or BAM file.
TargetMetrics, are metrics to measure how well we hit specific targets (or baits) when using a targeted sequencing process like hybrid selection
or Targeted PCR Techniques (TSCA).
TargetMetrics, are metrics to measure how well we hit specific targets (or baits) when using a targeted sequencing process like hybrid selection
or Targeted PCR Techniques (TSCA).
TargetMetrics, are metrics to measure how well we hit specific targets (or baits) when using a targeted sequencing process like hybrid selection
or Targeted PCR Techniques (TSCA).
A simple class that is used to store the coverage information about an interval.
For internal test purposes only.
Created by David Benjamin on 5/13/15.
TheoreticalSensitivityMetrics, are metrics calculated from TheoreticalSensitivity and parameters used in
the calculation.
This version of the thread pool executor will throw an exception if any of the internal jobs have throw exceptions
while executing
Represents a tile from TileMetricsOut.bin.
Load a file containing 8-byte records like this:
tile number: 4-byte int
number of clusters in tile: 4-byte int
Number of records to read is determined by reaching EOF.
Reads a TileMetricsOut file commonly found in the InterOp directory of an Illumina Run Folder.
Helper class which captures the combination of a lane, tile invalid input: '&' metric code
IlluminaPhasingMetrics corresponds to a single record in a TileMetricsOut file
Utility for reading the tile data from an Illumina run directory's TileMetricsOut.bin file
Captures information about a phasing value - Which read it corresponds to, which phasing type and a median value
Defines the first or second template read for a tile
Enum representation of a transition from one base to any other.
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are
defined as originating from a single fragment of DNA.
UmiGraph is used to identify UMIs that come from the same original source molecule.
Metrics that are calculated during the process of marking duplicates
within a stream of SAMRecords using the UmiAwareDuplicateSetIterator.
A utility class for dealing with unsigned types.
UnortedBasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data
from standard Illumina formats to specific output records (FASTA records/SAM records).
Takes a VCF file and a Sequence Dictionary (from a variety of file types) and updates the Sequence Dictionary in VCF.
This tool reports on the validity of a SAM or BAM file relative to the SAM format
specification.
Describes the functionality for an executor that manages the delegation of work to
VariantProcessor.Accumulator
s.A
VariantAccumulatorExecutor
that breaks down work into chunks described by the provided VariantIteratorProducer
and
spreads them over the indicated number of threads.Tools that evaluate and refine variant calls, e.g.
Interface for classes that can generate filters for VariantContexts.
Tools that filter variants
A mechanism for iterating over
CloseableIterator
of VariantContext
s in in some fashion, given VCF files and optionally
an interval list.Tools that manipulate variant call format (VCF) data
Describes an object that processes variants and produces a result.
Handles
VariantContext
s, and accumulates their data in some fashion internally.Generates instances of
VariantProcessor.Accumulator
s.Simple builder of
VariantProcessor
s.Takes a collection of results produced by
VariantProcessor.Accumulator.result()
and merges them into a single RESULT.Enum to hold the possible types of dbSnps.
Deprecated.
from 2022-03-17, Use VcfPathSegment
Deprecated.
from 2022-03-17, Use
VcfPathSegmentGenerator
Converts an ASCII VCF file to a binary BCF or vice versa.
Describes a segment of a particular VCF file.
Describes a mechanism for producing
VcfPathSegment
s from a VCF file path.A simple program to convert a Genotyping Arrays VCF to an ADPC file (Illumina intensity data file).
Converts a VCF or BCF file to a Picard Interval List.
Created by farjoun on 4/1/17.
Prints a SAM or BAM file to the screen.
Metrics for evaluating the performance of whole genome sequencing experiments.
Interface for processing data and generate result for CollectWgsMetrics
Implementation of
WgsMetricsProcessor
that gets input data from a given iterator
and processes it with a help of collector
CrosscheckFingerprints
instead.