Class DiskBasedReadEndsForMarkDuplicatesMap

java.lang.Object
picard.sam.markduplicates.util.DiskBasedReadEndsForMarkDuplicatesMap
All Implemented Interfaces:
ReadEndsForMarkDuplicatesMap

public class DiskBasedReadEndsForMarkDuplicatesMap extends Object implements ReadEndsForMarkDuplicatesMap
Disk-based implementation of ReadEndsForMarkDuplicatesMap. A subdirectory of the system tmpdir is created to store files, one for each reference sequence. The reference sequence that is currently being queried (i.e. the sequence for which remove() has been most recently called) is stored in RAM. ReadEnds for all other sequences are stored on disk.

When put() is called for a sequence that is the current one in RAM, the ReadEnds object is merely put into the in-memory map. If put() is called for a sequence ID that is not the current RAM one, the ReadEnds object is appended to the file for that sequence, creating the file if necessary.

When remove() is called for a sequence that is the current one in RAM, remove() is called on the in-memory map. If remove() is called for a sequence other than the current RAM sequence, then the current RAM sequence is written to disk, the new sequence is read from disk into RAM map, and the file for the new sequence is deleted.

If things work properly, and reads are processed in genomic order, records will be written for mates that are in a later sequence. When the mate is reached in the input SAM file, the file that was written will be deleted. This should result in all temporary files being deleted by the time all the reads are processed. The temp directory is marked to be deleted on exit so everything should get cleaned up.

  • Constructor Details

    • DiskBasedReadEndsForMarkDuplicatesMap

      public DiskBasedReadEndsForMarkDuplicatesMap(int maxOpenFiles, ReadEndsForMarkDuplicatesCodec readEndsForMarkDuplicatesCodec)
  • Method Details

    • remove

      public ReadEndsForMarkDuplicates remove(int mateSequenceIndex, String key)
      Description copied from interface: ReadEndsForMarkDuplicatesMap
      Remove element with given key from the map. Because an implementation may be disk-based, the object returned may not be the same object that was put into the map
      Specified by:
      remove in interface ReadEndsForMarkDuplicatesMap
      Parameters:
      mateSequenceIndex - must agree with the value used when the object was put into the map
      key - typically, concatenation of read group ID and read name
      Returns:
      null if the key is not found, otherwise the object removed.
    • put

      public void put(int mateSequenceIndex, String key, ReadEndsForMarkDuplicates readEnds)
      Description copied from interface: ReadEndsForMarkDuplicatesMap
      Store the element in the map with the given key. It is assumed that the element is not already present in the map.
      Specified by:
      put in interface ReadEndsForMarkDuplicatesMap
      Parameters:
      mateSequenceIndex - use to optimize storage invalid input: '&' retrieval. The same value must be used when trying to remove this element. It is not valid to store the same key with two different mateSequenceIndexes.
      key - typically, concatenation of read group ID and read name
      readEnds - the object to be stored
    • size

      public int size()
      Specified by:
      size in interface ReadEndsForMarkDuplicatesMap
      Returns:
      number of elements stored in map
    • sizeInRam

      public int sizeInRam()
      Specified by:
      sizeInRam in interface ReadEndsForMarkDuplicatesMap
      Returns:
      number of elements stored in RAM. Always invalid input: '<'= size()