Class MultiTileBclParser

java.lang.Object
picard.illumina.parser.MultiTileBclParser
All Implemented Interfaces:
Iterator<BclData>

public class MultiTileBclParser extends Object
Parse .bcl.bgzf files that contain multiple tiles in a single file. This requires an index file that tells the bgzf virtual file offset of the start of each tile in the block-compressed bcl file.
  • Field Details

    • MASKING_QUALITY

      public static final byte MASKING_QUALITY
      See Also:
    • bclQualityEvaluationStrategy

      protected final BclQualityEvaluationStrategy bclQualityEvaluationStrategy
    • currentTile

      protected int currentTile
      The current tile number
  • Constructor Details

    • MultiTileBclParser

      public MultiTileBclParser(File directory, int lane, picard.illumina.parser.CycleIlluminaFileMap tilesToCycleFiles, OutputMapping outputMapping, boolean applyEamssFilter, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, TileIndex tileIndex)
  • Method Details

    • initialize

      public void initialize()
    • makeCycleFileParser

      protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> makeCycleFileParser(List<File> files, picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> cycleFilesParser)
      For a given cycle, return a CycleFilesParser. It will close the cycleFilesParser if not null.
      Parameters:
      files - The file to parse
      cycleFilesParser - The previous cycle file parser, null otherwise.
      Returns:
      A CycleFilesParser that will populate the correct position in the IlluminaData object with that cycle's data.
    • makeCycleFileParser

      protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> makeCycleFileParser(List<File> files)
      Create a Bcl parser for an individual cycle and wrap it with the CycleFilesParser interface which populates the correct cycle in BclData.
      Parameters:
      files - The files to parse.
      Returns:
      A CycleFilesParser that populates a BclData object with data for a single cycle
    • supportedTypes

      public Set<IlluminaDataType> supportedTypes()
    • next

      public BclData next()
      Return the data for the next cluster by: 1. Advancing tiles if we reached the end of the current tile. 2. For each cycle, get the appropriate parser and have it populate it's data into the IlluminaData object.
      Specified by:
      next in interface Iterator<BclData>
      Returns:
      The IlluminaData object for the next cluster
    • runEamssForReadInPlace

      protected static void runEamssForReadInPlace(byte[] bases, byte[] qualities)
      EAMSS is an Illumina Developed Algorithm for detecting reads whose quality has deteriorated towards their end and revising the quality to the masking quality (2) if this is the case. This algorithm works as follows (with one exception):

      Start at the end (high indices, at the right below) of the read and calculate an EAMSS tally at each location as follow: if(quality[i] invalid input: '<' 15) tally += 1 if(quality[i] >= 15 and invalid input: '<' 30) tally = tally if(quality[i] >= 30) tally -= 2

      For each location, keep track of this tally (e.g.) Read Starts at invalid input: '<'- this end Cycle: 1 2 3 4 5 6 7 8 9 Bases: A C T G G G T C A Qualities: 32 32 16 15 8 10 32 2 2 Cycle Score: -2 -2 0 0 1 1 -2 1 1 //The EAMSS Score determined for this cycle alone EAMSS TALLY: 0 0 2 2 2 1 0 2 1 X - Earliest instance of Max-Score

      You must keep track of the maximum EAMSS tally (in this case 2) and the earliest(lowest) cycle at which it occurs. If and only if, the max EAMSS tally >= 1 then from there until the end(highest cycle) of the read reassign these qualities as 2 (the masking quality). The output qualities would therefore be transformed from:

      Original Qualities: 32 32 16 15 8 10 32 2 2 to Final Qualities: 32 32 2 2 2 2 2 2 2 X - Earliest instance of max-tally/end of masking

      IMPORTANT: The one exception is: If the max EAMSS Tally is preceded by a long string of G basecalls (10 or more, with a single basecall exception per10 bases) then the masking continues to the beginning of that string of G's. E.g.:

      Cycle: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Bases: C T A C A G A G G G G G G G G C A T Qualities: 30 22 26 27 28 30 7 34 20 19 38 15 32 32 10 4 2 5 Cycle Score: -2 0 0 0 0 -2 1 -2 0 0 -2 0 -2 -2 1 1 1 1 EAMSS TALLY: -2 -5 -5 -5 -5 -5 -3 -4 -2 -2 -2 0 0 2 4 3 2 1 X- Earliest instance of Max-Tally

      Resulting Transformation: Bases: C T A C A G A G G G G G G G G C A T Original Qualities: 30 22 26 27 28 30 7 34 20 19 38 15 32 32 10 4 2 5 Final Qualities: 30 22 26 27 28 2 2 2 2 2 2 2 2 2 2 2 2 2 X- Earliest instance of Max-Tally X - Start of EAMSS masking due to G-Run

      To further clarify the exception rule here are a few examples: A C G A C G G G G G G G G G G G G G G G G G G G G A C T X - Earliest instance of Max-Tally X - Start of EAMSS masking (with a two base call jump because we have 20 bases in the run already)

      T T G G A G G G G G G G G G G G G G G G G G G A G A C T X - Earliest instance of Max-Tally X - We can skip this A as well as the earlier A because we have 20 or more bases in the run already X - Start of EAMSS masking (with a two base call jump because we have 20 bases in the run)

      T T G G G A A G G G G G G G G G G G G G G G G G G T T A T X - Earliest instance of Max-Tally X X - WE can skip these bases because the first A counts as the first skip and as far as the length of the string of G's is concerned, these are both counted like G's X - This A is the 20th base in the string of G's and therefore can be skipped X - Note that the A's previous to the G's are only included because there are G's further on that are within the number of allowable exceptions away (i.e. 2 in this instance), if there were NO G's after the A's you CANNOT count the A's as part of the G strings (even if no exceptions have previously occured) In other words, the end of the string of G's MUST end in a G not an "exception"

      However, if the max-tally occurs to the right of the run of Gs then this is still part of the string of G's but does count towards the number of exceptions allowable (e.g.) T T G G G G G G G G G G A C G X - Earliest instance of Max-tally The first index CAN be considered as an exception, the above would be masked to the following point: T T G G G G G G G G G G A C G X - End of EAMSS masking due to G-Run

      To sum up the final points, a string of G's CAN START with an exception but CANNOT END in an exception.

      Parameters:
      bases - Bases for a single read in the cluster ( not the entire cluster )
      qualities - Qualities for a single read in the cluster ( not the entire cluster )
    • seekToTile

      public void seekToTile(int tile)
      Clear the current set of cycleFileParsers and replace them with the ones for the tile indicated by oneBasedTileNumber
      Parameters:
      tile - requested tile
    • hasNext

      public boolean hasNext()
      Specified by:
      hasNext in interface Iterator<ILLUMINA_DATA extends picard.illumina.parser.IlluminaData>
    • getTileOfNextCluster

      public int getTileOfNextCluster()
      Returns the tile of the next cluster that will be returned by PerTilePerCycleParser and therefore should be called before next() if you want to know the tile for the data returned by next()
      Returns:
      The tile number of the next ILLUMINA_DATA object to be returned
    • verifyData

      public void verifyData(List<Integer> tiles, int[] cycles)
    • remove

      public void remove()
      Specified by:
      remove in interface Iterator<ILLUMINA_DATA extends picard.illumina.parser.IlluminaData>
    • close

      public void close()