Package picard.illumina.parser
Class MultiTileBclParser
java.lang.Object
picard.illumina.parser.MultiTileBclParser
Parse .bcl.bgzf files that contain multiple tiles in a single file. This requires an index file that tells
the bgzf virtual file offset of the start of each tile in the block-compressed bcl file.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final BclQualityEvaluationStrategy
protected int
The current tile numberstatic final byte
-
Constructor Summary
ConstructorsConstructorDescriptionMultiTileBclParser
(File directory, int lane, picard.illumina.parser.CycleIlluminaFileMap tilesToCycleFiles, OutputMapping outputMapping, boolean applyEamssFilter, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, TileIndex tileIndex) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
int
Returns the tile of the next cluster that will be returned by PerTilePerCycleParser and therefore should be called before next() if you want to know the tile for the data returned by next()boolean
hasNext()
void
protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser
<BclData> makeCycleFileParser
(List<File> files) Create a Bcl parser for an individual cycle and wrap it with the CycleFilesParser interface which populates the correct cycle in BclData.protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser
<BclData> makeCycleFileParser
(List<File> files, picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> cycleFilesParser) For a given cycle, return a CycleFilesParser.next()
Return the data for the next cluster by: 1.void
remove()
protected static void
runEamssForReadInPlace
(byte[] bases, byte[] qualities) EAMSS is an Illumina Developed Algorithm for detecting reads whose quality has deteriorated towards their end and revising the quality to the masking quality (2) if this is the case.void
seekToTile
(int tile) Clear the current set of cycleFileParsers and replace them with the ones for the tile indicated by oneBasedTileNumbervoid
verifyData
(List<Integer> tiles, int[] cycles) Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface java.util.Iterator
forEachRemaining
-
Field Details
-
MASKING_QUALITY
public static final byte MASKING_QUALITY- See Also:
-
bclQualityEvaluationStrategy
-
currentTile
protected int currentTileThe current tile number
-
-
Constructor Details
-
MultiTileBclParser
public MultiTileBclParser(File directory, int lane, picard.illumina.parser.CycleIlluminaFileMap tilesToCycleFiles, OutputMapping outputMapping, boolean applyEamssFilter, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, TileIndex tileIndex)
-
-
Method Details
-
initialize
public void initialize() -
makeCycleFileParser
protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> makeCycleFileParser(List<File> files, picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> cycleFilesParser) For a given cycle, return a CycleFilesParser. It will close the cycleFilesParser if not null.- Parameters:
files
- The file to parsecycleFilesParser
- The previous cycle file parser, null otherwise.- Returns:
- A CycleFilesParser that will populate the correct position in the IlluminaData object with that cycle's data.
-
makeCycleFileParser
protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> makeCycleFileParser(List<File> files) Create a Bcl parser for an individual cycle and wrap it with the CycleFilesParser interface which populates the correct cycle in BclData.- Parameters:
files
- The files to parse.- Returns:
- A CycleFilesParser that populates a BclData object with data for a single cycle
-
supportedTypes
-
next
Return the data for the next cluster by: 1. Advancing tiles if we reached the end of the current tile. 2. For each cycle, get the appropriate parser and have it populate it's data into the IlluminaData object. -
runEamssForReadInPlace
protected static void runEamssForReadInPlace(byte[] bases, byte[] qualities) EAMSS is an Illumina Developed Algorithm for detecting reads whose quality has deteriorated towards their end and revising the quality to the masking quality (2) if this is the case. This algorithm works as follows (with one exception): Start at the end (high indices, at the right below) of the read and calculate an EAMSS tally at each location as follow: if(quality[i] invalid input: '<' 15) tally += 1 if(quality[i] >= 15 and invalid input: '<' 30) tally = tally if(quality[i] >= 30) tally -= 2 For each location, keep track of this tally (e.g.) Read Starts at invalid input: '<'- this end Cycle: 1 2 3 4 5 6 7 8 9 Bases: A C T G G G T C A Qualities: 32 32 16 15 8 10 32 2 2 Cycle Score: -2 -2 0 0 1 1 -2 1 1 //The EAMSS Score determined for this cycle alone EAMSS TALLY: 0 0 2 2 2 1 0 2 1 X - Earliest instance of Max-Score You must keep track of the maximum EAMSS tally (in this case 2) and the earliest(lowest) cycle at which it occurs. If and only if, the max EAMSS tally >= 1 then from there until the end(highest cycle) of the read reassign these qualities as 2 (the masking quality). The output qualities would therefore be transformed from: Original Qualities: 32 32 16 15 8 10 32 2 2 to Final Qualities: 32 32 2 2 2 2 2 2 2 X - Earliest instance of max-tally/end of masking IMPORTANT: The one exception is: If the max EAMSS Tally is preceded by a long string of G basecalls (10 or more, with a single basecall exception per10 bases) then the masking continues to the beginning of that string of G's. E.g.: Cycle: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Bases: C T A C A G A G G G G G G G G C A T Qualities: 30 22 26 27 28 30 7 34 20 19 38 15 32 32 10 4 2 5 Cycle Score: -2 0 0 0 0 -2 1 -2 0 0 -2 0 -2 -2 1 1 1 1 EAMSS TALLY: -2 -5 -5 -5 -5 -5 -3 -4 -2 -2 -2 0 0 2 4 3 2 1 X- Earliest instance of Max-Tally Resulting Transformation: Bases: C T A C A G A G G G G G G G G C A T Original Qualities: 30 22 26 27 28 30 7 34 20 19 38 15 32 32 10 4 2 5 Final Qualities: 30 22 26 27 28 2 2 2 2 2 2 2 2 2 2 2 2 2 X- Earliest instance of Max-Tally X - Start of EAMSS masking due to G-Run To further clarify the exception rule here are a few examples: A C G A C G G G G G G G G G G G G G G G G G G G G A C T X - Earliest instance of Max-Tally X - Start of EAMSS masking (with a two base call jump because we have 20 bases in the run already) T T G G A G G G G G G G G G G G G G G G G G G A G A C T X - Earliest instance of Max-Tally X - We can skip this A as well as the earlier A because we have 20 or more bases in the run already X - Start of EAMSS masking (with a two base call jump because we have 20 bases in the run) T T G G G A A G G G G G G G G G G G G G G G G G G T T A T X - Earliest instance of Max-Tally X X - WE can skip these bases because the first A counts as the first skip and as far as the length of the string of G's is concerned, these are both counted like G's X - This A is the 20th base in the string of G's and therefore can be skipped X - Note that the A's previous to the G's are only included because there are G's further on that are within the number of allowable exceptions away (i.e. 2 in this instance), if there were NO G's after the A's you CANNOT count the A's as part of the G strings (even if no exceptions have previously occured) In other words, the end of the string of G's MUST end in a G not an "exception" However, if the max-tally occurs to the right of the run of Gs then this is still part of the string of G's but does count towards the number of exceptions allowable (e.g.) T T G G G G G G G G G G A C G X - Earliest instance of Max-tally The first index CAN be considered as an exception, the above would be masked to the following point: T T G G G G G G G G G G A C G X - End of EAMSS masking due to G-Run To sum up the final points, a string of G's CAN START with an exception but CANNOT END in an exception.- Parameters:
bases
- Bases for a single read in the cluster ( not the entire cluster )qualities
- Qualities for a single read in the cluster ( not the entire cluster )
-
seekToTile
public void seekToTile(int tile) Clear the current set of cycleFileParsers and replace them with the ones for the tile indicated by oneBasedTileNumber- Parameters:
tile
- requested tile
-
hasNext
public boolean hasNext() -
getTileOfNextCluster
public int getTileOfNextCluster()Returns the tile of the next cluster that will be returned by PerTilePerCycleParser and therefore should be called before next() if you want to know the tile for the data returned by next()- Returns:
- The tile number of the next ILLUMINA_DATA object to be returned
-
verifyData
-
remove
public void remove() -
close
public void close()
-