===================================================== Illumina BCL2FASTQ2 DEMUX DELIVERY OPTION FOR ZAP001: ===================================================== These are `.fastq.gz' demux'd files that are basically the exact same information as the QSEQ+64 files (including conforming to delivery spec.), but expressed in a FASTQ+33 form and then partitioned into (generally many) separate output piles based on index reads.  The index reads are partially folded (quality scores are lost) into read names and not included in this delivery option as separate files. The partitioning into output piles is according to how the index reads compare against a list of expected indices (or index pairs) contained in a run-specific, lane-specific ``sample sheet'' file. Illumina's BCL2FASTQ2 tool does not have much flexibility in terms of how it deals with mismatches to expected indices.  There will be a fifth delivery option soon that will demux with SJC's pipeline that is more flexible; for many lanes, this should result in additional (sometimes substantially more) successfully demux'd reads/read pairs compared to BCL2FASTQ2. 387G    L1  (smaller... e.g., better PhiX compression) 378G    L2  (smaller... e.g., better PhiX compression) 303G    L3  (smaller... e.g., better PhiX compression) 347G    L4  (smaller... e.g., better PhiX compression) 1.4T    total IluDmxFQsZAP001L1:Na3qH4QT8gw3 IluDmxFQsZAP001L2:dU8Er7Td8RD8 IluDmxFQsZAP001L3:MU3Qr6me8Wr7 IluDmxFQsZAP001L4:dG4XR4HW4Tv6 Example `rsync' commands to retrieve (replace `/tmp/.../' with the local directory you want to put data into) --- NOTE THAT THE SERVER HAS CHANGED (it is no longer `pan'): rsync -rtWi --stats --progress rsync://IluDmxFQsZAP001L1@quercus.pellegrini.mcdb.ucla.edu/IluDmxFQsZAP001L1/ /tmp/IluDmxFQsZAP001L1/ rsync -rtWi --stats --progress rsync://IluDmxFQsZAP001L2@quercus.pellegrini.mcdb.ucla.edu/IluDmxFQsZAP001L2/ /tmp/IluDmxFQsZAP001L2/ rsync -rtWi --stats --progress rsync://IluDmxFQsZAP001L3@quercus.pellegrini.mcdb.ucla.edu/IluDmxFQsZAP001L3/ /tmp/IluDmxFQsZAP001L3/ rsync -rtWi --stats --progress rsync://IluDmxFQsZAP001L4@quercus.pellegrini.mcdb.ucla.edu/IluDmxFQsZAP001L4/ /tmp/IluDmxFQsZAP001L4/ The DNS name `quercus.pellegrini.mcdb.ucla.edu' is temporary --- the new server will probably eventually end up being called something like `sequencing.stemcell.ucla.edu'. IMPORTANT NOTES FOR NovaSeq 6000 S4 VERSION 1 FLOWCELLS: * SEE ALL THE NOTES ALREADY SENT OUT FOR THE TRADITIONAL QSEQ DELIVERY OPTION, plus ALL THE NOTES ALREADY SENT OUT FOR THE NON-DEMUX BCL2FASTQ2 DELIVERY OPTION, plus the additional notes below: * See file `SJCsampleSheet-${RUN}L${LANE}.csv' for the actual sample sheet (list of target canonical indices/index pairs) used in input to Illumina BCL2FASTQ2.  Note that, e.g., sample names may have been adjusted so as not to use impermissible characters (only `A-Za-z0-9', hyphens, and underscores are allowed), and additional indices/index pairs may have been added to capture much of PhiX spike in or unexpected, empirically- discovered components present in substantial quantity.  Note that `Sample_ID' and `Sample_Name' have the same contents. Also note that if PhiX is of concern for you, that you should not rely on demux to isolate it, but run alignments to pull out such reads from each of the FASTQ outputs (at least until we have more experience on the NovaSeq 6000 platform). Successive rows in the sample sheet are what BCL2FASTQ2 calls ``Sample #1'', ``Sample #2', etc.  ``Sample #0'' is inserted at the front internally to BCL2FASTQ2 and represents demux failure (reads/read pairs not assignable to anything listed explicitly in the sample sheet under the mismatch rules in effect).  NOTE THAT IF YOUR SAMPLE NAMES ARE INTEGERS (or have integers embedded in them), THAT THIS NUMBERING IS COMPLETELY SEPARATE.  The BCL2FASTQ2 sample numbers get embedded into output filenames as `S<#>' (see below).  IN THE SAMPLE SHEET WE USE, LINES MAY NOT BE IN THE SAME ORDER AS SAMPLE SHEETS ORIGINALLY SENT TO US. * See `--barcode-mismatches ${BCmm}' near the top of file `BCL2FASTQ2.stderr.txt' for how many BCL2FASTQ2 index mismatches (`${BCmm}' = `0', `1', or `2') were allowed (applied separately by BCL2FASTQ2 for 1st and 2nd index when using dual indexing?). Choice and consequences of this parameter (which sometimes has to be suboptimally chosen so that Illumina BCL2FASTQ2 will run at all) will be discussed later, when the fifth delivery option (SJC pipeline demux) is ready. * Reads or read pairs for which demux fails go into file(s)   Undetermined_S0_L00${LANE}_R${END}_001.fastq.gz where `${END}' is `1'/`2' for 1st/2nd main end.  As with non-demux'd Illumina FASTQ delivery, the `_001' is a part number; if BCL2FASTQ2 decided to split output over multiple parts, the next part would have `_002', then `_003', etc. * Reads or read pairs that get assigned to a unique entry in the sample sheet go into file(s)   ${Sample_Name}_S<#>_L00${LANE}_R${END}_001.fastq.gz inside a directory named `${Sample_Project}'.  Again, the `_001' is a part number. * Entries inside these FASTQ files are almost the same as entries inside the already-described FASTQ files for the non-demux BCL2FASTQ2 delivery option.  The only change is that the final `0' in read names (described as ``index'' in that delivery option) is replaced with the actual `A'/ `C'/`G'/`T'/`N' discrete basecalls string for the actual index read(s) for that read/read pair --- for dual indices, it is the first index followed by a `+' character, followed by the second index.  Note that there are no `I1'/`I2' FASTQ files for the demux delivery option; the per-base quality scores for the indices are not available in this delivery option. * Each and every PF1 read/read pair goes into exactly one of the output FASTQ piles.  PF0 reads/read pairs are omitted for the reason described in earlier delivery options (such data no longer available in raw runfolder for cycles >= 26). * If you are interested in the number of reads/read pairs in each of the output FASTQs, you can look at plain text file `DemultiplexingStats.SJC-xformed.txt' having tab- separated columns (with first line having column titles). This is relevant parts of file `DemultiplexingStats.xml' (generated by BCL2FASTQ2) in the `Stats' directory, reformatted with XSLT `DemultiplexingStats.SJC-xform.xslt' stylesheet.  To apply the stylesheet, the `xsltproc' command line tool ( http://xmlsoft.org/XSLT/xsltproc2.html ) is used, with a command line such as   xsltproc -o DemultiplexingStats.SJC-xformed.txt DemultiplexingStats.SJC-xform.xslt Stats/DemultiplexingStats.xml Columns `ProjectName' and `SampleName' are as in the sample sheet (values `default'+`Undetermined' correspond to Sample #0), and column `BarcodeCount' gives the number of reads/read pairs placed into that output pile.  `0mm', `1mm', ..., `5mm' give the number of reads/read pairs with the stated number of index mismatches placed into that output pile (and the sum of these should equal BarcodeCount).