=====================================================
Illumina BCL2FASTQ2 DEMUX DELIVERY OPTION FOR ZAP001:
=====================================================

These are `.fastq.gz' demux'd files that are basically
the exact same information as the QSEQ+64 files (including
conforming to delivery spec.), but expressed in a FASTQ+33
form and then partitioned into (generally many) separate
output piles based on index reads.  The index reads are
partially folded (quality scores are lost) into read names
and not included in this delivery option as separate files.
The partitioning into output piles is according to how the
index reads compare against a list of expected indices
(or index pairs) contained in a run-specific, lane-specific
``sample sheet'' file.

Illumina's BCL2FASTQ2 tool does not have much flexibility
in terms of how it deals with mismatches to expected
indices.  There will be a fifth delivery option soon
that will demux with SJC's pipeline that is more flexible;
for many lanes, this should result in additional
(sometimes substantially more) successfully demux'd
reads/read pairs compared to BCL2FASTQ2.

387G    L1  (smaller... e.g., better PhiX compression)
378G    L2  (smaller... e.g., better PhiX compression)
303G    L3  (smaller... e.g., better PhiX compression)
347G    L4  (smaller... e.g., better PhiX compression)
1.4T    total

IluDmxFQsZAP001L1:Na3qH4QT8gw3
IluDmxFQsZAP001L2:dU8Er7Td8RD8
IluDmxFQsZAP001L3:MU3Qr6me8Wr7
IluDmxFQsZAP001L4:dG4XR4HW4Tv6


Example `rsync' commands to retrieve (replace `/tmp/.../' with
the local directory you want to put data into) --- NOTE THAT
THE SERVER HAS CHANGED (it is no longer `pan'):

rsync -rtWi --stats --progress rsync://IluDmxFQsZAP001L1@quercus.pellegrini.mcdb.ucla.edu/IluDmxFQsZAP001L1/ /tmp/IluDmxFQsZAP001L1/

rsync -rtWi --stats --progress rsync://IluDmxFQsZAP001L2@quercus.pellegrini.mcdb.ucla.edu/IluDmxFQsZAP001L2/ /tmp/IluDmxFQsZAP001L2/

rsync -rtWi --stats --progress rsync://IluDmxFQsZAP001L3@quercus.pellegrini.mcdb.ucla.edu/IluDmxFQsZAP001L3/ /tmp/IluDmxFQsZAP001L3/

rsync -rtWi --stats --progress rsync://IluDmxFQsZAP001L4@quercus.pellegrini.mcdb.ucla.edu/IluDmxFQsZAP001L4/ /tmp/IluDmxFQsZAP001L4/

The DNS name `quercus.pellegrini.mcdb.ucla.edu' is temporary
--- the new server will probably eventually end up being called
something like `sequencing.stemcell.ucla.edu'.


IMPORTANT NOTES FOR NovaSeq 6000 S4 VERSION 1 FLOWCELLS:

* SEE ALL THE NOTES ALREADY SENT OUT FOR THE TRADITIONAL
QSEQ DELIVERY OPTION, plus ALL THE NOTES ALREADY SENT OUT
FOR THE NON-DEMUX BCL2FASTQ2 DELIVERY OPTION, plus the
additional notes below:

* See file `SJCsampleSheet-${RUN}L${LANE}.csv' for the actual
sample sheet (list of target canonical indices/index pairs)
used in input to Illumina BCL2FASTQ2.  Note that, e.g., sample
names may have been adjusted so as not to use impermissible
characters (only `A-Za-z0-9', hyphens, and underscores are
allowed), and additional indices/index pairs may have been added
to capture much of PhiX spike in or unexpected, empirically-
discovered components present in substantial quantity.  Note
that `Sample_ID' and `Sample_Name' have the same contents.
Also note that if PhiX is of concern for you, that you should
not rely on demux to isolate it, but run alignments to pull
out such reads from each of the FASTQ outputs (at least until
we have more experience on the NovaSeq 6000 platform).

Successive rows in the sample sheet are what BCL2FASTQ2 calls
``Sample #1'', ``Sample #2', etc.  ``Sample #0'' is inserted
at the front internally to BCL2FASTQ2 and represents demux
failure (reads/read pairs not assignable to anything listed
explicitly in the sample sheet under the mismatch rules in
effect).  NOTE THAT IF YOUR SAMPLE NAMES ARE INTEGERS (or have
integers embedded in them), THAT THIS NUMBERING IS COMPLETELY
SEPARATE.  The BCL2FASTQ2 sample numbers get embedded into
output filenames as `S<#>' (see below).  IN THE SAMPLE SHEET
WE USE, LINES MAY NOT BE IN THE SAME ORDER AS SAMPLE SHEETS
ORIGINALLY SENT TO US.

* See `--barcode-mismatches ${BCmm}' near the top of file
`BCL2FASTQ2.stderr.txt' for how many BCL2FASTQ2 index mismatches
(`${BCmm}' = `0', `1', or `2') were allowed (applied separately
by BCL2FASTQ2 for 1st and 2nd index when using dual indexing?).

Choice and consequences of this parameter (which sometimes
has to be suboptimally chosen so that Illumina BCL2FASTQ2
will run at all) will be discussed later, when the fifth
delivery option (SJC pipeline demux) is ready.


* Reads or read pairs for which demux fails go into file(s)

  Undetermined_S0_L00${LANE}_R${END}_001.fastq.gz

where `${END}' is `1'/`2' for 1st/2nd main end.  As with
non-demux'd Illumina FASTQ delivery, the `_001' is a part
number; if BCL2FASTQ2 decided to split output over multiple
parts, the next part would have `_002', then `_003', etc.

* Reads or read pairs that get assigned to a unique entry
in the sample sheet go into file(s)

  ${Sample_Name}_S<#>_L00${LANE}_R${END}_001.fastq.gz

inside a directory named `${Sample_Project}'.  Again, the
`_001' is a part number.

* Entries inside these FASTQ files are almost the same as
entries inside the already-described FASTQ files for the
non-demux BCL2FASTQ2 delivery option.  The only change is
that the final `0' in read names (described as ``index''
in that delivery option) is replaced with the actual `A'/
`C'/`G'/`T'/`N' discrete basecalls string for the actual
index read(s) for that read/read pair --- for dual indices,
it is the first index followed by a `+' character, followed
by the second index.  Note that there are no `I1'/`I2'
FASTQ files for the demux delivery option; the per-base
quality scores for the indices are not available in this
delivery option.

* Each and every PF1 read/read pair goes into exactly one
of the output FASTQ piles.  PF0 reads/read pairs are omitted
for the reason described in earlier delivery options (such
data no longer available in raw runfolder for cycles >= 26).


* If you are interested in the number of reads/read pairs
in each of the output FASTQs, you can look at plain text
file `DemultiplexingStats.SJC-xformed.txt' having tab-
separated columns (with first line having column titles).
This is relevant parts of file `DemultiplexingStats.xml'
(generated by BCL2FASTQ2) in the `Stats' directory,
reformatted with XSLT `DemultiplexingStats.SJC-xform.xslt'
stylesheet.  To apply the stylesheet, the `xsltproc' command
line tool ( http://xmlsoft.org/XSLT/xsltproc2.html ) is used,
with a command line such as

  xsltproc -o DemultiplexingStats.SJC-xformed.txt DemultiplexingStats.SJC-xform.xslt Stats/DemultiplexingStats.xml

Columns `ProjectName' and `SampleName' are as in the
sample sheet (values `default'+`Undetermined' correspond
to Sample #0), and column `BarcodeCount' gives the number
of reads/read pairs placed into that output pile.  `0mm',
`1mm', ..., `5mm' give the number of reads/read pairs
with the stated number of index mismatches placed into that
output pile (and the sum of these should equal BarcodeCount).