The GenBank BioProject accession number for this data is PRJNA350692. All samples were obtained from station 136 (106.543W, 17.043N; cast 136) or station BB2 (107.148W, 16.527N; cast 141) on cruise TN278 between April 8-9 2012.
Water from all samples was obtained from Niskin bottles on the CTD Rosette. At station 136 we sampled 10 depths including the oxycline and anoxic zones. Two liters of Niskin water were vacuum filtered onto a 0.2 µm SUPOR filter. At station BB2, a nearby station approximately four liters from 100m, 120m, and 150m were each prefiltered through >30 µm filters and subsequently filtered onto 0.2 µm SUPOR filters. All filters were stored at -80C until further processing on shore. The GenBank BioSample accession numbers are SAMN05944799-SAMN05944812.
DNA was extracted from filters using freeze thaw followed by incubation with lysozyme and proteinase K and phenol/chloroform extraction. A Rubicon THRUPLEX kit was used for library prep using 50 ng of DNA per sample. Four libraries were sequenced on an Illumina HiSeq 2500 in rapid mode (25 million 150 bp paired-end reads per sample) at Michigan State. The other 10 libraries were sequenced on an Illumina HiSeq 2500 in high output mode (40-70 million 125 bp paired-end reads per sample) at the University of Utah. Sequences were quality checked, trimmed, and remaining adapter sequences were removed using Trimmomatic (Bolger et al., 2014). Paired reads that overlapped were combined with Flash (Magoc and Salzberg, 2011). FASTQ files of the raw sequence reads are deposited in the Sequence Read Archive under study SRP092212 and the individual file accession numbers SRS1765745-SRS1765758.
Metagenomic sequences from each sample were assembled independently into larger contigs using two approaches. They have been deposited in the Whole Genome Shotgun archive under the accession numbers:
MOXS00000000
MPLT00000000
MPLU00000000
MPLV00000000
MPLW00000000
MPLX00000000
MPLY00000000
MPLZ00000000
MPLZ00000000
MPMB00000000
MPMC00000000
MPMD00000000
MPME00000000
MPMF00000000
First, for de novo assembly we pre-processed reads with the khmer software package (Crusoe et al. 2015), first using normalize-by-median which implements a Digital normalization algorithm (Brown et al., 2012) to reduce high coverage reads to 20x, followed by filter-abund.py to trim reads of kmers with an abundance below 2, and finally we used filter-below-abund.py to trim kmers with counts above 50 (Zhang et al. 2015). We assembled the khmer processed reads with the VELVET (1.2.10) assembler (Zerbino, 2010), using a kmer size of 45. Velvet assemblies have been released as version 1 in Whole Genome Shotgun archive (accession numbers XXXX01000000).
Second, we also generated a second independent de-novo assembly for each sample using MEGAHIT v1.1.2 (Li et al., 2015) in paired end mode with a minimum contig length of 500. MEGAHIT assemblies have been released as version 2 in the Whole Genome Shotgun archive (accession numbers XXXX02000000).