Samples were collected from Sites U1382, U1383, and U1384 during IODP Expedition 336.
In the home laboratory, rock samples were crushed in an ethanol- and UV-sterilized steel impact mortar and pestle (Chemplex, Palm City, FL, USA). The mass of rock used for DNA extraction ranged from 7-86 grams. Rock powders were split into 2 ml Lysing Matrix E tubes containing ceramic, silica, and glass beads (MP Biomedicals, Santa Ana, CA, USA). Each tube was filled with 978 ul sodium phosphate buffer and 122 ul MT buffer according to manufacturer protocols for the FastDNA Spin Kit (MP Biomedicals). The tubes were shaken in a FastPrep 24 instrument (MP Biomedicals) twice at a speed of 5.5 for 30 seconds to mechanically extract and homogenize DNA, and the DNA was removed according to manufacturer instructions. Replicate extracts of the same sample were combined and concentrated using an Eppendorf 5301 Vacufuge. To account for possible sample handling contamination of the low biomass samples, "blank" negative controls were also run through all steps as described above for the rock samples. Resultant DNA was quantified using the Qubit dsDNA HS Assay Kit on a Qubit 2.0 Fluorometer (Invitrogen, Carlsbad, CA, USA). The V4 hypervariable region of the 16S rRNA gene was amplified from DNA extracts by a commercial sequencing facility (Mr. DNA, Shallowater, TX) using the Illumina MiSeq platform. The 300bp × 2 kit was used with the Earth Microbiome Project primers (515f (5'-GTG CCA GCM GCC GCG GTA A) and 806r (5'-GGA CTA CHV GGG TWT CTA AT); (Caparaso et al. 2012)) to generate paired end reads. Illumina tag data were processed using mothur v.1.34.4 (Schloss et al. 2009) following the mothur Illumina MiSeq Standard Operating Procedure (Kozich et al. 2013). Briefly, paired end reads were joined into contigs, and any sequences with ambiguous base calls were removed. These were then aligned to the mothur-recreated SILVA SEED database from release v119 (Yarza et al. 2008). Sequences were then pre-clustered at the 1% dissimilarity level to mitigate the generation of spurious sequences, as recommended elsewhere (Kozich et al. 2013). Chimeras were screened with UCHIME using de novo mode (Edgar et al. 2011) and removed from further processing and analysis. Sequences were clustered into Operational Taxonomic Units (OTUs) at 3% sequence dissimilarity using the average neighbor method. A conservative OTU abundance cutoff threshold of 0.005% of total reads was used for filtering the full dataset before any downstream analysis, as previously suggested (Bokulich et al. 2013). The remaining filtered OTUs were classified using the SILVA Ribosomal 16S gene database (Quast et al. 2013). Closest environmental sequences to the OTUs were identified in the NCBI database using the BLAST algorithm (Altschul et al. 1997). OTUs recovered from the two protocol blanks, which may reflect contaminant DNA from the sample handling or sequencing steps, were removed from the dataset to provide the most conservative estimate of sequences from the deep biosphere, as has been done elsewhere (Inagaki et al. 2015).