Dataset: IODP360 - FPKM values
Deployment: IODP-360

View Data: For data, See Dataset Metadata Page: https://osprey.bco-dmo.org/dataset/812936

Principal Investigator:

Virginia P. Edgcomb (Woods Hole Oceanographic Institution, WHOI)

Contact:

Virginia P. Edgcomb (Woods Hole Oceanographic Institution, WHOI)

BCO-DMO Data Manager:

Karen Soenen (Woods Hole Oceanographic Institution, WHOI BCO-DMO)

Project:

Collaborative Research: Delineating The Microbial Diversity and Cross-domain Interactions in The Uncharted Subseafloor Lower Crust Using Meta-omics and Culturing Approaches (Subseafloor Lower Crust Microbiology)

Version:

Deployment Synonyms:

JOIDES Resolution, International Ocean Discovery Program (IODP) Expedition 360 (X360)

Expand/Collapse All

Description

Supplementary Table 4A: Metatranscriptome data summary for cellular activities presented and statistics on sequencing and removal of potential contaminant sequences: FPKM values. Samples taken on board of the R/V JOIDES Resolution between November 30, 2015 and January 30, 2016.

Methods & Sampling

Dataset acquisition description

Frozen rock material was crushed as above, and then ground quickly into a fine powder using a precooled sterilized mortar and pestle, and then RNA extraction started immediately. The jaw crusher was cleaned and rinsed with 70% ethanol and RNaseZap™ RNase Decontamination Solution (Invitrogen, USA) between samples. About 40 g of material was extracted for each sample using the RNeasy PowerSoil Total RNA Isolation Kit (Qiagen, USA) according to the manufacturer’s protocol with the following modifications.

Each sample was evenly divided into 8 Bead Tubes (Qiagen, USA) and then 2.5 mL of Bead Solution were added into the Bead Tube followed by 0.25 mL of Solution SR1 and 0.8 mL of Solution SR2. Bead Tubes were frozen in liquid nitrogen and then thawed at 65°C in a water bath three times. RNA was purified using the MEGAclear Transcription Clean-up Kit (Ambion, USA) and concentrated with an overnight isopropanol precipitation at 4 °C. Trace amounts of contaminating DNA were removed from the RNA extracts using TURBO DNA free™ (Invitrogen, USA) as directed by the manufacturer. To ensure DNA was removed thoroughly, each RNA extract was treated twice with TURBO DNase (Invitrogen, USA). A nested PCR reaction (2 x 35 cycles) using bacterial primers was used to confirm the absence of DNA in our RNA solutions. RNA was converted to cDNA using the Ovation® RNA-Seq System V2 kit (NuGEN, USA) according to the manufacturer’s protocol to preferentially prime non-rRNA sequences.

The cDNA was purified with the MinElute Reaction Cleanup Kit (Qiagen, USA) and eluted into 20 μL elution buffer. Extracts were quantified using a Qubit Fluorometer (Life Technologies, USA) and cDNAs were stored at -80 °C until sequencing using 150 bp paired-end Illumina NextSeq 550.

To control for potential contaminants introduced during drilling, sample handling, and laboratory kit reagents, we sequenced a number of control samples as above. Two samples controlled for potential nucleic acid contamination; a “method” control to monitor possible contamination from our laboratory extractions, which included ~ 40 g sterilized glass beads processed through the entire protocol in place of rock, and a “kit” control to account for any signal coming from trace contaminants in kit reagents, which received no addition. In addition, 3 more controls were extracted: a sample of the drilling mud (Sepiolite), and two drilling seawater samples collected during the first and third weeks of drilling. cDNA obtained from these controls were sequenced together with the rock samples and co-assembled. Trimmomatic (v. 0.32) was used to trim adapter sequences (leading=20, trailing=20, sliding window=04:24, minlen=50).

Paired reads were further quality checked and trimmed using FastQC (v. 0.11.7) and FASTX-toolkit (v. 0.014). Downstream analyses utilized paired reads. After co-assembling reads with Trinity (v. 2.4.0) from all controls (min length 150 bp), Bowtie2 (v. 2.3.4.1, 50) was used (with the parameter ‘un-conc’) to align all sample reads to this co-assembly. Reads that mapped to our control co-assembly allowing 1 mismatch were removed from further analysis (23.5-68.5% of sequences remained in sample data sets, see Supplementary Table 4). Trinity (v. 2.4.0) was used for de novo assembly of the remaining reads in sample data sets (min. length 150 bp). Bowtie aligner was used to align reads to assembled contigs, RSEM was used to estimate the expression level of these reads, and TMM was used to perform cross sample normalization and to generate a TMM-normalized expression matrix. Within the Trinotate suite, TransDecoder (v. 3.0.1) was used to identify coding regions within contigs and functional and taxonomic annotation was made 622 by BLASTx and BLASTp against UniProt, Swissprot (release 2018_02) and RefSeq non-redundant protein sequence (nr) databases (e-value threshold of 1e-5). BLASTp was used to look for sequence homologies with the same e-values. HMMER (v. 3.1b2) was used to identify conserved domains by searching against the Pfam (v31.0) database. SignalP (v. 4.1) and TMHMM (2.0c) were used to predict signal peptides and transmembrane domains. RNAMMER (v.1.2) was used to identify rRNA homologies of archaea, bacteria and eukaryotes.

Because the Swissprot database does not have extensive representation of protein sequences from environmental samples, particularly deep-sea and deep biosphere samples, annotations of contigs utilized for analyses of selected processes were manually cross checked by BLASTx against GenBank nr database. Aside from removing any reads that mapped well to our control co-assembly (1 mismatch), as an extra precaution, any sequence that exhibited ≥ 95% sequence identity over ≥ 80% of the sequence length to suspected contaminants (e.g., human pathogens, plants, or taxa known to be common molecular kit reagent contaminants, and not described from the marine environment) as in Salter et al. and Glassing et al. were removed.

This conservative approach potentially removed environmentally relevant data that were annotated to suspected contaminants due to poor taxonomic representation from environmental taxa in public databases, however it affords the highest possible confidence about any transcripts discussed. Additional functional annotations of contigs were obtained by BLAST against the KEGG, COG, SEED, and MetaCyc databases using MetaPathways (v. 2.0) to gain insights into particular cellularprocesses, and to provide overviews of metabolic functions across samples based on comparisons of FPKM-normalized data. All annotations were integrated into a SQLite database for further analysis.

Data Processing Description

Dataset Processing Description

BCO-DMO processing notes:

Reformatted table structure
Adjusted column header names to comply with database requirements
Replaced , with ; in data table to comply with database requirements

More information about this dataset deployment

Funding

Award Number	Funding Source
OCE-1658031	NSF Division of Ocean Sciences

Instruments

Automated DNA Sequencer

Supplied Name: Illumina NextSeq 550 platform

Supplied Description:

RNA sequencing was performed using the Illumina NextSeq 550 platform (Univ. of Georgia).v

Instrument Type

Generic Name: Automated DNA Sequencer

Acronym: Automated Sequencer

Generic Description:

General term for a laboratory instrument used for deciphering the order of bases in a strand of DNA. Sanger sequencers detect fluorescence from different dyes that are used to identify the A, C, G, and T extension reactions. Contemporary or Pyrosequencer methods are based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemoluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step.

Parameters

Supplied Name	Supplied description	Supplied Units	Standard Name
Cycle	Cycle of the biosynthetic pathway	unitless	no_bcodmo_term
Biosynthetic_pathway	Name of biosynthetic pathway	unitless	no_bcodmo_term
ID_2R1	FPKM values per pathway for sample 2R1	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_19R1	FPKM values per pathway for sample 19R1	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_26R2	FPKM values per pathway for sample 26R2	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_31R1	FPKM values per pathway for sample 31R1	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_42R2	FPKM values per pathway for sample 42R2	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_51R3	FPKM values per pathway for sample 51R3	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_62R1	FPKM values per pathway for sample 62R1	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_68R4	FPKM values per pathway for sample 68R4	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_71R1	FPKM values per pathway for sample 71R1	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_81R2	FPKM values per pathway for sample 81R2	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term
ID_84R6	FPKM values per pathway for sample 84R6	Fragments per Kilobase of transcript per Million mapped reads (FPKM)	no_bcodmo_term

Database

Contribute Data

Dataset: IODP360 - FPKM values
Deployment: IODP-360

Dataset acquisition description

Dataset Processing Description

Database

Contribute Data

Dataset: IODP360 - FPKM valuesDeployment: IODP-360

Dataset acquisition description

Dataset Processing Description

Dataset: IODP360 - FPKM values
Deployment: IODP-360