Dataset: AE1913 Protein Spectral Counts
Deployment: AE1913

View Data: For data, See Dataset Metadata Page: https://osprey.bco-dmo.org/dataset/934706

Principal Investigator:

Mak A. Saito (Woods Hole Oceanographic Institution, WHOI)

Scientist:

Natalie Cohen (Woods Hole Oceanographic Institution, WHOI)

BCO-DMO Data Manager:

Amber D. York (Woods Hole Oceanographic Institution, WHOI BCO-DMO)

Project:

Collaborative Research: Direct Characterization of Adaptive Nutrient Stress Responses in the Sargasso Sea using Protein Biomarkers and a Biogeochemical AUV (Nutrient Stress Responses and AUV Clio)

Version:

Deployment Synonyms:

Clio-BATS-WHOI, AE1913

Expand/Collapse All

Description

Related data table and dataset descriptions:

The primary data table for this dataset is provided under the "Data Files" section and contains total protein spectral counts while the table under "Supplemental Files" provides the exclusive protein spectral counts.

Total spectral counts refer to the total number of spectra with peptide to spectrum matches (PSMs) that matches to each entry within the FASTA sequence database. This approach allows each peptide to map to multiple closely related sequences. In contrast, with exclusive spectral counts each peptide is only allowed to map to one sequence within the FASTA database, and when a peptide is found in multiple database sequences the one with the most peptides mapping (parsimony) to it is selected. There are pros and cons to each approach, where total spectral counts will double count peptides when two similar proteins are compared, and exclusive spectral counts will underrepresent less abundant proteins with shared peptides, favoring the most homolog with the most shared peptides. Considering protein groups with shared peptides or focusing on peptide-level analyses are alternative approaches that could be constructed from these results.

See "Related Datasets" section for:
* "AE1913 Peptide Spectral Counts" which includes the individual peptides associated with these proteins (includes total spectral counts for each peptide).
* "AE1913 Protein Identification FASTA"

CTD and other data from the same cruise are listed on deployment page AE1913: https://www.bco-dmo.org/deployment/916412

These data will become part of the Ocean Protein Portal (https://proteinportal.whoi.edu/; Saito et al., 2020).

The assembly, annotations, metatranscriptomic assembly products, the same exclusive protein spectral counts, and other useful information associated with this multi-omic analysis was published as a package at Zenodo (doi: 10.5281/zenodo.8287779).

Methods & Sampling

Dataset acquisition description

Methods are reported in Cohen et al. 2023 (biorxiv preprint doi: 10.1101/2023.11.20.567900) and are summarized below.
* This section describes how this and related datasets were generated (see "Related Datasets" section).

One half of the 142 mm filters (0.2-51 μm) collected by Clio were processed for metaproteomics. Proteins were extracted in an 1% SDS-based detergent in 50 mM HEPES at pH 8.5, reduced with dithiothreitol, alkylated with iodoacetamide, and purified using a polyacrylamide electrophoresis tube gel method. Protein quantification was performed using a BSA assay. Trypsin was added to the protein-bead mixture in a 1:20 trypsin:protein ratio. Peptides were purified using C18 tips and diluted to a concentration of 0.1 μg μL−1.

Approximately 2-5 µg of purified peptides were injected onto a Dionex UltiMate 3000 RSLCnano LC system with an additional RSLCnano pump, run in online 2D active modulation mode interfaced with a Thermo Fusion mass spectrometer. The mass spectrometer acquired MS1 scans from 380 to 1,580 m/z at 240K resolution in the Orbitrap. MS2 were collected in data dependent mode in the ion trap with a cycle time of 2 seconds between scans and acquisition of charge states 2 to 10. MS2 scans had 1.6 m/z isolation window, 50 ms maximum injection time and 5 s dynamic exclusion time.

Note: This dataset contains two different missing data identifiers "NA" and "-". If there were partial matches to the functional annotation database, the missing ones were denoted with "-". If there were no matches at all, when the data frames were merged, the empty columns were denoted with "NA".

example lines in opp_TOTAL_spectralcounts.csv

"6","megahit_HN001_k141_101642.p1","-","-","-","SBP_bac_1,SBP_bac_8"...
vs
"4","megahit_HN001_k141_100671.p1",NA,NA,NA,NA,NA,NA,"X1_30_0.2"...

Data Processing Description

Dataset Processing Description

The metatranscriptomic ORFs were used as the protein database, and peptide-spectrum matches were performed using Sequest algorithm within IseNode Proteome Discoverer 2.2.0.388 with a parent ion tolerance of 10 ppm and fragment tolerance of 0.6 Da, and 0 max missed cleavage. Identification criteria consisted of a peptide threshold of 98% (false discovery rate [FDR] = 0.1%) and protein threshold of 99% (1 peptide minimum, FDR = 1.5%) in Scaffold 5.1.2 (Proteome Software) resulting in 77,438 proteins and 3,155,061 exclusive spectral counts.

More information about this dataset deployment

Funding

Award Number	Funding Source
OCE-1658030	NSF Division of Ocean Sciences

Instruments

AUV Clio

Supplied Name:

Supplied Description:

Instrument Type

Generic Name: AUV Clio

Generic Description:

Clio is an autonomous underwater vehicle (AUV) created to accomplish the dual goals of global ocean mapping and biochemistry sampling. The ability to sample dissolved and particulate seawater biochemistry across ocean basins while capturing fine-scale biogeochemical processes sets it apart from other AUVs. Clio is designed to efficiently and precisely move vertically through the ocean, drift laterally to observe water masses, and integrate with research vessel operations to map large horizontal scales up to a depth of 6,000 meters. More information is available at https://www2.whoi.edu/site/deepsubmergencelab/clio/

Mass Spectrometer

Supplied Name: Thermo Fusion mass spectrometer

Supplied Description:

Instrument Type

Generic Name: Mass Spectrometer

Acronym: Mass Spec

Community Identifier: http://vocab.nerc.ac.uk/collection/L05/current/LAB16/

Generic Description:

General term for instruments used to measure the mass-to-charge ratio of ions; generally used to find the composition of a sample by generating a mass spectrum representing the masses of sample components.

Ultra high-performance liquid chromatography

Supplied Name: Dionex UltiMate 3000 RSLCnano LC system

Supplied Description:

Instrument Type

Generic Name: Ultra high-performance liquid chromatography

Acronym: UHPLC

Generic Description:

Ultra high-performance liquid chromatography: Column chromatography where the mobile phase is a liquid, the stationary phase consists of very small (< 2 microm) particles and the inlet pressure is relatively high.

Parameters

Supplied Name	Supplied description	Supplied Units	Standard Name
row_id	sequential row identifier	unitless	no_bcodmo_term
protein_id	Protein identifier. Uniquely identifies a protein within the dataset and FASTA file	unitless	protein_ID
kegg_id	Kegg identifier	unitless	accession_number
enzyme_comm_id	Enzyme Commission identifer	unitless	accession_number
protein_name	Protein descriptive name	unitless	sample_descrip
pfams_id	Protein family ID number	unitless	accession_number
supergroup	Supergroup	unitless	sample_descrip
classification	Classification	unitless	sample_descrip
sample_id	Identifies the sample associated with this annotation	unitless	sample
spectral_count	Spectral count	unitless	spectral_counts
cruise_id	Cruise identifier	unitless	cruise_id
station_id	Station identifier where sample was taken	unitless	station
depth_m	The depth in meters at which the sample as taken	meters	depth
minimum_filter_size_microns	Minimum size of the collection filter	microns (um)	filter_size
maximum_filter_size_microns	Maximum size of the collection filter	microns (um)	filter_size
date_y_m_d	The date of sample collection	unitless	date
latitude_dd	The latitude at the station in decimal degrees (-90 to 90)	decimal degrees	lat
longitude_dd	The longitude at the station in decimal degrees (-180 to 180)	decimal degrees	lon

Database

Contribute Data

Dataset: AE1913 Protein Spectral Counts
Deployment: AE1913

Dataset acquisition description

Dataset Processing Description

Database

Contribute Data

Dataset: AE1913 Protein Spectral CountsDeployment: AE1913

Dataset acquisition description

Dataset Processing Description

Dataset: AE1913 Protein Spectral Counts
Deployment: AE1913