The initial assembly of Atribacteria bacterium SCGC AD- 561-N23 is publically available within the IMG system (taxon ID 2588254308) and the sequence for the 16S rRNA gene is available within the IMG system and Supplementary Materials. A detailed assembly procedure (QC.finalReport.pdf ) can be downloaded from: http://genome.jgi.doe.gov/CandivSCAD561N23/CandivSCAD561N23.download.html.
Briefly, single-cell amplified genomic (SAG) DNA was sequenced, assembled and annotated at the United States Department of Energy’s Joint Genome Institute (JGI) following their standard pipeline for Illumina HiSeq 2000 platform sequencing. Illumina reads were screened using JGI’s in-house DUK filtering program (Mingkun et al., unpublished). Trimmed reads were assembled using SPAdes (version 3.0.0) with the following parameters (–t 8 –m 40 – –sc – –careful – –12; Bankevich et al., 2012). Once released to Integrated Microbial Genomes (IMG) system, manual screening and removal of potential contaminate sequences according to JGI’s single cell data decontamination protocol (Clingenpeel, 2015). Scaffolds with GC contents that varied from the genome average more than 10% and clustered as a distinct group according to a kmer analysis (IMG, fragment window 5000 bp, fragment step 500 bp, oligomer size 5, minimum variation 10) were identified as potential contaminates and were removed from the de novo assembly (with the exception of scaffolds that contained ribosomal DNA). This screened genome was submitted to the IMG database as GOLD project Gp0087948, titled "Candidate division JS 1 bacterium SCGC AD-560-N23 (manually screened)". Gene annotations were performed using both IMG and the Rapid Annotation using Subsystem Technology (RAST) platforms (Aziz et al., 2008; Overbeek et al., 2013 ;Markowitz et al., 2014). Discrepancies between annotations were investigated by comparing coding sequences of genes against GenBank non-redundant protein sequence and Swiss- Prot Databases by BLASTP (Altschul et al., 1990). Genome completeness was estimated by comparing the annotated genome sequence against a list of conserved single copy bacterial genes.
This screened genome was submitted to the IMG database as GOLD project Gp0087948 with an IMG taxon ID of 2626541500, and to MG-RAST under accession numbers 4624791.3–4634830.3.