A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/marbl/canu/issues/1806 below:

Bogart failure in Canu 2.1

Hello,
I am performing a metagenome assembly with PacBio HiFi data and have run into a repeatable error. I am using canu 2.1 with:
canu -d STD -p Zymo6331-STD -pacbio-hifi Zymo6331-STD.fastq genomeSize=100m maxInputCoverage=1000 batMemory=200

I am running on SGE with the same configuration I have used to run other assemblies successfully (gridEngineResourceOption="-pe smp THREADS -l mem_free=MEMORY" and gridOptions="-V -S /bin/bash -q bigmem").

I am working with three samples, each run using the same set of arguments as above (but with different -d, -p, and fastq names). Two finished without any issues. For one sample, the run eventually fails. I scanned the error message and it suggested trying to run it again. I did, and it produced the same error again. It suggests the issue occurs when running Bogart:

Found perl:
   /bin/perl
   This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi

Found java:
   /mnt/software/j/jre/1.8.0_144/bin/java
   java version "1.8.0_144"

Found canu:
   /mnt/software/c/canu/2.1/bin/canu
   canu 2.1

-- canu 2.1
--
-- CITATIONS
--
-- For assemblies of PacBio HiFi reads:
--   Nurk S, Walenz BP, Rhiea A, Vollger MR, Logsdon GA, Grothe R, Miga KH, Eichler EE, Phillippy AM, Koren S.
--   HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.
--   biorXiv. 2020.
--   https://doi.org/10.1101/2020.03.14.992248
-- 
-- Read and contig alignments during correction and consensus use:
--   Šošic M, Šikic M.
--   Edlib: a C/C ++ library for fast, exact sequence alignment using edit distance.
--   Bioinformatics. 2017 May 1;33(9):1394-1395.
--   http://doi.org/10.1093/bioinformatics/btw753
-- 
-- Overlaps are generated using:
--   Berlin K, et al.
--   Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.
--   Nat Biotechnol. 2015 Jun;33(6):623-30.
--   http://doi.org/10.1038/nbt.3238
-- 
--   Myers EW, et al.
--   A Whole-Genome Assembly of Drosophila.
--   Science. 2000 Mar 24;287(5461):2196-204.
--   http://doi.org/10.1126/science.287.5461.2196
-- 
-- Contig consensus sequences are generated using an algorithm derived from pbdagcon:
--   Chin CS, et al.
--   Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
--   Nat Methods. 2013 Jun;10(6):563-9
--   http://doi.org/10.1038/nmeth.2474
-- 
-- CONFIGURE CANU
--
-- Detected Java(TM) Runtime Environment '1.8.0_144' (from '/mnt/software/j/jre/1.8.0_144/bin/java') with -d64 support.
-- Detected gnuplot version '4.6 patchlevel 2   ' (from 'gnuplot') and image format 'png'.
-- Detected 72 CPUs and 504 gigabytes of memory.
-- Detected Sun Grid Engine in '/usr/share/gridengine/default'.
-- User supplied Parallel Environment 'smp'.
-- User supplied Memory Resource      'mem_free'.
-- 
-- Found  17 hosts with  80 cores and  754 GB memory under Sun Grid Engine control.
-- Found   1 host  with  48 cores and  440 GB memory under Sun Grid Engine control.
-- Found   4 hosts with  24 cores and   47 GB memory under Sun Grid Engine control.
-- Found  18 hosts with  48 cores and  251 GB memory under Sun Grid Engine control.
-- Found  14 hosts with  16 cores and   46 GB memory under Sun Grid Engine control.
-- Found  14 hosts with  80 cores and  755 GB memory under Sun Grid Engine control.
-- Found   1 host  with  24 cores and   39 GB memory under Sun Grid Engine control.
-- Found  14 hosts with  56 cores and  503 GB memory under Sun Grid Engine control.
-- Found   2 hosts with  24 cores and   31 GB memory under Sun Grid Engine control.
-- Found  14 hosts with  72 cores and  503 GB memory under Sun Grid Engine control.
-- Found  13 hosts with  24 cores and   94 GB memory under Sun Grid Engine control.
-- Found  26 hosts with  48 cores and  503 GB memory under Sun Grid Engine control.
-- Found   4 hosts with 256 cores and  503 GB memory under Sun Grid Engine control.
-- Found   2 hosts with  16 cores and   94 GB memory under Sun Grid Engine control.
-- Found   1 host  with  48 cores and  188 GB memory under Sun Grid Engine control.
-- Found   1 host  with  24 cores and  141 GB memory under Sun Grid Engine control.
--
--                         (tag)Threads
--                (tag)Memory         |
--        (tag)             |         |  algorithm
--        -------  ----------  --------  -----------------------------
-- Grid:  meryl     13.000 GB    8 CPUs  (k-mer counting)
-- Grid:  hap       10.000 GB    8 CPUs  (read-to-haplotype assignment)
-- Grid:  cormhap   10.000 GB    8 CPUs  (overlap detection with mhap)
-- Grid:  obtovl     8.000 GB    8 CPUs  (overlap detection)
-- Grid:  utgovl     8.000 GB    8 CPUs  (overlap detection)
-- Grid:  cor       16.000 GB    4 CPUs  (read correction)
-- Grid:  ovb        4.000 GB    1 CPU   (overlap store bucketizer)
-- Grid:  ovs        8.000 GB    1 CPU   (overlap store sorting)
-- Grid:  red        8.000 GB    4 CPUs  (read error detection)
-- Grid:  oea        8.000 GB    1 CPU   (overlap error adjustment)
-- Grid:  bat      200.000 GB    8 CPUs  (contig construction with bogart)
-- Grid:  cns        -.--- GB    8 CPUs  (consensus)
--
-- In 'Zymo6331-ULI.seqStore', found PacBio HiFi reads:
--   PacBio HiFi:              1
--
--   Corrected:                1
--   Corrected and Trimmed:    1
--
-- Generating assembly 'Zymo6331-ULI' in '/dept/appslab/projects/old/2020/dp_Zymo/HiCanu_Assembly/ULI':
--    - assemble HiFi reads.
--
-- Parameters:
--
--  genomeSize        100000000
--
--  Overlap Generation Limits:
--    corOvlErrorRate 0.0000 (  0.00%)
--    obtOvlErrorRate 0.0250 (  2.50%)
--    utgOvlErrorRate 0.0100 (  1.00%)
--
--  Overlap Processing Limits:
--    corErrorRate    0.0000 (  0.00%)
--    obtErrorRate    0.0250 (  2.50%)
--    utgErrorRate    0.0100 (  1.00%)
--    cnsErrorRate    0.0500 (  5.00%)
--
--
-- BEGIN ASSEMBLY
--
--
-- Bogart failed, tried 2 times, giving up.
--

ABORT:
ABORT: canu 2.1
ABORT: Don't panic, but a mostly harmless error occurred and Canu stopped.
ABORT: Try restarting.  If that doesn't work, ask for help.
ABORT:
ABORT: Disk space available:  91264.777 GB
ABORT:
ABORT: Last 50 lines of the relevant log file (unitigging/4-unitigger/unitigger.err):
ABORT:
ABORT:   mergeOrphans()-- ignored       0               tigs with 0 reads; failed to place
ABORT:   mergeOrphans()--
ABORT:   
ABORT:   ==> MARK SIMPLE BUBBLES.
ABORT:       using 0.010000 user-specified threshold
ABORT:   
ABORT:   
ABORT:   findPotentialOrphans()-- working on 4954 tigs.
ABORT:   findPotentialOrphans()-- found 3866 potential orphans.
ABORT:   mergeOrphans()-- flagged    3691        bubble tigs with 550991 reads
ABORT:   mergeOrphans()-- placed        0 unique orphan tigs with 0 reads
ABORT:   mergeOrphans()-- shattered     0 repeat orphan tigs with 0 reads
ABORT:   mergeOrphans()-- ignored       1               tigs with 38 reads; failed to place
ABORT:   mergeOrphans()--
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- singleton
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- too few reads        (< 2 reads)
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- too short            (< 0 bp)
ABORT:   classifyAsUnassembled()--      0 tigs           0 bases -- single spanning read (> 1.000000 tig length)
ABORT:   classifyAsUnassembled()--     58 tigs      687914 bases -- low coverage         (> 0.500000 tig length at < 3 coverage)
ABORT:   classifyAsUnassembled()--   4758 tigs   111950257 bases -- acceptable contigs
ABORT:   
ABORT:   
ABORT:   ==> GENERATING ASSEMBLY GRAPH.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 4954 tigs, with 8 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   
ABORT:   AssemblyGraph()-- allocating vectors for placements, 117.061MB
ABORT:   AssemblyGraph()-- finding edges for 885032 reads (821084 contained), ignoring 1672196 unplaced reads, with 8 threads.
ABORT:   AssemblyGraph()-- building reverse edges.
ABORT:   AssemblyGraph()-- build complete.
ABORT:   
ABORT:   ==> BREAK REPEATS.
ABORT:   
ABORT:   computeErrorProfiles()-- Computing error profiles for 4954 tigs, with 8 threads.
ABORT:   computeErrorProfiles()-- Finished.
ABORT:   bogart: bogart/AS_BAT_MarkRepeatReads.C:887: std::vector<breakReadEnd> buildBreakPoints(TigVector&, Unitig*, intervalList<int>&, intervalList<int>&, std::vector<confusedEdge>&): Assertion `isRepeat == true' failed.
ABORT:   
ABORT:   Failed with 'Aborted'; backtrace (libbacktrace):
ABORT:   utility/src/utility/system-stackTrace.C::83 in _Z17AS_UTL_catchCrashiP7siginfoPv()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:   bogart/AS_BAT_MarkRepeatReads.C::887 in _Z16buildBreakPointsR9TigVectorP6UnitigR12intervalListIiES5_RSt6vectorI12confusedEdgeSaIS7_EE()
ABORT:   bogart/AS_BAT_MarkRepeatReads.C::1051 in _Z15markRepeatReadsP13AssemblyGraphR9TigVectordjdRSt6vectorI12confusedEdgeSaIS4_EE()
ABORT:   bogart/bogart.C::656 in main()
ABORT:   (null)::0 in (null)()
ABORT:   (null)::0 in (null)()
ABORT:

Any suggestions as to what might be going wrong here? I have attached the unitigger.err file here too (unitigger.err.txt), in case that helps.

Thanks!
Dan


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4