A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/marbl/canu/issues/1924 below:

overlapInCorePartition generates 67GB large shellscript full of just if and fi · Issue #1924 · marbl/canu · GitHub

Hi,
it happened to me that overlapInCorePartition generated too many partitions and a 67GB large correction/1-overlapper/overlap.sh shellscript, practically just huge if/else code which a call to perl utility afterwards. Interpreting this file takes several CPU cores.

$ canu/build/bin/overlapInCorePartition
ERROR:  Hash length (-hl) must be specified.
ERROR:  Reference length (-rl) must be specified.
ERROR:  seqStore (-S) must be supplied.
usage: /storage/plzen1/home/mmokrejs/apps/canu/build/bin/overlapInCorePartition [opts]
  Someone should write the command line help.
  But this is only used interally to canu, so...
$

When canu configured itself, it logged:

--                                (tag)Concurrency
--                         (tag)Threads          |
--                (tag)Memory         |          |
--        (tag)             |         |          |       total usage      algorithm
--        -------  ----------  --------   --------  --------------------  -----------------------------
-- Local: meryl     64.000 GB    8 CPUs x  16 jobs  1024.000 GB 128 CPUs  (k-mer counting)
-- Local: hap       16.000 GB   64 CPUs x   2 jobs    32.000 GB 128 CPUs  (read-to-haplotype assignment)
-- Local: corovl     8.000 GB    1 CPU  x 128 jobs  1024.000 GB 128 CPUs  (overlap detection)
-- Local: obtovl    24.000 GB   16 CPUs x   8 jobs   192.000 GB 128 CPUs  (overlap detection)
-- Local: utgovl    24.000 GB   16 CPUs x   8 jobs   192.000 GB 128 CPUs  (overlap detection)
-- Local: cor       24.000 GB    4 CPUs x  32 jobs   768.000 GB 128 CPUs  (read correction)
-- Local: ovb        4.000 GB    1 CPU  x 128 jobs   512.000 GB 128 CPUs  (overlap store bucketizer)
-- Local: ovs       32.000 GB    1 CPU  x  62 jobs  1984.000 GB  62 CPUs  (overlap store sorting)
-- Local: red       64.000 GB    8 CPUs x  16 jobs  1024.000 GB 128 CPUs  (read error detection)
-- Local: oea        8.000 GB    1 CPU  x 128 jobs  1024.000 GB 128 CPUs  (overlap error adjustment)
-- Local: bat      1024.000 GB   64 CPUs x   1 job   1024.000 GB  64 CPUs  (contig construction with bogart)
-- Local: cns        -.--- GB    8 CPUs x   - jobs     -.--- GB   - CPUs  (consensus)

Then, it did its math:

-- OVERLAPPER (normal) (correction) erate=0.32
--
----------------------------------------
-- Starting command on Sat Feb 13 23:39:51 2021 with 270611.71 GB free disk space

    cd correction/1-overlapper
    /auto/plzen1/home/mmokrejs/apps/canu-2.1.1/build/bin/overlapInCorePartition \
     -S  ../../my_genome.seqStore \
     -hl 2500000 \
     -rl 2000000 \
     -ol 500 \
     -o  ./my_genome.partition \
    > ./my_genome.partition.err 2>&1

-- Finished on Sat Feb 13 23:56:51 2021 (1020 seconds) with 270368.935 GB free disk space
----------------------------------------
--
-- Configured 492959931 overlapInCore jobs.
-- Finished stage 'cor-overlapConfigure', reset canuIteration.
--
-- Running jobs.  First attempt out of 2.
----------------------------------------
-- Starting 'corovl' concurrent execution on Sun Feb 14 04:54:43 2021 with 269805.128 GB free disk space (492959931 processes; 128 concurrently)

I think for sure a roadblock should be installed somewhere so this huge piece of code should be prevented from further processing. Than we can think of what to change in the code (dunno where).

BTW, the if and fi should at least be turned into if and elif.

if [ $jobid -eq 90 ] ; then
  bat="001"
  job="001/000090"
  opt="-h 5595-5812 -r 752-1503 --hashdatalen 2525793"
fi

if [ $jobid -eq 91 ] ; then
  bat="001"
  job="001/000091"
  opt="-h 5595-5812 -r 1504-2250 --hashdatalen 2525793"
fi

if [ $jobid -eq 92 ] ; then
  bat="001"
  job="001/000092"
  opt="-h 5595-5812 -r 2251-2770 --hashdatalen 2525793"
fi

if [ $jobid -eq 93 ] ; then
  bat="001"
  job="001/000093"
  opt="-h 5595-5812 -r 2771-3197 --hashdatalen 2525793"
fi

if [ $jobid -eq 94 ] ; then
  bat="001"
  job="001/000094"
  opt="-h 5595-5812 -r 3198-3562 --hashdatalen 2525793"
fi

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4