Hello,
I've been using (Hi)canu to assemble hifi reads, with some good success. I had a question regarding the bat-unitigging stage.
The overall command I'm running is canu -p asm -d out_dir genomeSize=3g -pacbio-hifi <input>
I've tested this on different data as well as two (at the time tip) versions
issuecanu snapshot v2.0-development +612 changes (r10105 90065fd)
canu snapshot v2.2-development +15 changes (r10124 2a31172)
Everything works smoothly, up until the 4-unitigger/unitigger.sh
script, which requests a bit over 500GB of memory. There are nodes on our grid that have that, so it has run correctly, but queues for a long time. After inspecting the log files, all three assemblies have only used between 5GB and 16GB of max memory during this time.
All other stages seem to reasonably request memory and threads, so I was surprised by such a divergence here. I found a relevant line in Configure.pm, where it assumes this memory requirement based on the 3g estimate provided.
I wasn't sure if this was some artefact of canu assembling shorter/noisier reads than hifi, and so overestimates the resources needed for this stage. I appreciate it is easier to err on the side of caution and presumably this can be adjusted manually with the batMemory option, but I thought it was worth sharing as canu has otherwise been better than me at predicting resources.
I've included some details on the LSF grid resource summary from the unitigger job below.
CPU time : 3567.08 sec.
Max Memory : 15621 MB
Average Memory : 12243.67 MB
Total Requested Memory : 524288.00 MB
Delta Memory : 508667.00 MB
Run time : 697 sec.
Turnaround time : 11703 sec.
Thanks,
Alex
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4