rush
is a tool similar to GNU parallel and gargs. rush
borrows some idea from them and has some unique features, e.g., supporting custom defined variables, resuming multi-line commands, more advanced embeded replacement strings.
These features make rush
suitable for easily and flexibly parallelizing complex workflows in fields like Bioinformatics (see examples 18).
Major:
--line-buffer
in GNU parallel)-t
). (--timeout
in GNU parallel)-r
). (--retry-failed --joblog
in GNU parallel)-c
). (--resume --joblog
in GNU parallel, sut it does not support multi-line commands, which are common in workflow)awk -v
like custom defined variables (-v
). (Using Shell variable in GNU parallel)-k
). (Same -k/--keep-order
in GNU parallel)-e
). (not perfect, you may stop it by typing ctrl-c or closing terminal) (--halt 2
in GNU parallel)-D
, default \n
). (--recstart
and --recend
in GNU parallel)-n
, default 1
). (-n/--max-args
in GNU parallel)-d
, default \s+
). (Same -d/--delimiter
in GNU parallel){{}}
, {}
itself{{1,}}
, {1,}
.{#}
, job ID. (Same in GNU parallel){}
, full data. (Same in GNU parallel){n}
, n
th field in delimiter-delimited data. (Same in GNU parallel){/}
, dirname. ({//}
in GNU parallel){%}
, basename. ({/}
in GNU parallel){.}
, remove the last file extension. (Same in GNU parallel){:}
, remove all file extensions (Not directly supported in GNU parallel){^suffix}
, remove suffix
(Not directly supported in GNU parallel){@regexp}
, capture submatch using regular expression (Not directly supported in GNU parallel). There's a limitation here: curly brackets can't be used in the regular expression.{%.}
, {%:}
, basename without extension{2.}
, {2/}
, {2%.}
, manipulate n
th fieldrush -v p={^suffix} 'echo {p}_new_suffix'
, where {p}
is replaced with {^suffix}
. (Using Shell variable in GNU parallel)Minor:
--dry-run
). (Same in GNU parallel)--trim
). (Same in GNU parallel)--verbose
). (Same in GNU parallel)Differences between rush and GNU parallel on GNU parallel site.
Performance of rush
is similar to gargs
, and they are both slightly faster than parallel
(Perl) and both slower than Rust parallel
(discussion).
Note that speed is not the #.1 target, especially for processes that last long.
rush
is implemented in Go programming language, executable binary files for most popular operating systems are freely available in release page.
Install conda, then run
conda install -c conda-forge rush
Or use mamba, which is faster.
mamba install -c conda-forge rush
Method 1: Download binaries
Tip: run rush -V
to check update !!!
Just download compressed executable file of your operating system, and decompress it with tar -zxvf *.tar.gz
command or other tools. And then:
For Linux-like systems
If you have root privilege simply copy it to /usr/local/bin
:
sudo cp rush /usr/local/bin/
Or copy to anywhere in the environment variable PATH
:
mkdir -p $HOME/bin/; cp rush $HOME/bin/
For windows, just copy rush.exe
to C:\WINDOWS\system32
.
go install github.com/shenwei356/rush@latest
Method 3: Compiling from source
# download Go from https://go.dev/dl
wget https://go.dev/dl/go1.24.4.linux-amd64.tar.gz
tar -zxf go1.24.4.linux-amd64.tar.gz -C $HOME/
# or
# echo "export PATH=$PATH:$HOME/go/bin" >> ~/.bashrc
# source ~/.bashrc
export PATH=$PATH:$HOME/go/bin
git clone https://github.com/shenwei356/rush
cd rush
go build
# or statically-linked binary
CGO_ENABLED=0 go build -tags netgo -ldflags '-w -s'
# or cross compile for other operating systems and architectures
CGO_ENABLED=0 GOOS=openbsd GOARCH=amd64 go build -tags netgo -ldflags '-w -s'
rush -- a cross-platform command-line tool for executing jobs in parallel
Version: 0.7.0
Author: Wei Shen <shenwei356@gmail.com>
Homepage: https://github.com/shenwei356/rush
Input:
- Input could be a list of strings or numbers, e.g., file paths.
- Input can be given either from the STDIN or file(s) via the option -i/--infile.
- Some options could be used to defined how the input records are parsed:
-d, --field-delimiter field delimiter in records (default "\s+")
-D, --record-delimiter record delimiter (default "\n")
-n, --nrecords number of records sent to a command (default 1)
-J, --records-join-sep record separator for joining multi-records (default "\n")
-T, --trim trim white space (" \t\r\n") in input
Output:
- Outputs of all commands are written to STDOUT by default,
you can also use -o/--out-file to specify a output file.
- Outputs of all commands are random, you can use the flag -k/--kep-order
to keep output in order of input.
- Outputs of all commands are buffered, you can use the flag -I/--immediate-output
to print output immediately and interleaved.
Replacement strings in commands:
{} full data
{#} job ID
{n} nth field in delimiter-delimited data
{/} dirname
{%} basename
{.} remove the last file extension
{:} remove all file extensions.
{^suffix} remove suffix
{@regexp} capture submatch using regular expression
Limitation: curly brackets can't be used in the regexp.
Escaping curly brackets "{}":
{{}} {}
{{1}} {1}
{{1,}} {1,}
{{a}} {a}
Combinations:
{%.}, {%:} basename without extension
{2.}, {2/}, {2%.} manipulate nth field
Preset variable (macro):
1. You can pass variables to the command like awk via the option -v. E.g.,
$ seq 3 | rush -v p=prefix_ -v s=_suffix 'echo {p}{}{s}'
prefix_3_suffix
prefix_1_suffix
prefix_2_suffix
2. The value could also contain replacement strings.
# {p} will be replaced with {%:}, which computes the basename and remove all file extensions.
$ echo a/b/c.txt.gz | rush -v 'p={%:}' 'echo {p} {p}.csv'
c c.csv
Usage:
rush [flags] [command]
Examples:
1. simple run, quoting is not necessary
$ seq 1 10 | rush echo {}
2. keep order
$ seq 1 10 | rush 'echo {}' -k
3. timeout
$ seq 1 | rush 'sleep 2; echo {}' -t 1
4. retry
$ seq 1 | rush 'python script.py' -r 3
5. dirname & basename & remove suffix
$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
dir file.txt.gz dir/file
6. basename without the last or any extension
$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
dir.d/file.txt dir.d/file file.txt file
7. job ID, combine fields and other replacement strings
$ echo 12 file.txt dir/s_1.fq.gz | rush 'echo job {#}: {2} {2.} {3%:^_1}'
job 1: file.txt file s
8. capture submatch using regular expression
$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
read
9. custom field delimiter
$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
a b c
10. custom record delimiter
$ echo a=b=c | rush -D "=" -k 'echo {}'
a
b
c
$ echo abc | rush -D "" -k 'echo {}'
a
b
c
11. assign value to variable, like "awk -v"
# seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei,lname=Shen
$ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
Hello, Wei Shen!
12. preset variable (Macro)
# equal to: echo sample_1.fq.gz | rush 'echo {:^_1} {} {:^_1}_2.fq.gz'
$ echo sample_1.fq.gz | rush -v p={:^_1} 'echo {p} {} {p}_2.fq.gz'
sample sample_1.fq.gz sample_2.fq.gz
13. save successful commands to continue in NEXT run
$ seq 1 3 | rush 'sleep {}; echo {}' -c -t 2
[INFO] ignore cmd #1: sleep 1; echo 1
[ERRO] run cmd #1: sleep 2; echo 2: time out
[ERRO] run cmd #2: sleep 3; echo 3: time out
14. escape special symbols
$ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"' -q
a
15. escape curly brackets "{}"
$ echo aaa bbb ccc | sed -E "s/(\S){3,}/\1/g"
a b c
$ echo 1 | rush 'echo aaa bbb ccc | sed -E "s/(\S){{3,}}/\1/g"' --dry-run
echo aaa bbb ccc | sed -E "s/(\S){3,}/\1/g"
16. run a command with relative paths in Windows, please use backslash as the separator.
# "brename -l -R" is used to search paths recursively
$ brename -l -q -R -i -p "\.go$" | rush "bin\app.exe {}"
More examples: https://github.com/shenwei356/rush
Flags:
-v, --assign strings assign the value val to the variable var (format: var=val, val also
supports replacement strings)
--cleanup-time int time to allow child processes to clean up between stop / kill signals
(unit: seconds, 0 for no time) (default 3) (default 3)
-c, --continue continue jobs. NOTES: 1) successful commands are saved in file (given
by flag -C/--succ-cmd-file); 2) if the file does not exist, rush saves
data so we can continue jobs next time; 3) if the file exists, rush
ignores jobs in it and update the file
--dry-run print command but not run
-q, --escape escape special symbols like $ which you can customize by flag
-Q/--escape-symbols
-Q, --escape-symbols string symbols to escape (default "$#&`")
--eta show ETA progress bar
-d, --field-delimiter string field delimiter in records, support regular expression (default "\\s+")
-h, --help help for rush
-I, --immediate-output print output immediately and interleaved, to aid debugging
-i, --infile strings input data file, multi-values supported
-j, --jobs int run n jobs in parallel (default value depends on your device) (default 16)
-k, --keep-order keep output in order of input
--no-kill-exes strings exe names to exclude from kill signal, example: mspdbsrv.exe; or use
all for all exes (default none)
--no-stop-exes strings exe names to exclude from stop signal, example: mspdbsrv.exe; or use
all for all exes (default none)
-n, --nrecords int number of records sent to a command (default 1)
-o, --out-file string out file ("-" for stdout) (default "-")
--print-retry-output print output from retry commands (default true)
--propagate-exit-status propagate child exit status up to the exit status of rush (default true)
-D, --record-delimiter string record delimiter (default is "\n") (default "\n")
-J, --records-join-sep string record separator for joining multi-records (default is "\n") (default "\n")
-r, --retries int maximum retries (default 0)
--retry-interval int retry interval (unit: second) (default 0)
-e, --stop-on-error stop child processes on first error (not perfect, you may stop it by
typing ctrl-c or closing terminal)
-C, --succ-cmd-file string file for saving successful commands (default "successful_cmds.rush")
-t, --timeout int timeout of a command (unit: seconds, 0 for no timeout) (default 0)
-T, --trim string trim white space (" \t\r\n") in input (available values: "l" for left,
"r" for right, "lr", "rl", "b" for both side)
--verbose print verbose information
-V, --version print version information and check for update
Simple run, quoting is not necessary
# seq 1 3 | rush 'echo {}'
$ seq 1 3 | rush echo {}
3
1
2
Read data from file (-i
)
$ rush echo {} -i data1.txt -i data2.txt
Keep output order (-k
)
$ seq 1 3 | rush 'echo {}' -k
1
2
3
Timeout (-t
)
$ time seq 1 | rush 'sleep 2; echo {}' -t 1
[ERRO] run command #1: sleep 2; echo 1: time out
real 0m1.010s
user 0m0.005s
sys 0m0.007s
Retry (-r
)
$ seq 1 | rush 'python unexisted_script.py' -r 1
python: can't open file 'unexisted_script.py': [Errno 2] No such file or directory
[WARN] wait command: python unexisted_script.py: exit status 2
python: can't open file 'unexisted_script.py': [Errno 2] No such file or directory
[ERRO] wait command: python unexisted_script.py: exit status 2
Input containing {}
(since v0.7.0)
$ echo "a attr{href}"="h4 text{}" | rush -T b -k -D "=" 'echo "{}"'
a attr{href}
h4 text{}
$ echo -ne "a{},b{{}},c{d}" | rush -D , -k "echo {}"
a{}
b{{}}
c{d}
Output {}
itself (since v0.7.0)
$ echo abc | rush 'echo "{} {{}}"'
abc {}
Dirname ({/}
) and basename ({%}
) and remove custom suffix ({^suffix}
)
$ echo dir/file_1.txt.gz | rush 'echo {/} {%} {^_1.txt.gz}'
dir file_1.txt.gz dir/file
Get basename, and remove last ({.}
) or any ({:}
) extension
$ echo dir.d/file.txt.gz | rush 'echo {.} {:} {%.} {%:}'
dir.d/file.txt dir.d/file file.txt file
Job ID, combine fields index and other replacement strings
$ echo 12 file.txt dir/s_1.fq.gz | rush 'echo job {#}: {2} {2.} {3%:^_1}'
job 1: file.txt file s
Capture submatch using regular expression ({@regexp}
)
$ echo read_1.fq.gz | rush 'echo {@(.+)_\d}'
Custom field delimiter (-d
)
$ echo a=b=c | rush 'echo {1} {2} {3}' -d =
a b c
Send multi-lines to every command (-n
)
$ seq 5 | rush -n 2 -k 'echo "{}"; echo'
1
2
3
4
5
# Multiple records are joined with separator `"\n"` (`-J/--records-join-sep`)
$ seq 5 | rush -n 2 -k 'echo "{}"; echo' -J ' '
1 2
3 4
5
$ seq 5 | rush -n 2 -k -j 3 'echo {1}'
1
3
5
Custom record delimiter (-D
), note that empty records are not used.
$ echo a b c d | rush -D " " -k 'echo {}'
a
b
c
d
$ echo abcd | rush -D "" -k 'echo {}'
a
b
c
d
# FASTA format
$ echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC"
>seq1
actg
>seq2
AAAA
>seq3
CCCC
$ echo -ne ">seq1\nactg\n>seq2\nAAAA\n>seq3\nCCCC" | rush -D ">" 'echo FASTA record {#}: name: {1} sequence: {2}' -k -d "\n"
FASTA record 1: name: seq1 sequence: actg
FASTA record 2: name: seq2 sequence: AAAA
FASTA record 3: name: seq3 sequence: CCCC
Assign value to variable, like awk -v
(-v
)
$ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei -v lname=Shen
Hello, Wei Shen!
$ seq 1 | rush 'echo Hello, {fname} {lname}!' -v fname=Wei,lname=Shen
Hello, Wei Shen!
$ for var in a b; do \
$ seq 1 3 | rush -k -v var=$var 'echo var: {var}, data: {}'; \
$ done
var: a, data: 1
var: a, data: 2
var: a, data: 3
var: b, data: 1
var: b, data: 2
var: b, data: 3
Preset variable (-v
), avoid repeatedly writing verbose replacement strings
# naive way
$ echo read_1.fq.gz | rush 'echo {:^_1} {:^_1}_2.fq.gz'
read read_2.fq.gz
# macro + removing suffix
$ echo read_1.fq.gz | rush -v p='{:^_1}' 'echo {p} {p}_2.fq.gz'
# macro + regular expression
$ echo read_1.fq.gz | rush -v p='{@(.+?)_\d}' 'echo {p} {p}_2.fq.gz'
Escape special symbols
$ seq 1 | rush 'echo "I have $100"'
I have 00
$ seq 1 | rush 'echo "I have $100"' -q
I have $100
$ seq 1 | rush 'echo "I have $100"' -q --dry-run
echo "I have \$100"
$ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"'
a b
$ seq 1 | rush 'echo -e "a\tb" | awk "{print $1}"' -q
a
Interrupt jobs by Ctrl-C
, rush will stop unfinished commands and exit.
$ seq 1 20 | rush 'sleep 1; echo {}'
^C[CRIT] received an interrupt, stopping unfinished commands...
[ERRO] wait cmd #7: sleep 1; echo 7: signal: interrupt
[ERRO] wait cmd #5: sleep 1; echo 5: signal: killed
[ERRO] wait cmd #6: sleep 1; echo 6: signal: killed
[ERRO] wait cmd #8: sleep 1; echo 8: signal: killed
[ERRO] wait cmd #9: sleep 1; echo 9: signal: killed
1
3
4
2
Continue/resume jobs (-c
). When some jobs failed (by execution failure, timeout, or cancelling by user with Ctrl + C
), please switch flag -c/--continue
on and run again, so that rush
can save successful commands and ignore them in NEXT run.
$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
1
2
[ERRO] run cmd #3: sleep 3; echo 3: time out
# successful commands:
$ cat successful_cmds.rush
sleep 1; echo 1__CMD__
sleep 2; echo 2__CMD__
# run again
$ seq 1 3 | rush 'sleep {}; echo {}' -t 3 -c
[INFO] ignore cmd #1: sleep 1; echo 1
[INFO] ignore cmd #2: sleep 2; echo 2
[ERRO] run cmd #1: sleep 3; echo 3: time out
Commands of multi-lines (Not supported in GNU parallel)
$ seq 1 3 | rush 'sleep {}; echo {}; \
echo finish {}' -t 3 -c -C finished.rush
1
finish 1
2
finish 2
[ERRO] run cmd #3: sleep 3; echo 3; \
echo finish 3: time out
$ cat finished.rush
sleep 1; echo 1; \
echo finish 1__CMD__
sleep 2; echo 2; \
echo finish 2__CMD__
# run again
$ seq 1 3 | rush 'sleep {}; echo {}; \
echo finish {}' -t 3 -c -C finished.rush
[INFO] ignore cmd #1: sleep 1; echo 1; \
echo finish 1
[INFO] ignore cmd #2: sleep 2; echo 2; \
echo finish 2
[ERRO] run cmd #1: sleep 3; echo 3; \
echo finish 3: time out
Commands are saved to file (-C
) right after it finished, so we can view the check finished jobs:
grep -c __CMD__ successful_cmds.rush
A comprehensive example: downloading 1K+ pages given by three URL list files using phantomjs save_page.js
(some page contents are dynamicly generated by Javascript, so wget
does not work). Here I set max jobs number (-j
) as 20
, each job has a max running time (-t
) of 60
seconds and 3
retry changes (-r
). Continue flag -c
is also switched on, so we can continue unfinished jobs. Luckily, it's accomplished in one run 😄
$ for f in $(seq 2014 2016); do \
$ /bin/rm -rf $f; mkdir -p $f; \
$ cat $f.html.txt | rush -v d=$f -d = 'phantomjs save_page.js "{}" > {d}/{3}.html' -j 20 -t 60 -r 3 -c; \
$ done
A bioinformatics example: mapping with bwa
, and processing result with samtools
:
$ tree raw.cluster.clean.mapping
raw.cluster.clean.mapping
├── M1
│ ├── M1_1.fq.gz -> ../../raw.cluster.clean/M1/M1_1.fq.gz
│ ├── M1_2.fq.gz -> ../../raw.cluster.clean/M1/M1_2.fq.gz
...
$ ref=ref/xxx.fa
$ threads=25
$ ls -d raw.cluster.clean.mapping/* \
| rush -v ref=$ref -v j=$threads \
'bwa mem -t {j} -M -a {ref} {}/{%}_1.fq.gz {}/{%}_2.fq.gz > {}/{%}.sam; \
samtools view -bS {}/{%}.sam > {}/{%}.bam; \
samtools sort -T {}/{%}.tmp -@ {j} {}/{%}.bam -o {}/{%}.sorted.bam; \
samtools index {}/{%}.sorted.bam; \
samtools flagstat {}/{%}.sorted.bam > {}/{%}.sorted.bam.flagstat; \
/bin/rm {}/{%}.bam {}/{%}.sam;' \
-j 2 --verbose -c -C mapping.rush
Since {}/{%}
appears many times, we can use preset variable (macro) to simplify it:
$ ls -d raw.cluster.clean.mapping/* \
| rush -v ref=$ref -v j=$threads -v p='{}/{%}' \
'bwa mem -t {j} -M -a {ref} {p}_1.fq.gz {p}_2.fq.gz > {p}.sam; \
samtools view -bS {p}.sam > {p}.bam; \
samtools sort -T {p}.tmp -@ {j} {p}.bam -o {p}.sorted.bam; \
samtools index {p}.sorted.bam; \
samtools flagstat {p}.sorted.bam > {p}.sorted.bam.flagstat; \
/bin/rm {p}.bam {p}.sam;' \
-j 2 --verbose -c -C mapping.rush
Shell grep
returns exit code 1
when no matches found. rush
thinks it failed to run. Please use grep foo bar || true
instead of grep foo bar
.
$ seq 1 | rush 'echo abc | grep 123'
[ERRO] wait cmd #1: echo abc | grep 123: exit status 1
$ seq 1 | rush 'echo abc | grep 123 || true'
Main contributors:
Specially thank @brentp and his gargs, from which rush
borrows some ideas.
Thank @bburgin for his contribution on improvement of child process management.
Create an issue to report bugs, propose new functions or ask for help.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4