Ever wished to use Python in Bash? Would you choose the Python syntax over sed
, awk
, ...? Should you exactly know what command would you use in Python, but you end up querying man
again and again, read further. The utility allows you to pythonize the shell: to pipe arbitrary contents through pz
, loaded with your tiny Python script.
How? Simply meddle with the s
variable. Example: appending '.com' to every line.
$ echo -e "example\nwikipedia" | pz 's += ".com"'
example.com
wikipedia.com
- Installation
- Examples
- Extract a substring
- Prepend to every line in a stream
- Converting to uppercase
- Reversing lines
- Parsing numbers
- Find out all URLs in a text
- Sum numbers
- Keep unique lines
- Counting words
- Aggregating suffixes in a directory
- Fetching web content
- Handling nested quotes
- Computing factorial
- Read CSV
- Generate random number
- Average a stream value
- Multiline statements
- Simple progress bar
- Docs
- Scope variables
s
β current linen
β current line converted to anint
(orfloat
) if possibleb
β current line as a byte-stringcount
β current line numbertext
β whole text, all lines togetherlines
β list of lines so far processednumbers
β list of numbers so far processedskip
linei
,S
,L
,D
,C
β other global variables
- Auto-import
- Output
- CLI flags
- Scope variables
Install with a single command from PyPi.
pip3 install pz
Or download and launch the pz
file from here.
How does your data look when pythonized via pz
? Which Bash programs may the utility substitute?
Just use the [:]
notation.
echo "hello world" | pz s[6:] # world
Note that suppressing quotes around the argument may not work (Zsh) or lead to an unexpected behaviour: touch s1 && echo "hello" | pz s[1]
β Exception: <class 'NameError'>
. Use echo "hello" | pz 's[1]'
instead.
We prepend the length of the line.
# let's use the f-string `--format` flag
tail -f /var/log/syslog | pz -f '{len(s)}: {s}'
# or do it the long way, explicitly setting the `s` variable
tail -f /var/log/syslog | pz 's = str(len(s)) + ": " + s'
Replacing | tr '[:upper:]' '[:lower:]'
.
echo "HELLO" | pz s.lower # "hello"
Replacing | tac
or | tail -r
(on some systems only) or | sed '1!G;h;$!d'
(for cool guys only)
$ echo -e "1\n2\n3" | pz -E 'lines[::-1]'
3
2
1
Replacing cut
. Note you can chain multiple pz
calls. Split by a comma ',
', then use n
to access the line converted to a number.
echo "hello,5" | pz 's.split(",")[1]' | pz n+7 # 12
Replacing sed
. We know that all functions from the re
library are already included, ex: "findall".
# either use the `--findall` flag
pz --findall "(https?:https://[^\s]+)" < file.log
# or expand the full command to which is the `--findall` flag equivalent
pz "findall(r'(https?:https://[^\s]+)', s)" < file.log
If chained, you can open all the URLs in the current web browser. Note that the function webbrowser.open
gets auto-imported from the standard library.
pz --findall "(https?:https://[^\s]+)" < file.log | pz webbrowser.open
Replacing | awk '{count+=$1} END{print count}'
or | paste -sd+ | bc
. Just use sum
in the --end
clause.
# internally changed to --end `s = sum(numbers)`
echo -e "1\n2\n3\n4" | pz --end sum # 10
Replacing | sort | uniq
makes little sense, but the demonstration gives you the idea. We initialize a set c
(like a collection). When processing a line, skip
is set to True
if already seen.
$ echo -e "1\n2\n2\n3" | pz "skip = s in c; c.add(s)" --setup "c=set()"
1
2
3
However, an advantage over | sort | uniq
comes when handling a stream. You see unique lines instantly, without waiting a stream to finish. Useful when using with tail --follow
.
Alternatively, to assure the values are sorted, we can make a use of --end
flag that produces the output after the processing finished.
echo -e "1\n2\n2\n3" | pz "S.add(s)" --end "sorted(S)" -0
Note that we used the variable S
which is initialized by default to an empty set (hence we do not have to use --setup
at all) and the flag -0
to prevent the processing from output (we do not have to use skip
parameter then).
(Strictly speaking we could omit -0
too. If you use the verbose -v
flag, you would see the command changed to s = S.add(s)
internally. And since set.add
produces None
output, it is the same as if it was skipped.)
We can omit (s)
in the main
clause and hence get rid of the quotes all together.
echo -e "1\n2\n2\n3" | pz S.add --end "sorted(S)"
Nevertheless, the most straightforward approach would involve the lines
variable, available when using the --end
clause.
echo -e "1\n2\n2\n3" | pz --end "sorted(set(lines))"
We split the line to get the words and put them in S
, a global instance of the set
. Then, we print the set length to get the number of unique words.
echo -e "red green\nblue red green" | pz 'S.update(s.split())' --end 'len(S)' # 3
But what if we want to get the most common words and the count of its usages? Let's use C
, a global instance of the collections.Counter
. We see then the red
is the most_common word and has been used 2 times.
$ echo -e "red green\nblue red green" | pz 'C.update(s.split())' --end C.most_common
red 2
green 2
blue 1
To get a quick notion about the number of file extensions dwelling on a path, firstly convert file names to the suffixes. Then, feed them to the collections.Counter
constructor.
$ ls
a.txt b.txt c.txt v1.mp4 v2.mp4
$ ls | pz 'Path(s).suffix' | pz --end 'Counter(lines).most_common'
.txt 3
.mp4 2
Accessing internet is easy thanks to the requests
library. Here, we fetch example.com
, grep it for all lines containing "href" and print them out while stripping spaces.
$ echo "https://example.com" | pz 'requests.get(s).content' | grep href | pz s.strip
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
To see how auto-import are resolved, use the verbose mode. (Notice the line Importing requests
.)
$ echo "https://example.com" | pz 'requests.get(s).content' -v | grep href | pz s.strip
Changing the command clause to: s = requests.get(s).content
Importing requests
<p><a href="https://www.iana.org/domains/example">More information...</a></p>
To match every line that has a quoted expressions and print out the quoted contents, you may serve yourself of Python triple quotes. In the example below, an apostrophe is used to delimit the COMMAND
flag. If we used an apostrophe in the text, we would have to slash it. Instead, triple quotes might improve readability.
echo -e 'hello "world".' | pz 'match(r"""[^"]*"(.*)".""", s)' # world
In that case, even better is to use the --match
flag to get rid of the quoting as much as possible.
echo -e 'hello "world".' | pz --match '[^"]*"(.*)"' # world
Take a look at multiple ways. The simplest is to use the function.
echo 5 | pz factorial # 120
What happens in the background? factorial
is available from math.factorial
. Since it is a callable, we try to put current line as the parameter: factorial(s)
. Since s = "5"
which means a string, it fails. It then tries to use factorial(n)
where n
is current line automatically fetched to a number. That works.
Harder way? Let's use math.prod
then.
echo 5 | pz 'prod(i for i in range(1,n+1))' # 120
Without any built-in library? Let's just use a for-cycle. Process all numbers from 1 to n
(which is 5) and multiply to product. Finally, assign n
to s
which is output.
echo 5 | pz 'for c in range(1,n): n*= c ; s = n' # 120
Using generator will print a factorial for every number from 1 to -g
.
$ pz factorial -g5
1
2
6
24
120
As csv
is one of the auto-imported libraries, we may directly access instantiate the reader object. In the following example, we output the second element of every line either progressively or at once when processing finished.
# output line by line
echo '"a","b1,b2,b3","c"' | pz "(x[1] for x in csv.reader([s]))" # "b1,b2,b3"
# output at the end
echo '"a","b1,b2,b3","c"' | pz --end "(x[1] for x in csv.reader(lines))" # "b1,b2,b3"
First, take a look how to stream random numbers to 100 in Bash.
while :; do echo $((1+$RANDOM%100)); done
Now examine pure Python solution, without having pz
involved.
python3 -c "while True: from random import randint; print(randint(1,100))"
Using pz
, we relieve the cycle handling and importing burden from the command.
pz "randint(1,100)" --generate=0
Let's generate few random strings of variable length 1 to 30. When generator flag is used without a number, it cycles five times.
pz "''.join(random.choice(string.ascii_letters) for _ in range(randint(1,30)))" -S "import string" -g
Let's have a stream and output the average value.
# print out current line `count` and current average `sum/count`
$ while :; do echo $((1 + $RANDOM % 100)) ; sleep 0.1; done | pz 'sum+=n;s=count, sum/count' --setup "sum=0"
1 38.0
2 67.0
3 62.0
4 49.75
# print out every 10 000 lines
# (thanks to `not i % 10000` expression)
$ while :; do echo $((1 + $RANDOM % 100)) ; done | pz 'sum+=n;s=sum/count; s = (count,s) if not count % 10000 else ""' --setup "sum=0"
10000 50.9058
20000 50.7344
30000 50.693466666666666
40000 50.5904
How can this be simplified? Let's use an infinite generator -g0
. As we know, n
is given current line number by the generator and i
is by default implicitly declared to i=0
so we use it to hold the sum. No setup clause needed. No Bash cycle needed.
$ pz "i+=randint(1,100); s = (n,i/n) if not n % 10000 else ''" -g0
10000 49.9488
20000 50.5399
30000 50.39906666666667
40000 50.494425
Should you need to evaluate a short multiline statement, use standard multiline statements, supported by Bash.
$ echo -e "1\n2\n3" | pz "if n > 2:
s = 'bigger'
else:
s = 'smaller'
"
smaller
bigger
bigger
Simulate a lengthy processing by generating a long sequence of numbers (as they are not needed, we throw them away by 1>/dev/null
).
On every 100th line, we move cursor up (\033[1A
), clear line (\033[K
) and print to STDERR
current status.
$ seq 1 100000 | pz 's = f"\033[1A\033[K ... {count} ..." if count % 100 == 0 else None ' --stderr 1>/dev/null
... 100 ... # replaced by ... 200 ...
In the script scope, you have access to the following variables:
Change it according to your needs
echo 5 | pz 's += "4"' # 54
echo 5 | pz n+2 # 7
echo 5.2 | pz n+2 # 7.2
Sometimes the input cannot be converted to str easily. A warning is output, however, you can still operate with raw bytes.
echo -e '\x80 invalid line' | pz s
Cannot parse line correctly: b'\x80 invalid line'
οΏ½ invalid line
# use the `--quiet` flag to suppress the warning, then decode the bytes
echo -e '\x80 invalid line' | pz 'b.decode("cp1250")' --quiet
β¬ invalid line
# display every 1_000nth line
$ pz -g0 n*3 | pz "n if not count % 1000 else None"
3000
6000
9000
# the same, using the `--filter` flag
$ pz -g0 n*3 | pz -F "not count % 1000"
Not available with the --overflow-safe
flag set nor in the main
clause unless the --whole
flag set.
Ex: get character count (an alternative to | wc -c
).
echo -e "hello\nworld" | pz --end 'len(text)' # 11
When used in the main
clause, an error appears.
$ echo -e "1\n2\n3" | pz 'len(text)'
Did not you forget to use --text?
Exception: <class 'NameError'> name 'text' is not defined on line: 1
Appending --whole
helps, but the result is processed for every line.
$ echo -e "1\n2\n3" | pz 'len(text)' -w
5
5
5
Appending -1
makes sure the statement gets computed only once.
$ echo -e "1\n2\n3" | pz 'len(text)' -w1
5
Not available with the --overflow-safe
flag set.
Ex: returning the last line
echo -e "hello\nworld" | pz --end lines[-1] # "world"
Not available with the --overflow-safe
flag set.
Ex: show current average of the stream. More specifically, we output tuples: line count, current line, average
.
$ echo -e "20\n40\n25\n28" | pz 's = count, s, sum(numbers)/count'
1 20 20.0
2 40 30.0
3 25 28.333333333333332
4 28 28.25
If set to True
, current line will not be output. If set to False
when using the -0
flag, the line will be output regardless.
Some variables are initialized and ready to be used globally. They are common for all the lines.
i = 0
S = set()
L = list()
D = dict()
C = Counter()
It is true that using uppercase is not conforming the naming convention. However, in these tiny scripts the readability is the chief principle, every character counts.
Using a set S
. In the example, we add every line to the set and end print it out in a sorted manner.
$ echo -e "2\n1\n2\n3\n1" | pz "S.add(s)" --end "sorted(S)"
1
2
3
Using a list L
. Append lines that contains a number bigger than one and finally, print their count. As only the final count matters, suppress the line output with the flag -0
.
$ echo -e "2\n1\n2\n3\n1" | pz "if n > 1: L.append(s)" --end "len(L)" -0
3
- You can always import libraries you need manually. (Put
import
statement into the command.) - Some libraries are ready to be used:
re.* (match, search, findall), math.* (sqrt,...), defaultdict
- Some others are auto-imported whenever its use has been detected. In such case, the line is reprocessed.
- Functions:
b64decode, b64encode, datetime, (requests).get, glob, iglob, Path, randint, sleep, time, ZipFile
- Modules:
base64, collections, csv, humanize, itertools, jsonpickle, pathlib, random, requests, time, webbrowser, zipfile
- Functions:
Caveat: When accessed first time, the auto-import makes the row reprocessed. It may influence your global variables. Use verbose output to see if something has been auto-imported.
$ echo -e "hey\nbuddy" | pz 'a+=1; sleep(1); b+=1; s = a,b ' --setup "a=0;b=0;" -v
Importing sleep from time
2 1
3 2
As seen, a
was incremented 3Γ times and b
on twice because we had to process the first line twice in order to auto-import sleep. In the first run, the processing raised an exception because sleep
was not known. To prevent that, explicitly appending from time import sleep
to the --setup
flag would do.
-
Explicit assignment: By default, we output the
s
.echo "5" | pz 's = len(s)' # 1
-
Single expression: If not set explicitly, we assign the expression to
s
automatically.echo "5" | pz 'len(s)' # 1 (command internally changed to `s = len(s)`)
-
Tuple, generator: If
s
ends up as a tuple, it gets joined by tabs.$ echo "5" | pz 's, len(s)' 5 1
Consider piping two lines 'hey' and 'buddy'. We return three elements, original text, reversed text and its length.
$ echo -e "hey\nbuddy" | pz 's,s[::-1],len(s)' hey yeh 3 buddy yddub 5
-
List: When
s
ends up as a list, its elements are printed to independent lines.$ echo "5" | pz '[s, len(s)]' 5 1
-
Regular match: All groups are treated as a tuple. If no group used, we print the entire matched string.
# no group β print entire matched string echo "hello world" | pz 'search(r"\s.*", s)' # " world" # single matched group echo "hello world" | pz 'search(r"\s(.*)", s)' # "world" # matched groups treated as tuple echo "hello world" | pz 'search(r"(.*)\s(.*)", s)' # "hello world"
-
Callable: It gets called. Very useful when handling simple function β without the need of explicitly putting parenthesis to call the function, we can omit quoting in Bash (expression
s.lower()
would have had to be quoted.) Use the verbose flag-v
to inspect the internal change of the command.# internally changed to `s = s.lower()` echo "HEllO" | pz s.lower # "hello" # internally changed to `s = len(s)` echo "HEllO" | pz len # "5" # internally changed to `s = base64.b64encode(s.encode('utf-8'))` echo "HEllO" | pz b64encode # "SEVsbE8=" # internally changed to `s = math.sqrt(n)` # and then to `s = round(n)` echo "25" | pz sqrt | pz round # "5" # internally changed to `s = sum(numbers)` echo -e "1\n2\n3\n4" | pz sum 1 3 6 10 # internally changed to `' - '.join(lines)` echo -e "1\n2\n3\n4" | pz --end "' - '.join" 1 - 2 - 3 - 4
As you see in the examples, if
TypeError
raised, we try to reprocess the row while adding current line as the argument:- either its basic form
s
- the
numbers
if available - using its numeral representation
n
if available - encoded to bytes
s.encode('utf-8')
In the
--end
clause, we try furthermore thelines
. - either its basic form
-v
,--verbose
: See what happens under the hood. Show automatic imports and internal command modification (attempts to make it callable and prependings =
if omitted).$ echo -e "hello" | pz 'invalid command' Exception: <class 'SyntaxError'> invalid syntax (<string>, line 1) on line: hello $ echo -e "hello" | pz 'sleep(1)' --verbose Importing sleep from time
-q
,--quiet
: See errors and values only. Suppress command exceptions.echo -e "hello" | pz 'invalid command' --quiet # empty result
COMMAND
: Themain
clause, any Python script executed on every line (multiple statements allowed)-E COMMAND
,--end COMMAND
: Any Python script, executed after processing. Useful for the final output. The variabletext
is available by default here.$ echo -e "1\n2\n3\n4" | pz --end sum 10 $ echo -e "1\n2\n3\n4" | pz s --end sum 1 # output of the `main` clause 2 3 4 10 # output of the `end` clause $ echo -e "1\n2\n3\n4" | pz sum --end sum 1 # output of the `main` clause 3 6 10 10 # output of the `end` clause
-S COMMAND
,--setup COMMAND
: Any Python script, executed before processing. Useful for variable initializing. Ex: prepend line numbers by incrementing a variablecount
.$ echo -e "row\nanother row" | pz 'count+=1;s = f"{count}: {s}"' --setup 'count=0' 1: row 2: another row # the same using globally available variable `count` instead of using `--setup` and the `--format` flag $ echo -e "row\nanother row" | pz -f '{count}: {s}'
-I
,--insecure
: If set, any Python script in the environment variablePZ_SETUP
will be executed just before the--setup
clause. Useful for imports. Since the user might launch an unintended code if an attacker tampers with the variable, we condition its evaluation by this flag for the moment.$ echo -e "1\n2\n3" | PZ_SETUP='from hashlib import sha3_256' pz -I 'sha3_256(b).hexdigest' # equivalent to: $ echo -e "1\n2\n3" | pz --setup 'from hashlib import sha3_256' 'sha3_256(b).hexdigest' 67b176705b46206614219f47a05aee7ae6a3edbe850bbbe214c536b989aea4d2 b1b1bd1ed240b1496c81ccf19ceccf2af6fd24fac10ae42023628abbe2687310 1bf0b26eb2090599dd68cbb42c86a674cb07ab7adc103ad3ccdf521bb79056b9
-F
,--filter
: Line is piped out unchanged, however only if evaluated toTrue
. When piping in numbers to 5, we pass only such bigger than 3.The statement is equivalent to using$ echo -e "1\n2\n3\n4\n5" | pz "n > 3" --filter 4 5
skip
(and not using--filter
).When not using filter,$ echo -e "1\n2\n3\n4\n5" | pz "skip = not n > 3" 4 5
s
evaluates toTrue
/False
. By default,False
or empty values are not output.$ echo -e "1\n2\n3\n4\n5" | pz "n > 3" True True
-f
,--format
: Main and end clauses are considered f-strings. The clause is inserted in between three-apostrophesf'''COMMAND'''
internally.
-
-n NUM
Process only such number of lines. Roughly equivalent tohead -n
. -
-1
Process just the first line. -
-0
Skip all lines output. (Useful in combination with--end
.) -
--empty
Output even empty lines. (By default skipped.)
Consider shortening the text by 3 last letters. First linehey
disappears completely then.$ echo -e "hey\nbuddy" | pz 's[:-3]' bu
Should we insist on displaying, we see an empty line now.
$ echo -e "hey\nbuddy" | pz 's[:-3]' --empty bu
-
-g [NUM]
,--generate [NUM]
Generate lines while ignoring the input pipe. Line will correspond to the iteration cycle count (unless having the--overflow-safe
flag on while having an infinite generator β in that case, lines will equal to '1'). IfNUM
not specified, 5 lines will be produced by default. PuttingNUM == 0
means an infinite generator. If nomain
clause set, the number is piped out.$ pz -g2 1 2 $ pz 'i=i+5' -g -v Changing the main clause to: s = i=i+5 Generating s = 1 .. 5 5 10 15 20 25
-
--stderr
Print clauses' output to theSTDERR
, while letting the original line piped to theSTDOUT
intact. Useful for generating reports during a long operation. Take a look at the following example, every third line will makeSTDERR
to receive a message.$ pz -g=9 s | pz "s = 'Processed next few lines' if count % 3 == 0 else None" --stderr 1 2 3 Processed next few lines 4 5 6 Processed next few lines 7 8 9 Processed next few lines
Demonstrate different pipes by writing
STDOUT
to a file and leavingSTDERR
in the terminal.$ pz -g=9 s | pz "s = 'Processed next few lines' if count % 3 == 0 else None" --stderr > /tmp/example Processed next few lines Processed next few lines Processed next few lines cat /tmp/example 1 2 3 ...
-
--overflow-safe
Preventlines
,numbers
,text
variables to be available. Useful when handling an infinite input.# prevent `text` to be populated by default echo -e "1\n2\n2\n3" | pz --end "len(text)" --overflow-safe Did you not forget to use --whole to access `text`? Exception: <class 'NameError'> name 'text' is not defined in the --end clause # force to populate `text` echo -e "1\n2\n2\n3" | pz --end "len(text)" --overflow-safe --whole 7
-
--search
Equivalent tosearch(COMMAND, s)
$ echo -e "hello world\nanother words" | pz --search ".*\s" hello another
-
--match
Equivalent tomatch(COMMAND, s)
-
--findall
Equivalent tofindall(COMMAND, s)
-
--sub SUBSTITUTION
Equivalent tosub(COMMAND, SUBSTITUTION, s)
$ echo -e "hello world\nanother words" | pz ".*\s" --sub ":" :world :words
Using groups
$ echo -e "hello world\nanother words" | pz "(.*)\s" --sub "\1" helloworld anotherwords
- Run:
apt-get install bash-completion jq
- Copy: extra/pz-autocompletion.bash to
/etc/bash_completion.d/
- Restart terminal