- Edit 4th and 6th columns in the CSV file
$ cat file.csv | teip -d, -f 4,6 -- sed 's/./@/g'
- Convert timestamps in /var/log/secure to UNIX time
$ cat /var/log/secure | teip -c 1-15 -- date -f- +%s
- Percent-encode bare-minimum range of the file
$ cat file | teip -og '[^-a-zA-Z0-9@:%._\+~#=/]+' -- php -R 'echo urlencode($argn)."\n";'
teip
allows a command to focus on its own task.
Here is the comparison of processing time to replace approx 761,000 IP addresses with dummy ones in 100 MiB text file.
See detail on wiki > Benchmark.
-
Allows any command to "ignore unwanted input" which most commands cannot do
- The targeted command just handles selected parts of the standard input
- Unselected parts are bypassed by
teip
- Flexible methods for selecting a range (Select like AWK,
cut
command, or a regular expression)
-
High performer
- The targeted command's standard input/output are intercepted by multiple
teip
's threads asynchronously. - If general UNIX commands on your environment can process a few hundred MB files in a few seconds, then
teip
can do the same or better performance.
- The targeted command's standard input/output are intercepted by multiple
Using Homebrew
$ brew install greymd/tools/teip
$ wget https://git.io/teip-1.2.0.x86_64.deb
$ sudo dpkg -i ./teip*.deb
$ sudo dnf install https://git.io/teip-1.2.0.x86_64.rpm
$ sudo yum install https://git.io/teip-1.2.0.x86_64.rpm
$ docker build -t teip .
$ echo "100 200 300 400" | docker run --rm -i teip -f 3 -- sed 's/./@/g'
Pre-built binary is not prepared for now.
Build with cargo
, then make sure libclang
shared library is on your environment.
### Example for Ubuntu
$ sudo apt install cargo clang
$ cargo install teip
### Example for RHEL
$ sudo dnf install cargo clang
$ cargo install teip
Unfortunately, teip
does not work on non-UNIX environment due to technical reason.
Usage:
teip -g <pattern> [-oGsvz] [--] [<command>...]
teip -f <list> [-d <delimiter> | -D <pattern>] [-svz] [--] [<command>...]
teip -c <list> [-svz] [--] [<command>...]
teip -l <list> [-svz] [--] [<command>...]
teip --help | --version
Options:
--help Display this help and exit
--version Show version and exit
-g <pattern> Select lines that match the regular expression <pattern>
-o -g selects only matched parts.
-G -g adopts Oniguruma regular expressions
-f <list> Select only these white-space separated fields
-d <delimiter> Use <delimiter> for field delimiter of -f
-D <pattern> Use regular expression <pattern> for field delimiter of -f
-c <list> Select only these characters
-l <list> Select only these lines
-s Execute command for each selected part
-v Invert the sense of selecting
-z Line delimiter is NUL instead of newline
Try this at first.
$ echo "100 200 300 400" | teip -f 3
The result is almost the same as the input but "300" is highlighted and surrounded by [...]
.
Because -f 3
selects the 3rd field of space-separated input.
100 200 [300] 400
Next, put the sed
and its arguments at the end.
$ echo "100 200 300 400" | teip -f 3 sed 's/./@/g'
The result is as below.
Highlight and [...]
is gone then.
100 200 @@@ 400
As you can see, teip
passes only highlighted part to the sed
and replaces it with the result of the sed
.
Off-course, any command whatever you like can be specified. It is called the targeted command in this article.
Let's try the cut
as the targeted command to extract the first character only.
$ echo "100 200 300 400" | teip -f 3 cut -c 1
teip: Invalid arguments.
Oops? Why is it failed?
This is because the cut
uses the -c
option.
The option of the same name is also provided by teip
, which is confusing.
When entering a targeted command with teip
, it is better to enter it after --
.
Then, teip
interprets the arguments after --
as the targeted command and its argument.
$ echo "100 200 300 400" | teip -f 3 -- cut -c 1
100 200 3 400
Great, the first character 3
is extracted from 300
!
Although --
is not always necessary, it is always better to be used.
So, --
is used in all the examples from here.
Now let's double this number with the awk
.
The command looks like the following (Note that the variable to be doubled is not $3
).
$ echo "100 200 300 400" | teip -f 3 -- awk '{print $1*2}'
100 200 600 400
OK, the result went from 300 to 600.
Now, let's change -f 3
to -f 3,4
and run it.
$ echo "100 200 300 400" | teip -f 3,4 -- awk '{print $1*2}'
100 200 600 800
The numbers in the 3rd and 4th were doubled!
As some of you may have noticed, the argument of -f
is compatible with the LIST of cut
.
Let's see how it works with cut --help
.
$ echo "100 200 300 400" | teip -f -3 -- sed 's/./@/g'
@@@ @@@ @@@ 400
$ echo "100 200 300 400" | teip -f 2-4 -- sed 's/./@/g'
100 @@@ @@@ @@@
$ echo "100 200 300 400" | teip -f 1- -- sed 's/./@/g'
@@@ @@@ @@@ @@@
The -c
option allows you to select a range by character-base.
The below example is selecting 1st, 3rd, 5th, 7th characters and apply the sed
command to them.
$ echo ABCDEFG | teip -c 1,3,5,7
[A]B[C]D[E]F[G]
$ echo ABCDEFG | teip -c 1,3,5,7 -- sed 's/./@/'
@B@D@F@
As same as -f
, -c
's argument is compatible with cut
's LIST.
The -f
option recognizes delimited fields like awk
by default.
The continuous white spaces (all forms of whitespace categorized by Unicode) is interpreted as a single delimiter.
$ printf "A B \t\t\t\ C \t D" | teip -f 3 -- sed s/./@@@@/
A B @@@@ C D
This behavior might be inconvenient for the processing of CSV and TSV.
However, the -d
option in conjunction with the -f
can be used to specify a delimiter.
Now you can process the CSV file like this.
$ echo "100,200,300,400" | teip -f 3 -d , -- sed 's/./@/g'
100,200,@@@,400
In order to process TSV, the TAB character need to be typed.
If you are using Bash, type $'\t'
which is one of ANSI-C Quoting.
$ printf "100\t200\t300\t400\n" | teip -f 3 -d $'\t' -- sed 's/./@/g'
100 200 @@@ 400
teip
also provides -D
option to specify an extended regular expression as the delimiter.
This is useful when you want to ignore consecutive delimiters, or when there are multiple types of delimiters.
$ echo 'A,,,,,B,,,,C' | teip -f 2 -D ',+'
A,,,,,[B],,,,C
$ echo "1970-01-02 03:04:05" | teip -f 2-5 -D '[-: ]'
1970-[01]-[02] [03]:[04]:05
The regular expression of TAB character (\t
) can also be specified with the -D
option, but -d
has slightly better performance.
Regarding available notations of the regular expression, refer to regular expression of Rust.
You can also select particular lines that match a regular expression with -g
.
$ echo -e "ABC1\nEFG2\nHIJ3" | teip -g '[GJ]\d'
ABC1
[EFG2]
[HIJ3]
By default, whole the line including the given pattern is selected like the grep
command.
With -o
option, only matched parts are selected.
$ echo -e "ABC1\nEFG2\nHIJ3" | teip -og '[GJ]\d'
ABC1
EF[G2]
HI[J3]
Note that -og
is one of the useful idiom and freuquently used in this manual.
Here is an example of using \d
which matches numbers.
$ echo ABC100EFG200 | teip -og '\d+'
ABC[100]EFG[200]
$ echo ABC100EFG200 | teip -og '\d+' -- sed 's/.*/@@@/g'
ABC@@@EFG@@@
This feature is quite versatile and can be useful for handling the file that has no fixed form like logs, markdown, etc.
However, you should pay attention to use it.
The below example is almost the same as above one but \d+
is replaced with \d
.
$ echo ABC100EFG200 | teip -og '\d' -- sed 's/.*/@@@/g'
ABC@@@@@@@@@EFG@@@@@@@@@
Although the selected characters are the same, the result is different.
It is necessary to know the "Tokenization" of teip
in order to understand this behavior.
teip
divides the standard input into tokens.
A token that does not match the pattern will be displayed on the standard output as it is. On the other hand, the matched token is passed to the standard input of a targeted command.
After that, the matched token is replaced with the result of the targeted command.
In the next example, the standard input is divided into four tokens as follows.
echo ABC100EFG200 | teip -og '\d+' -- sed 's/.*/@@@/g'
ABC => Token(1)
100 => Token(2) -- Matched
EFG => Token(3)
200 => Token(4) -- Matched
By default, the matched tokens are combined by line breaks and used as the new standard input for the targeted command.
Imagine that teip
executes the following command in its process.
$ printf "100\n200\n" | sed 's/.*/@@@/g'
@@@ # => Result of Token(2)
@@@ # => Result of Token(4)
(It is not technically accurate but you can now see why $1
is used not $3
in one of the examples in "Getting Started")
After that, matched tokens are replaced with each line of result.
ABC => Token(1)
@@@ => Token(2) -- Replaced
EFG => Token(3)
@@@ => Token(4) -- Replaced
Finally, all the tokens are concatenated and the following result is printed.
ABC@@@EFG@@@
Practically, the above process is performed asynchronously. Tokens being printed sequentially as they become available.
Back to the story, the reason why a lot of @
are printed in the example below is that the input is broken up into many tokens.
$ echo ABC100EFG200 | teip -og '\d'
ABC[1][0][0]EFG[2][0][0]
teip
recognizes input matched with the entire regular expression as a single token.
\d
matches a single digit, and it results in many tokens.
ABC => Token(1)
1 => Token(2) -- Matched
0 => Token(3) -- Matched
0 => Token(4) -- Matched
EFG => Token(5)
2 => Token(6) -- Matched
0 => Token(7) -- Matched
0 => Token(8) -- Matched
Therefore, sed
loads many newline characters.