Skip to content

Commit

Permalink
python: Add 2/3 compat wrappers for byte strings
Browse files Browse the repository at this point in the history
Introduce some helpers for managing bytes/unicode objects in a way that
bridges the gap from python2 to 3.

1. Add printb() helper for writing bytes output directly to stdout. This
avoids complaints from print() in python3, which expects a unicode
str(). Since python 3.5, `b"" % bytes()` style format strings should
work and we can write tools with common code, once we convert format
strings to bytes.
http:https://legacy.python.org/dev/peps/pep-0461/

2. Add a class for wrapping command line arguments that are intended for
comparing to debugged memory, for instance running process COMM or
kernel pathname data. The approach takes some of the discussion from
http:https://legacy.python.org/dev/peps/pep-0383/ into account, though
unfortunately the python2-future implementation of "surrogateescape" is
buggy, therefore this iteration is partial.

The object instance should only ever be coerced into a bytes object.
This silently invokes encode(sys.getfilesystemencoding()), which if it
fails implies that the tool was passed junk characters on the command
line. Thereafter the tool should implement only bytes-bytes comparisons
(for instance re.search(b"", b"")) and bytes stdout printing (see
printb).

3. Add an _assert_is_bytes helper to check for improper usage of str
objects in python arguments. The behavior of the assertion can be
tweaked by changing the bcc.utils._strict_bytes bool.

Going forward, one should never invoke decode() on a bpf data stream,
e.g. the result of a table lookup or perf ring output. Leave that data
in the native bytes() representation.

Signed-off-by: Brenden Blanco <[email protected]>
  • Loading branch information
drzaeus77 committed Feb 8, 2018
1 parent c28f6e8 commit e663541
Showing 1 changed file with 57 additions and 0 deletions.
57 changes: 57 additions & 0 deletions src/python/bcc/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import ctypes as ct
import sys
import traceback
import warnings

from .libbcc import lib

Expand Down Expand Up @@ -39,3 +42,57 @@ def detect_language(candidates, pid):
res = lib.bcc_procutils_language(pid)
language = ct.cast(res, ct.c_char_p).value.decode()
return language if language in candidates else None

FILESYSTEMENCODING = sys.getfilesystemencoding()

def printb(s, file=sys.stdout):
"""
printb(s)
print a bytes object to stdout and flush
"""
buf = file.buffer if hasattr(file, "buffer") else file

buf.write(s)
buf.write(b"\n")
file.flush()

class ArgString(object):
"""
ArgString(arg)
encapsulate a system argument that can be easily coerced to a bytes()
object, which is better for comparing to kernel or probe data (which should
never be en/decode()'ed).
"""
def __init__(self, arg):
if sys.version_info[0] >= 3:
self.s = arg
else:
self.s = arg.decode(FILESYSTEMENCODING)

def __bytes__(self):
return self.s.encode(FILESYSTEMENCODING)

def __str__(self):
return self.__bytes__()

def warn_with_traceback(message, category, filename, lineno, file=None, line=None):
log = file if hasattr(file, "write") else sys.stderr
traceback.print_stack(f=sys._getframe(2), file=log)
log.write(warnings.formatwarning(message, category, filename, lineno, line))

# uncomment to get full tracebacks for invalid uses of python3+str in arguments
#warnings.showwarning = warn_with_traceback

_strict_bytes = False
def _assert_is_bytes(arg):
if arg is None:
return arg
if _strict_bytes:
assert type(arg) is bytes, "not a bytes object: %r" % arg
elif type(arg) is not bytes:
warnings.warn("not a bytes object: %r" % arg, DeprecationWarning, 2)
return ArgString(arg).__bytes__()
return arg

0 comments on commit e663541

Please sign in to comment.