support .format for bytes #48232

benjaminp · 2008-09-27T15:50:41Z

BPO	3982
Nosy	@loewis, @warsaw, @brettcannon, @terryjreedy, @gpshead, @ncoghlan, @pitrou, @vstinner, @ericvsmith, @tiran, @benjaminp, @glyph, @ezio-melotti, @florentx, @vadmium, @serhiy-storchaka
Files	byte_format.py: Imitate str.format with bytes function

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2014-07-26.07:00:02.695>
created_at = <Date 2008-09-27.15:50:40.624>
labels = ['interpreter-core', 'type-feature']
title = 'support .format for bytes'
updated_at = <Date 2016-06-10.22:00:08.951>
user = 'https://github.com/benjaminp'

bugs.python.org fields:

activity = <Date 2016-06-10.22:00:08.951>
actor = 'ncoghlan'
assignee = 'none'
closed = True
closed_date = <Date 2014-07-26.07:00:02.695>
closer = 'ncoghlan'
components = ['Interpreter Core']
creation = <Date 2008-09-27.15:50:40.624>
creator = 'benjamin.peterson'
dependencies = []
files = ['32009']
hgrepos = []
issue_num = 3982
keywords = []
message_count = 95.0
messages = ['73931', '73935', '73936', '73937', '73938', '73939', '74019', '74021', '74022', '74050', '84121', '84123', '90421', '90423', '90425', '90428', '127210', '130215', '130253', '130284', '163369', '163379', '171791', '171795', '171796', '171799', '171800', '171801', '171803', '171804', '171806', '171815', '171816', '171821', '171824', '180414', '180415', '180416', '180419', '180420', '180423', '180426', '180427', '180430', '180431', '180432', '180433', '180436', '180437', '180439', '180441', '180442', '180445', '180446', '180447', '180448', '180449', '180452', '180453', '180454', '180466', '180489', '180490', '180491', '180492', '180493', '180500', '198112', '199181', '199199', '199203', '199204', '199206', '199207', '199251', '199253', '199254', '199258', '199260', '199264', '199265', '199266', '199267', '199268', '199270', '199271', '199432', '199438', '223976', '223979', '224022', '224023', '266568', '268157', '268160']
nosy_count = 26.0
nosy_names = ['loewis', 'barry', 'brett.cannon', 'terry.reedy', 'gregory.p.smith', 'exarkun', 'ncoghlan', 'pitrou', 'vstinner', 'eric.smith', 'christian.heimes', 'benjamin.peterson', 'glyph', 'ezio.melotti', 'durin42', 'Arfrever', 'arjennienhuis', 'flox', 'ecir.hana', 'uau', 'tshepang', 'underrun', 'martin.panter', 'serhiy.storchaka', '[email protected]', 'stendec']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue3982'
versions = ['Python 3.5']

benjaminp · 2008-09-27T15:50:40Z

I just working on porting some networking code from 2.x to 3.x and it
heavily uses string formatting. Since bytes don't support any kind of
formatting, it's becoming tedious and inelegant to do it with "+". Can
.format be supported in bytes?

[I understand format is implemented with stringlib so shouldn't it be
fairly easy to implement?]

ericvsmith · 2008-09-27T17:33:52Z

Yes, it would be easy to add. Maybe bring this up on python-dev (or
python-3000) to get consensus?

Are we in feature freeze for 3.0?

benjaminp · 2008-09-27T17:35:10Z

On Sat, Sep 27, 2008 at 12:33 PM, Eric Smith <[email protected]> wrote:

Eric Smith <[email protected]> added the comment:

Yes, it would be easy to add. Maybe bring this up on python-dev (or
python-3000) to get consensus?

Yes, that will have to be done.

Are we in feature freeze for 3.0?

Unfortunately, yes.

loewis · 2008-09-27T17:35:37Z

I'm skeptical. What networking code specifically are you using, and what
specifically does it use string formatting for?

benjaminp · 2008-09-27T17:39:01Z

On Sat, Sep 27, 2008 at 12:35 PM, Martin v. Löwis
<[email protected]> wrote:

Martin v. Löwis <[email protected]> added the comment:

I'm skeptical. What networking code specifically are you using, and what
specifically does it use string formatting for?

I'm working on the tests for ftplib. [1] The dummy server uses string
formatting to build responses.

[1] https://svn.python.org/view/python/trunk/Lib/test/test_ftplib.py?view=markup

loewis · 2008-09-27T18:42:02Z

I'm working on the tests for ftplib. [1] The dummy server uses string
formatting to build responses.

I see. I propose to add a method push_string, defined as

  def push_string(self, s):
      self.push(s.encode("ascii")

In FTP, the responses are, by definition, ASCII-encoded strings.
The proper way to generate them is to make a string, then encode it.

vstinner · 2008-09-29T10:22:20Z

I don't think that b'...'.format() is a good idea. Programmers will
continue to mix characters and bytes since .format() target are
characters.

ericvsmith · 2008-09-29T10:50:58Z

I don't think that b'...'.format() is a good idea. Programmers
will continue to mix characters and bytes since .format() target
are characters.

b''.format() would return bytes, not a string. This is also how it works
in 2.6.

I'm also not sold on implementing it, although it would be easy and I
can see a few uses for it. I think Martin's suggesting of encoding back
to ascii might be the best thing to do (that is, don't implement
b''.format()).

vstinner · 2008-09-29T10:56:38Z

I think Martin's suggesting of encoding back to ascii might be
the best thing to do

As I understand, you would like to use bytes as characters, like
b'{code} {message}'.format(code=100, message='OK'). So why no using
explicit conversion to ASCII? ftp='{code} {message}'.format(code=100,
message='OK').encode('ASCII').

If you need to work on bytes, it means that you will use the full
range 0..255 whereas ASCII reject bytes in 128..255.

loewis · 2008-09-29T21:33:09Z

> I think Martin's suggesting of encoding back to ascii might be
> the best thing to do

As I understand, you would like to use bytes as characters, like
b'{code} {message}'.format(code=100, message='OK'). So why no using
explicit conversion to ASCII? ftp='{code} {message}'.format(code=100,
message='OK').encode('ASCII').

That's indeed exactly what I had proposed - only that you shouldn't
repeat the .encode('ascii') all over the place, but instead wrap that
into a function (which I proposed to call push_string, along with the
existing .push function.

vstinner · 2009-03-24T23:28:28Z

loewis> That's indeed exactly what I had proposed
loewis> - only that you shouldn't repeat the .encode('ascii')
loewis> all over the place, (...)

If you can only use bytes 0..127, it can not used for binary protocols
and so I don't think that it's really useful. If your protocol is
ASCII text, use explicit conversion to ASCII.

I also not fan on functions having different result type
(format->bytes or str, it depends...).

ericvsmith · 2009-03-24T23:37:06Z

I also not fan on functions having different result type
(format->bytes or str, it depends...).

In 3.x, str.format() and bytes.format() would be two different methods
on two different objects. I don't think there's any expectation that
they have the same return type. There's no such expectation for
str.strip() and bytes.strip() either.

Similarly, in 2.6, str.format() has a different return type than
unicode.format().

Now the builtin format() function is another issue. In 2.6 the return
type does depend on the types of the arguments. In 3.x, I'd suggest
leaving it as unicode and you won't be allowed to pass in bytes.

arjennienhuis · 2009-07-11T13:54:38Z

There are many binary formats that use ASCII numbers.

'HTTP chunking' uses ASCII mixed with binary (octets).

With 2.6 you could write:

def chunk(block):
    return b'{0:x}\r\n{1}\r\n'.format(len(block), block)

With 3.0 you'd have to write this:

def chunk(block):
    return format(len(block), 'x').encode('ascii') + b'\r\n' + block +
b'\r\n'

You cannot convert to ascii at the end of the pipeline as there are
bytes > 127 in the data blocks.

loewis · 2009-07-11T15:52:16Z

def chunk(block):
return format(len(block), 'x').encode('ascii') + b'\r\n' + block +
b'\r\n'

You cannot convert to ascii at the end of the pipeline as there are
bytes > 127 in the data blocks.

I wouldn't write it in such a complicated way. Instead, use

def chunk(block):
   return hex(len(block)).encode('ascii') + b'\r\n' + block + b'\r\n'

This doesn't need any format call, and describes adequatly how the
protocol works: send an ASCII-encoded hex length, send CRLF, send
the block, then send another CRLF. Of course, I would probably write
that into the socket right away, rather than copying it into a different
bytes object first.

arjennienhuis · 2009-07-11T16:28:44Z

def chunk(block):
return hex(len(block)).encode('ascii') + b'\r\n' + block + b'\r\n'

hex(10) returns '0xa' instead of 'a'.

This doesn't need any format call, and describes adequatly how the
protocol works: send an ASCII-encoded hex length, send CRLF, send
the block, then send another CRLF. Of course, I would probably write
that into the socket right away, rather than copying it into a different
bytes object first.

The point is that need to convert to ascii for each int that you send.
You cannot just wrap the socket with an encoding. This makes porting
difficult.

loewis · 2009-07-11T16:47:34Z

hex(10) returns '0xa' instead of 'a'.

Ah, right. So I would still use

'{0:x}'.format(100).encode("ascii")

rather than the format builtin format function. Actually, I would
probably use

('%x' % len(bytes)).encode("ascii")

The point is that need to convert to ascii for each int that you send.
You cannot just wrap the socket with an encoding. This makes porting
difficult.

This I don't understand. What porting becomes more difficult?
From 2.x to 3.x? Why do you have any .format calls in your code that you
want to port - .format was only added in 2.6, so if you want to support
2.x, you surely are not using .format, are you?

uau · 2011-01-27T18:54:02Z

This kind of formatting is needed quite often when working on network protocols or file formats, and I think the replies here fail to address important issues. In general you can't encode after formatting, as that doesn't work with binary data, and often it's not appropriate for the low-level routines doing the formatting to know what charset the data is in even if it is text (so it should be fed in already encoded as bytes). The replies from Martin v. Löwis seem to argue that you could use methods other than formatting; that would work almost as well as an argument to remove formatting support from text strings, and IMO cases where formatting is the best option are common.

Here's an example (based on real use but simplified):

template = b"""
stuff here
header1: {}
header2: {}
more stuff
"""

def lowlevel_send(s, b1, b2):  # s socket, b1 and b2 bytes
    s.send(template.format(b1, b2))

To clarify the requirements a bit, the issue is not so much about having a .format method on byte string objects (that's just the most natural-looking way of solving it); the core requirement is to have a formatting operator that can take byte strings as *arguments* and produce byte string *output* where the arguments can be placed unchanged.

terryjreedy · 2011-03-07T00:47:09Z

For future reference, struct.pack, not mentioned here, is a binary bytes formatting function. It can mix ascii bytes with binary octets. It works the same in Python 2 and 3.

Str.bytes does two things: convert objects to strings according to the contents of field specifiers; interpolate the resulting strings into a template string according to the locations of the field specifiers. If desired bytes represent encoded text, then encoding computed text is the obvious Py3 solution.

For some mixed ascii-binary uses, struct.pack is not as elegant as a bytes.format might be. But I think such a method should use struct format codes within field specifiers to convert objects into binary bytes rather than text.

arjennienhuis · 2011-03-07T12:34:55Z

struct.pack does not work with variable length data. Something like:

b'{0:x}\r\n{1}\r\n'.format(len(block), block)

or

b'%x\r\n%s\r\n' % (len(block), block)

is not possible with struct.pack

terryjreedy · 2011-03-07T19:09:19Z

You are right, I misinterpreted the meaning of 's' without a count (and opened bpo-11436 to clarify). However, for the fairly common case where a variable-length binary block is preceded by a 4 byte *binary* count, one can do something which is not too bad:

>>> block = b'lsfjdlksaj'
>>> n=len(block)
>>> struct.pack('I%ds'%n, n, block)
b'\n\x00\x00\x00lsfjdlksaj'

If leading blanks are acceptable for your example with count as ascii hex digits, one can do something that I admit is worse:

>>> struct.pack('10s%ds2s'%n, ('%8x\r\n'%n).encode(), block, b'\r\n')
b'       a\r\nlsfjdlksaj\r\n'

Of course, for either of these in isolation, I would probably only use .pack for the binary conversion and otherwise use '+' or b''.join(...).

uau · 2012-06-21T21:21:05Z

I've hit this limitation a couple more times, and none of the proposed workarounds are adequate. Working with protocols and file formats that use human-readable markup is significantly clumsier than it was with Python 2 (using either the % operator, which also lost its support for byte strings in Python 3, or .format()).

This bug report was closed by its original creator, after early posts where IMO nobody made as good a case for the feature as they could have. Is it possible to reopen this bug or is it necessary to file a new one?

Is there any clear argument AGAINST having .format() for bytes, other than work needed to implement it? Some posts mention "mixing characters and bytes", but I see no reason why this would be much of a real practical concern if it's a method on bytes objects producing bytes output.

terryjreedy · 2012-06-21T23:41:28Z

If you want to discuss this issue further, I think you post to python-ideas list with concrete examples.

exarkun · 2012-10-02T12:05:43Z

Since Benjamin originally requested this feature, and then decided that he could accomplish his desired goal (ftplib porting, as far as I can tell) without it, I think that the "rejected" status is actually incorrect. I think that Benjamin just wanted to indicate that he no longer needed the feature. This doesn't mean that no one else will need the feature, and as it turns out the comments seem to reveal that other people do need the feature (also, I need the feature).

So, adjusting the ticket metadata to reflect that this is a valid feature request just waiting for someone to implement it, not a rejected idea that is not welcome in Python.

tiran · 2012-10-02T12:40:11Z

The proposal sounds like a good idea to me.

Benjamin, what needs to be done to implement the feature?

serhiy-storchaka · 2012-10-02T13:08:50Z

Formatting is a very complicated part of Python (especially after Victor's optimizations). I think no one wants to maintain this code for a long time. The price of maintaining exceeds the potential very limited benefits from the use.

ericvsmith · 2012-10-02T13:16:13Z

I was just logging in to make this point, but Serhiy beat me to it. When I wrote several years ago that this was "easy", it was before the (awesome) PEP-393 work. I suspect, but have not verified, that having a bytes version of this code would now require an implementation that shared very little with the str version.

So I think Martin's advice to just encode to ascii is the best course of action.

pitrou · 2013-10-08T08:53:47Z

I'd like to put a nudge towards supporting the __mod__ interface on bytes -
for Mercurial this is the single biggest impediment to even getting our
testrunner working, much less starting the porting process.

Given a spec hasn't been written (bytes.__mod__ can't support the same things as str.__mod__), and nobody seems to step up to write it, I'd say this is unlikely to appear in 3.4.

durin42 · 2013-10-08T12:55:55Z

Is there any chance we could just have it work for bytes, ints, and floats? That'd solve the immediate need, and it'd be obviously correct how to have those behave.

Punting this to 3.5 basically means we'll have to either wait for 3.5, or do something awful like use cffi to grab sprintf to port Mercurial.

ericvsmith · 2013-10-08T13:35:50Z

If you could write up a concrete proposal, including which format specifiers would be supported, that would be helpful.

Would it be extensible with something like __bformat__?

There's really quite a bit of work to be done to specify how this would work.

ericvsmith · 2013-10-08T13:38:09Z

Also, with the PEP-393 changes, the implementation will be much more difficult. Sharing code with str (unicode) will likely be impossible, or require much refactoring of the existing code.

pitrou · 2013-10-08T15:08:37Z

Is there any chance we could just have it work for bytes, ints, and
floats? That'd solve the immediate need, and it'd be obviously
correct how to have those behave.

You mean "%s" and "%d"?

Punting this to 3.5 basically means we'll have to either wait for
3.5, or do something awful like use cffi to grab sprintf to port
Mercurial.

Or write a pure Python implementation.

durin42 · 2013-10-08T15:10:00Z

On Tue, Oct 8, 2013 at 11:08 AM, Antoine Pitrou <[email protected]>wrote:

> Is there any chance we could just have it work for bytes, ints, and
> floats? That'd solve the immediate need, and it'd be obviously
> correct how to have those behave.

You mean "%s" and "%d"?

Basically, yes.

> Punting this to 3.5 basically means we'll have to either wait for
> 3.5, or do something awful like use cffi to grab sprintf to port
> Mercurial.

Or write a pure Python implementation.

Hah. Probably too slow for anything beyond a proof of concept, no?

glyph · 2013-10-08T21:10:13Z

On Oct 8, 2013, at 8:10 AM, Augie Fackler <[email protected]> wrote:

Hah. Probably too slow for anything beyond a proof of concept, no?

It should perform acceptably on PyPy ;-).

pitrou · 2013-10-08T21:11:40Z

> > Punting this to 3.5 basically means we'll have to either wait for
> > 3.5, or do something awful like use cffi to grab sprintf to port
> > Mercurial.
>
> Or write a pure Python implementation.

Hah. Probably too slow for anything beyond a proof of concept, no?

If it's only for the Mercurial test suite, that shouldn't be a problem?

durin42 · 2013-10-08T21:17:11Z

On Tue, Oct 8, 2013 at 5:11 PM, Antoine Pitrou <[email protected]>wrote:

Antoine Pitrou added the comment:

> > > Punting this to 3.5 basically means we'll have to either wait for
> > > 3.5, or do something awful like use cffi to grab sprintf to port
> > > Mercurial.
> >
> > Or write a pure Python implementation.
>
> Hah. Probably too slow for anything beyond a proof of concept, no?

If it's only for the Mercurial test suite, that shouldn't be a problem?

It's not just the testsuite though: we do this _all over_ hg itself. For
example, status needs to do something like this:

sys.stdout.write('%(state)s %(path)s\n' % {'state': 'M', 'path':
'some/filesystem/path'})

except we don't know the encoding of the filesystem path (Hi unix!) so we
have to treat the whole thing as opaque bytes. It's even more fun for
'log', becase then it's got localized strings in it as well.

vstinner · 2013-10-08T21:24:53Z

2013/10/8 Augie Fackler <[email protected]>:

sys.stdout.write('%(state)s %(path)s\n' % {'state': 'M', 'path':
'some/filesystem/path'})

except we don't know the encoding of the filesystem path (Hi unix!) so we
have to treat the whole thing as opaque bytes.

You are doing it wrong. In Python 3, you "should" store filenames as
Unicode (str type). If Python fails to decode a filename, undecodable
bytes are stored as surrogate characters (see the PEP-383).

The Unicode type became natural in Python 3, as byte string (old "str"
type) was natural in Python 2.

sys.stdout.write() expects a Unicode string, not a byte string.

Does it mean that Mercurial is moving to Python 3? Cool :-)

ericvsmith · 2013-10-08T21:35:39Z

I've lost track what we were talking about. I thought we were trying to support b'<something>'.format() in 3.4, for a restricted set of arguments.

I don't see how a third-party package is going to help, if the goal is to allow 3.4 to be source compatible with 2.7. And the recent example uses %-formatting, which is not the subject of this ticket.

What proposal is actually on the table here?

glyph · 2013-10-08T22:19:14Z

On Oct 8, 2013, at 2:35 PM, Eric V. Smith wrote:

What proposal is actually on the table here?

Sorry Eric, you're right, there is too much discussion here. This issue ought to be about .format, like the title says. There should be a separate ticket for %-formatting, since it seems to be an almost wholly unrelated task. While I'm sympathetic to Mercurial's issues, they're somewhat different from Twisted's, in that we're willing to adopt the "one new way" to do things in order to achieve compatibility whereas that would be too hard for Mercurial.

durin42 · 2013-10-08T22:19:42Z

On Oct 8, 2013, at 5:24 PM, STINNER Victor <[email protected]> wrote:

STINNER Victor added the comment:

2013/10/8 Augie Fackler <[email protected]>:
> sys.stdout.write('%(state)s %(path)s\n' % {'state': 'M', 'path':
> 'some/filesystem/path'})
>
> except we don't know the encoding of the filesystem path (Hi unix!) so we
> have to treat the whole thing as opaque bytes.

You are doing it wrong. In Python 3, you "should" store filenames as
Unicode (str type). If Python fails to decode a filename, undecodable
bytes are stored as surrogate characters (see the PEP-383).

No, I'm not. In Mercurial, all end-user data is OPAQUE BYTES, and must remain that way. We're not able to change either our on-disk data format OR our stdout format, even to support a newer version of Python. I don't know the encoding of the filename's bytes, but I _must_ faithfully reproduce them exactly as they are or I'll break tools like make(1) and patch(1). Similarly, if a file goes from ISO-8859-1 to UTF-8, I have to emit a diff that has some ISO bytes and some UTF bytes - it's not in *any* valid encoding. Changing that is a showstopper regression.

The Unicode type became natural in Python 3, as byte string (old "str"
type) was natural in Python 2.

sys.stdout.write() expects a Unicode string, not a byte string.

Ouch. Is there any way to write things to stderr and stdout without decoding and hopelessly breaking user data?

Does it mean that Mercurial is moving to Python 3? Cool :-)

Not likely, honestly. I tackle this when I've got some spare cycles and my ability to handle pain is high. As it stands, I have the test-runner barely working, but it's making wrong assumptions to get there. The best estimate is that it's a year of work to upgrade to Python 3.

----------

Python tracker <[email protected]>
<https://bugs.python.org/issue3982\>

durin42 · 2013-10-08T22:20:51Z

On Oct 8, 2013, at 6:19 PM, Glyph Lefkowitz <[email protected]> wrote:

Glyph Lefkowitz added the comment:

On Oct 8, 2013, at 2:35 PM, Eric V. Smith wrote:

> What proposal is actually on the table here?

Sorry Eric, you're right, there is too much discussion here. This issue ought to be about .format, like the title says. There should be a separate ticket for %-formatting, since it seems to be an almost wholly unrelated task. While I'm sympathetic to Mercurial's issues, they're somewhat different from Twisted's, in that we're willing to adopt the "one new way" to do things in order to achieve compatibility whereas that would be too hard for Mercurial.

Yeah, my bad too. I suppose I should add a new bug for %-formatting on bytes objects?

Note that for hg, we can't drop Python 2.6 or so (we'll only drop *2.4* if we can do 2.6 and some 3.x from a single source tree) for a while, due to supporting the system interpreter on a variety of LTS platforms.

terryjreedy · 2013-10-08T22:28:02Z

Augie, to understand what Viktor meant, I suggest reading
https://www.python.org/dev/peps/pep-0383/
One point of the pep is round-trip filenames without loss on all systems, which is just what you say you need.

durin42 · 2013-10-08T22:31:18Z

On Oct 8, 2013, at 6:28 PM, "Terry J. Reedy" <[email protected]> wrote:

https://www.python.org/dev/peps/pep-0383/
One point of the pep is round-trip filenames without loss on all systems, which is just what you say you need.

At a quick skim, likely not good enough, because https://en.wikipedia.org/wiki/Shift_JIS isn't completely ASCII-compatible, and we've got a fair number of users on weird Shift-JIS using platforms.

glyph · 2013-10-08T22:45:40Z

On Oct 8, 2013, at 3:19 PM, Augie Fackler wrote:

No, I'm not. In Mercurial, all end-user data is OPAQUE BYTES, and must remain that way.

The PEP-383 technique for handling file names is completely capable of round-tripping exact bytes, given one encoding for both input and output. You can still handle file names this way internally in Mercurial and not risk disturbing any observable output. You do not need to change that in order to do what Victor suggests.

We should get together in some other forum and discuss file-name handling though, since you can't actually round-trip "opaque bytes" through a *filesystem* and not disturb your output.

Ouch. Is there any way to write things to stderr and stdout without decoding and hopelessly breaking user data?

You can use sys.stdout.buffer.write.

terryjreedy · 2013-10-09T00:13:58Z

Here is a proof of concept Python function, with a minimal test. It is similar to how str.format could be coded in Python, with re.split and ''.join, except that it does not allow anything before : in the format specification. By default (no format spec given), it copies bytes objects without change. If a format specification *is* given, it does not restrict the object, as this code simply uses builtin format sandwiched between decode and encode.

ezio-melotti · 2013-10-11T01:18:51Z

You can use sys.stdout.buffer.write.

Note that there's no guarantee that sys.stdout.buffer exists, e.g. if sys.stdout has been replaced with a StringIO.

glyph · 2013-10-11T02:01:25Z

Tempting as it is to reply to the comment about 'buffer' not existing, we're way off topic here. Let's please keep further comments on this bug to issues about a 'format' methods on the 'bytes' object.

underrun · 2014-07-25T17:31:11Z

First off, +1 for this feature. It's not just for twisted, but anyone doing anything with binary data (storage, compression, encryption and networking for me) with python since 2.6 will very likely have been using .format for building messages. I know I have and obviously others have been doing so as well.

The advantages of .format to me are:

compatible with 2.6 (porting and single code base support easier)
ease of composition (the format langauge makes it easy to build complex data structures out of bytes)
readability (named fields make complex formats obvious)
consistency (manipulating a block of bytes or characters can be done in a similar way)

Specific comments on the patch supplied by terry.reedy:

it doesn't support named fields
it doesn't handle padding
it doesn't handle nested formats (like '{0:{1}>{2}}'.format(data,pad_char,pad_width)
formatting byte strings with a width embedds the repr of the byte string ( bf(b'{:>10}', [b'test']) == b" b'test'" )

Really this isn't a good way to solve the problem.

Has a PEP been created for this? If not how can I help make that happen?

Including this in 3.5 would be so helpful for us low level systems programmers out there who have lots of code using .format for binary interfaces in python 2.6/2.7 already.

Also, not to add to derailment, but if we're adding a .format for python3 bytes it would be great if .format could pad with the null byte ('\0') which it currently converts to spaces internally (which is strange). Since this unexpected conversion is bad (so padding with null doesn't happen in python2) its more like a bug fix... actually - maybe that's a separate bug to file on the current .format for text...

underrun · 2014-07-25T17:42:30Z

sorry, terry's patch does handle padding - just with the caveats i listed later. i should have removed that bullet.

terryjreedy · 2014-07-26T06:51:41Z

https://legacy.python.org/dev/peps/pep-0461/
adds % formatting for bytes and bytes array.

Nick, I have the impression that there was a decision to not add bytes.format. Correct? If so, this issue should be closed. If not, what, if anything, has been decided?

ncoghlan · 2014-07-26T07:00:02Z

Right, bytes.format was considered as part of the PEP-461 discussions, and rejected as an operation that only made sense in the text domain: https://www.python.org/dev/peps/pep-0461/#proposed-variations

With PEP-461 accepted, and PEP-460 withdrawn, that means we won't be adding bytes.format and bytearray.format.

bpo-20284 covers the implementation of PEP-461.

gpshead · 2016-05-28T18:33:11Z

This came up in the language summit today when discussing twisted. .format() is still not supported on bytes though % is in 3.5.

realistically it sounded like twisted needs to support python 3.4 for many years so they can't rely on bytes having a .format() method that also works on 2.7 anyways... but assuming .format() is only useful for text may still have been an oversight. (i'll have to go re-read PEP-460 and 461 and discussion before commenting further)

underrun · 2016-06-10T21:02:58Z

Gregory - I'm glad that you're willing to consider this again. It still is a constant issue for me, and .format with variable width fields in binary protocols is so the right tool for the job. If there is anything I can do to help get this added to 3.6 let me know. The forward/backward compatibility issue is secondary to me to the flexibility gained from having .format available for bytes.

Also padding with null bytes that don't get converted would be awesome.

ncoghlan · 2016-06-10T22:00:08Z

The core problem with the idea of adding bytes.format to Python 3 is that the real power of str.format actually lies in the extensible __format__ protocol and the associated format() builtin, as those rely heavily on text-specific assumptions.

I interpreted Amber's comments at the language summit as referring more to our changing tune regarding mod formatting from:

mod formatting is deprecated, use brace formatting instead; to
they're both fully supported, neither is deprecated; to
use brace formatting for text data, mod formatting for binary data

Folks that followed our original "stop using mod formatting" guidance thus needed to change course when it became our recommended technique for formatting binary data.

Since we now know format() and __format__ aren't suitable for binary data (PEP-361 originally included it, and it got dropped as we kept finding awkward corner cases), that means any new binary formatting proposal needs to explain:

how it compares to existing serialisation techniques (mod-formatting, the struct module, text-formatting+encoding, etc)
why it needs to be a builtin method or function rather than a new serialisation module

sosi-deadeye · 2024-01-16T10:26:04Z

I could have sworn that bytes.format had been implemented. When I needed it once, I came to the realization that this method never existed in Python 3.0, but it did in Python 2.7.

I also remember that bytes.format triggered an error if the input was of data type str.

Who else has this false memory?
Is this the Mandela effect?

benjaminp added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Sep 27, 2008

benjaminp closed this as completed Feb 19, 2010

exarkun mannequin reopened this Oct 2, 2012

ncoghlan closed this as completed Jul 26, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

support .format for bytes #48232

support .format for bytes #48232

Comments

benjaminp commented Sep 27, 2008

benjaminp commented Sep 27, 2008

ericvsmith commented Sep 27, 2008

benjaminp commented Sep 27, 2008

loewis mannequin commented Sep 27, 2008

benjaminp commented Sep 27, 2008

loewis mannequin commented Sep 27, 2008

vstinner commented Sep 29, 2008

ericvsmith commented Sep 29, 2008

vstinner commented Sep 29, 2008

loewis mannequin commented Sep 29, 2008

vstinner commented Mar 24, 2009

ericvsmith commented Mar 24, 2009

arjennienhuis mannequin commented Jul 11, 2009

loewis mannequin commented Jul 11, 2009

arjennienhuis mannequin commented Jul 11, 2009

loewis mannequin commented Jul 11, 2009

uau mannequin commented Jan 27, 2011

terryjreedy commented Mar 7, 2011

arjennienhuis mannequin commented Mar 7, 2011

terryjreedy commented Mar 7, 2011

uau mannequin commented Jun 21, 2012

terryjreedy commented Jun 21, 2012

exarkun mannequin commented Oct 2, 2012

tiran commented Oct 2, 2012

serhiy-storchaka commented Oct 2, 2012

ericvsmith commented Oct 2, 2012

pitrou commented Oct 8, 2013

durin42 mannequin commented Oct 8, 2013

ericvsmith commented Oct 8, 2013

ericvsmith commented Oct 8, 2013

pitrou commented Oct 8, 2013

durin42 mannequin commented Oct 8, 2013

glyph mannequin commented Oct 8, 2013

pitrou commented Oct 8, 2013

durin42 mannequin commented Oct 8, 2013

vstinner commented Oct 8, 2013

ericvsmith commented Oct 8, 2013

glyph mannequin commented Oct 8, 2013

durin42 mannequin commented Oct 8, 2013

durin42 mannequin commented Oct 8, 2013

terryjreedy commented Oct 8, 2013

durin42 mannequin commented Oct 8, 2013

glyph mannequin commented Oct 8, 2013

terryjreedy commented Oct 9, 2013

ezio-melotti commented Oct 11, 2013

glyph mannequin commented Oct 11, 2013

underrun mannequin commented Jul 25, 2014

underrun mannequin commented Jul 25, 2014

terryjreedy commented Jul 26, 2014

ncoghlan commented Jul 26, 2014

gpshead commented May 28, 2016

underrun mannequin commented Jun 10, 2016

ncoghlan commented Jun 10, 2016

sosi-deadeye commented Jan 16, 2024