-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/xml: proposed fixes for namespaces #14407
Comments
If 4 is implemented, it would be nice if the null value could be on the namespace or on the local element name. Eg. if I want to unmarshal either of the following two XMPP errors: <defined-condition xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/>
<invalid-xml xmlns='urn:ietf:params:xml:ns:xmpp-stanzas'/> I might do something like the following: `xml:"urn:ietf:params:xml:ns:xmpp-stanzas _"` This could also use the extra space formatting, but that makes things unreadable at a glance, so I definitely wouldn't recommend it. |
An interesting use case. Unfortunately, I think it risks creating confusion: (4) suggests using |
Ah, I think I was misunderstanding what 4 was proposing. I've filed #14433 as a separate issue instead. Sorry for the noise. |
I am working on this issue now as my company needs to have this functionality gap covered ASAP. I would like to send some patch soon (i.e. this week). |
Is there any patches proposed for this yet, or a branch in progress? |
|
#7113 has a proposed fix. This fix keeps the tracking of depth active even when no indent is required. The request to expose the namespace prefixes (i.e. the existing maps) of the tags is fairly simple but there is loss of information on the actual structure of the XML document, i.e. usage might be confusing. |
Any likely progress on this issue ? I am currently having to 'fix' namespaces/bindings in XML documents by passing them through libxml for namespace cleanup. This is less than ideal. |
I played on a fork of
It does not address all of the issues addressed above, but at least relieves my own pain in my particular use-case. |
For someone just looking a way to compose valid request for certain server, there is working workaround: #9519 (comment) |
Issue #13400 lists a number of issues related to namespace handling in encoding/xml. It's been noted that this area needs a bit of a rethink. This issue documents a set of proposed fixes to address the problems currently seen. I've grouped the current set of bugs into 7 separate topics to be addressed:
1. Lack of control over prefixes
Encoder currently provides no mechanism for the user to specify namespace/prefix bindings. In theory, this shouldn't matter: documents using different prefixes for the same namespaces are equivalent. In practice, people often do care.
For namespaced attributes, Encoder generates namespace prefixes based on the last part of the URI, and for elements, it redeclares the default namespace as required. This results in documents that are technically correct, but not what the user wants, and the generated prefixes may be cumbersome.
This raises the question of how much control we should give over prefixes.
XML allows quite a lot of complexity: prefixes can be rebound to different namespace URIs within a document, and the same namespace URI can simultaneously be bound to multiple prefixes. There's
a very old post on xml-dev that make a plea to produce "sane" documents, that is, ones where a given prefix is only ever used to refer to a single namespace URI, and a given namespace URI is only ever represented by a single prefix:
https://lists.xml.org/archives/xml-dev/200204/msg00170.html
I suggest that it is sufficient that we support the creation of "sane" documents, noting that we should allow a single URI to be represented by both a prefix and the default namespace within a document (this is effectively what Encoder does currently).
I think that the current approach of using the default namespace for elements, and generated prefixes for attributes is a good default: generated prefixes may be ugly, so we should use them only where needed (i.e. attributes), but should provide a mechanism for users to specify their own.
Proposed approach:
Notes/questions:
Issues addressed:
#11496: Serializing XML with namespace prefix
#9519: support for XML namespace prefixes
I think #9519 is based on a misunderstanding of how the current system works, but it seems likely that the user actually wants control over prefix names. I'm not sure if the reporter realises that a prefix is meaningless (and illegal) without a namespace URI binding.
2. Inability to access/set namespace declaration (handling QName values)
Namespace bindings are sometimes used by element and attribute values. For example:
In order to correctly understand "a:blort" you need to know the currently effective namespace bindings. The same problem exists in reverse when encoding: you need to make sure that necessary namespace declarations are in place in the document.
Proposed approach:
We need to allow Unmarshalers and Marshalers to obtain and insert namespace bindings respectively. This means:
Issues addressed:
#12406: support QName values / expose namespace bindings
3. Specifying namespaces in tags is cumbersome
Currently namespaces for elements may only be specified by including the full namespace URI, e.g.:
Aside from being verbose and repetitive, it means URIs can't be changed at runtime. It's not uncommon to want to use the same struct for different namespaces, for example, where version number in the namespace has changed, or as per #12624, to cope with documents using a subtlely
wrong namespace.
Proposed solution:
Given the mechanism in (1) to allow the user to specify namespaces/prefix mappings, it makes it possible for a struct to unambiguously use prefixes to reference namespaces. The obvious notation
is QName notation:
Under this proposal it would be an error to use a prefix that hadn't been explicitly specified for the user (i.e. it won't use prefixes picked up from the document when decoding). Users might be surprised that the above wouldn't match the following document unless they'd
explicitly set the prefix "nsv1" on the Decoder:
but doing so would be inherently fragile, as it wouldn't work with the entirely equivalent:
Notes:
This proposal changes the behaviour of Encoder/Decoder for tags with a colon in them, which it's possible that existing code relies on. On the other hand the current behaviour of such tags is clearly a source of confusion and bugs and doesn't work for Decoding anyway (see #11496)
Issues addressed:
#9775: Unmarshal does not properly handle NCName in XML namespaces
I think the bug as described is invalid: it's not clear what you'd expect to happen given that the namespace being used is undeclared.
#12624: brittle support for matching a namespace by identifier or url
The exact requirement behind this bug is not totally clear: it appears that the user wants unmarshaled elements that have one of a number of namespaces. I don't understand "Xmlns wasn't defined, but the namespace was used (ie. for mRSS with media namespace)" - that sounds like invalid
XML.
4. "No namespace" indistinguishable from "any namespace" in struct tags
When decoding, the tag
xml:"foo"
means element "foo" in any namespace. There's no way to say that you want foo in the null namespace. i.e.This is a problem if a namespaced sibling of the same localname also exists. #8535 demonstrates this quite clearly.
Proposed approach:
Introduce a way of explicitly referencing the null namespace, e.g.
We could go for the logical, but horribly subtle:
(note the space before foo)
Issues addressed:
#8535: failure to handle conflicting tags in different namespaces
#11724: namespaced and non-namespaced attributes conflict
5. Bug: default namespace not set to null on un-namespaced children
It's not currently possible to produce the following XML:
If you produce
<c>
with a tag of:No xmlns declaration will be added, so
<c>
will inherit the namespace of it's parent<a>
. This is related to issue (4): we don't currently distinguish between "any namespace" and "no namespace".I can see two possible solutions here:
xml:"c"
as meaning "no namespace" and insert xmlns="" as required to make that so.xml:"c"
as meaning "any namespace" and make it inherit the namespace of its parent. If you really want no namespace, usexml:"_ c"
(or whatever notation we settle on for (4))I can see arguments both ways.
Issues addressed
#7113: encoding/xml: missing nested namespace not handled
6. Bug: xmlns attributes not removed from Token (Decode/Encode not idempotent)
Decoder includes xmlns attributes in start element tokens. For example, an attribute of xmlns:foo="bar" would be included as an attribute with a name of {Space: "xmlns", Local: "foo"}. This is very dubious. xmlns attributes are special, but which ever way you look at it "xmlns" is not
a namespace URI - if anything, it's a prefix.
This creates problems if you feed the output of a Decoder into an Encoder, as it treats "xmlns" as a namespace URI, and introduces namespace declarations for it.
There's no good reason to include these attributes. It's reasonable to expose the current set of namespace bindings (see point 2), but the attributes themselves are not needed. If a user really wants to do their own namespace processing, they should use RawToken.
Proposed solution:
Issues addressed:
#7535 Encoder duplicates namespace tags
7. Specifying xmlns attributes manually: allow or disallow?
Should we allow users to manually insert xmlns:* or xmlns="..." attributes?
#8167: disallow attributes named xmlns:*
#11431: encoding/xml: loss of xmlns= in encoding since Go 1.4
I don't think we need to support this, given the mechanism introduced under (1) and (3) above. One of the reasons why you might want to do it, is because namespace URIs are otherwise hard-coded into struct tags. The solution to (3) gives us a mechanism to avoid this.
That said, I'm struggling to see why we couldn't treat this as a call to add a preferred prefix - although there's a question of whether it should force the creation of the xmlns declaration if it's already in
scope.
8. Other issues
#11735: empty namespace conventions are badly documented
Yes, this should be clearer.
#8068: encoding/xml: empty namespace prefix definitions should be illegal
It sounds like this should be resolved as invalid.
The text was updated successfully, but these errors were encountered: