-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing domain names #61
Comments
I'll start by saying, this is working as intended. Those domains are illegal IDNs. The few zones that have emoji Unicode characters encoded in the punycode are typically registries that perform no registry-side validation checking whatsoever like Samoa (.ws), or legacy registrations from before the standards were finalized. The clients that support emojis are due to transitional strategies moving from the old version of IDNA, but as software matures those domains will be supported less and less. See https://www.icann.org/en/system/files/files/sac-095-en.pdf for recent discussion of the issue, and the ICANN board resolved late last year to direct policy making to ensure adherence to the latest version of IDNA (i.e. not allowing emoji): https://features.icann.org/ssac-advisory-use-emoji-domain-names There may be an argument for debugging reasons or otherwise to skip the validation component of the IDNA (see #18 (comment) for related discussion). But it should certainly not be part of the default logic as it opens clients to all sorts of security issues that IDNA is designed to prevent, following issues like homophone attacks that were demonstrated with the earlier version. Based on your feedback, the domains you identified, after removing duplicates, appear only in a small number of zones. Here are the zones with 5 or more domains in the list:
Without knowing the source or comprehensiveness of the list this doesn't strike me as widespread use. |
Hi everyone, I've run into a domain that is registered but fails to decode: xn--irland-jc1c.com
You can check whois or click the link. :) So this domain totally exists – is this a registry fail? What should I do if I have to deal with this? Since this is not about emoji, I wasn't sure whether to recycle this issue or open a new one? |
@hynek IDNA2008 disallows symbols and punctionation which € is in the category "Sc" or "Symbols Currency". If you have to deal with it you could use IDNA 2003: >>> b"xn--irland-jc1c.com".decode("idna")
'ir€land.com' It seems like a registry not following standards. :) |
Is this related? I really expected this to work. I'm not trying to decode random, possibly-invalid punycode, but rather encode a domain that works with nearly all browsers.
I initially came across this while using Scrapy. scrapy/scrapy#4330 (comment) |
The
|
IDNA2008 has different, mostly stricter, rules from IDNA2003 about what characters are allowed in domain names. Emoji aren't allowed, so the code is doing what it is supposed to. |
Confirming that |
Closing this issue, I'll keep issue #18 open to track potential changes relating to this. Please add any additional commentary there. |
I've been looking at IDNA domain registrations and using your library in conjunction with the built in python tools.
The IDNA package is a HUGE life saver for me. I monitored approximately over 111,000 IDNA domains being registered, and a small percentage of them failed. I've attached the output that I thought you might find useful.
As you can see above, there is an uptick of registrations of emoji domains now. Although it is not part of the specification, it would be very helpful if that was incorporated into this package.
failed_output.txt
The text was updated successfully, but these errors were encountered: