Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "no UTF-8 stripping URL" option #4032

Open
marcanuy opened this issue Oct 31, 2017 · 7 comments
Open

Add a "no UTF-8 stripping URL" option #4032

marcanuy opened this issue Oct 31, 2017 · 7 comments
Milestone

Comments

@marcanuy
Copy link

marcanuy commented Oct 31, 2017

I am working with Chinese content (using UTF-8), while most of the time it generates the right url, sometimes it strips certain Chinese characters from URL.

Some examples of these characters are:

When generating a page for each character, i.e.: example.com/post/〇 it generates empty paths example.com/post// .

Steps

To reproduce the bug, add

slug: "foo〇○〡〤〢⺮〣21三bar"

in the front matter of any page Hugo will generate the following stripped path:

http:https://localhost:1313/post/foo21三bar/` 

removing 〇○〡〤〢⺮〣.

*Tested with latest Hugo release: Hugo Static Site Generator v0.30.2 linux/amd64 BuildDate: 2017-10-19T08:34:27-03:00, SO: 4.10.0-37-generic #41-Ubuntu SMP Fri Oct 6 20:20:37 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux Ubuntu 17.04 *

(x-post: stackoverflow.com, forum)

@bep bep reopened this Oct 31, 2017
@bep
Copy link
Member

bep commented Oct 31, 2017

@marcanuy I'm reopening this. I quoted you a part of the comment describing the current behaviour. I'm sure the original motivation for this "unicode sanitize" was good and founded in file system support or something (that function precedes my time on Hugo).

So, we cannot just change that behaviour, that would break lots of sites. But we could consider adding some "no URL sanitize whatsoever" option.

@bep bep changed the title Hugo stripping unicode characters from page's slug Add a "no UTF-8 stripping URL" option Oct 31, 2017
@bep bep added this to the v0.32 milestone Oct 31, 2017
@marcanuy
Copy link
Author

marcanuy commented Nov 1, 2017

Great, a configuration flag to avoid it would be really helpful, especially for SEO purposes.

@bep bep modified the milestones: v0.32, v0.33 Dec 16, 2017
@biodranik
Copy link
Contributor

biodranik commented Dec 23, 2017

Am I right that this issue is about the same IRI/IRL (International Resource Identifier/Locator) support as this forum topic https://discourse.gohugo.io/t/bug-feature-hugo-wrong-support-non-acsii-symbols-in-url/8375 and closed issue #3039?

It would be great to avoid converting valid UTF-8 IRI into percent-encoded URIs at least for two reasons:

  1. User-friendly links in non-ASCII, non-English or multilingual sites (though it also depends on a browser).
  2. Readable diffs of generated HTML files in git commits and easier code changes review/support/debug. It's impossible to visually decode and understand links in HTML like this: href="/%D0%BA%D0%BE%D0%BD%D1%82%D0%B0%D0%BA%D1%82%D1%8B/">

And probably better SEO too.

A simple option like EnableIRI (false by default) would be great!

@bep bep modified the milestones: v0.33, v0.34 Jan 11, 2018
@bep bep modified the milestones: v0.34, v0.35, v0.36 Jan 22, 2018
@bep bep modified the milestones: v0.36, v0.37 Feb 3, 2018
@bep bep modified the milestones: v0.37, v0.38 Feb 11, 2018
@bep bep modified the milestones: v0.38, v0.39 Feb 21, 2018
@rinetd
Copy link

rinetd commented Mar 15, 2018

期待 unicode sanitize
or add function urldecode transform %e5%a5%bd to

@bep bep modified the milestones: v0.39, v0.40 Apr 9, 2018
@bep bep modified the milestones: v0.40, v0.41 Apr 20, 2018
@bep bep modified the milestones: v0.41, v0.42 May 4, 2018
@bep bep removed this from the v0.42 milestone Jun 5, 2018
@bep bep modified the milestones: v0.116.0, v0.117.0 Aug 1, 2023
@bep bep modified the milestones: v0.117.0, v0.118.0 Aug 30, 2023
@bep bep modified the milestones: v0.118.0, v0.119.0 Sep 15, 2023
@bep bep modified the milestones: v0.119.0, v0.120.0 Oct 4, 2023
@bep bep modified the milestones: v0.120.0, v0.121.0 Oct 31, 2023
@bep bep modified the milestones: v0.121.0, v0.122.0 Dec 6, 2023
@bep bep modified the milestones: v0.122.0, v0.123.0, v0.124.0 Jan 27, 2024
@bep bep modified the milestones: v0.124.0, v0.125.0 Mar 4, 2024
@bep bep modified the milestones: v0.125.0, v0.126.0 Apr 23, 2024
@bep bep modified the milestones: v0.126.0, v0.127.0 May 15, 2024
@bep bep modified the milestones: v0.127.0, v0.128.0 Jun 8, 2024
@bep bep modified the milestones: v0.128.0, v0.129.0 Jun 21, 2024
@bep bep modified the milestones: v0.129.0, v0.131.0 Jul 22, 2024
@bep bep modified the milestones: v0.131.0, v0.133.0 Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants