Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robots.txt should link to sitemap by default #4678

Open
earthboundkid opened this issue Apr 27, 2018 · 10 comments
Open

Robots.txt should link to sitemap by default #4678

earthboundkid opened this issue Apr 27, 2018 · 10 comments

Comments

@earthboundkid
Copy link
Contributor

I propose changing the default robots.txt layout to:

User-agent: *
Sitemap: {{ .Sitemap.Filename | default "sitemap.xml" | absURL }}

This will link to sitemaps by default, which is the common usecase.

@bep
Copy link
Member

bep commented Apr 27, 2018

I think this needs a little more thinking for the multilingual case, but I agree in principle.

@ghost
Copy link

ghost commented Apr 29, 2018

How about List All Available Languages from the docs:

{{ range $.Site.Home.AllTranslations }}
Sitemap: {{ .Permalink }}
{{ end }}

This output method tweaked to output the sitemap files satisfies the suggestion from Moz:

It’s generally a best practice to indicate the location of any sitemaps associated with this domain at the bottom of the robots.txt file. Here’s an example:

image

Things to check:

  • Does it handle multihost?
  • What would this look like for sitemap by section?

According to Moz inclusion of the sitemap in the robots.txt file may not supported by all search engines. Sitemaps are also not defined in The Robots Exclusion Protocol. It seems quirky to use a file to assist a scraper or aggregator in a file designed to tell them not to crawl certain sections. Of course if you're a search giant who cares, right?

@stale
Copy link

stale bot commented Aug 27, 2018

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

@stale stale bot added the Stale label Aug 27, 2018
@earthboundkid
Copy link
Contributor Author

Seems like a pretty simple feature.

@stale stale bot removed the Stale label Aug 27, 2018
@FelicianoTech
Copy link
Contributor

I took a look at this. Currently, since robots.txt is a template file, it doesn't have access to all the information it would need to know what sitemaps are available to link to.

Whether it should point to a sitemap index file for multi-lingual sites yes, but otherwise there can be more than one sitemap file with Hugo's current implementation.

This information is gathered when generating the sitemap, but otherwise not available under .Sites currently. I assume we'd need to add that there. Problem is, the sitemap template (like this and the 404) templates are currently generated in isolation so... not sure what the best way to implement this would be.

@comaldave
Copy link

I suppose I have the choice of disabling the Hugo robots.txt and craft my own or replacing the template with something I craft that works with my clients that may not be appropriate for others. My clients are all multilingual so the suggestion by the OP is not quite right for me. I do think this is a worthwhile thing for Hugo to fix, most bloggers do not need to be messing with the robots.txt file. A change in the template and options in the config seem an ideal implementation for the average users.

@hsn10
Copy link

hsn10 commented Aug 18, 2019

.Sitemap.Filename doesnt work inside robots.txt

tdelmas pushed a commit to tdelmas/website that referenced this issue Aug 18, 2019
tdelmas pushed a commit to letsencrypt/website that referenced this issue Aug 21, 2019
andygrunwald added a commit to andygrunwald/andygrunwald.com that referenced this issue Jun 29, 2021
@amrsoll
Copy link

amrsoll commented Jan 1, 2024

For a template that respects people who didn't add a sitemap, this can work too

User-agent: *

{{ with .Sitemap }}
Sitemap: {{ .Filename | default "sitemap.xml" | absURL }}
{{ end }}

@ytrepidorosonomous
Copy link

This works for multilingual sites

User-agent: *

Sitemap: {{ site.Home.Sitemap.Filename | absURL }}

@bep bep added this to the v0.131.0 milestone Jul 30, 2024
@bep bep modified the milestones: v0.131.0, v0.133.0 Aug 9, 2024
KimSJ15 added a commit to KimSJ15/letsencrypt_website that referenced this issue Aug 26, 2024
@Fethbita
Copy link

For a template that respects people who didn't add a sitemap, this can work too

User-agent: *

{{ with .Sitemap }}
Sitemap: {{ .Filename | default "sitemap.xml" | absURL }}
{{ end }}

This doesn't work if Sitemap is disabled using disableKinds.

@bep bep modified the milestones: v0.133.0, Unscheduled Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants