Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting crawl from subdirectory #196

Closed
Robanna777 opened this issue Mar 5, 2024 · 4 comments
Closed

Starting crawl from subdirectory #196

Robanna777 opened this issue Mar 5, 2024 · 4 comments

Comments

@Robanna777
Copy link

Details

When I run a command for --site https://site/subdirector on my mac, everything works as I'd like; starting with that page, doesn't find a sitemap file, so falls back to crawling from https://site/subdirector but on a windows machine, the crawling starts at the domain https://site.

Is there a configuration that I can force it to start at the subdirectory? I tried -include /subdirector/.* but that doesn't seem to do it. With that, it just hangs.

Debug shows this "GET /api/reports 200 object - 0ms" repeating over and over.

Mac:
Successfully connected to https://teamsideline.com/Layouts/minimalist/Home.aspx?d=ZHcj%2bsPHK5g%2bZkLyQaVo0Q%3d%3d/, status code: 200. unlighthouse 07:50:32

───────────────────────────────────────────────────╮
│ │
│ ⛵ unlighthouse cli @ v0.5.0 │
│ │
│ ▸ Scanning: https://teamsideline.com/Layouts/minimalist/Home.aspx?d=ZHcj%2bsPHK5g%2bZkLyQaVo0Q%3d%3d/
│ ▸ Route Discovery: Crawler

Windows:
Successfully connected to https://teamsideline.com/. (Status: 200). Unlighthouse 2:50:40 PM
─────────╮
│ │
│ ⛵ Unlighthouse cli @ v0.11.4 │
│ │
│ ▸ Scanning: https://teamsideline.com/
│ ▸ Route Discovery: Crawler

@Robanna777
Copy link
Author

Robanna777 commented Mar 6, 2024

I notice this works with [email protected] but not 0.6.0 or after.

@Robanna777
Copy link
Author

--include-urls does not solve this issue. It hangs the same as the original issue.

@harlan-zw
Copy link
Owner

Hi @Robanna777, thanks for the issue.

Seems like this wasn't supported and worked by accident in earlier versions. I've pushed up a fix for it, you can use it as:

npx [email protected] --site https://teamsideline.com/sites/apex/home

Let me know if you have any issues with it.

@Robanna777
Copy link
Author

That's awesome. Thank you. That works perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants