You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ParallelCrawlerEngine is getting the wrong URLs to crawl. Upon checking the page in the Parent URI, I could not find where it gets the wrong URL. It's probably the <a> anchor tag without the scheme "https://"
<ahref="www.thelawyermag.com/au/best-in-law/best-legal-tech-and-legal-service-providers-in-australia-and-new-zealand-service-provider-awards/467481">
bla bla
</a>
The ParallelCrawlerEngine is getting the wrong URLs to crawl. Upon checking the page in the Parent URI, I could not find where it gets the wrong URL. It's probably the
<a>
anchor tag without the scheme "https://"Parent URI:
https://www.thelawyermag.com/au/best-in-law/best-in-law-2023/468046
Parsed Hyperlink (Wrong URL):
https://www.thelawyermag.com/au/best-in-law/best-in-law-2023/www.thelawyermag.com/au/best-in-law/best-legal-tech-and-legal-service-providers-in-australia-and-new-zealand-service-provider-awards/467481
The text was updated successfully, but these errors were encountered: