Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace python link checker with bash #216

Merged
merged 10 commits into from
Jun 24, 2020

Conversation

zevisert
Copy link
Contributor

@zevisert zevisert commented Jun 13, 2020

"Hey, how come there's dependenices I have to install just to check the validity of these links?"
-- Me, talking to myself, yesterday

So I thought I'd give back a little, spending my open source friday time on this. I wrote a decently portable bash implementation of the existing link checker python script. It does the same sorts of things as the python script did; such as

  • Parse links from markdown documents
  • Only try identical links once
  • Use a task pool to have multiple inflight requests
  • Log and count the number of "bad" links
  • Log but skip mailto and document links
  • Set the return code based on the links we saw

Things this doesn't do, that the python script did:

  • Slow down requests for to the same host
    • I decided against adding this. A typical webpage load runs tens to hundreds of requests. Hitting a server with cURL for it's index page and nothing further is within reason for them to handle.
  • Collect a list of good / bad / assumed links in memory
    • This just logs it and forgets it, other tools can parse the output if they want. I do collect the return codes though, so the final return code can be computed

This this does do that the python script didn't:

  • Provide two types of logs.
    1. The first is for looking at, it's decorated with fun emoji and is at least somewhat readable by humans
    2. The second is for people who are making pull requests, it outputs a markdown document that can be used in the PR's description to show check off which links were removed or left alone. See the link checker's new readme
  • Easily accept input from standard in as well as from a file, this enables pipeing input from for instance a curl command fetching the raw markdown document from a specific commit or branch on github. I demo'd this in a new asciinema clip

Really I'm hoping this script makes it a little easier for people to test the links before they upload. No installing python, creating virtual environments, installing dependencies, that sort of thing. Just run the script, done. Pretty much everyone will have bash 4, and most probably have curl installed too. No other dependencies that you wouldn't have on any linux system. I had a lot of fun writing it, I got to learn about named pipes as they were my semaphore mechanism for parallelizing the link testing :).

Actual dependencies I expect to be on `$PATH`
  • head - /usr/bin/head
  • cut - /usr/bin/cut
  • sort - /usr/bin/sort
  • sed - /usr/bin/sed
  • grep - /usr/bin/grep
  • curl - /usr/bin/curl
  • env - /usr/bin/env
  • mktemp - /usr/bin/mktemp
  • mkfifo - /usr/bin/mkfifo
  • rm - /usr/bin/rm
and builtins for
  • echo
  • exit
  • shift
  • test
  • [[
  • read
  • wait
  • printf
  • readarray
  • exec
  • command

Windows users (I'm one of them) can still use this too, either through WSL or docker. Does the new MacOS come with bash still, or just zsh?

I also don't want to create a hurdle for people who use this repo often, so if you're interested in this, maybe it's better to reopen this PR without removing the python side of things.

@maryam-dyspatch maryam-dyspatch merged commit 81692c0 into sendwithus:master Jun 24, 2020
@zevisert zevisert deleted the dev/zev/bash-check branch August 19, 2020 00:56
zevisert added a commit to revela-systems/vic-startup-jobs that referenced this pull request Aug 19, 2020
* Link check re-write to bash

* Allow bash 4

* Add historical curl code 51

* Replace dependency on GNU sort with POSIX sort

* Less options to mktemp == more portable

* mailto: handling, trailing whitespace, language 'is => seems to'

* Fix empty scheme not warning

* Fix scheme parsing, -z is for empty string, not -n

🤦🏼‍♂️

* Add unresolve host message, error code 6

* Fix double if in ladder

It's like I should be testing these changes before I commit them..
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants