Want to ensure that all the links in your git-based website are alive? This project may help!
This script searches through the source file content of either a directory or a git branch's difference from main. This means it's possible to both verify the life of every link in the repo and perform much smaller checks on branches to quickly ensure all new links are valid.
Install our NPM package, repo-link-check
, with the Node package manager of your choice. This package contains a binary entry of the same name, allowing you to run yarn repo-link-check
or npm run repo-link-check
. From there, it's recommended to use scripts in package.json
to have different pre-configured runs.
For example:
"link-check": "repo-link-check -c config/link-check/config.yml -s filesystem -u", "link-check-diff": "repo-link-check -c config/link-check/config.yml", "link-check-dev-server": "repo-link-check -c config/link-check/config.yml -r http://localhost:3000", "link-check-exclude": "repo-link-check -c config/link-check/config.yml -s filesystem -u only"
This setup checks all files in the project-relative directory /content
with .css
, .md
, and .json
extensions for links, resolve root-relative links to be relative to https://www.example.com
and excluding files based on the lines in .config/exclude-files
and .config.exclude-links
While not an actual GitHub Action, this project provides reusable GitHub Actions workflows that use the CLI version of link-check
to generate a report. Check them out in .github/workflows
This application is configured primarily through a YAML configuration file whose path is specified with the --configFile
(or -c
) option. Other options that override the file can be specified with flags.
For example, this is what a configuration file could look like:
rootURL: https://dvc.org fileIncludePatterns: '{.github,content,src}/**/*!(.test).{css,js,jsx,md,tsx,ts,json}' fileExcludePatternFile: config/link-check/excluded-files.yml linkExcludePatternFile: config/link-check/excluded-links.yml linkOptions: '*.wikipedia.org': minTime: 2000 maxConcurrent: 1 '(*.)?github.com': minTime: 1000 maxConcurrent: 1
When set by a runner, Link Check will read this path relative to the root of the repo for a configuration file, either in JSON or YAML depending on the extension. Both the CLI and GHA runners can do this, which is particularly useful for sharing patterns between the two.
When true, uses the output of a git diff between the current working area and origin/main as input, as opposed to the default behavior of reading the filesystem. It effectively means that this mode checks links that would be new to main if the current state of the program were merged, and that this will provide no links when checking out an up-to-date main.
This string is used as the base for root-relative links (those that start with '/'). It's useful for specifying a deploy preview or local server, particularly from GitHub Actions.
linkIncludePatterns: string[]?When provided, links to check will be limited to those that pass a micromatch
test with this option as the pattern. Otherwise, all links will be used.
When provided, links that pass a micromatch
test with this option as the pattern will show up on the report, passing with no test necessary and marked as excluded.
Exclusions take precedence over inclusions.
fileIncludePatterns: string[]?When provided, files to check links in will be limited to those whose filenames that pass a micromatch
test with this option as the pattern. Otherwise, all files from the content source will be used.
When provided, files whose filenames match a micromatch
check with this option as its pattern will be completely excluded from checks and reports.
Exclusions take precedence over inclusions.
<file|link><Include|Exclude>PatternFile: string?These four sister options mirror the Patterns
variants, but instead take paths to files which are top-level arrays in YAML or JSON. These parsed arrays will be used alongside ones provided in the related Patterns
options.
When this option is true, no link checks will actually be run. Useful for debugging link patterns, as excluded links will have a description distinct from those stopped by the dry run alone.
unusedPatternsOnly: booleanIf true, Link Check will use dryRun
, report unused patterns, and then exit.
Selects the output strategy to use. Both runners can use "consoleLog", and the GitHub Action has a "checksAction" mode to generate output for LouisBRunner/checks-action
. Can accept multiple strings to use multiple output strategies.
When true, only log/report failed link checks. Useful to get around GitHub Actions' character limit.
Disabled by default on CLI, enabled by default on GitHub Actions.
When true, the application will console.log
the parsed options object before running.
The minimum amount of time in ms to wait before two requests on one domain. Defaults to 400.
The maximum amount of requests allowed on each hostname at one time. Defaults to 1.
When specified, will use this string as the user-agent
header in link check requests.
Defaults to Mozilla/5.0 (compatible; Iterative/link-check; +https://github.com/iterative/link-check)
This object determines settings that will be applied for each hostname. The keys will be tried as a micromatch pattern against each link's hostname, and the object at the first match will have its keys override the defaults for that instance.
By default, each hostname is allowed one concurrent connection and at least 400ms minimum time between each call per hostname. Sites with more aggressive 429 responses may require a larger minTime, but the defaults handle the majority of sites well.
This setting can only be defined in an options file.
Currently, the only settings here are minTime and maxConcurrent.
To specify multiple patterns or pattern files, use the relevant flag multiple times. Use -h
to get this help output:
Options from CLI flags can be used to override the config file when necessary.
Usage: repo-link-check [options]
Options:
-c, --configFile <path> Path to the configuration file
-r, --rootURL <url> Check root-relative links relative to this URL
-o, --output <strategy[,strategy]> Use one or more strategies to generate report output
-d, --diff Use git diff from origin/main as a source instead of the whole filesystem.
--dryRun Skip checking parsed links and report them as skipped
-u, --unusedPatternsOnly Do a dry run and exit after printing unused patterns
-f, --failsOnly Only report failing links
-v, --verbose Log fully resolved options
-li, --linkIncludePatterns <pattern> Add a micromatch pattern used to whitelist links
-le, --linkExcludePatterns <pattern> Add a micromatch pattern used to exclude links
-fi, --fileIncludePatterns <pattern> Add a micromatch pattern used to whitelist files to scrape links from
-fe, --fileExcludePatterns <pattern> Add a micromatch pattern used to exclude files to scrape links from
-h, --help display help for command
To specify multiple patterns, use the relevant flag multiple times.
In its current state, most of this project is a standard node CLI package published on NPM. The reusable GitHub Actions in .github/workflows
deploy through GitHub, but are contributed to the same as any other code.
To manually test the source, build it with yarn build
and then run node dist/cli.js
with whatever flags you would otherwise pass to repo-link-check
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4