Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couch Scanner #5014

Merged
merged 1 commit into from
Apr 12, 2024
Merged

Couch Scanner #5014

merged 1 commit into from
Apr 12, 2024

Commits on Apr 12, 2024

  1. Couch Scanner

    An application to scan the cluster with a plugin system to report various
    things about databases and documents. The initial idea was to have something
    like this to scan all the javascript design docs to check for compatibility
    with the new QuickJS engine. It had since been split apart from the QuickJS
    branch and made into a separate pull request.
    
    The current implementation includes two plugins:
      * couch_scanner_plugin_find : scan for regexes in doc bodies
      * couch_scanner_ddoc_features : report various design doc features
    
    A more detailed description is in the README.md file. The plugin API is defined
    in the `couch_scanner_plugin` module. There are additional details in the
    comments in the included Erlang modules. What follows is as summary description
    of some of the implementation details and features.
    
    Plugins are managed as individual process by the `couch_scanner_server` with
    the `start_link/1` and `stop/1` functions. After a plugin runner process is
    spawned, `couch_scanner_server` wait for it to exit. A process may exit with an
    error, then it will be penalized with an exponential back-off, or it may also
    exit with a special `{shutdown, {reschedule, TSec}}` value, in which case it
    will be rescheduled to run again on or after the `TSec` time.
    
    After the plugin process process starts, it will load and validate its plugin
    module. Then, it will start scanning all the dbs and docs on the local node.
    Shard ranges will be scanned only on one of the cluster nodes to avoid
    duplicating work. For instance, if there are 2 shard ranges, `0-7`, `8-f`, with
    copies on nodes `n1`, `n2`, `n3`. Then, `0-7` might be scanned on `n1` only,
    and `8-f` on `n3`.
    
    During various events the plugin process will call into the plugin module: on
    startup, when resuming from a checkpoint, when checkpointing, when processing a
    new db, design doc, a document, and when completing a scan. The plugin may
    accumulate reporting data, or may indicate that some parts of the scan should
    be skipped, or that the scanning session should be reset.
    
    By default all plugins are disabled. Plugins are enabled and managed via the
    config system. To enable a plugin, add a `$plugin = true` entry in the
    `[couch_scanner_plugins]` section. For example:
    ```
    [couch_scanner_plugins]
    couch_scanner_plugin_ddoc_features = true
    ```
    
    Plugins can be configured to run on or after a particular date and time or to
    run periodically. That can be configured via `[$plugin] after = ...` and
    `[$plugin] repeat = ...` settings. For instance, to run after 2024-03-20T15:00
    and then run every Monday:
    
    ```
    [couch_scanner_plugin_ddoc_features]
    after = 2024-03-20T15:00
    repeat = monday
    ```
    
    The default values for `after` and `repeat` is ` = restart`, meaning to run
    once after the node starts up.
    
    To prevent the plugins from consuming too may resources. There is a simple rate
    limiter which limits how many databases, shard and documents should e processed
    by all the plugins. Rate limits are configurable:
    ```
    [couch_scanner]
    db_rate_limit = 50
    shard_rate_limit = 50
    doc_rate_limit = 500
    ```
    nickva committed Apr 12, 2024
    Configuration menu
    Copy the full SHA
    b129f82 View commit details
    Browse the repository at this point in the history