Skip to content

Architecture Overview of Harbor

Orlix edited this page Jan 30, 2023 · 11 revisions

Architecture

From now on (Version 2.0), Harbor has been evolved to a complete OCI compliant cloud-native artifact registry.

OCI compliant cloud-native artifact registry means it now supports OCI images and OCI image indexes (https://github.com/opencontainers/image-spec/blob/master/image-index.md). An OCI image index is a higher level manifest which points to a list of image manifests, ideal for one or more platforms. For example, Docker manifest list is a popular implementation of the OCI image index. This also means Harbor now fully supports multi-architecture images.

With Harbor V2.0, users can manage images, manifest lists, Helm charts, CNABs, OPAs among others which all adhere to the OCI image specification. It also allows for pulling, pushing, deleting, tagging, replicating, and scanning such kinds of artifacts. Signing images and manifest list are also possible now.

The diagram shown below is the overall architecture of the Harbor registry.

arch

version: 2.0

As depicted in the above diagram, Harbor comprises the following components placed in the 3 layers:

Data Access Layer

k-v storage: formed by Redis, provides data cache functions and supports temporarily persisting job metadata for the job service.

data storage: multiple storages supported for data persistence as backend storage of registry and chart museum. For checking more details, please refer to the driver list document at docker website and ChartMuseum GitHub repository.

Database: stores the related metadata of Harbor models, like projects, users, roles, replication policies, tag retention policies, scanners, charts, and images. PostgreSQL is adopted.

Fundamental Services

Proxy: reverse-proxy formed by the Nginx Server to provide API routing capabilities. Components of Harbor, such as core, registry, web portal, and token services, etc., are all behind this reversed proxy. The proxy forwards requests from browsers and Docker clients to various backend services.

Core: Harbor’s core service, which mainly provides the following functions:

  • API Server: A HTTP server accepting REST API requests and responding those requests rely on its submodules such as 'Authentication & Authorization', 'Middleware', and 'API Handlers'.
    • Authentication & Authorization
      • requests are protected by the authentication service which can be powered by a local database, AD/LDAP or OIDC.
      • RBAC mechanism is enabled for performing authorizations to the related actions, e.g: pull/push an image
      • Token service is designed for issuing a token for every docker push/pull command according to a user’s role of a project. If there is no token in a request sent from a Docker client, the Registry will redirect the request to the token service.
    • Middleware: Preprocess some requests in advance to determine whether they match the required criteria and can be passed to the backend components for further processing or not. Some functions are implemented as kinds of middleware, such as 'quota management', 'signature check', 'vulnerability severity check' and 'robot account parsing' etc.
    • API Handlers: Handle the corresponding REST API requests, mainly focus on parsing and validating request parameters, completing business logic on top of the relevant API controller, and writing back the generated response.
  • Config Manager: Covers the management of all the system configurations, like authentication type settings, email settings, and certificates, etc..
  • Project Management: Manages the base data and corresponding metadata of the project, which is created to isolate the managed artifacts.
  • Quota Manager: Manages the quota settings of projects and performs the quota validations when new pushes happened.
  • Chart Controller: Proxy the chart related requests to backend chartmuseum and provides several extensions to improve chart management experiences.
  • Retention Manager: Manages the tag retention policies and perform and monitor the tag retention processes
  • Content Trust: add extensions to the trust capability provided by backend Notary to support the smoothly content trust process. At present, only container images are supported to sign.
  • Replication Controller: Manages the replication policies and registry adapters, triggers and monitors the concurrent replication processes. Many registry adapters are implemented:
    • Distribution (docker registry)
    • Docker Hub
    • Huawei SWR
    • Amazon ECR
    • Google GCR
    • Azure ACR
    • Ali ACR
    • Helm Hub
    • Quay
    • Artifactory
    • GitLab Registry
  • Scan Manager: Manages the multiple configured scanners adapted by different providers and also provides scan summaries and reports for the specified artifacts.
    • The Trivy scanner provided by Aqua Security, the Anchore Engine scanner provided by Anchore, the Clair scanner sponsored by CentOS (Redhat), and DoSec Scanner provided by DoSec will be supported.
    • At present, only container images or bundles are built on top of images like the manifest list/OCI index or CNAB bundle are supported to scan.
  • Notification Manager(webhook): A mechanism configured in Harbor so that artifact status changes in Harbor can be populated to the Webhook endpoints configured in Harbor. The interested parties can trigger some follow-up actions by listening to the related webhook events. Now, two ways are supported:
    • HTTP Post request
    • Slack channel
  • OCI Artifact Manager: Core component to manage the lifecycle of all the OCI artifacts across the whole Harbor registry. It provides the CRUD operations to manage the metadata and related additions such as scanning report, building history of container images and readme, dependencies, and value.yaml of helm charts, etc. of the artifact, it also supports the capabilities of managing artifact tags and other helpful operations.
  • Registry Driver: Implemented as a registry client SDK to do communications with the underlying registry (docker distribution at this moment). 'OCI Artifact Manager' relies on this driver to get additional info from the manifest and even config JSON of the specified artifact that located at the underlying registry.

Job Service: General job execution queue service to let other components/services submit requests of running asynchronous tasks concurrently with simple restful APIs

Log collector: Log collector, responsible for collecting logs of other modules into a single place.

GC Controller: manages the online GC schedule settings and start and track the GC progress.

Chart Museum: a 3rd party chart repository server providing chart management and access APIs. To learn more details, check here.

Docker Registry: a 3rd party registry server, responsible for storing Docker images and processing Docker push/pull commands. As Harbor needs to enforce access control to images, the Registry will direct clients to a token service to obtain a valid token for each pull or push request.

Notary: a 3rd party content trust server, responsible for securely publishing and verifying content. To learn more details, check here.

Consumers

As a standard cloud-native artifact registry, the related clients will be naturally supported, like docker CLI, notary client, OCI compatible client like Oras, and Helm. Besides those clients, Harbor also provides a web portal for the administrators to easily manage and monitor all the artifacts.

Web Portal: a graphical user interface to help users manage images on the Registry

The following two examples of the Docker command illustrate the interaction between Harbor’s components.

The process of docker login

Suppose Harbor is deployed on a host with IP 192.168.1.10. A user runs the docker command to send a login request to Harbor:

$ docker login 192.168.1.10

After the user enters the required credentials, the Docker client sends an HTTP GET request to the address “192.168.1.10/v2/”. The different containers of Harbor will process it according to the following steps:

(a) First, this request is received by the proxy container listening on port 80. Nginx in the container forwards the request to the Registry container at the backend.

(b) The Registry container has been configured for token-based authentication, so it returns an error code 401, notifying the Docker client to obtain a valid token from a specified URL. In Harbor, this URL points to the token service of Core Services;

(c) When the Docker client receives this error code, it sends a request to the token service URL, embedding username and password in the request header according to basic authentication of HTTP specification;

(d) After this request is sent to the proxy container via port 80, Nginx again forwards the request to the UI container according to pre-configured rules. The token service within the UI container receives the request, it decodes the request and obtains the username and password;

(e) After getting the username and password, the token service checks the database and authenticates the user by the data in the MySql database. When the token service is configured for LDAP/AD authentication, it authenticates against the external LDAP/AD server. After successful authentication, the token service returns an HTTP code that indicates success. The HTTP response body contains a token generated by a private key.

At this point, one docker login process has been completed. The Docker client saves the encoded username/password from step (c) locally in a hidden file.

The process of docker push

(We have omitted proxy forwarding steps. The figure above illustrates communication between different components during the docker push process)

After the user logs in successfully, a Docker Image is sent to Harbor via a Docker Push command:

# docker push 192.168.1.10/library/hello-world

(a) Firstly, the docker client repeats the process similar to login by sending the request to the registry, and then gets back the URL of the token service;

(b) Subsequently, when contacting the token service, the Docker client provides additional information to apply for a token of the push operation on the image (library/hello-world);

(c) After receiving the request forwarded by Nginx, the token service queries the database to look up the user’s role and permissions to push the image. If the user has the proper permission, it encodes the information of the push operation and signs it with a private key and generates a token to the Docker client;

(d) After the Docker client gets the token, it sends a push request to the registry with a header containing the token. Once the Registry receives the request, it decodes the token with the public key and validates its content. The public key corresponds to the private key of the token service. If the registry finds the token valid for pushing the image, the image transferring process begins.