Skip to content

Based on the learnware paradigm, the learnware package supports the entire process including the submission, usability testing, organization, identification, deployment, and reuse of learnwares. Simultaneously, this repository serves as Beimingwu's engine, supporting its core functionalities.

License

Notifications You must be signed in to change notification settings

Learnware-LAMDA/Learnware

Repository files navigation



Python Versions Platform Test PypI Versions Documentation Status License

English | 中文

Introduction

The learnware paradigm, proposed by Professor Zhi-Hua Zhou in 2016 [1, 2], aims to build a vast model platform system, i.e., a learnware dock system, which systematically accommodates and organizes models shared by machine learning developers worldwide, and can efficiently identify and assemble existing helpful model(s) to solve future tasks in a unified way.

The learnware package provides a fundamental implementation of the central concepts and procedures within the learnware paradigm. Its well-structured design ensures high scalability and facilitates the seamless integration of additional features and techniques in the future.

In addition, the learnware package serves as the engine for the Beimingwu System and can be effectively employed for conducting experiments related to learnware.

[1] Zhi-Hua Zhou. Learnware: on the future of machine learning. Frontiers of Computer Science, 2016, 10(4): 589–590
[2] Zhi-Hua Zhou. Machine Learning: Development and Future. Communications of CCF, 2017, vol.13, no.1 (2016 CNCC keynote)

Learnware Paradigm

A learnware consists of a high-performance machine learning model and specifications that characterize the model, i.e., "Learnware = Model + Specification". These specifications, encompassing both semantic and statistical aspects, detail the model's functionality and statistical information, making it easier for future users to identify and reuse these models.

The need for Learnware arises due to challenges in machine learning, such as the need for extensive training data, advanced techniques, continuous learning, catastrophic forgetting, and data privacy issues. Although there are many efforts focusing on one of these issues separately, they are entangled, and solving one problem may exacerbate others. The learnware paradigm aims to address many of these challenges through a unified framework. Its benefits are listed as follows.

Benefit Description
Lack of training data Strong models can be built with small data by adapting well-performed learnwares.
Lack of training skills Ordinary users can obtain strong models by leveraging well-performed learnwares instead of building models from scratch.
Catastrophic forgetting Accepted learnwares are always stored in the learnware market, retaining old knowledge.
Continual learning The learnware market continually enriches its knowledge with constant submissions of well-performed learnwares.
Data privacy/ proprietary Developers only submit models, not data, preserving data privacy/proprietary.
Unplanned tasks Open to all legal developers, the learnware market can accommodate helpful learnwares for various tasks.
Carbon emission Assembling small models may offer good-enough performance, reducing interest in training large models and the carbon footprint.

The learnware paradigm consists of two distinct stages:

  • Submitting Stage: Developers voluntarily submit various learnwares to the learnware market, and the system conducts quality checks and further organization of these learnwares.
  • Deploying Stage: When users submit task requirements, the learnware market automatically selects whether to recommend a single learnware or a combination of multiple learnwares and provides efficient deployment methods. Whether it’s a single learnware or a combination of multiple learnwares, the system offers convenient learnware reuse interfaces.

Framework and Infrastructure Design

The architecture is designed based on the guidelines including decoupling, autonomy, reusability, and scalability. The above diagram illustrates the framework from the perspectives of both modules and workflows.

  • At the workflow level, the learnware package consists of Submitting Stage and Deploying Stage.
Module Workflow
Submitting Stage The learnware developers submit learnwares to the learnware market, which conducts usability checks and further organization of these learnwares.
Deploying Stage The learnware package identifies learnwares according to users’ task requirements and provides efficient reuse and deployment methods.
  • At the module level, the learnware package is a platform that consists of Learnware, Market, Specification, Model, Reuse, and Interface modules.
Module Description
Learnware The specific learnware, consisting of specification module, and user model module.
Market Designed for learnware organization, identification, and usability testing.
Specification Generating and storing statistical and semantic information of learnware, which can be used for learnware search and reuse.
Model Including the base model and the model container, which can provide unified interfaces and automatically create isolated runtime environments.
Reuse Including the data-free reuser, data-dependent reuser, and aligner, which can deploy and reuse learnware for user tasks.
Interface The interface for network communication with the Beimingwu backend.

Quick Start

Installation

Learnware is currently hosted on PyPI. You can easily install learnware by following these steps:

pip install learnware

In the learnware package, besides the base classes, many core functionalities such as "learnware specification generation" and "learnware deployment" rely on the torch library. Users have the option to manually install torch, or they can directly use the following command to install the learnware package:

pip install learnware[full]

Note: However, it's crucial to note that due to the potential complexity of the user's local environment, installing learnware[full] does not guarantee that torch will successfully invoke CUDA in the user's local setting.

Prepare Learnware

In the learnware package, each learnware is encapsulated in a zip package, which should contain at least the following four files:

  • learnware.yaml: learnware configuration file.
  • __init__.py: methods for using the model.
  • stat.json: the statistical specification of the learnware. Its filename can be customized and recorded in learnware.yaml.
  • environment.yaml or requirements.txt: specifies the environment for the model.

To facilitate the construction of a learnware, we provide a Learnware Template that users can use as a basis for building their own learnware. We've also detailed the format of the learnware zip package in Learnware Preparation.

Learnware Package Workflow

Users can start a learnware workflow according to the following steps:

Initialize a Learnware Market

The EasyMarket class provides the core functions of a Learnware Market. You can initialize a basic Learnware Market named "demo" using the code snippet below:

from learnware.market import instantiate_learnware_market

# instantiate a demo market
demo_market = instantiate_learnware_market(market_id="demo", name="easy", rebuild=True)

Upload Learnware

Before uploading your learnware to the Learnware Market, you'll need to create a semantic specification, semantic_spec. This involves selecting or inputting values for predefined semantic tags to describe the features of your task and model.

For instance, the following code illustrates the semantic specification for a Scikit-Learn type model. This model is tailored for education scenarios and performs classification tasks on tabular data:

from learnware.specification import generate_semantic_spec

semantic_spec = generate_semantic_spec(
    name="demo_learnware",
    data_type="Table",
    task_type="Classification",
    library_type="Scikit-learn",
    scenarios="Education",
    license="MIT",
)

After defining the semantic specification, you can upload your learnware using a single line of code:

demo_market.add_learnware(zip_path, semantic_spec)

Here, zip_path is the directory of your learnware zip package.

Semantic Specification Search

To find learnwares that align with your task's purpose, you'll need to provide a semantic specification, user_semantic, that outlines your task's characteristics. The Learnware Market will then perform an initial search using user_semantic, identifying potentially useful learnwares with models that solve tasks similar to your requirements.

# construct user_info, which includes a semantic specification
user_info = BaseUserInfo(id="user", semantic_spec=semantic_spec)

# search_learnware: performs semantic specification search when user_info doesn't include a statistical specification
search_result = easy_market.search_learnware(user_info) 
single_result = search_results.get_single_results()

# single_result: the List of Tuple[Score, Learnware] returned by semantic specification search
print(single_result)

Statistical Specification Search

If you decide in favor of providing your own statistical specification file, stat.json, the Learnware Market can further refine the selection of learnwares from the previous step. This second-stage search leverages statistical information to identify one or more learnwares that are most likely to be beneficial for your task.

For example, the code below executes learnware search when using Reduced Set Kernel Embedding as the statistical specification:

import learnware.specification as specification

user_spec = specification.RKMETableSpecification()

# unzip_path: directory for unzipped learnware zipfile
user_spec.load(os.path.join(unzip_path, "rkme.json"))
user_info = BaseUserInfo(
    semantic_spec=user_semantic, stat_info={"RKMETableSpecification": user_spec}
)
search_result = easy_market.search_learnware(user_info)

single_result = search_results.get_single_results()
multiple_result = search_results.get_multiple_results()

# search_item.score: based on MMD distances, sorted in descending order
# search_item.learnware.id: id of learnwares, sorted by scores in descending order
for search_item in single_result:
    print(f"score: {search_item.score}, learnware_id: {search_item.learnware.id}")

# mixture_item.learnwares: collection of learnwares whose combined use is beneficial
# mixture_item.score: score assigned to the combined set of learnwares in `mixture_item.learnwares`
for mixture_item in multiple_result:
    print(f"mixture_score: {mixture_item.score}\n")
    mixture_id = " ".join([learnware.id for learnware in mixture_item.learnwares])
    print(f"mixture_learnware: {mixture_id}\n")

Reuse Learnwares

With the list of learnwares, mixture_learnware_list, returned from the previous step, you can readily apply them to make predictions on your own data, bypassing the need to train a model from scratch. We provide two methods for reusing a given list of learnwares: JobSelectorReuser and AveragingReuser. Substitute test_x in the code snippet below with your testing data, and you're all set to reuse learnwares:

from learnware.reuse import JobSelectorReuser, AveragingReuser

# using jobselector reuser to reuse the searched learnwares to make prediction
reuse_job_selector = JobSelectorReuser(learnware_list=mixture_item.learnwares)
job_selector_predict_y = reuse_job_selector.predict(user_data=test_x)

# using averaging ensemble reuser to reuse the searched learnwares to make prediction
reuse_ensemble = AveragingReuser(learnware_list=mixture_item.learnwares)
ensemble_predict_y = reuse_ensemble.predict(user_data=test_x)

We also provide two methods when the user has labeled data for reusing a given list of learnwares: EnsemblePruningReuser and FeatureAugmentReuser. Substitute test_x in the code snippet below with your testing data, and substitute train_X, train_y with your training labeled data, and you're all set to reuse learnwares:

from learnware.reuse import EnsemblePruningReuser, FeatureAugmentReuser

# Use ensemble pruning reuser to reuse the searched learnwares to make prediction
reuse_ensemble = EnsemblePruningReuser(learnware_list=mixture_item.learnwares, mode="classification")
reuse_ensemble.fit(train_X, train_y)
ensemble_pruning_predict_y = reuse_ensemble.predict(user_data=data_X)

# Use feature augment reuser to reuse the searched learnwares to make prediction
reuse_feature_augment = FeatureAugmentReuser(learnware_list=mixture_item.learnwares, mode="classification")
reuse_feature_augment.fit(train_X, train_y)
feature_augment_predict_y = reuse_feature_augment.predict(user_data=data_X)

Auto Workflow Example

The learnware package also offers automated workflow examples. This includes preparing learnwares, uploading and deleting learnwares from the market, and searching for learnwares using both semantic and statistical specifications. To experience the basic workflow of the learnware package, the users can run test/test_workflow/test_workflow.py to try the basic workflow of learnware.

Experiments and Examples

Environment