Skip to content

Commit

Permalink
[DOC] add table experiment
Browse files Browse the repository at this point in the history
  • Loading branch information
liuht-0807 committed Dec 20, 2023
1 parent 002a505 commit 853c80e
Show file tree
Hide file tree
Showing 3 changed files with 94 additions and 28 deletions.
Binary file added docs/_static/img/table_hetero_labeled.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/img/table_homo_labeled.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
122 changes: 94 additions & 28 deletions docs/start/exp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,37 +21,103 @@ Table: homo+hetero

Datasets
------------------
We designed experiments on three publicly available datasets, namely `Prediction Future Sales (PFS) <https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data>`_,
`M5 Forecasting (M5) <https://www.kaggle.com/competitions/m5-forecasting-accuracy/data>`_ and `CIFAR 10 <https://www.cs.toronto.edu/~kriz/cifar.html>`_.
For the two sales forecasting data sets of PFS and M5, we divide the user data according to different stores, and train the Ridge model and LightGBM model on the corresponding data respectively.
For the CIFAR10 image classification task, we first randomly pick 6 to 10 categories, and randomly select 800 to 2000 samples from each category from the categories corresponding to the training set, constituting a total of 50 different uploaders.
For test users, we first randomly pick 3 to 6 categories, and randomly select 150 to 350 samples from each category from the corresponding categories from the test set, constituting a total of 20 different users.
Our study involved three public datasets in the sales forecasting field: `Predict Future Sales (PFS) <https://www.kaggle.com/c/competitive-data-science-predict-future-sales/data>`_,
`M5 Forecasting (M5) <https://www.kaggle.com/competitions/m5-forecasting-accuracy/data>`_ and `Corporacion <https://www.kaggle.com/competitions/favorita-grocery-sales-forecasting/data>`_.
We applied various pre-processing methods to these datasets to enhance the richness of the data.
After pre-processing, we first divided each dataset by store and then split the data for each store into training and test sets. Specifically:

We tested the efficiency of the specification generation and the accuracy of the search and reuse model respectively.
The evaluation index on PFS and M5 data is RMSE, and the evaluation index on CIFAR10 classification task is classification accuracy
- For PFS, the test set consisted of the last month's data from each store.
- For M5, we designated the final 28 days' data from each store as the test set.
- For Corporacion, the test set was composed of the last 16 days of data from each store.

Results
----------------
In the submitting stage, the Corporacion dataset's 55 stores are regarded as 165 uploaders, each employing one of three different feature engineering methods.
For the PFS dataset, 100 uploaders are established, each using one of two feature engineering approaches.
These uploaders then utilize their respective stores' training data to develop LightGBM models.
As a result, the learnware market comprises 265 learnwares, derived from five types of feature spaces and two types of label spaces

Based on the specific design of user tasks, our experiments were primarily categorized into two types:

- ``homogeneous experiments`` are designed to evaluate performance when users can reuse learnwares in the learnware market that have the same feature space as their tasks(homogeneous learnwares).
This contributes to showing the effectiveness of using learnwares that align closely with the user's specific requirements.

- ``heterogeneous experiments`` aim to evaluate the performance of identifying and reusing helpful heterogeneous learnwares in situations where
no available learnwares match the feature space of the user's task. This helps to highlight the potential of learnwares for applications beyond their original purpose.

Homo Experiments
-----------------------

For homogeneous experiments, the 55 stores in the Corporacion dataset act as 55 users, each applying one feature engineering method,
and using the test data from their respective store as user data. These users can then search for homogeneous learnwares in the market with the same feature spaces as their tasks.

The MSE of search and reuse is presented in the table below:

=================== ====================== ================= ==================
Top-1 Reuse Average Ensemble Reuse Best in Market Average in Market
=================== ====================== ================= ==================
0.280 +/- 0.090 0.267 +/- 0.051 0.151 +/- 0.046 0.331 +/- 0.040
=================== ====================== ================= ==================

When users have both test data and limited training data derived from their original data, reusing single or multiple searched learnwares from the market can often yield
better results than training models from scratch on limited training data. We present the change curves in MSE for the user's self-trained model and
a multiple learnware reuse method: Ensemble Pruning. These curves display their performance on the user's test data as the amount of labeled training data increases.
The average results across 55 users are depicted in the figure below:

.. image:: ../_static/img/table_homo_labeled.png
:width: 300
:height: 200
:alt: Table Homo Limited Labeled Data

From the figure, it's evident that when users have limited training data, the performance of reusing multiple table learnwares is superior to that of the user's own model.
However, as the user's training data increases, we anticipate the user's own model to eventually outperform learnware reuse.
This emphasizes the benefit of learnware reuse in significantly reducing the need for extensive training data and achieving enhanced results when available user training data is limited.


Hetero Experiments
-------------------------

In heterogeneous experiments, the learnware market would recommend helpful heterogeneous learnwares with different feature spaces with
the user tasks. Based on whether there are learnwares in the market that handle tasks similar to the user's task, the experiments can be further subdivided into the following two types:

Cross Feature Engineering Experiments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We designate the 41 stores in the PFS dataset as users, creating their user data with an alternative feature engineering approach that varies from the methods employed by learnwares in the market.
Consequently, while the market's learnwares from the PFS dataset undertake tasks very similar to our users, the feature spaces do not match exactly. In this experimental configuration,
we tested various heterogeneous learnware reuse methods (without using user's labeled data) and compared them to the user's self-trained model based on a small amount of training data.
The average MSE performance across 41 users are as follows:

+------------------------------------+---------------------+
| Mean in Market (Single) | 1.459 +/- 1.066 |
+------------------------------------+---------------------+
| Best in Market (Single) | 1.226 +/- 1.032 |
+------------------------------------+---------------------+
| Top-1 Reuse (Single) | 1.407 +/- 1.061 |
+------------------------------------+---------------------+
| Average Ensemble Reuse (Multiple) | 1.312 +/- 1.099 |
+------------------------------------+---------------------+
| User model with 50 labeled data | 1.267 +/- 1.055 |
+------------------------------------+---------------------+

From the results, it is noticeable that the learnware market still perform quite well even when users lack labeled data,
provided it includes learnwares addressing tasks that are similar but not identical to the user's.
In these instances, the market's effectiveness can match or even rival scenarios where users have access to a limited quantity of labeled data.

Cross Task experiments
^^^^^^^^^^^^^^^^^^^^^^^

Here we have chosen the 10 stores from the M5 dataset to act as users. Although the broad task of sales forecasting is similar to the tasks addressed by the learnwares in the market,
there are no learnwares available that directly cater to the M5 sales forecasting requirements. All learnwares show variations in both feature and label spaces compared to the tasks of M5 users.
We present the change curves in RMSE for the user's self-trained model and several learnware reuse methods.
These curves display their performance on the user's test data as the amount of labeled training data increases.
The average results across 10 users are depicted in the figure below:

.. image:: ../_static/img/table_hetero_labeled.png
:width: 300
:height: 200
:alt: Table Hetero Limited Labeled Data

The time-consuming specification generation is shown in the table below:

==================== ==================== =================================
Dataset Data Dimensions Specification Generation Time (s)
==================== ==================== =================================
PFS 8714274*31 < 1.5
M5 46027957*82 9~15
CIFAR10 9000*3*32*32 7~10
==================== ==================== =================================

The accuracy of search and reuse is shown in the table below:

==================== ==================== ================================= =================================
Dataset Top-1 Performance Job Selector Reuse Average Ensemble Reuse
==================== ==================== ================================= =================================
PFS 1.955 +/- 2.866 2.175 +/- 2.847 1.950 +/- 2.888
M5 2.066 +/- 0.424 2.116 +/- 0.472 2.512 +/- 0.573
CIFAR10 0.619 +/- 0.138 0.585 +/- 0.056 0.715 +/- 0.075
==================== ==================== ================================= =================================
We can observe that heterogeneous learnwares are beneficial when there's a limited amount of the user's labeled training data available,
aiding in better alignment with the user's specific task. This underscores the potential of learnwares to be applied to tasks beyond their original purpose.

Text Experiment
====================
Expand Down

0 comments on commit 853c80e

Please sign in to comment.