MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection

Xubin Wang¹² · Haojiong Shangguan¹ · Fengyi Huang¹ · Shangrui Wu¹ · Weijia Jia^23*

¹Hong Kong Baptist University · ²Beijing Normal University · ³BNU-HKBU United International College

^*corresponding authors

Feature selection is a crucial step in data mining to enhance model performance by reducing data dimensionality. However, the increasing dimensionality of collected data exacerbates the challenge known as the "curse of dimensionality", where computation grows exponentially with the number of dimensions. To tackle this issue, evolutionary computational (EC) approaches have gained popularity due to their simplicity and applicability. Unfortunately, the diverse designs of EC methods result in varying abilities to handle different data, often underutilizing and not sharing information effectively. In this paper, we propose a novel approach called PSO-based Multi-task Evolutionary Learning (MEL) that leverages multi-task learning to address these challenges. By incorporating information sharing between different feature selection tasks, MEL achieves enhanced learning ability and efficiency. We evaluate the effectiveness of MEL through extensive experiments on 22 high-dimensional datasets. Comparing against 24 EC approaches, our method exhibits strong competitiveness.

Framework

A schematic diagram of the proposed MEL method. The parent population is divided into two subpopulations: Sub1 learns the feature importance during evolution, and its search is affected by Sub2 best. Sub2 also learns the importance of features during evolution, and searches for the optimal feature subset based on the results learned from Sub1 and Sub2. In particular, features with higher weights have a higher probability of being selected.

Baselines

To illustrate the performance of our method, 18 evolutionary computation algorithms were employed. Specifically, we selected four swarm-based methods (inspired by mutual behavior of swarm creatures), four nature-inspired methods (inspired by natural system), two evolutionary algorithms (inspired by natural selection concepts), four bio-stimulated methods (inspired by the foraging and hunting behavior in the wild) and four physics-based methods (inspired by physical rules). These meta-heuristic optimization methods basically represent the most classical, representative, and widely used methods in the field.

Additionally, we have incorporated six evolutionary methods for feature selection published in recent four years (SaWDE, FWPSO, VGS-MOEA, MTPSO, PSO-EMT and DENCA. These methods represent the cutting-edge research in the field and offer promising solutions to the challenges at hand. By incorporating these state-of-the-art evolutionary algorithms, we aim to enhance the effectiveness and robustness of our proposed framework.

Dependencies

This project was developped with MATLAB 2019b and MATLAB 2023b. Early versions of MATLAB may have incompatibilities.

Instructions

MEL.m (This is the main file of the proposed model)
- You can replace your data in the Problem. For example:
  - Problem = {'The_name_of_your_own_data'};
- How to load your own data?
```
  traindata = load(['C:\Users\c\Desktop\MEL\data\',p_name]);
  traindata = getfield(traindata, p_name);
  data = traindata;
  feat = data(:,1:end-1); 
  label = data(:,end);
```
- You can set the number of iterations of the whole experiment through numRun
- The parameters of PSO algorithm can be replaced in:
  - opts.k = 3; % number of k in K-nearest neighbor
  - opts.N = 20; % number of solutions
  - opts.T = 100; % maximum number of iterations
  - opts.thres = 0.6; % threshold

To reproduce our experiments, you can run MEL.m ten times and take the average of the results.

Results

We evaluate the effectiveness of our method using three metrics: accuracy, feature subset size, and algorithm running time. Extensive experiments on 12 high-dimensional genetic datasets showed that MEL can effectively improve classification accuracy while obtaining a small feature subset. MEL also showed highly competitive running time compared to 18 state-of-the-art meta-heuristic optimization algorithms and five recently published evolutionary feature selection methods. Additionally, we provided further experiments on a separate set of 10 larger sample size datasets, comparing MEL against five representative algorithms. The results demonstrated MEL's superior overall performance in classification metrics, validating its effectiveness on high-dimensional data with both few and many samples.

Contact

wangxb19 at mails.jlu.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Baselines_Codes_and_Results		Baselines_Codes_and_Results
Datasets		Datasets
Figures		Figures
MEL_Methods		MEL_Methods
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection

Contents

Overview

Framework

Baselines

Dependencies

Instructions

Results

Contact

About

Releases

Packages

Languages

eltociear/MEL

Folders and files

Latest commit

History

Repository files navigation

MEL: Efficient Multi-Task Evolutionary Learning for High-Dimensional Feature Selection

Contents

Overview

Framework

Baselines

Dependencies

Instructions

Results

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages