HSC_VAD

This repo is the official Pytorch implementation of paper:

"Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection"

Overview

Requirements

Please follow the requirements.txt

Video Parsing

Objects Tracking

You can use the FairMot(MCMOT) for obtaining the boundingboxes of objects.

Appearance Branch

We use the pre-trained ViT for extracting the appearance features of objects, you can refer to the implementations of PyTorch-Pretrained-ViT.

Motion Branch

We use the PoseConv3D for motion features extraction, and we use the following Python script to get the 512-dimension motion features.

results, feats = inference_recognizer(model, fake_anno, outputs=['backbone'])
backbone_feats = feats['backbone']
avg_pool = torch.nn.AdaptiveAvgPool3d(1)
pool_feats = torch.mean(avg_pool(backbone_feats), dim=0).view([-1])

Scene Branch

We use the DeepLabV3+ to generate a segmentation map, you can employ this backbone to get the scene features. For scene features extraction with this backbone, you can modify the utils.py and add the following Python script.

features = self.backbone(x)
x = self.classifier(features)
maxpool2d = nn.MaxPool2d(4)
feat = maxpool2d(x)
ind, feat = feat.max(1)[1], feat.max(1)[0]
feat[ind > 10] = 0 #filtering the foreground objects
feat = feat.view([-1])
feat = F.normalize(feat, p=2, dim=-1)

Then you can use the PCA and DBSCAN for obtaining the pseudo-labels of scene features, take the example of ShanghaiTech:

from sklearn.manifold import TSNE
from sklearn.cluster import DBSCAN
dim_processor = TSNE(n_components=2, init="pca", random_state=0)
scene_features = dim_processor.fit_transform(scene_features)
label = DBSCAN(eps=4.63, min_samples=5).fit_predict(scene_features)

Extracted Features

You can download the extracted features from links below:

ShanghaiTech (code:9LX7)

CUHK Avenue (code:6F9A)

UCSD Ped2 (code:65K7)

Mixture datasets (code:SW85)

Moiton Augmentation (Optional)

If you would like to utilize the motion(skeleton) augmentation, you can use the following commands, otherwise, you can skip this. Define the rotation angles and related points.

n_frames = 16 #16 frames in a clip
cutting_prob = 0.5 #the cutting probability
rotate_prob = 0.5 #the rotation probability
angle_range = [[], [], [], [], [], [-10, 10], #0-5
               [-10, 10], [-90, 90], [-90, 90], [0, 90], [0, 90], #6-10
               [-10, 10], [-10, 10], [-90, 90], [-90, 90], [-90, 0], [-90, 0]] #11-16

related_point = [[], [], [], [], [], [5, 7, 9], #0-5
                 [6, 8, 10], [7, 9], [8, 10], [9], [10], #6-10
                 [11, 13, 15], [12, 14, 16], [13, 15], [14, 16], [15], [16]] #11-16

ancestor_point = [[], [], [], [], [], 3, #0-5
                  4, 5, 6, 7, 8, #6-10
                  5, 6, 11, 12, 13, 14] #11-16

Define the transformation function:

def rotate_point(keypoints, related_point_i, ancestor_point, angle):
    rotate_mat = np.array([[np.cos(math.radians(angle)), np.sin(math.radians(angle))],
                           [-np.sin(math.radians(angle)), np.cos(math.radians(angle))]])
    keypoints[:, :, related_point_i, :] = np.matmul(keypoints[:, :, related_point_i, :] - keypoints[:, :, ancestor_point, :], rotate_mat) + keypoints[:, :, ancestor_point, :]

def spatial_transformation(keypoints):
    for point_i in range(5, 17):
        if np.random.rand() > rotate_prob:
            angle = random.uniform(angle_range[point_i][0], angle_range[point_i][1])
            for related_point_i in related_point[point_i]:
                rotate_point(keypoints, related_point_i, ancestor_point[point_i], angle)
    return keypoints

def temporal_cutting(keypoints):
    for frame_i in range(n_frames):
        if np.random.rand() > cutting_prob:
            keypoints[0, frame_i, :, :] = 0
    return keypoints

Combining with PoseConv3D, you can use the following Python script:

for aug_i in range(4):
    fake_anno_copy = deepcopy(anno_raw) #anno_raw is obtained by PoseConv3D
    keypoints = fake_anno_copy['keypoint']
    if aug_i == 0:
        keypoints = spatial_transformation(keypoints)
    if aug_i == 1:
        keypoints = temporal_cutting(keypoints)
    if aug_i == 2:
        keypoints = temporal_cutting(spatial_transformation(keypoints))
    if aug_i == 3:
        keypoints = spatial_transformation(temporal_cutting(keypoints))

Motion Classifier Training

You can train the motion binary classifier by motion_classifier_training.py.

For ShanghaiTech:

python motion_classifier_training.py --dataset ShanghaiTech --train_dataset_path ShanghaiTech_MA_training_dataset.npy --batch_size 512

For CUHK_Avenue:

python motion_classifier_training.py --dataset Avenue --train_dataset_path Avenue_MA_training_dataset.npy --batch_size 512

You can download the motion augmentation training datasets from links below:

ShanghaiTech (code:37JE)

CUHK Avenue (code:T5Y6)

Training

Check the configurations of datasets in config/, e.g. ShanghaiTech.yaml

For ShanghaiTech:

python HSC_training.py --dataset ShanghaiTech --gpu 0

For CUHK_Avenue:

python HSC_training.py --dataset Avenue --gpu 0

For UCSD_ped2:

python HSC_training.py --dataset UCSD_ped2 --gpu 0

For Mixture datasets, take a example of 01+02：

python HSC_training.py --dataset Mixture0102 --gpu 0

Inference

Check the configurations of datasets in config/, e.g. ShanghaiTech.yaml

If you would like to use the motion augmentation mode, you can set the augmentMotion to true in the yaml file.

For ShanghaiTech:

python HSC_testing.py --dataset ShanghaiTech --gpu 0

For Avenue:

python HSC_testing.py --dataset Avenue --gpu 0

For UCSD_ped2:

python HSC_testing.py --dataset UCSD_ped2 --gpu 0

For Mixture datasets, take a example of 01+02：

python HSC_training.py --dataset Mixture0102 --gpu 0

You can download the checkpoint models from links below:

ShanghaiTech (code:YF69)

CUHK Avenue (code:H8W5)

UCSD Ped2 (code:5WDC)

Mixture datasets (code:S8V6)

You can download the generated augmentation scores from links below:

ShanghaiTech (code:F4W6)

CUHK Avenue (code:547K)

Results

Model	UCSD Ped2	CUHK Avenue	ShanghaiTech
HSC	98.1%	92.4%	83.0%
HSC w/ MA	98.1%*	93.7%	83.4%

*: We only use the appearance branch in the UCSD Ped2.

License

This repo is released under the MIT License.

Citation

If this repo is useful for your research, please consider citing our paper:

@InProceedings{sun2023hierarchical,
    author    = {Sun, Shengyang and Gong, Xiaojin},
    title     = {Hierarchical Semantic Contrast for Scene-Aware Video Anomaly Detection},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {22846-22856}
}

Acknowledgements

The video parsing backbones of this repo are based on FairMot(MCMOT), ViT, PoseConv3D and DeepLabV3+, we sincerely thank them for their contributions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HSC_VAD

Overview

Requirements

Video Parsing

Objects Tracking

Appearance Branch

Motion Branch

Scene Branch

Extracted Features

Moiton Augmentation (Optional)

Motion Classifier Training

Training

Inference

Results

License

Citation

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
data		data
models		models
utils		utils
HSC_testing.py		HSC_testing.py
HSC_training.py		HSC_training.py
README.md		README.md
framework.png		framework.png
motion_classifier_training.py		motion_classifier_training.py
requirements.txt		requirements.txt

shengyangsun/HSC_VAD

Folders and files

Latest commit

History

Repository files navigation

HSC_VAD

Overview

Requirements

Video Parsing

Objects Tracking

Appearance Branch

Motion Branch

Scene Branch

Extracted Features

Moiton Augmentation (Optional)

Motion Classifier Training

Training

Inference

Results

License

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages