Add example which implements YOLO object detection #576

rgerganov · 2023-10-12T09:00:57Z

This PR implements yolov3-tiny from https://github.com/pjreddie/darknet/. It is still WIP but most of the work is done.
I had to make two changes to ggml for this:

add leaky relu activation
add padding support in ggml_pool_2d(); this one is a bit weird because it has to support odd number of padding elements; I changed the type of p0 and p1 to float to keep the current semantics and then use p=0.5 when one padding element is needed;

rgerganov · 2023-10-17T12:17:41Z

I have completed the implementation of this example and it is ready for review. Here are the results for the default image (dog.jpg):

$ ./yolov3-tiny -m yolov3-tiny.gguf -i dog.jpg        
Layer  0 output shape:  416 x 416 x   16 x   1
Layer  1 output shape:  208 x 208 x   16 x   1
Layer  2 output shape:  208 x 208 x   32 x   1
Layer  3 output shape:  104 x 104 x   32 x   1
Layer  4 output shape:  104 x 104 x   64 x   1
Layer  5 output shape:   52 x  52 x   64 x   1
Layer  6 output shape:   52 x  52 x  128 x   1
Layer  7 output shape:   26 x  26 x  128 x   1
Layer  8 output shape:   26 x  26 x  256 x   1
Layer  9 output shape:   13 x  13 x  256 x   1
Layer 10 output shape:   13 x  13 x  512 x   1
Layer 11 output shape:   13 x  13 x  512 x   1
Layer 12 output shape:   13 x  13 x 1024 x   1
Layer 13 output shape:   13 x  13 x  256 x   1
Layer 14 output shape:   13 x  13 x  512 x   1
Layer 15 output shape:   13 x  13 x  255 x   1
Layer 18 output shape:   13 x  13 x  128 x   1
Layer 19 output shape:   26 x  26 x  128 x   1
Layer 20 output shape:   26 x  26 x  384 x   1
Layer 21 output shape:   26 x  26 x  256 x   1
Layer 22 output shape:   26 x  26 x  255 x   1
dog: 57%
car: 52%
truck: 56%
car: 62%
bicycle: 59%
Detected objects saved in 'predictions.jpg' (time: 0.595000 sec.)

I am using ggml to compute the output of all layers except the YOLO layers (you can find the model architecture in yolov3-tiny.cfg). The output of the YOLO layers is computed with the apply_yolo() function. At the end, detected objects are extractred from the output of the YOLO layers.

As you can see yolov3-tiny is quite fast but not very accurate. However, the same approach can be applied to infer more sophisticated YOLO models like v4, v5 and v7.

ggerganov

Cool 😄

Would be nice to add a test in ci/run.sh

examples/yolo/yolo_image.h

examples/yolo/yolo_image.cpp

ggerganov · 2023-10-19T16:22:59Z

examples/yolo/yolov3-tiny.cpp

+ result = ggml_sub(ctx, result, ggml_repeat(ctx, layer.rolling_mean, result));
+ result = ggml_div(ctx, result, ggml_sqrt(ctx, ggml_repeat(ctx, layer.rolling_variance, result)));
+ result = ggml_mul(ctx, result, ggml_repeat(ctx, layer.scales, result));


These ggml_repeat should be avoidable via implicit broadcast. ggml_mul already supports broadcast - might be a good idea to add for ggml_sub and ggml_div in a similar way. For now, we can implement it just on the CPU and GGML_ASSERT on the GPU backends when broadcast is necessary but not implemented yet

ggml_mul has partial broadcast support, it expects equal number of elements in the first dimension which is not the case here; I may try to address this in a follow-up patch

examples/yolo/yolov3-tiny.cpp

include/ggml/ggml.h

rgerganov · 2023-10-26T08:42:11Z

I have addressed the comments and added a CI test. I also realized that I don't need to create a second computation graph and the code now runs much faster:

./yolov3-tiny -m yolov3-tiny.gguf -i dog.jpg
...
dog: 57%
car: 52%
truck: 56%
car: 62%
bicycle: 59%
Detected objects saved in 'predictions.jpg' (time: 0.360000 sec.)

FSSRepo · 2023-12-04T20:50:50Z

@rgerganov @ggerganov Hello guys, I was working on implementing an upscaler in stable-diffusion.cpp, but it requires the LeakyReLU activation function, with a negative_slope parameter of 0.2. While reviewing ggml to see if there's a similar function, I came across ggml_leaky, but it doesn't have any other parameter to specify. Upon further inspection, it seems that they function in the same way, with the only difference being that ggml_leaky does not use 'min'.

According to this implementation with negative_slope of 0.1, YOLO-3 uses LeakyReLU. The architecture I am implementing requires specifying a negative slope of 0.2. My question is, should I extend the existing function or create a new one?

inline static void ggml_vec_leaky_f32 (const int n, float * y, const float * x) { for (int i = 0; i < n; ++i) y[i] = (x[i] > 0.f) ? x[i] : 0.1f*x[i]; }
inline static void ggml_vec_leaky_relu_f32 (const int n, float * y, const float * x, const float ns) { for (int i = 0; i < n; ++i) y[i] = ((x[i] > 0.f) ? x[i] : 0.f) + ns * ((x[i] < 0.0f) ? x[i] : 0.f); }

// ggml_leaky

struct ggml_tensor * ggml_leaky(
        struct ggml_context * ctx,
        struct ggml_tensor  * a) {
    return ggml_unary(ctx, a, GGML_UNARY_OP_LEAKY);
}

struct ggml_tensor * ggml_leaky_relu(
        struct ggml_context * ctx,
        struct ggml_tensor  * a, float negative_slope, bool inplace) {
    bool is_node = false;

    if (!inplace && (a->grad)) {
        is_node = true;
    }

    struct ggml_tensor * result = inplace ? ggml_view_tensor(ctx, a) : ggml_dup_tensor(ctx, a);

    ggml_set_op_params_i32(result, 0, (int32_t) GGML_UNARY_OP_LEAKY_RELU);
    ggml_set_op_params_i32(result, 1, (int32_t) (negative_slope * 100.0f));

    result->op   = GGML_OP_UNARY;
    result->grad = is_node ? ggml_dup_tensor(ctx, result) : NULL;
    result->src[0] = a;
}

I think the name should be very clear as well. It never occurred to me that ggml_leaky was an activation function. I think it should be renamed to ggml_leaky_relu to emphasize its use.

ggerganov · 2023-12-05T06:59:48Z

In PyTorch, it seems to be called LeakyReLU so I think we should rename ggml_leaky -> ggml_leaky_relu and add float negative_slope argument as you proposed. Probably no need to keep the ggml_leaky overload. Also rename GGML_UNARY_OP_LEAKY -> GGML_UNARY_OP_LEAKY_RELU

Fixes building for x86 processors missing F16C featureset MSVC not included, as in MSVC F16C is implied with AVX2/AVX512

rgerganov force-pushed the yolo branch from e220e24 to c1ef09a Compare October 17, 2023 12:15

rgerganov marked this pull request as ready for review October 17, 2023 12:17

ggerganov approved these changes Oct 19, 2023

View reviewed changes

rgerganov added 3 commits October 26, 2023 10:08

Add leaky relu activation

e094093

Add padding support in ggml_pool_2d()

079949d

Add yolov3-tiny example

efc813d

rgerganov force-pushed the yolo branch from c1ef09a to efc813d Compare October 26, 2023 08:39

ggerganov merged commit 05ff36f into ggerganov:master Oct 30, 2023
4 checks passed

ggerganov mentioned this pull request Nov 22, 2023

WIP Feature/cpp ggerganov/llama.cpp#4155

Closed

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023

cmake : add explicit F16C option (x86) (ggerganov#576)

585d91a

Fixes building for x86 processors missing F16C featureset MSVC not included, as in MSVC F16C is implied with AVX2/AVX512

ggerganov mentioned this pull request Jan 30, 2024

New Features to run MobileVLM on orin ggerganov/llama.cpp#5132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example which implements YOLO object detection #576

Add example which implements YOLO object detection #576

rgerganov commented Oct 12, 2023

rgerganov commented Oct 17, 2023

ggerganov left a comment

ggerganov Oct 19, 2023

rgerganov Oct 26, 2023

rgerganov commented Oct 26, 2023

FSSRepo commented Dec 4, 2023 •

edited

Loading

ggerganov commented Dec 5, 2023

Add example which implements YOLO object detection #576

Add example which implements YOLO object detection #576

Conversation

rgerganov commented Oct 12, 2023

rgerganov commented Oct 17, 2023

ggerganov left a comment

Choose a reason for hiding this comment

ggerganov Oct 19, 2023

Choose a reason for hiding this comment

rgerganov Oct 26, 2023

Choose a reason for hiding this comment

rgerganov commented Oct 26, 2023

FSSRepo commented Dec 4, 2023 • edited Loading

ggerganov commented Dec 5, 2023

FSSRepo commented Dec 4, 2023 •

edited

Loading