Method for face detection #47

davinellulinvega · 2021-03-25T23:14:56Z

Instead of simply throwing a pull-request your way and see if it sticks, I thought I'd open an issue to discuss how you plan on implementing the face detection feature?
I have already cloned your project and tried my hand at such an implementation using OpenCV and YoloV4-tiny. For the moment the neural network is trained using the OpenImage dataset and can only detect Human hands and heads.
The class handling the detection and tracking is implemented here: https://github.com/davinellulinvega/pycozmo/blob/face_detection/pycozmo/multi_tracking.py
And an example of how this might be used within the Pycozmo library can be found here:
https://github.com/davinellulinvega/pycozmo/blob/face_detection/examples/face_detection.py (note that you also need the .cfg, .names, and .weights files for building the network).
As mentioned above, for the moment yolo has only been trained to detect hands and heads, but it can "easily" be retrained to handle heads, cozmo's cubes, pets (cat, dogs, birds, ...), and cozmo's other objects (platform, cube markers, and what else?).

Hope this might be of interest for you. If it is do let me know and we can talk more about a better integration of the multi-tracker (and its accompanying files) within the Pycozmo library.

gimait · 2021-04-20T23:50:43Z

Hey, I took a look at your code, I think it is fine, but we would need to work a bit to properly integrate it with cozmo.

First, I think the image processing should be done in a separate thread. This means that we would need the tracking to have its own thread class with methods to add new frames in the loop as they come from cozmo. We should also think about whether we want to process all frames or only the ones we have time to do (depending on each device it might take some time to process a frame).

We should also have a few classes to define and store the properties of each visible object, and add the functionality to store and retrieve positions and other information about visible objects (and maybe those that went out of the field of view for a more advanced version).

This should of course be independent of the implementation of yolo or other image recognition methods, so we can easily update the algorithms we use or combine them (have something like yolo for faces and something else for the cubes).

You could start implementing this slowly and create PRs with each part, I am interested in working with this too (I was planning to work in the cube recognition in the past), so I could start implementing some of the basic architecture if you'd like some help, then we can see how it goes as we add the code.

davinellulinvega · 2021-05-01T18:55:54Z

I think even before talking about threads, we should first split the MultiTracker class into (at least) three sub-classes, each corresponding to a different functionality: object tracking, object detection, and display. We might keep the MultiTracker class to more easily manage the tracker, object detection, and display, in one place. It would also contain the main loop.

I like your idea of creating different classes to define and store properties associated with each detected/tracked object. Makes it easier and more flexible to use from a script. So I am all for this. Might I suggest defining an interface so that other classes can handle those objects in a more generic manner?

At a more abstract level, you seem to imply that Yolo would only be used to detect faces. However, my intention was rather to have Yolo be THE algorithm/mechanism used for all sorts of object detection. At the moment it has only been trained to detect faces and hands. Using the Open Image dataset though it is possible to train the network to detect some 600 categories of objects. If I remember correctly, the roadmap for the pycozmo library includes pet detection. Well that could be done with the same network. Same goes for the cubes. Although, detecting the cubes would take a bit more work, since we would have to take pictures of them in different settings and annotate all those pictures by hand. But it is doable.
So to summarize a little bit. What I had in mind is to integrate the MultiTracker class into the pycozmo library and train its neural network to detect many categories of objects. Then the user or other classes within the library can simply specify what they want the network to detect and voila.

Finally, I am not against some help to implement this whole thing (that is the reason for opening this issue in the first place), especially if it will serve to detect different categories of object. But before that we should define what the final architecture will look like, so that we are not just coding in the dark.

gimait · 2021-05-02T00:32:40Z

we should first split the MultiTracker class into (at least) three sub-classes, each corresponding to a different functionality: object tracking, object detection, and display.

I agree, the code should be split in separate classes for different functionality. I am not sure about what should go where, but I think you have a good idea of how it should be done, so you can go ahead.

I think even before talking about threads, .... It would also contain the main loop.

I talk about threads thinking on how this should be included in the pycozmo architecture. We currently have independent thread classes managing different tasks, and computer vision (which I believe will be the most computationally expensive task in the package) should definitely not be blocking the main thread. This is because, in my opinion, any user of pycozmo should be able to import the package and start playing with the robot without needing to worry about the management of the tasks we are implementing in the package.

If you'd like to implement everything without worrying much about this, that's perfectly fine, you can do it in an example at first, and we'll find a way of including it in the package afterwards.

Might I suggest defining an interface so that other classes can handle those objects in a more generic manner?

I'm not sure what you mean by this, but yes, we should be able to access these objects in a generic manner. The way I see to do this would be to include a list/dict/manager class for these objects in the Client, so they can be easily accessed from there.

At a more abstract level, you seem to imply that Yolo would only be used to detect faces.

I think we can use yolo for any item we can train it to. My only concern is how easily would be to train this model to detect the cubes. I think it will be a pain to do enough pictures in enough environments to train the model.
This, as you probably know, is because the accuracy of cnns such as yolo depends deeply in the dataset, and in the case of the cubes, we would also need to classify them not only as a cube but which face of the cube it is...

There are other use cases where yolo might not be the right solution (e.g detect lines or points to improve the precision of the localization of the robot).

Finally, I am not against some help to implement this whole thing (that is the reason for opening this issue in the first place), especially if it will serve to detect different categories of object. But before that we should define what the final architecture will look like, so that we are not just coding in the dark.

As I said, you can go ahead and implement what you'd like, then create a PR and we'll take it from there. I don't know what is the best way of doing this and you have a head start so I think you should go ahead and implement it.

davinellulinvega · 2021-05-03T08:25:44Z

Very well then, I will start implementing all that and I will let you know where I might need some guidance to integrate my code into the pycozmo library. It might take some time, since I still have to find a job on the side, but I will try my best.

gimait · 2021-05-05T05:34:12Z

Sounds good! You don't need to hurry with this, I think the best of this library is how much we can learn from cozmo and developing robotic solutions. Hopefully, you'll have some good fun with it. Let me know if you want some help or have some questions (you can just drop me an email), I'd like to help you if I can.

Also, good luck with the job hunt!

davinellulinvega · 2021-06-11T19:35:52Z

Hello again,

I finally took some time off after getting my vaccine and worked on the face detection algorithm for the pycozmo. Really sorry that it took a month to get there.
If you want to have a look I created a pull request here: #49.

Hope you'll enjoy it.

davinellulinvega mentioned this issue Apr 20, 2021

Remote control car example with OpenCV UI #46

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Method for face detection #47

Method for face detection #47

davinellulinvega commented Mar 25, 2021

gimait commented Apr 20, 2021

davinellulinvega commented May 1, 2021

gimait commented May 2, 2021

davinellulinvega commented May 3, 2021

gimait commented May 5, 2021

davinellulinvega commented Jun 11, 2021

Method for face detection #47

Method for face detection #47

Comments

davinellulinvega commented Mar 25, 2021

gimait commented Apr 20, 2021

davinellulinvega commented May 1, 2021

gimait commented May 2, 2021

davinellulinvega commented May 3, 2021

gimait commented May 5, 2021

davinellulinvega commented Jun 11, 2021