Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make R-CNN the Caffe detection example #482

Merged
merged 7 commits into from
Jun 11, 2014

Conversation

shelhamer
Copy link
Member

R-CNN [1] is a state-of-the-art detector powered by a finetuned Caffe model. As an off-the-shelf example, I have made a pure Caffe version of R-CNN, ILSVRC13 detection edition, by

  1. making a new model, caffe_rcnn_imagenet_model, with the RCNN class SVMs transplanted into a fully-connected layer
  2. adding an R-CNN crop mode to the pycaffe Detector
  3. rewriting our detection example, which right now runs the Caffe Reference ImageNet Model classifier on selective search windows

@rbgirshick please review to make sure that everything is clear. Thanks!

  1. Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Arxiv 2013.

@shelhamer
Copy link
Member Author

Note that I've added a further example image for detection. A further ilsvrc auxiliary file is needed too, det_synset_words.txt for the detection set names and synset IDs. I simply added it to the get_ilsvrc_aux.sh script in data/ilsvrc12 even though that's not exactly right since it is the ilsvrc13 detection set... but it didn't seem worth the hassle to add another data directory.

@shelhamer
Copy link
Member Author

@sergeyk you may want to review too since you wrote the original detection example.

@rbgirshick
Copy link
Contributor

Thanks for updating this! I'll take a look at it tomorrow.

On Sun, Jun 8, 2014 at 9:07 PM, Evan Shelhamer [email protected]
wrote:

@sergeyk https://github.com/sergeyk you may want to review too since
you wrote the original detection example.


Reply to this email directly or view it on GitHub
#482 (comment).

https://www.cs.berkeley.edu/~rbg/

@sergeyk
Copy link
Contributor

sergeyk commented Jun 9, 2014

Looks good to me

On Sun, Jun 8, 2014 at 10:25 PM, Ross Girshick [email protected]
wrote:

Thanks for updating this! I'll take a look at it tomorrow.

On Sun, Jun 8, 2014 at 9:07 PM, Evan Shelhamer [email protected]
wrote:

@sergeyk https://github.com/sergeyk you may want to review too since
you wrote the original detection example.


Reply to this email directly or view it on GitHub
#482 (comment).

https://www.cs.berkeley.edu/~rbg/


Reply to this email directly or view it on GitHub
#482 (comment).

@shelhamer
Copy link
Member Author

@sergeyk could you update the NMS function to match R-CNN as you and Ross have talked about?

  • overlap threshold should be 0.3
  • overlap should be measured as w * h / (union of areas)

and give it a test? Currently it seems to be returning the wrong windows. If you look at the output the boxes are not the top scoring boxes.

Please push the changes directly to my fork to include them in this PR.

@rbgirshick
Copy link
Contributor

Hi Evan,

I want to make sure that the Caffe R-CNN demo outputs the same results (or
close enough) that I get with my code. Aside from the NMS issue, it seems
like there might be some other small differences.

One that comes to mind is that prior to doing selective search, I resize
images so that the width is 500px. This ended up being important for
ImageNet detection because images have a variety of different sizes and
selective search is not scale invariant.

For reference, I'm using the R-CNN model at
data/rcnn_models/ilsvrc2013/rcnn_model.mat from
https://www.cs.berkeley.edu/~rbg/r-cnn-release1-data-ilsvrc2013.tgz. It uses
the caffe net at data/caffe_nets/finetune_ilsvrc13_val1+train1k_iter_50000.

When I run R-CNN, I end up with the following detections in this image (top
for bicycle and top for person). BTW, which set does this image come from?
If it's from the train or val sets, then these results are likely overfit.

[image: Inline image 1]
[image: Inline image 2]

On Mon, Jun 9, 2014 at 11:54 AM, Evan Shelhamer [email protected]
wrote:

@sergeyk https://github.com/sergeyk could you update the NMS function
to match R-CNN as you and Ross have talked about?

  • overlap threshold should be 0.3
  • overlap should be measured as w * h / (union of areas)

and give it a test? Currently it seems to be returning the wrong windows.
If you look at the output the boxes are not the top scoring boxes.


Reply to this email directly or view it on GitHub
#482 (comment).

https://www.cs.berkeley.edu/~rbg/

- caffe.Detector learned how to crop windows with context in the R-CNN
  style s.t. the bordero of the network input is a given amount of
  context.
- add --context_pad arg to detect.py for amount of context. Default is
  16, as in R-CNN.
@shelhamer
Copy link
Member Author

@sergeyk is it alright with you if a push a selective_search_rcnn.m to your selective search module for use in the demo? Ross would prefer the same region proposal mechanism be used for the demo.

@shelhamer
Copy link
Member Author

Included @sergeyk's latest NMS code but the results still don't look quite right as it seems the top scoring box isn't winning in the suppression? and fixed the display method to use the right coordinates.

@sergeyk
Copy link
Contributor

sergeyk commented Jun 10, 2014

Yeah for sure

On Monday, June 9, 2014, Evan Shelhamer [email protected] wrote:

@sergeyk https://github.com/sergeyk is it alright with you if a push a
selective_search_rcnn.m to your selective search module for use in the
demo? Ross would prefer the same region proposal mechanism be used for the
demo.


Reply to this email directly or view it on GitHub
#482 (comment).

@shelhamer
Copy link
Member Author

@rbgirshick I made the fixes we talked about offline and proposals are now collected with the R-CNN configuration of selective search. Please take another look when you have a chance.

Thanks for your help!

@rbgirshick
Copy link
Contributor

Did you figure out the issue with the NMS update (if there was one)?

On Mon, Jun 9, 2014 at 9:41 PM, Evan Shelhamer [email protected]
wrote:

@rbgirshick https://github.com/rbgirshick I made the fixes we talked
about offline and proposals are now collected with the R-CNN configuration
of selective search. Please take another look when you have a chance.

Thanks for your help!


Reply to this email directly or view it on GitHub
#482 (comment).

https://www.cs.berkeley.edu/~rbg/

@rbgirshick
Copy link
Contributor

I was able to go through the notebook and the output looks close enough to
the gold standard to be fine. I see that the image isn't being resized
before selective search, but I guess that's ok for this demo.

In the detection notebook the text about 190 proposals needs to be updated
("190 regions were proposed with this light configuration of selective
search....").

In the last part, with the NMS'd bicycle examples, it would be better to
print out the scores (in the figure or just as text output) in order to
show that the non-top detections are very low scoring.

Otherwise, it looks good!

On Mon, Jun 9, 2014 at 9:54 PM, Ross Girshick [email protected] wrote:

Did you figure out the issue with the NMS update (if there was one)?

On Mon, Jun 9, 2014 at 9:41 PM, Evan Shelhamer [email protected]
wrote:

@rbgirshick https://github.com/rbgirshick I made the fixes we talked
about offline and proposals are now collected with the R-CNN configuration
of selective search. Please take another look when you have a chance.

Thanks for your help!


Reply to this email directly or view it on GitHub
#482 (comment).

https://www.cs.berkeley.edu/~rbg/

https://www.cs.berkeley.edu/~rbg/

@shelhamer
Copy link
Member Author

The plotting was wrong haha. The usual X vs. Y conspiracy.

Le lundi 9 juin 2014, Ross Girshick [email protected] a écrit :

Did you figure out the issue with the NMS update (if there was one)?

On Mon, Jun 9, 2014 at 9:41 PM, Evan Shelhamer <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');>
wrote:

@rbgirshick https://github.com/rbgirshick I made the fixes we talked
about offline and proposals are now collected with the R-CNN
configuration
of selective search. Please take another look when you have a chance.

Thanks for your help!


Reply to this email directly or view it on GitHub
#482 (comment).

https://www.cs.berkeley.edu/~rbg/


Reply to this email directly or view it on GitHub
#482 (comment).

@shelhamer
Copy link
Member Author

The resizing to width 500 is here:
https://github.com/sergeyk/selective_search_ijcv_with_python/blob/master/selective_search_rcnn.m

Apologies for the the mixing of configuration across repos, but I figured
the right way was the way to hardcode haha.

Thanks for the catches and suggestions for the text -- I'll follow up with
those shortly.

Le lundi 9 juin 2014, Ross Girshick [email protected] a écrit :

I was able to go through the notebook and the output looks close enough to
the gold standard to be fine. I see that the image isn't being resized
before selective search, but I guess that's ok for this demo.

In the detection notebook the text about 190 proposals needs to be updated
("190 regions were proposed with this light configuration of selective
search....").

In the last part, with the NMS'd bicycle examples, it would be better to
print out the scores (in the figure or just as text output) in order to
show that the non-top detections are very low scoring.

Otherwise, it looks good!

On Mon, Jun 9, 2014 at 9:54 PM, Ross Girshick <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');> wrote:

Did you figure out the issue with the NMS update (if there was one)?

On Mon, Jun 9, 2014 at 9:41 PM, Evan Shelhamer <[email protected]
javascript:_e(%7B%7D,'cvml','[email protected]');>
wrote:

@rbgirshick https://github.com/rbgirshick I made the fixes we talked
about offline and proposals are now collected with the R-CNN
configuration
of selective search. Please take another look when you have a chance.

Thanks for your help!


Reply to this email directly or view it on GitHub
#482 (comment).

https://www.cs.berkeley.edu/~rbg/

https://www.cs.berkeley.edu/~rbg/


Reply to this email directly or view it on GitHub
#482 (comment).

- run through and save new output
- collect region proposals with R-CNN configuration (see sergeyk/selective_search_ijcv_with_python)
- call detect.py in GPU mode
- fix NMS plotting: X and Y coords were accidentally exchanged. print scores too.
@shelhamer
Copy link
Member Author

@rbgirshick I amended the last commit to address your comments.

Is everyone ok with merging then cherry-picking this example to master?

@rbgirshick
Copy link
Contributor

Looks good!

On Tue, Jun 10, 2014 at 10:17 AM, Evan Shelhamer [email protected]
wrote:

@rbgirshick https://github.com/rbgirshick I amended the last commit to
address your comments.

Is everyone ok with merging then cherry-picking this example to master?


Reply to this email directly or view it on GitHub
#482 (comment).

https://www.cs.berkeley.edu/~rbg/

@kloudkl
Copy link
Contributor

kloudkl commented Jun 11, 2014

ImportError: No module named selective_search_ijcv_with_python

This means that we need to first git clone https://github.com/sergeyk/selective_search_ijcv_with_python. I put it in caffe/python along with caffe/python/caffe. Is this the recommended place?

@kloudkl
Copy link
Contributor

kloudkl commented Jun 11, 2014

On a machine with 8GB memory and four CPU cores, I compiled Caffe with BLAS=open make -j4 after make superclean to run this example. selective_search_rcnn used more than 11GB memory causing memory swap and nearly 100% occupation ratio of the CPUs. But there is still no output after running for 4 minutes.

In the tutorial, the timing result is "Processed 223 windows in 16.525 s". What could be the cause of the performance discrepancy?

@kloudkl
Copy link
Contributor

kloudkl commented Jun 11, 2014

Was the example run on GPU? Could you provide the download link to the resultant cat.h5?

@shelhamer
Copy link
Member Author

Note that selective search is not part of Caffe and is distributed as
closed source, compiled MATLAB code. It is quite memory intensive.
sergeyk/selective_search_ijcv_with_python is a python wrapper around this
code, which we cannot modify ourselves.

Le mercredi 11 juin 2014, kloudkl [email protected] a écrit :

On a machine with 8GB memory and four CPU cores, I compiled Caffe with BLAS=open
make -j4 after make superclean to run this example. selective_search_rcnn
used more than 11GB memory causing memory swap and nearly 100% occupation
ratio of the CPUs. But there is still no output after running for 4 minutes.

In the tutorial
https://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb,
the timing result is "Processed 223 windows in 16.525 s". What could be the
cause of the performance discrepancy?


Reply to this email directly or view it on GitHub
#482 (comment).

@shelhamer
Copy link
Member Author

Yes, the ~1600 windows were run on GPU. If you check the example text,
which explains the '--gpu' argument to detect.py

Le mercredi 11 juin 2014, kloudkl [email protected] a écrit :

Was the example run on GPU?


Reply to this email directly or view it on GitHub
#482 (comment).

@bhack
Copy link
Contributor

bhack commented Jun 11, 2014

If you are interested we are integrating bing objectness[1] in opencv with one of this year GSOC

[1]https://mmcheng.net/bing/

@shelhamer
Copy link
Member Author

Merging -- will follow-up to port to master later (so that the public example is R-CNN).

shelhamer added a commit that referenced this pull request Jun 11, 2014
Make R-CNN the Caffe detection example
@shelhamer shelhamer merged commit a7e397a into BVLC:dev Jun 11, 2014
@shelhamer shelhamer deleted the rcnn-detector-example branch June 11, 2014 22:22
@shelhamer
Copy link
Member Author

@bhack please let us know once your region proposal method is in opencv. I'd be interested in seeing the performance of our models with bing objectness vs. selective search -- and of course open and fast code is better than not.

@rbgirshick
Copy link
Contributor

I looked into trying bing out a month or so ago, but found that it was
behind a registration wall--that stopped me from even trying it. It will be
great to have an open implementation. I'd like to replace selective search
with something that is faster/better/open source.

On Wed, Jun 11, 2014 at 6:09 PM, Evan Shelhamer [email protected]
wrote:

@bhack https://github.com/bhack please let us know once your region
proposal method is in opencv. I'd be interested in seeing the performance
of our models with bing objectness vs. selective search -- and of course
open and fast code is better than not.


Reply to this email directly or view it on GitHub
#482 (comment).

https://www.cs.berkeley.edu/~rbg/

@rbgirshick
Copy link
Contributor

It would be great if we could make the python caffe R-CNN demo faster so that it's closer to the speed of the reference matlab version.

On my machine (six-core i7; k20) it runs quite slowly. I added some timing statements to the code, which reveals that it spends a large amount of time cropping and pre-processing windows and very little inside Caffe doing forward.

Timing in GPU mode
Selective search returned 1570 windows in 6.419 s.
Cropping inputs...cropped windows in 43.749 s.
Computing CNN features and scoring regions...done! (preprocess 18.987 s forward 5.183 s)
Processed 1570 windows in 74.385 s.

The matlab version takes about 12.5s versus 74s above. I don't know enough python to understand if this to be expected or if there are some obvious ways to easily improve speed. Thoughts? (Also, it might be good for someone else to verify these timings in case something is messed up on my machine vis-a-vis my python installation.)

@shelhamer
Copy link
Member Author

Thanks for the timing analysis Ross. I'll try to investigate the image
processing slowdown -- it is entirely possible I made a suboptimal choice
somewhere.

Le mercredi 11 juin 2014, Ross Girshick [email protected] a écrit :

It would be great if we could make the python caffe R-CNN demo faster so
that it's closer to the speed of the reference matlab version.

On my machine (six-core i7; k20) it runs quite slowly. I added some timing
statements to the code, which reveals that it spends a large amount of time
cropping and pre-processing windows and very little inside Caffe doing
forward.

Timing in GPU mode
Selective search returned 1570 windows in 6.419 s.
Cropping inputs...cropped windows in 43.749 s.
Computing CNN features and scoring regions...done! (preprocess 18.987 s
forward 5.183 s)
Processed 1570 windows in 74.385 s.

The matlab version takes about 12.5s versus 74s above. I don't know enough
python to understand if this to be expected or if there are some obvious
ways to easily improve speed. Thoughts? (Also, it might be good for someone
else to verify these timings in case something is messed up on my machine
vis-a-vis my python installation.)


Reply to this email directly or view it on GitHub
#482 (comment).

@kloudkl
Copy link
Contributor

kloudkl commented Jun 12, 2014

We tried BING two weeks ago. I remember that BING's password is mmcheng.net, https://mmcheng.net or https://mmcheng.net/. It is fast but you have tell it where are the targets for it to detect the wanted objects.

@kloudkl
Copy link
Contributor

kloudkl commented Jun 12, 2014

The hot spot lines of python code can be found by the profilers.

@kloudkl
Copy link
Contributor

kloudkl commented Jun 12, 2014

The detector induces memory swap. Thus it can't be profiled correctly. Instead, I profiled the classification demo. The most time consuming function/method calls are as follows.

278606 function calls (276088 primitive calls) in 5.484 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    3.704    1.235    3.705    1.235 pycaffe.py:38(_Net_forward)
        1    0.739    0.739    0.785    0.785 classifier.py:16(__init__)
     2361    0.363    0.000    0.369    0.000 {numpy.core.multiarray.array}
       21    0.076    0.004    0.082    0.004 pycaffe.py:242(_Net_preprocess)
        1    0.033    0.033    0.282    0.282 pycaffe.py:4(<module>)
      230    0.026    0.000    0.070    0.000 doccer.py:12(docformat)
       12    0.021    0.002    0.025    0.002 {skimage.transform._warps_cy._warp_fast}
        1    0.018    0.018    0.035    0.035 __init__.py:22(<module>)
       22    0.014    0.001    0.739    0.034 __init__.py:1(<module>)
     6425    0.013    0.000    0.013    0.000 {method 'expandtabs' of 'str' objects}
      397    0.012    0.000    0.012    0.000 function_base.py:1519(__init__)
        2    0.010    0.005    0.010    0.005 io.py:41(oversample)
        2    0.010    0.005    0.026    0.013 machar.py:113(_do_init)
     2103    0.009    0.000    0.009    0.000 {method 'reduce' of 'numpy.ufunc' objects}
      238    0.009    0.000    0.014    0.000 doccer.py:128(indentcount_lines)
     6419    0.009    0.000    0.009    0.000 {method 'splitlines' of 'str' objects}
        1    0.008    0.008    0.009    0.009 transformer.py:9(<module>)
  373/106    0.007    0.000    0.019    0.000 sre_parse.py:379(_parse)
    72123    0.007    0.000    0.007    0.000 {method 'append' of 'list' objects}
      281    0.007    0.000    0.009    0.000 function_base.py:2945(add_newdoc)
        1    0.007    0.007    0.007    0.007 pycodegen.py:1(<module>)
      150    0.006    0.000    0.007    0.000 {method 'sub' of '_sre.SRE_Pattern' objects}
278606 function calls (276088 primitive calls) in 5.508 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    5.522    5.522 predict.py:1(<module>)
        3    3.753    1.251    3.754    1.251 pycaffe.py:38(_Net_forward)
        2    0.000    0.000    2.825    1.413 classifier.py:46(predict)
        2    0.000    0.000    2.543    1.272 pycaffe.py:107(_Net_forward_all)
        1    0.748    0.748    0.796    0.796 classifier.py:16(__init__)
       22    0.012    0.001    0.653    0.030 __init__.py:1(<module>)
     2361    0.362    0.000    0.367    0.000 {numpy.core.multiarray.array}
       85    0.000    0.000    0.355    0.004 numeric.py:392(asarray)
        1    0.034    0.034    0.257    0.257 pycaffe.py:4(<module>)
        2    0.001    0.000    0.222    0.111 io.py:1(<module>)
        1    0.000    0.000    0.150    0.150 hough_transform.py:1(<module>)
        1    0.000    0.000    0.138    0.138 _polygon.py:1(<module>)
        1    0.001    0.001    0.138    0.138 __init__.py:229(<module>)
        1    0.000    0.000    0.107    0.107 _peak_finding.py:3(<module>)
        1    0.001    0.001    0.107    0.107 __init__.py:331(<module>)
        1    0.002    0.002    0.105    0.105 pyplot.py:17(<module>)
        1    0.001    0.001    0.104    0.104 stats.py:166(<module>)
        1    0.002    0.002    0.103    0.103 distributions.py:8(<module>)

@bhack
Copy link
Contributor

bhack commented Jun 12, 2014

@kloudkl What do you mean with "you have tell it where are the targets for it to detect the wanted objects."?

@kloudkl
Copy link
Contributor

kloudkl commented Jun 17, 2014

Initialize the detector with the possible positions of the objects.

@bhack
Copy link
Contributor

bhack commented Jun 17, 2014

I don't undertstand... Bing generally give you 1000 candidate windows with a detection rate of 96%

@kloudkl
Copy link
Contributor

kloudkl commented Jun 18, 2014

Maybe we did something wrong. We'll give it another shot recently.

@shelhamer
Copy link
Member Author

The likely culprit for the input preprocessing slowdown is the list comprehension preprocessing one-by-one instead of doing it batch.

@sergeyk although you like the composition of np.asarray, list comprehension, and single input this seems worth changing.

@bhack
Copy link
Contributor

bhack commented Jun 24, 2014

@shelhamer @rbgirshick @kloudkl The initial BING opencv gsoc porting is here. It still doesn't have the training component ported but we have shared trained Mat and a sample.
@fpuja is the student @lenlen is the other mentor.
@bittnt is the original author of linux port.

Any feedback is welcome.

cc:@vpisarev

@bhack
Copy link
Contributor

bhack commented Jun 25, 2014

There was an extra character in the Bing repository link. Now it is fixed.

@bhack
Copy link
Contributor

bhack commented Jun 26, 2014

Take a look also to this https://arxiv.org/abs/1406.4729v1

@bhack bhack mentioned this pull request Jun 26, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
Make R-CNN the Caffe detection example
@bhack
Copy link
Contributor

bhack commented May 4, 2015

@rbgirshick Any plan to contribute back FAST-RCNN? How precomputed proposal are actually generated?

@rbgirshick
Copy link
Contributor

@bhack I'm happy to contribute any useful bits of Fast R-CNN to Caffe. On my side (for now at least), I need to jump through some hoops to get permission to do a PR. However, the Fast R-CNN code is all MIT licensed, so anyone can contribute it to Caffe if there's demand.

Personally, I don't think it makes sense to include all of Fast R-CNN in Caffe, but it would be nice if it worked out of the box with Caffe master. That would require merging some new layers (smooth l1, roi pooling).

Precomputed proposals used this code: https://github.com/rbgirshick/rcnn/tree/master/selective_search.

@bhack
Copy link
Contributor

bhack commented May 4, 2015

@rbgirshick Ok i've opened a new issue at #2411 with a link to an evaluation of some others object proposal algorithms because I think that selective search it is still closed source.

@rbgirshick
Copy link
Contributor

@bhack Use of selective search is largely for apples-to-apples comparison with previous published work. I'm going to look into using LPO (CVPR'15 paper; https://www.philkr.net/home/lpo), which has open source Python code.

@bhack
Copy link
Contributor

bhack commented May 4, 2015

@rbgirshick Thank you for point to LPO. I've updated the issue.

@bhack bhack mentioned this pull request May 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants