In this section, I will be covering RCNN.
Region-based Convolutional Neural Network
What is RCNN?
RCNN is a two-step object detection framework.
The idea: rather than scanning the whole picture pixel by pixel, it first guesses where objects might be, and then it looks inside those regions to figure out what they are.
A regression model tweaks the box’s coordinates for a tighter fit.
Why proposals must match? This step can only make small corrections. If your initial region is way off, regression can’t fix it.
Analogy: If you poke a sticker almost perfectly onto a label spot, you can nudge it into place. If you miss completely, a nudge isn’t enough.
Summary Table: RCNN Components
| Step | What It Does | Key Point |
|---|---|---|
| Region Proposal | Select likely areas with selective search | External, hand-designed, about 2,000 regions |
| Feature Extraction | CNN extracts regional features | Slow, cropped/warped input, per region |
| Classification | classifiy regions | This gives the class |
| Bounding Box Regression | Tweaks box for tightness | Small corrections only, must start close |
This technique removes overlapping boxes.
Analogy: If you circled a ball on a photo three times, NMS picks the best circle and removes the extras.
How it works: