In this section, I will be covering RCNN.

RCNN

Region-based Convolutional Neural Network

What is RCNN?

RCNN is a two-step object detection framework.

The idea: rather than scanning the whole picture pixel by pixel, it first guesses where objects might be, and then it looks inside those regions to figure out what they are.

Main Steps of RCNN

  1. Region Proposal (Selective Search)
  2. Feature Extraction (CNN)
  3. Classification
  4. Bounding Box Regression

Summary Table: RCNN Components

Step What It Does Key Point
Region Proposal Select likely areas with selective search External, hand-designed, about 2,000 regions
Feature Extraction CNN extracts regional features Slow, cropped/warped input, per region
Classification classifiy regions This gives the class
Bounding Box Regression Tweaks box for tightness Small corrections only, must start close

NMS (Non-Maximum Suppression)

This technique removes overlapping boxes.

Analogy:  If you circled a ball on a photo three times, NMS picks the best circle and removes the extras.

How it works: