[SOUND] Faster R-CNN is the next step in evolution of R-CNN model. Essentially, Faster R-CNN is Fast R-CNN plus Regional Proposal Network. In Faster R-CNN, the last main problem of R-CNN approach is solved. The dependency from the external hypothesis generation method is removed. Objects are detected in a single pass with a single neural network. RPN is a simple fully convolutional network which is trained to its multitask class, similar to Fast R-CNN, and serves as a proposal generator. Experiments in neural network visualisation have shown that by decoding one response at a single pixel we can still roughly see the object outline. So finer localization information has been encoded in the channels of convolutional feature response. We can extract this information from localization objects with RPN. The Region Proposal Network is a small network that is slided over convolutional feature map. This small network takes its imput and then by N special window usually N equals 3. RPN simultaneously classify corresponding region as object unknown object in the regress bounding box location. Position of the sliding window provides localization information with reference to the image. Box regression provides finer localization with the reference to this sliding window itself. At each sliding window position a set of object proposals is defined. Each proposal has different size and aspect ratio. Such proposals are called anchors. Anchors improve handling of objects of different size and aspect ratio. Essentially anchors is a variant of sliding window with different sizes and aspect ratio. When training of Region Proposal Network anchor is marked as qualitative example if intersection over union is larger than 0.7, or it reaches maximum for all anchors for this particular ground truth example. Anchor is marked as negative sample if IoU is lower than 0.3. The box regression is straight, so to regress the box of positive examples to ground truth box, for the image with a resolution 1000 per 600 pixels and VGG-16 base architecture as feature extractor, they have 60 by 40 sliding window position and 9 anchors through each window position. So in total we have 21,500 proposals per image. Faster R-CNN can be trained end to end as one network with four losses. RPN classification loss, RPN regression loss, Fast R-CNN classification loss over classes, Fast R-CNN regression loss to regress the proposal bounding box, so the ground tools bounding box. We can easily change the Baesian architecture for feature extraction, and retrain Faster R-CNN model. For example, changing from VGG-16 to ResNet-101 model will give us 28% relative gain on Microsoft COCO dataset. By removing the dependency on external proposal generation method, speed is significantly improved, so Faster R-CNN, this VGG-based architecture can perform detection at five frames per second. Here are just an example of Faster R-CNN detection with ResNet-based architecture. You can see how many objects are detected. However, despite the anchor boxes, it can be difficult for RPN to handle objects of very different scales. RPN has fixed receptive field. So small objects can occupy a very small portion of receptive field or if objects are very large then the receptive field will contain only a part of the object. We can solve this problem by training a set of RPN for various scales. Each RPN will take different convolutional layer or set of layers as input so the receptive field will be of different size. This will significantly improve detection of small and large objects so one, Faster-RCN model can detect simultaneously objects from small to large sizes. [SOUND]