Www.cell.com



Supplemental InformationDeep Learning for Plant Stress Phenotyping: Trends and Future PerspectivesAsheesh Kumar Singh1, Baskar Ganapathysubramanian2, Soumik Sarkar2,* and Arti Singh1,*1Department of Agronomy, Iowa State University, Ames, Iowa, USA 2Department of Mechanical Engineering, Iowa State University, Ames, Iowa, USACorrespondence: Arti Singh (arti@iastate.edu); Soumik Sarkar (soumiks@iastate.edu)Most Popular CNN Architectures for Image Recognition, and Simultaneous Objection Detection and LocalizationAlexNet: This CNN architecture was proposed ADDIN EN.CITE <EndNote><Cite><Author>Krizhevsky</Author><Year>2012</Year><RecNum>95</RecNum><DisplayText>[1]</DisplayText><record><rec-number>95</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">95</key></foreign-keys><ref-type name="Conference Paper">47</ref-type><contributors><authors><author>Alex Krizhevsky</author><author>Ilya Sutskever</author><author>Geoffrey E. Hinton</author></authors></contributors><titles><title>ImageNet classification with deep convolutional neural networks</title><secondary-title>Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1</secondary-title></titles><pages>1097-1105</pages><dates><year>2012</year></dates><pub-location>Lake Tahoe, Nevada</pub-location><publisher>Curran Associates Inc.</publisher><urls></urls><custom1>2999257</custom1></record></Cite></EndNote>[S1] that demonstrated remarkable success on the ImageNet data set and won the ImageNet Large Scale Visual Recognition Competition (ILSVRC). This architecture contains five convolutional layers and three fully connected layers along with a few max-pooling and dropout layers. Two GTX 580 GPUs were used for 5 to 6 days for training the network and it demonstrated the efficacies of data augmentation (translation, horizontal reflection, patch extraction), dropout based regularization and effectiveness of Rectified Linear Unit (ReLU) nonlinearities. Success of AlexNet essentially opened up the applicability of deep CNNs for standard computer vision applications. ZF Net: Zeiler and Fergus proposed this CNN architecture in 2013 that won ILSVRC in 2013. The architecture is very similar to that of AlexNet, except for a few minor modifications. Instead of using 11x11 sized filters in the first layer (as in AlexNet), ZF Net used filters of size 7x7 and a decreased stride value. The reasoning behind this modification is that a smaller filter size in the first convolution layer helps retain a lot of original pixel information. While AlexNet was trained on 15M images, ZF Net was trained on only 1.3M images. This study also introduced the visualization concept involving the deconvolution operation ADDIN EN.CITE <EndNote><Cite><Author>Zeiler</Author><Year>2014</Year><RecNum>94</RecNum><DisplayText>[2]</DisplayText><record><rec-number>94</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">94</key></foreign-keys><ref-type name="Conference Proceedings">10</ref-type><contributors><authors><author>Zeiler, Matthew D.</author><author>Fergus, Rob</author></authors><tertiary-authors><author>Fleet, David</author><author>Pajdla, Tomas</author><author>Schiele, Bernt</author><author>Tuytelaars, Tinne</author></tertiary-authors></contributors><titles><title>Visualizing and Understanding Convolutional Networks</title><tertiary-title>Computer Vision – ECCV 2014</tertiary-title></titles><pages>818-833</pages><dates><year>2014</year></dates><pub-location>Cham</pub-location><publisher>Springer International Publishing</publisher><isbn>978-3-319-10590-1</isbn><label>10.1007/978-3-319-10590-1_53</label><urls></urls></record></Cite></EndNote>[S2].GoogLeNet: Szegedy and his team at Google developed this architecture that won ILSVRC in 2014. This architecture was also known as the Inception network that used 9 inception modules in the whole architecture, with over 100 layers in total which made the network really deep. Also, there were no fully connected layers and instead average pooling was used to go from a 7x7x1024 feature volume to a 1x1x1024 feature volume. This results in use of 12x fewer parameters compared to AlexNet ADDIN EN.CITE <EndNote><Cite><Author>Szegedy</Author><Year>2016</Year><RecNum>97</RecNum><DisplayText>[3]</DisplayText><record><rec-number>97</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">97</key></foreign-keys><ref-type name="Conference Proceedings">10</ref-type><contributors><authors><author>Szegedy, C. </author><author>Vanhoucke, V.</author><author>Ioffe, S.</author><author>Shlens, J.</author><author>Wojna, Z.</author></authors></contributors><titles><title>Rethinking the Inception Architecture for Computer Vision</title><secondary-title>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</secondary-title><alt-title>2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</alt-title></titles><pages>2818-2826</pages><keywords><keyword>computer vision</keyword><keyword>image classification</keyword><keyword>neural nets</keyword><keyword>ILSVRC 2012 classification challenge validation set</keyword><keyword>deep convolutional networks</keyword><keyword>inception architecture</keyword><keyword>Benchmark testing</keyword><keyword>Computational efficiency</keyword><keyword>Computational modeling</keyword><keyword>Computer architecture</keyword><keyword>Convolution</keyword><keyword>Training</keyword></keywords><dates><year>2016</year><pub-dates><date>27-30 June 2016</date></pub-dates></dates><urls></urls><electronic-resource-num>10.1109/CVPR.2016.308</electronic-resource-num></record></Cite></EndNote>[S3].VGGNet: In 2014, this model became the runner-up in ILSVRC which was developed by Simonyan and Zisserman. Variants such as VGG16 and VGG19 demonstrated that the depth of a network is a critical factor for good performance. Also, input to the network did not require any z-scoring or preprocessing ADDIN EN.CITE <EndNote><Cite><Author>Simonyan</Author><Year>2014</Year><RecNum>98</RecNum><DisplayText>[4]</DisplayText><record><rec-number>98</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">98</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Simonyan, K. </author><author>Zisserman, A.</author></authors></contributors><titles><title>Very Deep Convolutional Networks for Large-Scale Image Recognition</title><secondary-title>CoRR abs/1409.1556</secondary-title></titles><periodical><full-title>CoRR abs/1409.1556</full-title></periodical><dates><year>2014</year></dates><urls></urls></record></Cite></EndNote>[S4] .ResNet: This architecture uses the residual concept in building the feature hierarchy that involves adding up the original signal at a layer with its transformed versions that demonstrated significant success and won the ILSVRC in 2015. Typically, it has 152 layers and interestingly, after only the first two layers, the spatial size gets compressed from an input volume of 224x224 to a 56x56 volume. The study also attempted a 1202-layer network, but achieved a lower test accuracy, most likely due to overfitting ADDIN EN.CITE <EndNote><Cite><Author>He</Author><Year>2015</Year><RecNum>107</RecNum><DisplayText>[5]</DisplayText><record><rec-number>107</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">107</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>He, K.</author><author>Zhang, X.</author><author>Ren, S.</author><author>Sun, J.</author></authors></contributors><titles><title>Deep Residual Learning for Image Recognition</title><secondary-title>CoRR</secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1512.03385</volume><dates><year>2015</year></dates><urls></urls></record></Cite></EndNote>[S5].More recently, other architectures such as hybrid models using Inception and ResNet called Inception-ResNet was developed in 2016. In 2017, the concept of separable convolutions was introduced in the Xception network. Also, in 2017-18, the concept of DenseNet was introduced that involves dense connections from one layer to multiple layers (as opposed to only the layers immediately before and after) ADDIN EN.CITE <EndNote><Cite><Author>Huang</Author><Year>2016</Year><RecNum>104</RecNum><DisplayText>[6-8]</DisplayText><record><rec-number>104</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">104</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Huang, Gao</author><author>Liu, Zhuang </author><author>Weinberger, Kilian</author></authors></contributors><titles><title>Densely Connected Convolutional Networks</title><secondary-title>CoRR</secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1608.06993</volume><dates><year>2016</year></dates><urls></urls></record></Cite><Cite><Author>Szegedy</Author><Year>2016</Year><RecNum>105</RecNum><record><rec-number>105</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">105</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Szegedy, C.</author><author>Ioffe, S. </author><author>Vanhoucke, V.</author></authors></contributors><titles><title>Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning</title><secondary-title>CoRR</secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1602.07261</volume><dates><year>2016</year></dates><urls></urls></record></Cite><Cite><Author>Chollet</Author><Year>2016</Year><RecNum>106</RecNum><record><rec-number>106</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">106</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Chollet, F.</author></authors></contributors><titles><title>Xception: Deep Learning with Depthwise Separable Convolutions}</title><secondary-title>CoRR</secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1610.02357</volume><dates><year>2016</year></dates><urls></urls></record></Cite></EndNote>[S6-S8]. In addition to image recognition, extremely useful DCNN architectures were proposed for simultaneous objection detection and localization. Some of the most popular of those architectures are briefly introduced below. Region CNN (RCNN): This is one of the first successful object detection and localization algorithm based on CNN. Region-CNN (RCNN) ADDIN EN.CITE <EndNote><Cite><Author>Girshick</Author><Year>2014</Year><RecNum>99</RecNum><DisplayText>[9]</DisplayText><record><rec-number>99</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">99</key></foreign-keys><ref-type name="Conference Paper">47</ref-type><contributors><authors><author>Girshick, Ross </author><author>Donahue, Jeff</author><author>Darrell, Trevor</author><author>Malik, Jitendra</author></authors></contributors><titles><title>Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation</title><secondary-title>Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition</secondary-title></titles><pages>580-587</pages><dates><year>2014</year></dates><publisher>IEEE Computer Society</publisher><urls></urls><custom1>2679851</custom1><electronic-resource-num>10.1109/cvpr.2014.81</electronic-resource-num></record></Cite></EndNote>[S9] was proposed in 2014 that focuses on various bounding boxes within an image and resizes them to a certain input size appropriate for a CNN trained for object detection. Fast RCNN: The RCNN algorithm was accelerated using the fast RCNN ADDIN EN.CITE <EndNote><Cite><Author>Girshick</Author><Year>2015</Year><RecNum>76</RecNum><DisplayText>[10]</DisplayText><record><rec-number>76</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">76</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Girshick, Ross B.</author></authors></contributors><titles><title>Fast R-CNN</title><secondary-title>CoRR</secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1504.08083</volume><dates><year>2015</year><pub-dates><date>/</date></pub-dates></dates><urls><related-urls><url>;[S10] algorithm that essentially avoids running multiple rounds of convolutional operations for each bounding box (or region) of interest. Instead convolutional operations are run once for the entire image and then only parts of the feature maps (after the layers of convolutional operations) corresponding to the different bounding boxes are passed through fully connected layers to the output layer for object detection ADDIN EN.CITE <EndNote><Cite><Author>Girshick</Author><Year>2014</Year><RecNum>99</RecNum><DisplayText>[9]</DisplayText><record><rec-number>99</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">99</key></foreign-keys><ref-type name="Conference Paper">47</ref-type><contributors><authors><author>Girshick, Ross </author><author>Donahue, Jeff</author><author>Darrell, Trevor</author><author>Malik, Jitendra</author></authors></contributors><titles><title>Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation</title><secondary-title>Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition</secondary-title></titles><pages>580-587</pages><dates><year>2014</year></dates><publisher>IEEE Computer Society</publisher><urls></urls><custom1>2679851</custom1><electronic-resource-num>10.1109/cvpr.2014.81</electronic-resource-num></record></Cite></EndNote>[S9]. Faster RCNN: In 2016, a faster RCNN concept emerged which automatically predicts the object bounding boxes and the scores as confidence level of the detection ADDIN EN.CITE <EndNote><Cite><Author>Ren</Author><Year>2015</Year><RecNum>100</RecNum><DisplayText>[11]</DisplayText><record><rec-number>100</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">100</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Ren, Shaoqing </author><author>He, Kaiming</author><author>Girshick, Ross</author><author>Sun, Jian</author></authors></contributors><titles><title>Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks</title><secondary-title>CoRR </secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1506.01497</volume><dates><year>2015</year></dates><urls></urls></record></Cite></EndNote>[S11]. It basically combines the powers of the fast RCNN and the Region Proposal Network (RPN), to economically search only high confidence regions, commonly referred to as the ‘attention’ mechanism. The idea is to fine-tune the proposed region which is determined by the ratio of the intersection and union of each anchor box and the ground truth, and then to perform object detection on the high confidence proposed regions. The filters for the region are selected to be multi-scaled to learn to detect the best scale of the detected object. Mask RCNN: In 2018, the concept was further refined to achieve a Mask RCNN ADDIN EN.CITE <EndNote><Cite><Author>He</Author><Year>2017</Year><RecNum>78</RecNum><DisplayText>[12]</DisplayText><record><rec-number>78</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">78</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>He, Kaiming</author><author>Gkioxari, Georgia</author><author>Dollár, Piotr</author><author>Girshick, Ross B.</author></authors></contributors><titles><title>Mask R-CNN</title><secondary-title>CoRR</secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1703.06870</volume><dates><year>2017</year><pub-dates><date>/</date></pub-dates></dates><urls><related-urls><url>;[S12] that is a masking type RCNN that aims to obtain the tightest bounding box that essentially leads to object region segmentation. The network combines the faster RCNN and the Fully Convolution Network (FCN) ADDIN EN.CITE <EndNote><Cite><Author>Shelhamer</Author><Year>2017</Year><RecNum>79</RecNum><DisplayText>[13]</DisplayText><record><rec-number>79</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">79</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Shelhamer, E.</author><author>Long, J.</author><author>Darrell, T.</author></authors></contributors><titles><title>Fully Convolutional Networks for Semantic Segmentation</title><secondary-title>IEEE Transactions on Pattern Analysis and Machine Intelligence</secondary-title></titles><periodical><full-title>IEEE Transactions on Pattern Analysis and Machine Intelligence</full-title></periodical><pages>640-651</pages><volume>39</volume><number>4</number><keywords><keyword>feedforward neural nets</keyword><keyword>image classification</keyword><keyword>image representation</keyword><keyword>image resolution</keyword><keyword>image segmentation</keyword><keyword>learning (artificial intelligence)</keyword><keyword>transforms</keyword><keyword>NYUDv2</keyword><keyword>PASCAL VOC</keyword><keyword>PASCAL-Context</keyword><keyword>SIFT Flow</keyword><keyword>coarse layer</keyword><keyword>contemporary classification networks</keyword><keyword>correspondingly-sized output</keyword><keyword>fine layer</keyword><keyword>fully convolutional networks</keyword><keyword>learned representations</keyword><keyword>semantic segmentation</keyword><keyword>spatially dense prediction tasks</keyword><keyword>visual models</keyword><keyword>Computer architecture</keyword><keyword>Convolution</keyword><keyword>Fuses</keyword><keyword>Proposals</keyword><keyword>Semantics</keyword><keyword>Training</keyword><keyword>Convolutional Networks</keyword><keyword>Deep Learning</keyword><keyword>Transfer Learning</keyword></keywords><dates><year>2017</year></dates><isbn>0162-8828</isbn><urls></urls><electronic-resource-num>10.1109/TPAMI.2016.2572683</electronic-resource-num></record></Cite></EndNote>[S13] to perform the instance-aware segmentation. Faster RCNN detects the objects while FCN masks the object to distinguish them from the others ADDIN EN.CITE <EndNote><Cite><Author>He</Author><Year>2017</Year><RecNum>78</RecNum><DisplayText>[12]</DisplayText><record><rec-number>78</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">78</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>He, Kaiming</author><author>Gkioxari, Georgia</author><author>Dollár, Piotr</author><author>Girshick, Ross B.</author></authors></contributors><titles><title>Mask R-CNN</title><secondary-title>CoRR</secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1703.06870</volume><dates><year>2017</year><pub-dates><date>/</date></pub-dates></dates><urls><related-urls><url>;[S12]. SSD: In Single Shot multibox Detector (SSD) is a specific object detection and localization architecture where anchor boxes have multiple offsets ADDIN EN.CITE <EndNote><Cite><Author>Liu</Author><Year>2015</Year><RecNum>101</RecNum><DisplayText>[14]</DisplayText><record><rec-number>101</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">101</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Liu, Wei </author><author>Anguelov, Dragomir </author><author>Erhan, Dumitru </author><author>Szegedy, Christian </author><author>Reed, Scott </author><author>Fu, Cheng-Yang</author><author>Berg, Alexander</author></authors></contributors><titles><title>SSD: Single Shot MultiBox Detector</title><secondary-title>CoRR</secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1512.02325</volume><dates><year>2015</year></dates><urls></urls></record></Cite></EndNote>[S14]?and the network is pre-trained using VGG16 ADDIN EN.CITE <EndNote><Cite><Author>Simonyan</Author><Year>2014</Year><RecNum>98</RecNum><DisplayText>[4]</DisplayText><record><rec-number>98</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">98</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Simonyan, K. </author><author>Zisserman, A.</author></authors></contributors><titles><title>Very Deep Convolutional Networks for Large-Scale Image Recognition</title><secondary-title>CoRR abs/1409.1556</secondary-title></titles><periodical><full-title>CoRR abs/1409.1556</full-title></periodical><dates><year>2014</year></dates><urls></urls></record></Cite></EndNote>[S4] . Specifically, 3 offsets each of 1:1 scales, 1:2 and 2:1 are preset. However, the algorithm SSD is fully convolutional network where the offsets and the confidence are predicted around the cells by feature maps at each layer to mimic a multiscale setting. ?Model training weighs the localization loss and the softmax confidence loss. Most offsets having low confidence are discarded as negative examples after the training. YOLO: You Only Look Once (YOLO) is another model for object classification and localization. It is typically pre-trained on ImageNet ADDIN EN.CITE <EndNote><Cite><Author>Russakovsky</Author><Year>2015</Year><RecNum>102</RecNum><DisplayText>[15]</DisplayText><record><rec-number>102</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">102</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Russakovsky, Olga</author><author>Deng, Jia</author><author>Su, Hao</author><author>Krause, Jonathan</author><author>Satheesh, Sanjeev</author><author>Ma, Sean</author><author>Huang, Zhiheng</author><author>Karpathy, Andrej</author><author>Khosla, Aditya</author><author>Bernstein, Michael</author><author>Berg, Alexander C.</author><author>Fei-Fei, Li</author></authors></contributors><titles><title>ImageNet Large Scale Visual Recognition Challenge</title><secondary-title>International Journal of Computer Vision</secondary-title></titles><periodical><full-title>International Journal of Computer Vision</full-title></periodical><pages>211-252</pages><volume>115</volume><number>3</number><dates><year>2015</year><pub-dates><date>December 01</date></pub-dates></dates><isbn>1573-1405</isbn><label>Russakovsky2015</label><work-type>journal article</work-type><urls><related-urls><url>;[S15], while the location coordinates (equivalent to offsets in SSD) of the anchor boxes are predicted all at once. In YOLO, the number of anchor boxes (called cluster IOU) are determined from the training data using k-means clustering to get the bounding boxes offsets where k is typically set to 5 for sufficiently high enough recall-to-complexity, since the anchor box center of the SSD are fixed. Starting off from equally sized grids of 13x13, bounding box sizes are increased or reduced based on class prediction. Each cell in the grid simultaneously predicts k bounding boxes, each of which are represented by the coordinates, the width and height, the confidence scores and the probability distribution over the classes. YOLO also learns the probability of IOU of an anchor box belonging to a class as well as the probability that an object is already in the box ADDIN EN.CITE <EndNote><Cite><Author>Redmon</Author><Year>2016</Year><RecNum>103</RecNum><DisplayText>[16]</DisplayText><record><rec-number>103</rec-number><foreign-keys><key app="EN" db-id="55pdzaptar0sr6ewsv8pxarbte009x20wt0f">103</key></foreign-keys><ref-type name="Journal Article">17</ref-type><contributors><authors><author>Redmon, J.</author><author>Farhadi, A.</author></authors></contributors><titles><title>YOLO9000: Better, Faster, Stronger</title><secondary-title>CoRR </secondary-title></titles><periodical><full-title>CoRR</full-title></periodical><volume>abs/1612.08242</volume><dates><year>2016</year></dates><urls></urls></record></Cite></EndNote>[S16] .Links to important repositories Pytorch implementations for AlexNet, VGG16, VGG19, Inception(GoogleNet), ResNet, DenseNet, SqueezeNet : Implementations of Xception, resnet50, inception, inception-resnet, mobilenet, densenet etc.: (keras): (different versions, keras implementation): , RCNN: , Supplemental ReferencesS ADDIN EN.REFLIST 1Krizhevsky, A., et al. (2012) ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, pp. 1097-1105, Curran Associates Inc.S2Zeiler, M.D. and Fergus, R. (2014) Visualizing and Understanding Convolutional Networks. pp. 818-833, Springer International PublishingS3Szegedy, C., et al. (2016) Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818-2826S4Simonyan, K. and Zisserman, A. (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 S5He, K., et al. (2015) Deep Residual Learning for Image Recognition. CoRR abs/1512.03385S6Huang, G., et al. (2016) Densely Connected Convolutional Networks. CoRR abs/1608.06993S7Szegedy, C., et al. (2016) Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. CoRR abs/1602.07261S8Chollet, F. (2016) Xception: Deep Learning with Depthwise Separable Convolutions}. CoRR abs/1610.02357S9Girshick, R., et al. (2014) Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580-587, IEEE Computer SocietyS10Girshick, R.B. (2015) Fast R-CNN. CoRR abs/1504.08083S11Ren, S., et al. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. CoRR abs/1506.01497S12He, K., et al. (2017) Mask R-CNN. CoRR abs/1703.06870S13Shelhamer, E., et al. (2017) Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 640-651S14Liu, W., et al. (2015) SSD: Single Shot MultiBox Detector. CoRR abs/1512.02325S15Russakovsky, O., et al. (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 211-252S16Redmon, J. and Farhadi, A. (2016) YOLO9000: Better, Faster, Stronger. CoRR abs/1612.08242 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download