|
[1] D.-H. Lee, K.-L. Chen, K.-H. Liou, C.-L. Liu, and J.-L. Liu, “Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation,” Applied Intelligence, vol. 51, no. 1, pp. 237–247, 2021. [2] D.-H. Lee and J.-L. Liu, “End-to-end deep learning of lane detection and path prediction for real-time autonomous driving,” arXiv preprint arXiv:2102.04738, 2021. [3] G. Hogan, “commaai/openpilot,” https://github.com/commaai/openpilot. [4] S. Ingle and M. Phute, “Tesla autopilot : Semi autonomous driving, an uptick for future autonomy,” in Proceedings of the International Research Journal of Engineering and Technology (IRJET), vol. 03, no. 09, pp. 369–372, 2016. [5] C. Reports, “Active driving assistance systems: Test results and design recommendations,” in Proceedings of the 2020 Consumer Reports ADAS System Test Rankings, 2020. [6] D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “Yolact++: Better real-time instance segmentation,” arXiv preprint arXiv: 1912.06218v2, 2020. [7] Q. S. software, “Snapdragon neural processing engine software development kit (snpe sdk),” https://developer.qualcomm.com/sites/default/files/docs/snpe/overview.html. [8] X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” arXiv preprint arXiv: 1811.11168, 2018. [9] G. Hogan, “commaai/comma10k,” https://github.com/commaai/comma10k. [10] Y. Yousfi, “comma10k-baseline,” https://github.com/YassineYousfi/comma10k-baseline. [11] A. Kirillov, K. He, R. Girshick, C. Rother, and P. Dollár, “Panoptic segmentation,” arXiv preprint arXiv: 1801.00868v3, 2019. [12] Y. Xiong, R. Liao, H. Zhao, R. Hu, M. Bai, E. Yumer, and R. Urtasun, “Upsnet: A unified panoptic segmentation network,” arXiv preprint arXiv: 1901.03784, 2019. [13] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” arXiv preprint arXiv: 1411.4038v2, 2015. [14] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” arXiv preprint arXiv: 1505.04597, 2015. [15] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” arXiv preprint arXiv: 1505.04366, 2015. [16] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoderdecoder architecture for image segmentation,” arXiv preprint arXiv: 1511.00561v3, 2016. [17] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv: 1412.7062v4, 2016. [18] ——, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” arXiv preprint arXiv: 1606.00915v2, 2017. [19] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv: 1706.05587v3, 2017. [20] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” arXiv preprint arXiv: 1802.02611v3, 2018. [21] P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” arXiv preprint arXiv: 1210.5644, 2012. [22] A. Kirillov, R. Girshick, K. He, and P. Dollár, “Panoptic feature pyramid networks,” arXiv preprint arXiv: 1901.02446v2, 2019. [23] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” arXiv preprint arXiv: 1610.02357v3, 2017. [24] A. Tao, K. Sapra, and B. Catanzaro, “Hierarchical multi-scale attention for semantic segmentation,” arXiv preprint arXiv: 2005.10821, 2020. [25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” arXiv preprint arXiv: 1706.03762v5, 2017. [26] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” arXiv preprint arXiv: 1409.3215v3, 2014. [27] K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” arXiv preprint arXiv: 1406.1078v3, 2014. [28] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv: 1409.3215v3, 2016. [29] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-based neural machine translation,” arXiv preprint arXiv: 1508.04025v5, 2015. [30] J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, W. Liu, and B. Xiao, “Deep high-resolution representation learning for visual recognition,” arXiv preprint arXiv: 1908.07919v2, 2020. [31] Y. Yuan, X. Chen, X. Chen, and J. Wang, “Segmentation transformer: Object-contextual representations for semantic segmentation,” arXiv preprint arXiv: 1909.11065v6, 2021. [32] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural net-works,” in Proceedings of the 36th International Conference on Machine Learning, ser.Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds.,vol. 97. PMLR, 2019. [33] ——, “Efficientnetv2: Smaller models and faster training,” arXiv preprint arXiv: 2104.00298v3, 2021. [34] B. Baheti, S. Innani, S. Gajre, and S. Talbar, “Eff-unet: A novel architecture for semantic segmentation in unstructured environment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops(CVPRW), 2020. [35] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 770–778, 2016. [36] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics: The kitti dataset,” the International Journal of Robotics Research(IJRR), vol. 32, no. 11, pp. 1231–1237, doi: 10.1177/0278364913491297, 2013. [37] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2012. [38] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2018. [39] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” arXiv preprint arXiv: 1512.03385, 2015. [40] J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” arXiv preprint arXiv: 1709.01507v3, 2018. [41] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv: 2010.11929, 2020. [42] C. P. Le, M. Soltani, R. Ravier, and V. Tarokh, “Task-aware neural architecture search,” arXiv preprint arXiv: 2010.13962v3, 2021. [43] P. Ramachandran, B. Zoph, and Q. V. Le, “Swish: a self-gated activation function,” arXiv preprint arXiv: 1710.05941, 2017. [44] D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv: 1606.08415, 2016. [45] S. Elfwing, E. Uchibe, and K. Doya, “Sigmoid-weighted linear units for neural network function approximation in reinforcement learning,” arXiv preprint arXiv: 1702.03118, 2017. [46] J. Wang, K. Chen, R. Xu, Z. Liu, C. C. Loy, and D. Lin, “Carafe: Content-aware reassembly of features,” arXiv preprint arXiv: 1905.02188v3, 2019. [47] Z. Tian, T. He, C. Shen, and Y. Yan, “Decoders matter for semantic segmentation: Data-dependent decoding enables flexible feature aggregation,” arXiv preprint arXiv: 1903.02120v3, 2019. |