|
[1] Aslanargun, A., Mammadov, M., Yazici, B. and Yolacan, S. (2007). Comparison of ARIMA, neural networks and hybrid models in time series: tourist arrival forecasting. Journal of Statistical Computation and Simulation, 77(1), 29-53. [2] Ahn, J. M., Kim, S., Ahn, K. S., Cho, S. H., Lee, K. B. and Kim, U. S. (2018). A deep learning model for the detection of both advanced and early glaucoma using fundus photography. PLoS One, 13(11), 1-8. [3] Baldi, P. (1995). Gradient descent learning algorithm overview: a general dynamical systems perspective. IEEE Transactions on Neural Networks, 6(1), 182-195. [4] Bell, S. and Bala, K. (2015). Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics, 34(4):98. [5] Bengio, Y. (2012). Practical recommendations for gradient-based training of deep architectures. Neural Networks: Tricks of the Trade, vol.7700 of Lecture Notes in Computer Science. [6] Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1-127. [7] Bengio, Y. and LeCun, Y. (2007). Scaling learning algorithms toward AI. Large Scale Kernel Machines, MIT Press, Cambridge, MA. [8] Bengio, Y., Lamblin, P., Popovici, D. and Larochelle, H. (2007). Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems, 19, 153-160. [9] Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13, 281–305. [10] Caruana, R., Lawrence, S. and Giles, L. (2001). Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping. Proceedings of the Advances in Neural Information Processing Systems, 402-408. [11] Chakraborti, T., Chatterjee A., Halder A. and Konar, A. (2015). Automated emotion recognition employing a novel modified binary quantum-behaved gravitational search algorithm with differential mutation. Expert Systems, 32(4), 522-530. [12] Chen, L., Huang, J. F., Wang, F. M. and Tang, Y. L. (2007). Comparison between back propagation neural network and regression models for the estimation of pigment content in rice leaves and panicles using hyperspectral data. International Journal of Remote Sensing, 28(16), 3457–3478. [13] Chiba, Z., Abghour, N., Moussaid, K., Omri, A. El and Rida, M. (2018). A novel architecture combined with optimal parameters for back propagation neural networks applied to anomaly network intrusion detection. Computers & Security, 75, 36-58. [14] Chong, E., Han, C., Park, F. C. (2017). Deep learning networks for stock market analysis and prediction: Methodology, data representations, and case studies. Expert Systems with Applications, 83, 187–205. [15] Deng, L. and Yu, D. (2014). Deep learning: methods and applications. Foundations & Trends in Signal Processing, 7(3-4), 197-387. [16] Deng, W., Li, W., and Yang, X. H. (2011). A novel hybrid optimization algorithm of computational intelligence techniques for highway passenger volume prediction. Expert Systems with Applications, 38(4), 4198-4205. [17] Dozart, T. (2016). Incorporating Nesterov Momentum into Adam. Proceedings of 4th International Conference on Learning Representations (ICLR), Workshop Track, San Juan, Puerto Rico. [18] Duan, K., Keerthi, S. S., Chu, W., Shevade, S. K. and Poo, A. N. (2003). Multi-category classification by soft-max combination of binary classifiers. International Workshop on Multiple Classifier Systems, Springer, Berlin, Germany, 125–134. [19] Duchi, J., Hazan, E. and Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121-2159. [20] Fausett, L. (1994). Fundamentals of Neural Network: Architectures, Algorithms, and Applications, Herts, England, Prentice-Hall International. [21] Freeway Bureau, MOTC. Available at: https://www.freeway.gov.tw/.
[22] Gaidhane, R., Vaidya, C. and Raghuwanshi, M. (2014). Intrusion detection and attack classification using back-propagation neural network. International Journal of Engineering Research & Technology (IJERT), 3(3), 1112-1115. [23] Glorot, X, Bordes, A. and Bengio, Y. (2011). Deep sparse rectifier neural networks. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 15 of Proceedings of Machine Learning Research, Fort Lauderdale, FL, USA. PMLR., 315–323. [24] Glorot, X. and Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTAT), Sardinia, Italy, 249–256. [25] Gnana Sheela, K. and Deepa, S. N. (2013). Review on methods to fix number of hidden neurons in neural networks. Mathematical Problems in Engineering, 1-11. [26] Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H. and Schmidhuber, J. (2009). A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 855–868. [27] Haykin, S. (1998). Neural Networks: A Comprehensive Foundation. Prentice Hall, New York. [28] He, K., Zhang, X., Ren, S. and Sun, J. (2015). Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. Proceedings of the IEEE International Conference on Computer Vision (ICCV) Internet, San Diego, Chile, 1026–1034. [29] Hinton, G. E. (2007). Learning multiple layers of representation. Trends in cognitive sciences, 11(10), 428–34. [30] Hinton, G. E., Osindero, S. and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. [31] Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. Computer Science, 3(4), 212–223. [32] Hinton, G. E., Srivastava, N. and Swersky, K. (2014). Lecture 6a Overview of Mini-Batch Gradient Descent. Lecture Notes Distributed in CSC321 of University of Toronto. Available at: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
[33] Huang, G. B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Transactions on Neural Networks, 14(2), 274–281. [34] Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. ArXiv preprint arXiv:1502.03167. Available at: https://arxiv.org/pdf/1502.03167.pdf.
[35] Jain, A. K., Mao, J. and Mohiuddin, K. M. (1996). Artificial neural networks: A tutorial. Computer, 29(3), 31-44. [36] Jarrett, K., Kavukcuoglu, K., Ranzato, M.’ A. and LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? IEEE 12th International Conference on Computer Vision (ICCV), Kyoto, Japan, 2146-2153. [37] Karsoliya, S. (2012). Approximating number of hidden layer neurons in multiple hidden layer BPNN architecture. International Journal of Engineering Trends and Technology, 3(6), 714-717. [38] Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., and Tang, P. T. P. (2017). On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. Proceedings of 5th International Conference on Learning Representations (ICLR), Toulon, France. [39] Kim, I.-J. and Xie, X. (2015). Handwritten Hangul recognition using deep convolutional neural networks. International Journal on Document Analysis and Recognition (IJDAR), 18(1), 1-13. [40] Kingma, D. P. and Ba, J. L. (2015). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR), San Diego. [41] Koza, J. R., Bennett, F. H., Andre, D. and Keane, M. A. (1996). Automated design of both the topology and sizing of analog electrical circuits using genetic programming. Artificial Intelligence in Design '96. Springer, Dordrecht. 151–170. [42] Larochelle, H., Bengio, Y., Louradour, J. and Lamblin, P. (2009). Exploring strategies for training deep neural networks. Journal of Machine Learning Research 1, 10, 1-40. [43] Lathuilière, S., Mesejo, P., Alameda-Pineda, X. and Horaud, R. (2019). A comprehensive analysis of deep regression. IEEE Transactions on Pattern Analysis and Machine Intelligence, DOI 10.1109/TPAMI.2019.2910523. [44] LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep Learning. Nature, 521, 436-444. [45] Lee, Y., Oh, S. H. and Kim, M. W. (1993). An analysis of premature saturation in back propagation learning. Neural Networks, 6(5), 719-728. [46] Li, M., Zhang, T., Chen, Y., and Smola, A. J. (2014). Efficient mini-batch training for stochastic optimization. Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, 661–670. [47] Lin, B., Lin, G., Liu, X., Ma, J., Wang, X., Lin, F. and Hu, L. (2015). Application of back-propagation artificial neural network and curve estimation in pharmacokinetics of losartan in rabbit. International Journal of Clinical and Experimental Medicine, 8(12), 22352–8. [48] Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y. and Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11-26. [49] Masters, D. and Luschi, C. (2018). Revisiting small batch training for deep neural networks. ArXiv preprint arXiv:1804.07612. Available at: https://arxiv.org/pdf/1804.07612.pdf.
[50] Nair, V. and Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 807–814. [51] Ng, A. Y. (2004). Feature selection, L1 vs. L2 regularization, and rotational invariance. Proceedings of the 21st International Conference on Machine Learning, Ban, Canada. [52] Niaki, S. T. A. and Abbasi, B. (2008). Detection and classification mean-shifts in multi-attribute processes by artificial neural networks. International Journal of Production Research, 46(11), 2945-2963. [53] O’Neal, M. R., Engel, B. A., Ess, D. R., and Frankenberger, J. R. (2002). Neural network prediction of maize yield using alternative data coding algorithms. Biosystems Engineering, 83(1), 31–45. [54] Panchal, F. S. and Panchal, M. (2014). Review on methods of selecting number of hidden nodes in artificial neural network. International Journal of Computer Science and Mobile Computing (IJCSMC), 3(11), 455-464. [55] Panchal, G., Ganatra, A., Kosta, Y. P. and Panchal, D. (2011). Behaviour analysis of multilayer perceptrons with multiple hidden neurons and hidden layers. International Journal of Computer Theory and Engineering, 3(2), 332-337. [56] Pascanu, R., Mikolov, T. and Bengio, Y. (2013). On the difficulty of training recurrent neural networks. Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA, 1310–1318. [57] Prechelt, L. (1998). Early Stopping—but when? Neural Networks: Tricks of the Trade, Springer, Berlin Heidelberg, 1524, 55-69. [58] Reddi, S. J., Kale, S. and Kumar, S. (2018). On the convergence of adam and beyond. Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada. [59] Ruder, S. (2017). An overview of gradient descent optimization algorithms. ArXiv preprint, arXiv:1609.04747v2. Available at: https://arxiv.org/pdf/1609.04747.pdf.
[60] Rumelhart, D. E. and Zipser, D. (1985). Feature discovery by competitive learning. Cognitive Science, 9(1), 75-112. [61] Rumelhart, D. E., Hinton, G. E. and Williams, R. J. (1985). Learning Internal Representations by Error Propagation. (No. ICS-8506). California Univ San Diego La Jolla Inst for Cognitive Science. [62] Russell, S. and Norvig, P. (2010). Artificial Intelligence: A Modern Approach. 3rd Edition, Prentice-Hall, Upper Saddle River. [63] Saduf and Wani, M. A. (2013). Comparative study of back propagation learning algorithms for neural networks. International Journal of Advanced Research in Computer Science and Software Engineering, 3(12), 1151-1156. [64] Sainath, T. N., Kingsbury, B., Saon, G., Soltau, H., Mohamed, A. r., Dahl, G. and Ramabhadran, B. (2015). Deep convolutional neural networks for large-scale speech tasks. Neural Networks, 64, 39–48. [65] Sak, H., Senior, A. and Beaufays, F. (2014). Long short-term memory recurrent neural network architectures for large scale acoustic modeling. Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association. [66] Santurkar, S., Tsipras, D., Ilyas, A. and Madry, A. (2018). How does Batch Normalization help optimization? ArXiv preprint arXiv:1805.11604. Available at: https://arxiv.org/pdf/1805.11604.pdf.
[67] Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. [68] Smith, L. N. (2017). Cyclical learning rates for training neural networks. IEEE Winter Conference on Applications of Computer Vision (WACV), 464–472. [69] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2014). Dropout: a Simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929-1958. [70] Stathakis, D. (2009). How many hidden layers and nodes? International Journal of Remote Sensing, 30(8), 2133-2147. [71] Sutskever, I., Martens, J., Dahl, G. and Hinton, G. (2013). On the importance of initialization and momentum in deep learning. Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA. [72] Thoma, M. (2017). Analysis and optimization of convolutional neural network architectures. ArXiv preprint arXiv:1707.09725. Available at: https://arxiv.org/pdf/1707.09725.pdf.
[73] Tseng, F. M., Yu, H.C. and Tzeng, G. H. (2002), Combining neural network model with seasonal time series ARIMA model. Technological Forecasting & Social Change, 69(1), 71-87. [74] Wang, L., Zeng, Y., Zhang, J., Huang, W. and Bao, Y. (2006). The criticality of spare parts evaluating model using artificial neural network approach. In: Alexandrov V.N., van Albada G.D., Sloot P.M.A., Dongarra J. (eds) Computational Science – ICCS 2006. Lecture Notes in Computer Science, 3991, 728-735. [75] Widrow, B., Winter, R. G. and Baxter, R. A. (1987). Learning phenomena in layered neural networks. Proceedings of the First IEEE International Joint Conference on Neural Networks, San Diego, 2, 411-429. [76] Wilson, R. and Martinez, T. R. (2001). The need for small learning rates on large problems. Proceedings of the 2001 International Joint Conference on Neural Networks (IJCNN’01), 115-119. [77] Wong, S. C., Gatt, A., Stamatescu, V. and McDonnell, M. D. (2016). Understanding data augmentation for classification: When to warp? International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, QLD, Australia, 1–6. [78] Yang, H. F., Dillon, T. S., Chen, Y. P. P. (2017). Optimized structure of the traffic flow forecasting model with a deep learning approach, IEEE Transactions on Neural Networks and Learning Systems, 28(10), 2371–2381. [79] Zeiler, M. D. (2012). Adadelta: An adaptive learning rate method. ArXiv preprint arXiv:1212.5701. Available at: https://arxiv.org/pdf/1212.5701.pdf.
[80] Zell, A. (1994). Simulation Neuronale Netze. Addison-Wesley, New York. [81] Zhang, C., Tan, K. C. and Ren, R. (2016) Training cost-sensitive deep belief networks on imbalance data problems. International Joint Conference on Neural Networks (IJCNN), 4362-4367.
|