Paper Title (use style: paper title)



Deep Learning with Convolutional Neural Network and Long Short-Term Memory for Phishing Detection M. A. AdebowaleSchool of Computing and Information ScienceAnglia Ruskin UniversityChelmsford, UKmoruf.adebowale@pgr.anglia.ac.uk ORCID ID: 0000-0001-5704-4985K. T. LwinSchool of Computing and Digital TechnologyTeesside University, Middlesbrough, UKk.lwin@tees.ac.ukM. A. HossainSchool of Computing and Digital TechnologyTeesside University, Middlesbrough, UKa.hossain@tees.ac.ukAbstract–Phishers sometimes exploit users’ trust of a known website’s appearance by using a similar page that looks like the legitimate site. In recent times, researchers have tried to identify and classify the issues that can contribute to the detection of phishing websites. This study focuses on design and development of a deep learning based phishing detection solution that leverages the Universal Resource Locator and website content such as images and frame elements. A Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM) algorithm were used to build a classification model. The experimental results showed that the proposed model achieved an accuracy rate of 93.28%. Keywords–Phishing detection; Cybercrime; Deep learning (DL); Convolutional Neural Network (CNN); Long Short-Term Memory (LSTM); Big data; Universal Resource Locator (URL). IntroductionNowadays, identity theft is one of the most popular cybercrime activities ADDIN EN.CITE <EndNote><Cite><Author>De Kimpe</Author><Year>2018</Year><IDText>You’ve got mail! Explaining individual differences in becoming a phishing target</IDText><DisplayText>[1]</DisplayText><record><dates><pub-dates><date>01 August 2018</date></pub-dates><year>2018</year></dates><keywords><keyword>Cybercrime</keyword><keyword>Phishing targeting</keyword><keyword>Lifestyle exposure model</keyword><keyword>Impulsivity</keyword><keyword>Online purchasing</keyword><keyword>Digital copying</keyword></keywords><urls><related-urls><url>’ve got mail! Explaining individual differences in becoming a phishing target</title><secondary-title>Telematics and Informatics</secondary-title></titles><pages>1277-1287</pages><number>5</number><contributors><authors><author>De Kimpe, Lies</author><author>Walrave, Michel</author><author>Hardyns, Wim</author><author>Pauwels, Lieven</author><author>Ponnet, Koen</author></authors></contributors><added-date format="utc">1546006081</added-date><ref-type name="Journal Article">17</ref-type><rec-number>1677</rec-number><last-updated-date format="utc">1546011493</last-updated-date><electronic-resource-num>;[1]. The most common method used to steal confidential information from online users is phishing. This activity is usually defined as a fraudulent attempt that is usually made via email and/or fake websites. Phishers are likely to have access to a wide variety of tactics and approaches that they can utilise to create a well-designed phishing attack. Cybercriminals use phishing for various illicit activities such as identity theft and fraud. They can also install malware on inadequately protected end-user systems to gain access to the systems of their victims ADDIN EN.CITE <EndNote><Cite><Author>Thakur</Author><Year>2016</Year><IDText>A Survey Paper On Phishing Detection</IDText><DisplayText>[2]</DisplayText><record><keywords><keyword>Classification</keyword><keyword>Phishing</keyword><keyword>Malware</keyword><keyword>Damage Detection</keyword><keyword>Computer Information Security</keyword><keyword>Internet</keyword><keyword>Evolution</keyword><keyword>Platforms</keyword><keyword>Security and Protection (Ci)</keyword></keywords><titles><title>A Survey Paper On Phishing Detection</title><secondary-title>International Journal of Advanced Research in Computer Science</secondary-title></titles><pages>64-68</pages><number>4</number><contributors><authors><author>Thakur, Himani</author><author>Kaur, Supreet</author></authors></contributors><added-date format="utc">1490695861</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2016</year></dates><rec-number>849</rec-number><last-updated-date format="utc">1534707785</last-updated-date><contributors><secondary-authors><author>Thakur, Himani</author></secondary-authors></contributors><volume>7</volume></record></Cite></EndNote>[2]. Phishing is one of the critical threats to web activities, where the attacker mimics the website of an official establishment to gather the personal data of online users ADDIN EN.CITE <EndNote><Cite><Author>Imani</Author><Year>2017</Year><IDText>Phishing website detection using weighted feature line embedding</IDText><DisplayText>[3]</DisplayText><record><isbn>2008-2045</isbn><titles><title>Phishing website detection using weighted feature line embedding</title><secondary-title>The ISC International Journal of Information Security</secondary-title></titles><pages>49-61</pages><number>2</number><contributors><authors><author>Imani, Maryam</author><author>Montazer, Gh A</author></authors></contributors><added-date format="utc">1537919215</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2017</year></dates><rec-number>1550</rec-number><last-updated-date format="utc">1537919215</last-updated-date><volume>9</volume></record></Cite></EndNote>[3]. Several solutions have been proposed in the literature that use various methodologies to attempt to counter web-phishing threats. However, phishing is a complicated phenomenon to tackle as it differs from the other security threats such as intrusions and malware, which are based on the methodological security holes in network systems ADDIN EN.CITE <EndNote><Cite><Author>Li</Author><Year>2019</Year><IDText>A stacking model using URL and HTML features for phishing webpage detection</IDText><DisplayText>[4]</DisplayText><record><dates><pub-dates><date>01 May 2019</date></pub-dates><year>2019</year></dates><keywords><keyword>Anti-phishing</keyword><keyword>HTML string embedding</keyword><keyword>Machine learning</keyword><keyword>Stacking model</keyword></keywords><urls><related-urls><url> stacking model using URL and HTML features for phishing webpage detection</title><secondary-title>Future Generation Computer Systems</secondary-title></titles><pages>27-39</pages><number>2019</number><contributors><authors><author>Li, Yukun</author><author>Yang, Zhenguo</author><author>Chen, Xu</author><author>Yuan, Huaping</author><author>Liu, Wenyin</author></authors></contributors><added-date format="utc">1546006081</added-date><ref-type name="Journal Article">17</ref-type><rec-number>1663</rec-number><last-updated-date format="utc">1546014553</last-updated-date><electronic-resource-num>;[4]. According to a report by the Anti-Phishing Working Group, the number of phishing attacks detected in the first quarter of 2018 was 46% higher than in the fourth quarter of 2017. The most targeted sector is the payment service which is the object of 39.4% of phishing attacks, followed by software-as-a-service/webmail with 18.7%, financial institutions with 14.2% and other sectors with 16.4% ADDIN EN.CITE <EndNote><Cite><Author>APWG</Author><Year>2018</Year><IDText>Unifying the Global Response to Cybercrime</IDText><DisplayText>[5]</DisplayText><record><dates><pub-dates><date>31 July 2018</date></pub-dates><year>2018</year></dates><urls><related-urls><url> July 2018</custom2><custom1>2018</custom1><work-type>Online</work-type><titles><title>Unifying the Global Response to Cybercrime</title><secondary-title>Phishing Activities Trends Report, Third Quarter 2018</secondary-title></titles><contributors><authors><author>APWG</author></authors></contributors><added-date format="utc">1534722535</added-date><pub-location>Washington, D.C, USA</pub-location><ref-type name="Report">27</ref-type><rec-number>1403</rec-number><publisher>Anti-Phishing Working Group</publisher><last-updated-date format="utc">1545859758</last-updated-date></record></Cite></EndNote>[5]. Given the above, this research aims to develop a solution that includes the detection of phishing attacks as well as providing insights and improving awareness as to how active Internet users can protect themselves against phishing attacks. It is hoped that this research will help to formulate an upward trend in the practice of preventive measures against cyber-security issues. Despite various approaches having been utilised to develop anti-phishing tools to combat phishing attacks, these methods suffer from limited accuracy. To address this issue, in this study we exploit the use of two essential, but previously under-studied method, the deep-learning (DL) algorithm that is categorised as a type of unsupervised machine learning algorithm that learns from the data on its own and designs a scheme for future use. This type of algorithm has a high probability of detecting newly generated phishing URLs and, moreover, does not need manual feature engineering. The use of a convolutional neural network (CNN) and Long Short-Term Memory (LSTM) proffers an effective solution for phishing website detection. Also, we explore and evaluate different CNNs and LSTM architectures that vary in terms of the width and depth of their layers to showcase the effect of varying the dataset spiral image context and scale performance. We then discuss when and why the transfer of learning from CNN and LSTM could be valuable. The original contribution of this study lies in the ability of the proposed method of CNN and LSTM, which we named the Intelligent Phishing Detection System (IPDS), to use the image, frame and text content of a website to detect phishing activities by using a hybridised combination of the CNN and LSTM. To our best knowledge, this is the first work that considers the best-integrated text, image and frame feature based solution using deep learning algorithm (CNN+LSTM) for phishing detection scheme. The proposed IPDS uses two DL layers to classify phishing websites by employing LSTM on text and frame content and the CNN on images. Thus, the model can easily explore the richness of the words embedded in the website’s Universal Resource Locator (URL) as well as the images on the site. The performance of the proposed model was tested by applying it to a phishing dataset that consisted of one million URLs taken from PhishTank and a legitimate site Common Crawl as well as over 10,000 images from legitimate and phishing websites that were personally collected from various e-commerce and e-banking sites over some time. This dataset was used to train and test the network using holdout cross-validation of 70% and 30% respectively. The results of our experiment showed that some level of improvement in phishing detection was achieved through the use of hybrid features by combining image, text and frame of a site with the use of the DL algorithm. Furthermore, in our experiment, we obtained information about the usefulness of unsupervised pre-training and the effectiveness of image feature extraction in detecting phishing sites. The rest of this paper is arranged as follows: Section II contains the literature review. Section III presents the methodology, including the CNN, LSTM, and DL algorithm. Section IV describes the experiment. The section presents the results and analysis. Section VI contains the conclusion and directions for future work. Literature reviewIn recent times, artificial intelligence technology has come to drive many aspects of modern society, from social networking and web searching to content filtering and e-commerce. It is also present in consumer products such as cameras and smartphones. Moreover, machine learning techniques are used for object identification in images, the transcription of voice into text, matching news items and products with users’ interests and presenting relevant search results. The DL concept started with the study of artificial neural networks (ANNs). This concept has become an active research area in recent years. To build a standard neural network (NN) requires the use of neurons to produce real-valued activations, and with the adjustment of weights, the ANN behaves as required ADDIN EN.CITE <EndNote><Cite><Author>P.</Author><Year>2019</Year><IDText>Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning</IDText><DisplayText>[6]</DisplayText><record><dates><pub-dates><date>11 January 2019</date></pub-dates><year>2019</year></dates><keywords><keyword>computer crime</keyword><keyword>feature extraction</keyword><keyword>Internet</keyword><keyword>learning (artificial intelligence)</keyword><keyword>pattern classification</keyword><keyword>statistical analysis</keyword><keyword>unsolicited e-mail</keyword><keyword>Web sites</keyword><keyword>webpage code features</keyword><keyword>webpage text features</keyword><keyword>deep learning</keyword><keyword>multidimensional feature phishing detection approach</keyword><keyword>URL statistical features</keyword><keyword>quick classification</keyword><keyword>phishing website detection</keyword><keyword>Phishing</keyword><keyword>Feature extraction</keyword><keyword>Uniform resource locators</keyword><keyword>Deep learning</keyword><keyword>Blacklisting</keyword><keyword>Phishing website detection</keyword><keyword>convolutional neural network</keyword><keyword>long short-term memory network</keyword><keyword>semantic feature</keyword><keyword>machine learning</keyword></keywords><isbn>2169-3536</isbn><titles><title>Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning</title><secondary-title>IEEE Access</secondary-title></titles><pages>15196-15209</pages><contributors><authors><author>P. Yang</author><author>G. Zhao</author><author>P. Zeng</author></authors></contributors><added-date format="utc">1555449039</added-date><ref-type name="Journal Article">17</ref-type><rec-number>1746</rec-number><last-updated-date format="utc">1555450713</last-updated-date><electronic-resource-num>10.1109/ACCESS.2019.2892066</electronic-resource-num><volume>7</volume></record></Cite></EndNote>[6]. However, training the ANN with backpropagation make it useful, as the gradient descent algorithms which have played a vital role in the model in the past decades. However, while training accuracy can be high using this approach, the performance of backpropagation when applied to the testing data might not be satisfactory ADDIN EN.CITE <EndNote><Cite><Author>Liu</Author><Year>2017</Year><IDText>A survey of deep neural network architectures and their applications</IDText><DisplayText>[7]</DisplayText><record><dates><pub-dates><date>2017/04/19/</date></pub-dates><year>2017</year></dates><keywords><keyword>Autoencoder</keyword><keyword>Convolutional neural network</keyword><keyword>Deep learning</keyword><keyword>Deep belief network</keyword><keyword>Restricted Boltzmann machine</keyword></keywords><urls><related-urls><url> survey of deep neural network architectures and their applications</title><secondary-title>Neurocomputing</secondary-title></titles><pages>11-26</pages><contributors><authors><author>Liu, Weibo</author><author>Wang, Zidong</author><author>Liu, Xiaohui</author><author>Zeng, Nianyin</author><author>Liu, Yurong</author><author>Alsaadi, Fuad E.</author></authors></contributors><added-date format="utc">1555114396</added-date><ref-type name="Journal Article">17</ref-type><rec-number>1726</rec-number><last-updated-date format="utc">1555114396</last-updated-date><electronic-resource-num>;[7]. James, Sandhya and Thomas PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5KLjwvQXV0aG9yPjxZZWFyPjIwMTM8L1llYXI+PElEVGV4

dD5EZXRlY3Rpb24gb2YgcGhpc2hpbmcgVVJMcyB1c2luZyBtYWNoaW5lIGxlYXJuaW5nIHRlY2hu

aXF1ZXM8L0lEVGV4dD48RGlzcGxheVRleHQ+WzhdPC9EaXNwbGF5VGV4dD48cmVjb3JkPjxkYXRl

cz48cHViLWRhdGVzPjxkYXRlPjEzLTE1IERlYy4gMjAxMzwvZGF0ZT48L3B1Yi1kYXRlcz48eWVh

cj4yMDEzPC95ZWFyPjwvZGF0ZXM+PGtleXdvcmRzPjxrZXl3b3JkPmF1dGhvcmlzYXRpb248L2tl

eXdvcmQ+PGtleXdvcmQ+ZGF0YSBtaW5pbmc8L2tleXdvcmQ+PGtleXdvcmQ+bGVhcm5pbmcgKGFy

dGlmaWNpYWwgaW50ZWxsaWdlbmNlKTwva2V5d29yZD48a2V5d29yZD51bnNvbGljaXRlZCBlLW1h

aWw8L2tleXdvcmQ+PGtleXdvcmQ+V2ViIHNpdGVzPC9rZXl3b3JkPjxrZXl3b3JkPnBoaXNoaW5n

IFVSTCBkZXRlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+SW50ZXJuZXQgdXNlcnM8L2tleXdvcmQ+

PGtleXdvcmQ+c3Bvb2ZlZCBlLW1haWw8L2tleXdvcmQ+PGtleXdvcmQ+cGhpc2hpbmcgc29mdHdh

cmU8L2tleXdvcmQ+PGtleXdvcmQ+cGVyc29uYWwgaW5mb3JtYXRpb24gc3RlYWxpbmc8L2tleXdv

cmQ+PGtleXdvcmQ+ZmluYW5jaWFsIGFjY291bnQgc3RlYWxpbmc8L2tleXdvcmQ+PGtleXdvcmQ+

dXNlcm5hbWVzPC9rZXl3b3JkPjxrZXl3b3JkPnBhc3N3b3Jkczwva2V5d29yZD48a2V5d29yZD5w

aGlzaGluZyBXZWIgc2l0ZSBkZXRlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+bGV4aWNhbCBmZWF0

dXJlIGFuYWx5c2lzPC9rZXl3b3JkPjxrZXl3b3JkPmhvc3QgcHJvcGVydGllczwva2V5d29yZD48

a2V5d29yZD5wYWdlIGltcG9ydGFuY2UgcHJvcGVydGllczwva2V5d29yZD48a2V5d29yZD5kYXRh

IG1pbmluZyBhbGdvcml0aG1zPC9rZXl3b3JkPjxrZXl3b3JkPm1hY2hpbmUgbGVhcm5pbmcgYWxn

b3JpdGhtIHNlbGVjdGlvbjwva2V5d29yZD48a2V5d29yZD5iZW5pZ24gc2l0ZXM8L2tleXdvcmQ+

PGtleXdvcmQ+RmVhdHVyZSBleHRyYWN0aW9uPC9rZXl3b3JkPjxrZXl3b3JkPkdvb2dsZTwva2V5

d29yZD48a2V5d29yZD5XZWIgcGFnZXM8L2tleXdvcmQ+PGtleXdvcmQ+SW50ZXJuZXQ8L2tleXdv

cmQ+PGtleXdvcmQ+TUFUTEFCPC9rZXl3b3JkPjxrZXl3b3JkPkNsYXNzaWZpY2F0aW9uIGFsZ29y

aXRobXM8L2tleXdvcmQ+PGtleXdvcmQ+RWxlY3Ryb25pYyBtYWlsPC9rZXl3b3JkPjxrZXl3b3Jk

PlBoaXNoaW5nPC9rZXl3b3JkPjxrZXl3b3JkPmJlbmlnbjwva2V5d29yZD48a2V5d29yZD5VUkw8

L2tleXdvcmQ+PGtleXdvcmQ+UGFnZSByYW5rPC9rZXl3b3JkPjxrZXl3b3JkPldIT0lTPC9rZXl3

b3JkPjwva2V5d29yZHM+PHRpdGxlcz48dGl0bGU+RGV0ZWN0aW9uIG9mIHBoaXNoaW5nIFVSTHMg

dXNpbmcgbWFjaGluZSBsZWFybmluZyB0ZWNobmlxdWVzPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxl

PjIwMTMgSW50ZXJuYXRpb25hbCBDb25mZXJlbmNlIG9uIENvbnRyb2wgQ29tbXVuaWNhdGlvbiBh

bmQgQ29tcHV0aW5nIChJQ0NDKTwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+MjAxMyBJbnRl

cm5hdGlvbmFsIENvbmZlcmVuY2Ugb24gQ29udHJvbCBDb21tdW5pY2F0aW9uIGFuZCBDb21wdXRp

bmcgKElDQ0MpPC9hbHQtdGl0bGU+PC90aXRsZXM+PHBhZ2VzPjMwNC0zMDk8L3BhZ2VzPjxjb250

cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5KLiBKYW1lczwvYXV0aG9yPjxhdXRob3I+U2FuZGh5

YSwgTC48L2F1dGhvcj48YXV0aG9yPkMuIFRob21hczwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRy

aWJ1dG9ycz48YWRkZWQtZGF0ZSBmb3JtYXQ9InV0YyI+MTU1NTk3NzIwNzwvYWRkZWQtZGF0ZT48

cHViLWxvY2F0aW9uPlRoaXJ1dmFuYW50aGFwdXJhbSwgSW5kaWE8L3B1Yi1sb2NhdGlvbj48cmVm

LXR5cGUgbmFtZT0iQ29uZmVyZW5jZSBQcm9jZWVkaW5nIj4xMDwvcmVmLXR5cGU+PHJlYy1udW1i

ZXI+MTc1NjwvcmVjLW51bWJlcj48cHVibGlzaGVyPklFRUU8L3B1Ymxpc2hlcj48bGFzdC11cGRh

dGVkLWRhdGUgZm9ybWF0PSJ1dGMiPjE1NTU5NzczNzQ8L2xhc3QtdXBkYXRlZC1kYXRlPjxlbGVj

dHJvbmljLXJlc291cmNlLW51bT4xMC4xMTA5L0lDQ0MuMjAxMy42NzMxNjY5PC9lbGVjdHJvbmlj

LXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5KLjwvQXV0aG9yPjxZZWFyPjIwMTM8L1llYXI+PElEVGV4

dD5EZXRlY3Rpb24gb2YgcGhpc2hpbmcgVVJMcyB1c2luZyBtYWNoaW5lIGxlYXJuaW5nIHRlY2hu

aXF1ZXM8L0lEVGV4dD48RGlzcGxheVRleHQ+WzhdPC9EaXNwbGF5VGV4dD48cmVjb3JkPjxkYXRl

cz48cHViLWRhdGVzPjxkYXRlPjEzLTE1IERlYy4gMjAxMzwvZGF0ZT48L3B1Yi1kYXRlcz48eWVh

cj4yMDEzPC95ZWFyPjwvZGF0ZXM+PGtleXdvcmRzPjxrZXl3b3JkPmF1dGhvcmlzYXRpb248L2tl

eXdvcmQ+PGtleXdvcmQ+ZGF0YSBtaW5pbmc8L2tleXdvcmQ+PGtleXdvcmQ+bGVhcm5pbmcgKGFy

dGlmaWNpYWwgaW50ZWxsaWdlbmNlKTwva2V5d29yZD48a2V5d29yZD51bnNvbGljaXRlZCBlLW1h

aWw8L2tleXdvcmQ+PGtleXdvcmQ+V2ViIHNpdGVzPC9rZXl3b3JkPjxrZXl3b3JkPnBoaXNoaW5n

IFVSTCBkZXRlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+SW50ZXJuZXQgdXNlcnM8L2tleXdvcmQ+

PGtleXdvcmQ+c3Bvb2ZlZCBlLW1haWw8L2tleXdvcmQ+PGtleXdvcmQ+cGhpc2hpbmcgc29mdHdh

cmU8L2tleXdvcmQ+PGtleXdvcmQ+cGVyc29uYWwgaW5mb3JtYXRpb24gc3RlYWxpbmc8L2tleXdv

cmQ+PGtleXdvcmQ+ZmluYW5jaWFsIGFjY291bnQgc3RlYWxpbmc8L2tleXdvcmQ+PGtleXdvcmQ+

dXNlcm5hbWVzPC9rZXl3b3JkPjxrZXl3b3JkPnBhc3N3b3Jkczwva2V5d29yZD48a2V5d29yZD5w

aGlzaGluZyBXZWIgc2l0ZSBkZXRlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+bGV4aWNhbCBmZWF0

dXJlIGFuYWx5c2lzPC9rZXl3b3JkPjxrZXl3b3JkPmhvc3QgcHJvcGVydGllczwva2V5d29yZD48

a2V5d29yZD5wYWdlIGltcG9ydGFuY2UgcHJvcGVydGllczwva2V5d29yZD48a2V5d29yZD5kYXRh

IG1pbmluZyBhbGdvcml0aG1zPC9rZXl3b3JkPjxrZXl3b3JkPm1hY2hpbmUgbGVhcm5pbmcgYWxn

b3JpdGhtIHNlbGVjdGlvbjwva2V5d29yZD48a2V5d29yZD5iZW5pZ24gc2l0ZXM8L2tleXdvcmQ+

PGtleXdvcmQ+RmVhdHVyZSBleHRyYWN0aW9uPC9rZXl3b3JkPjxrZXl3b3JkPkdvb2dsZTwva2V5

d29yZD48a2V5d29yZD5XZWIgcGFnZXM8L2tleXdvcmQ+PGtleXdvcmQ+SW50ZXJuZXQ8L2tleXdv

cmQ+PGtleXdvcmQ+TUFUTEFCPC9rZXl3b3JkPjxrZXl3b3JkPkNsYXNzaWZpY2F0aW9uIGFsZ29y

aXRobXM8L2tleXdvcmQ+PGtleXdvcmQ+RWxlY3Ryb25pYyBtYWlsPC9rZXl3b3JkPjxrZXl3b3Jk

PlBoaXNoaW5nPC9rZXl3b3JkPjxrZXl3b3JkPmJlbmlnbjwva2V5d29yZD48a2V5d29yZD5VUkw8

L2tleXdvcmQ+PGtleXdvcmQ+UGFnZSByYW5rPC9rZXl3b3JkPjxrZXl3b3JkPldIT0lTPC9rZXl3

b3JkPjwva2V5d29yZHM+PHRpdGxlcz48dGl0bGU+RGV0ZWN0aW9uIG9mIHBoaXNoaW5nIFVSTHMg

dXNpbmcgbWFjaGluZSBsZWFybmluZyB0ZWNobmlxdWVzPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxl

PjIwMTMgSW50ZXJuYXRpb25hbCBDb25mZXJlbmNlIG9uIENvbnRyb2wgQ29tbXVuaWNhdGlvbiBh

bmQgQ29tcHV0aW5nIChJQ0NDKTwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+MjAxMyBJbnRl

cm5hdGlvbmFsIENvbmZlcmVuY2Ugb24gQ29udHJvbCBDb21tdW5pY2F0aW9uIGFuZCBDb21wdXRp

bmcgKElDQ0MpPC9hbHQtdGl0bGU+PC90aXRsZXM+PHBhZ2VzPjMwNC0zMDk8L3BhZ2VzPjxjb250

cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5KLiBKYW1lczwvYXV0aG9yPjxhdXRob3I+U2FuZGh5

YSwgTC48L2F1dGhvcj48YXV0aG9yPkMuIFRob21hczwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRy

aWJ1dG9ycz48YWRkZWQtZGF0ZSBmb3JtYXQ9InV0YyI+MTU1NTk3NzIwNzwvYWRkZWQtZGF0ZT48

cHViLWxvY2F0aW9uPlRoaXJ1dmFuYW50aGFwdXJhbSwgSW5kaWE8L3B1Yi1sb2NhdGlvbj48cmVm

LXR5cGUgbmFtZT0iQ29uZmVyZW5jZSBQcm9jZWVkaW5nIj4xMDwvcmVmLXR5cGU+PHJlYy1udW1i

ZXI+MTc1NjwvcmVjLW51bWJlcj48cHVibGlzaGVyPklFRUU8L3B1Ymxpc2hlcj48bGFzdC11cGRh

dGVkLWRhdGUgZm9ybWF0PSJ1dGMiPjE1NTU5NzczNzQ8L2xhc3QtdXBkYXRlZC1kYXRlPjxlbGVj

dHJvbmljLXJlc291cmNlLW51bT4xMC4xMTA5L0lDQ0MuMjAxMy42NzMxNjY5PC9lbGVjdHJvbmlj

LXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE.DATA [8], propose a scheme for detecting phishing websites based on their features and the URL of the site using machine learning techniques. Also, they discuss the techniques used for phishing detection based on the lexical features, page properties and host features. They evaluate different data mining algorithms for understanding the structure of URLs that spread phishing contents. They have input features save in comma separated value (.csv) with use of four machine learning algorithms considered for processing the features. The machine learning algorithm used are Na?ve Bayes, J48 decision tree, K-nearest neighbour (KNN) and support vector machine (SVM). The result of the experiment shows that the J48 decision tree with a lexical feature performs better the rest of the algorithm with an accuracy of 93.2%. The approach to their solution is better but require an additional feature to make it more robust PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5KLjwvQXV0aG9yPjxZZWFyPjIwMTM8L1llYXI+PElEVGV4

dD5EZXRlY3Rpb24gb2YgcGhpc2hpbmcgVVJMcyB1c2luZyBtYWNoaW5lIGxlYXJuaW5nIHRlY2hu

aXF1ZXM8L0lEVGV4dD48RGlzcGxheVRleHQ+WzhdPC9EaXNwbGF5VGV4dD48cmVjb3JkPjxkYXRl

cz48cHViLWRhdGVzPjxkYXRlPjEzLTE1IERlYy4gMjAxMzwvZGF0ZT48L3B1Yi1kYXRlcz48eWVh

cj4yMDEzPC95ZWFyPjwvZGF0ZXM+PGtleXdvcmRzPjxrZXl3b3JkPmF1dGhvcmlzYXRpb248L2tl

eXdvcmQ+PGtleXdvcmQ+ZGF0YSBtaW5pbmc8L2tleXdvcmQ+PGtleXdvcmQ+bGVhcm5pbmcgKGFy

dGlmaWNpYWwgaW50ZWxsaWdlbmNlKTwva2V5d29yZD48a2V5d29yZD51bnNvbGljaXRlZCBlLW1h

aWw8L2tleXdvcmQ+PGtleXdvcmQ+V2ViIHNpdGVzPC9rZXl3b3JkPjxrZXl3b3JkPnBoaXNoaW5n

IFVSTCBkZXRlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+SW50ZXJuZXQgdXNlcnM8L2tleXdvcmQ+

PGtleXdvcmQ+c3Bvb2ZlZCBlLW1haWw8L2tleXdvcmQ+PGtleXdvcmQ+cGhpc2hpbmcgc29mdHdh

cmU8L2tleXdvcmQ+PGtleXdvcmQ+cGVyc29uYWwgaW5mb3JtYXRpb24gc3RlYWxpbmc8L2tleXdv

cmQ+PGtleXdvcmQ+ZmluYW5jaWFsIGFjY291bnQgc3RlYWxpbmc8L2tleXdvcmQ+PGtleXdvcmQ+

dXNlcm5hbWVzPC9rZXl3b3JkPjxrZXl3b3JkPnBhc3N3b3Jkczwva2V5d29yZD48a2V5d29yZD5w

aGlzaGluZyBXZWIgc2l0ZSBkZXRlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+bGV4aWNhbCBmZWF0

dXJlIGFuYWx5c2lzPC9rZXl3b3JkPjxrZXl3b3JkPmhvc3QgcHJvcGVydGllczwva2V5d29yZD48

a2V5d29yZD5wYWdlIGltcG9ydGFuY2UgcHJvcGVydGllczwva2V5d29yZD48a2V5d29yZD5kYXRh

IG1pbmluZyBhbGdvcml0aG1zPC9rZXl3b3JkPjxrZXl3b3JkPm1hY2hpbmUgbGVhcm5pbmcgYWxn

b3JpdGhtIHNlbGVjdGlvbjwva2V5d29yZD48a2V5d29yZD5iZW5pZ24gc2l0ZXM8L2tleXdvcmQ+

PGtleXdvcmQ+RmVhdHVyZSBleHRyYWN0aW9uPC9rZXl3b3JkPjxrZXl3b3JkPkdvb2dsZTwva2V5

d29yZD48a2V5d29yZD5XZWIgcGFnZXM8L2tleXdvcmQ+PGtleXdvcmQ+SW50ZXJuZXQ8L2tleXdv

cmQ+PGtleXdvcmQ+TUFUTEFCPC9rZXl3b3JkPjxrZXl3b3JkPkNsYXNzaWZpY2F0aW9uIGFsZ29y

aXRobXM8L2tleXdvcmQ+PGtleXdvcmQ+RWxlY3Ryb25pYyBtYWlsPC9rZXl3b3JkPjxrZXl3b3Jk

PlBoaXNoaW5nPC9rZXl3b3JkPjxrZXl3b3JkPmJlbmlnbjwva2V5d29yZD48a2V5d29yZD5VUkw8

L2tleXdvcmQ+PGtleXdvcmQ+UGFnZSByYW5rPC9rZXl3b3JkPjxrZXl3b3JkPldIT0lTPC9rZXl3

b3JkPjwva2V5d29yZHM+PHRpdGxlcz48dGl0bGU+RGV0ZWN0aW9uIG9mIHBoaXNoaW5nIFVSTHMg

dXNpbmcgbWFjaGluZSBsZWFybmluZyB0ZWNobmlxdWVzPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxl

PjIwMTMgSW50ZXJuYXRpb25hbCBDb25mZXJlbmNlIG9uIENvbnRyb2wgQ29tbXVuaWNhdGlvbiBh

bmQgQ29tcHV0aW5nIChJQ0NDKTwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+MjAxMyBJbnRl

cm5hdGlvbmFsIENvbmZlcmVuY2Ugb24gQ29udHJvbCBDb21tdW5pY2F0aW9uIGFuZCBDb21wdXRp

bmcgKElDQ0MpPC9hbHQtdGl0bGU+PC90aXRsZXM+PHBhZ2VzPjMwNC0zMDk8L3BhZ2VzPjxjb250

cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5KLiBKYW1lczwvYXV0aG9yPjxhdXRob3I+U2FuZGh5

YSwgTC48L2F1dGhvcj48YXV0aG9yPkMuIFRob21hczwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRy

aWJ1dG9ycz48YWRkZWQtZGF0ZSBmb3JtYXQ9InV0YyI+MTU1NTk3NzIwNzwvYWRkZWQtZGF0ZT48

cHViLWxvY2F0aW9uPlRoaXJ1dmFuYW50aGFwdXJhbSwgSW5kaWE8L3B1Yi1sb2NhdGlvbj48cmVm

LXR5cGUgbmFtZT0iQ29uZmVyZW5jZSBQcm9jZWVkaW5nIj4xMDwvcmVmLXR5cGU+PHJlYy1udW1i

ZXI+MTc1NjwvcmVjLW51bWJlcj48cHVibGlzaGVyPklFRUU8L3B1Ymxpc2hlcj48bGFzdC11cGRh

dGVkLWRhdGUgZm9ybWF0PSJ1dGMiPjE1NTU5NzczNzQ8L2xhc3QtdXBkYXRlZC1kYXRlPjxlbGVj

dHJvbmljLXJlc291cmNlLW51bT4xMC4xMTA5L0lDQ0MuMjAxMy42NzMxNjY5PC9lbGVjdHJvbmlj

LXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE PEVuZE5vdGU+PENpdGU+PEF1dGhvcj5KLjwvQXV0aG9yPjxZZWFyPjIwMTM8L1llYXI+PElEVGV4

dD5EZXRlY3Rpb24gb2YgcGhpc2hpbmcgVVJMcyB1c2luZyBtYWNoaW5lIGxlYXJuaW5nIHRlY2hu

aXF1ZXM8L0lEVGV4dD48RGlzcGxheVRleHQ+WzhdPC9EaXNwbGF5VGV4dD48cmVjb3JkPjxkYXRl

cz48cHViLWRhdGVzPjxkYXRlPjEzLTE1IERlYy4gMjAxMzwvZGF0ZT48L3B1Yi1kYXRlcz48eWVh

cj4yMDEzPC95ZWFyPjwvZGF0ZXM+PGtleXdvcmRzPjxrZXl3b3JkPmF1dGhvcmlzYXRpb248L2tl

eXdvcmQ+PGtleXdvcmQ+ZGF0YSBtaW5pbmc8L2tleXdvcmQ+PGtleXdvcmQ+bGVhcm5pbmcgKGFy

dGlmaWNpYWwgaW50ZWxsaWdlbmNlKTwva2V5d29yZD48a2V5d29yZD51bnNvbGljaXRlZCBlLW1h

aWw8L2tleXdvcmQ+PGtleXdvcmQ+V2ViIHNpdGVzPC9rZXl3b3JkPjxrZXl3b3JkPnBoaXNoaW5n

IFVSTCBkZXRlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+SW50ZXJuZXQgdXNlcnM8L2tleXdvcmQ+

PGtleXdvcmQ+c3Bvb2ZlZCBlLW1haWw8L2tleXdvcmQ+PGtleXdvcmQ+cGhpc2hpbmcgc29mdHdh

cmU8L2tleXdvcmQ+PGtleXdvcmQ+cGVyc29uYWwgaW5mb3JtYXRpb24gc3RlYWxpbmc8L2tleXdv

cmQ+PGtleXdvcmQ+ZmluYW5jaWFsIGFjY291bnQgc3RlYWxpbmc8L2tleXdvcmQ+PGtleXdvcmQ+

dXNlcm5hbWVzPC9rZXl3b3JkPjxrZXl3b3JkPnBhc3N3b3Jkczwva2V5d29yZD48a2V5d29yZD5w

aGlzaGluZyBXZWIgc2l0ZSBkZXRlY3Rpb248L2tleXdvcmQ+PGtleXdvcmQ+bGV4aWNhbCBmZWF0

dXJlIGFuYWx5c2lzPC9rZXl3b3JkPjxrZXl3b3JkPmhvc3QgcHJvcGVydGllczwva2V5d29yZD48

a2V5d29yZD5wYWdlIGltcG9ydGFuY2UgcHJvcGVydGllczwva2V5d29yZD48a2V5d29yZD5kYXRh

IG1pbmluZyBhbGdvcml0aG1zPC9rZXl3b3JkPjxrZXl3b3JkPm1hY2hpbmUgbGVhcm5pbmcgYWxn

b3JpdGhtIHNlbGVjdGlvbjwva2V5d29yZD48a2V5d29yZD5iZW5pZ24gc2l0ZXM8L2tleXdvcmQ+

PGtleXdvcmQ+RmVhdHVyZSBleHRyYWN0aW9uPC9rZXl3b3JkPjxrZXl3b3JkPkdvb2dsZTwva2V5

d29yZD48a2V5d29yZD5XZWIgcGFnZXM8L2tleXdvcmQ+PGtleXdvcmQ+SW50ZXJuZXQ8L2tleXdv

cmQ+PGtleXdvcmQ+TUFUTEFCPC9rZXl3b3JkPjxrZXl3b3JkPkNsYXNzaWZpY2F0aW9uIGFsZ29y

aXRobXM8L2tleXdvcmQ+PGtleXdvcmQ+RWxlY3Ryb25pYyBtYWlsPC9rZXl3b3JkPjxrZXl3b3Jk

PlBoaXNoaW5nPC9rZXl3b3JkPjxrZXl3b3JkPmJlbmlnbjwva2V5d29yZD48a2V5d29yZD5VUkw8

L2tleXdvcmQ+PGtleXdvcmQ+UGFnZSByYW5rPC9rZXl3b3JkPjxrZXl3b3JkPldIT0lTPC9rZXl3

b3JkPjwva2V5d29yZHM+PHRpdGxlcz48dGl0bGU+RGV0ZWN0aW9uIG9mIHBoaXNoaW5nIFVSTHMg

dXNpbmcgbWFjaGluZSBsZWFybmluZyB0ZWNobmlxdWVzPC90aXRsZT48c2Vjb25kYXJ5LXRpdGxl

PjIwMTMgSW50ZXJuYXRpb25hbCBDb25mZXJlbmNlIG9uIENvbnRyb2wgQ29tbXVuaWNhdGlvbiBh

bmQgQ29tcHV0aW5nIChJQ0NDKTwvc2Vjb25kYXJ5LXRpdGxlPjxhbHQtdGl0bGU+MjAxMyBJbnRl

cm5hdGlvbmFsIENvbmZlcmVuY2Ugb24gQ29udHJvbCBDb21tdW5pY2F0aW9uIGFuZCBDb21wdXRp

bmcgKElDQ0MpPC9hbHQtdGl0bGU+PC90aXRsZXM+PHBhZ2VzPjMwNC0zMDk8L3BhZ2VzPjxjb250

cmlidXRvcnM+PGF1dGhvcnM+PGF1dGhvcj5KLiBKYW1lczwvYXV0aG9yPjxhdXRob3I+U2FuZGh5

YSwgTC48L2F1dGhvcj48YXV0aG9yPkMuIFRob21hczwvYXV0aG9yPjwvYXV0aG9ycz48L2NvbnRy

aWJ1dG9ycz48YWRkZWQtZGF0ZSBmb3JtYXQ9InV0YyI+MTU1NTk3NzIwNzwvYWRkZWQtZGF0ZT48

cHViLWxvY2F0aW9uPlRoaXJ1dmFuYW50aGFwdXJhbSwgSW5kaWE8L3B1Yi1sb2NhdGlvbj48cmVm

LXR5cGUgbmFtZT0iQ29uZmVyZW5jZSBQcm9jZWVkaW5nIj4xMDwvcmVmLXR5cGU+PHJlYy1udW1i

ZXI+MTc1NjwvcmVjLW51bWJlcj48cHVibGlzaGVyPklFRUU8L3B1Ymxpc2hlcj48bGFzdC11cGRh

dGVkLWRhdGUgZm9ybWF0PSJ1dGMiPjE1NTU5NzczNzQ8L2xhc3QtdXBkYXRlZC1kYXRlPjxlbGVj

dHJvbmljLXJlc291cmNlLW51bT4xMC4xMTA5L0lDQ0MuMjAxMy42NzMxNjY5PC9lbGVjdHJvbmlj

LXJlc291cmNlLW51bT48L3JlY29yZD48L0NpdGU+PC9FbmROb3RlPn==

ADDIN EN.CITE.DATA [8]. Le et al. ADDIN EN.CITE <EndNote><Cite><Author>Le</Author><Year>2018</Year><IDText>URLnet: Learning a URL representation with deep learning for malicious URL detection</IDText><DisplayText>[9]</DisplayText><record><dates><pub-dates><date>2 March 2018</date></pub-dates><year>2018</year></dates><titles><title>URLnet: Learning a URL representation with deep learning for malicious URL detection</title><secondary-title>arXiv preprint arXiv:1802.03162</secondary-title></titles><contributors><authors><author>Le, Hung</author><author>Pham, Quang</author><author>Sahoo, Doyen</author><author>Hoi, Steven CH</author></authors></contributors><added-date format="utc">1555977285</added-date><pub-location>Washington, DC, US</pub-location><ref-type name="Conference Paper">47</ref-type><rec-number>1758</rec-number><last-updated-date format="utc">1555977587</last-updated-date></record></Cite></EndNote>[9] proposed a solution named URLNet, which is an end-to-end deep learning framework to learn a nonlinear malicious URL by detecting from the URL. They applied a convolutional neural network to both words and characters of URL features to learn the URL embedding in a jointly optimised framework. This approach allows their model to capture several types of semantic data that was not possible by existing schemes. Also, they present an advanced word-embeddings to solve the problem of too many rare words observed in the task. A large dataset was used to perform their experiments on a large-scale dataset and demonstrate a strong performance over the current method. The approach has two branches; the first branch has a character level CN, where the character embedding is used to represent the URL. Hence, the second branch contains the word-level CNN, where a word-level embedding is used to represent the URL. However, the word embedding itself is a mixture of the character-level embedding and the individual words embedding of that word. Their approach worked in a manner that does not require any expert features ADDIN EN.CITE <EndNote><Cite><Author>Le</Author><Year>2018</Year><IDText>URLnet: Learning a URL representation with deep learning for malicious URL detection</IDText><DisplayText>[9]</DisplayText><record><dates><pub-dates><date>2 March 2018</date></pub-dates><year>2018</year></dates><titles><title>URLnet: Learning a URL representation with deep learning for malicious URL detection</title><secondary-title>arXiv preprint arXiv:1802.03162</secondary-title></titles><contributors><authors><author>Le, Hung</author><author>Pham, Quang</author><author>Sahoo, Doyen</author><author>Hoi, Steven CH</author></authors></contributors><added-date format="utc">1555977285</added-date><pub-location>Washington, DC, US</pub-location><ref-type name="Conference Paper">47</ref-type><rec-number>1758</rec-number><last-updated-date format="utc">1555977587</last-updated-date></record></Cite></EndNote>[9]. Likewise, Yi et al. ADDIN EN.CITE <EndNote><Cite><Author>Yi</Author><Year>2018</Year><IDText>Web phishing detection using a deep learning framework</IDText><DisplayText>[10]</DisplayText><record><isbn>1530-8669</isbn><titles><title>Web phishing detection using a deep learning framework</title><secondary-title>Wireless Communications and Mobile Computing</secondary-title></titles><pages>1-9</pages><contributors><authors><author>Yi, Ping</author><author>Guan, Yuxiang</author><author>Zou, Futai</author><author>Yao, Yao</author><author>Wang, Wei</author><author>Zhu, Ting</author></authors></contributors><added-date format="utc">1555977233</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2018</year></dates><rec-number>1757</rec-number><last-updated-date format="utc">1555979030</last-updated-date><volume>2018</volume></record></Cite></EndNote>[10], design two sets of features for web phishing interaction element and original features. Also, they develop a scheme based on a deep belief network (DBN) is then presented. The test includes using the real Internet protocol (IP) flows from the ISP (Internet Service Provider) reflect that the detecting model based on DBN can achieve an approximately 90% true positive rate ADDIN EN.CITE <EndNote><Cite><Author>Yi</Author><Year>2018</Year><IDText>Web phishing detection using a deep learning framework</IDText><DisplayText>[10]</DisplayText><record><isbn>1530-8669</isbn><titles><title>Web phishing detection using a deep learning framework</title><secondary-title>Wireless Communications and Mobile Computing</secondary-title></titles><pages>1-9</pages><contributors><authors><author>Yi, Ping</author><author>Guan, Yuxiang</author><author>Zou, Futai</author><author>Yao, Yao</author><author>Wang, Wei</author><author>Zhu, Ting</author></authors></contributors><added-date format="utc">1555977233</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2018</year></dates><rec-number>1757</rec-number><last-updated-date format="utc">1555979030</last-updated-date><volume>2018</volume></record></Cite></EndNote>[10]. The procedure for training a CNN using backpropagation follows the same process as that for a normal NN. However, Yoshua ADDIN EN.CITE <EndNote><Cite><Author>Yoshua</Author><Year>2009</Year><IDText>Learning Deep Architectures for AI</IDText><DisplayText>[11]</DisplayText><record><contributors><tertiary-authors><author>Michael Jordan</author></tertiary-authors></contributors><urls><related-urls><url> Deep Architectures for AI</title><secondary-title>Learning Deep Architectures for AI</secondary-title></titles><pages>1-127</pages><number>1</number><contributors><authors><author>Yoshua, Bengio</author></authors></contributors><section>1-127</section><added-date format="utc">1555166013</added-date><pub-location>Hanover, MA USA</pub-location><ref-type name="Book">6</ref-type><dates><year>2009</year></dates><rec-number>1731</rec-number><publisher>now?Publishers Inc</publisher><last-updated-date format="utc">1555167440</last-updated-date><volume>2</volume></record></Cite></EndNote>[11] proposed an alternative method for training CNN called the error gradient and applied the CNN to classify images. In the initial stage of that method, the information is spread in a feed-forward direction through various layers. During this stage, a digital filter is applied to obtain important features at each layer to compute the value of the output. Also, in the next step, the error among the predictable and real values of the output is computed. Unlike the standard DL algorithm for backpropagation and error minimisation, in the CNN, the weight matrix is adjusted, and the network is fine-tuned to improve image classification and reduce pre-processing. However, the parameter setting is not required in the CNN, unlike in the traditional NN; the error gradient method trains the filters in the CNN which are independent of prior knowledge and human interference ADDIN EN.CITE <EndNote><Cite><Author>Yoshua</Author><Year>2009</Year><IDText>Learning Deep Architectures for AI</IDText><DisplayText>[11]</DisplayText><record><contributors><tertiary-authors><author>Michael Jordan</author></tertiary-authors></contributors><urls><related-urls><url> Deep Architectures for AI</title><secondary-title>Learning Deep Architectures for AI</secondary-title></titles><pages>1-127</pages><number>1</number><contributors><authors><author>Yoshua, Bengio</author></authors></contributors><section>1-127</section><added-date format="utc">1555166013</added-date><pub-location>Hanover, MA USA</pub-location><ref-type name="Book">6</ref-type><dates><year>2009</year></dates><rec-number>1731</rec-number><publisher>now?Publishers Inc</publisher><last-updated-date format="utc">1555167440</last-updated-date><volume>2</volume></record></Cite></EndNote>[11]. In the aspect of feature extraction, CNN has also provided a solution. There is a solution proposed in ADDIN EN.CITE <EndNote><Cite><Author>K.</Author><Year>2009</Year><IDText>What is the best multi-stage architecture for object recognition?</IDText><DisplayText>[12]</DisplayText><record><dates><pub-dates><date>29 Sept.-2 Oct. 2009</date></pub-dates><year>2009</year></dates><keywords><keyword>feature extraction</keyword><keyword>object recognition</keyword><keyword>unsupervised learning</keyword><keyword>multistage architecture</keyword><keyword>filter bank</keyword><keyword>nonlinear transformation</keyword><keyword>feature pooling layer</keyword><keyword>supervised learning</keyword><keyword>feature rectification</keyword><keyword>local contrast normalization</keyword><keyword>Caltech-101</keyword><keyword>NORB dataset</keyword><keyword>unprocessed MNIST dataset</keyword><keyword>Object recognition</keyword><keyword>Filter bank</keyword><keyword>Feature extraction</keyword><keyword>Refining</keyword><keyword>Brain modeling</keyword><keyword>Gabor filters</keyword><keyword>Learning systems</keyword><keyword>Image edge detection</keyword><keyword>Error analysis</keyword><keyword>Histograms</keyword></keywords><isbn>2380-7504</isbn><titles><title>What is the best multi-stage architecture for object recognition?</title><secondary-title>12th International Conference on Computer Vision</secondary-title><alt-title>2009 IEEE 12th International Conference on Computer Vision</alt-title></titles><pages>2146-2153</pages><contributors><authors><author>K. Jarrett</author><author>K. Kavukcuoglu</author><author>M. Ranzato</author><author>Y. LeCun</author></authors></contributors><added-date format="utc">1555167826</added-date><pub-location>Kyoto, Japan</pub-location><ref-type name="Conference Proceeding">10</ref-type><rec-number>1732</rec-number><publisher>IEEE</publisher><last-updated-date format="utc">1555167904</last-updated-date><electronic-resource-num>10.1109/ICCV.2009.5459469</electronic-resource-num></record></Cite></EndNote>[12] on the method that combines difficulty with an encoder-decoder architecture. In their method, sparse predictive disintegration unsupervised feature learning is used with the sparsity limitations on the element direction that is based on an encoder-decoder architecture. Their feature extraction stage involves a filter bank, a non-linear transformation, and a feature pooling layer ADDIN EN.CITE <EndNote><Cite><Author>K.</Author><Year>2009</Year><IDText>What is the best multi-stage architecture for object recognition?</IDText><DisplayText>[12]</DisplayText><record><dates><pub-dates><date>29 Sept.-2 Oct. 2009</date></pub-dates><year>2009</year></dates><keywords><keyword>feature extraction</keyword><keyword>object recognition</keyword><keyword>unsupervised learning</keyword><keyword>multistage architecture</keyword><keyword>filter bank</keyword><keyword>nonlinear transformation</keyword><keyword>feature pooling layer</keyword><keyword>supervised learning</keyword><keyword>feature rectification</keyword><keyword>local contrast normalization</keyword><keyword>Caltech-101</keyword><keyword>NORB dataset</keyword><keyword>unprocessed MNIST dataset</keyword><keyword>Object recognition</keyword><keyword>Filter bank</keyword><keyword>Feature extraction</keyword><keyword>Refining</keyword><keyword>Brain modeling</keyword><keyword>Gabor filters</keyword><keyword>Learning systems</keyword><keyword>Image edge detection</keyword><keyword>Error analysis</keyword><keyword>Histograms</keyword></keywords><isbn>2380-7504</isbn><titles><title>What is the best multi-stage architecture for object recognition?</title><secondary-title>12th International Conference on Computer Vision</secondary-title><alt-title>2009 IEEE 12th International Conference on Computer Vision</alt-title></titles><pages>2146-2153</pages><contributors><authors><author>K. Jarrett</author><author>K. Kavukcuoglu</author><author>M. Ranzato</author><author>Y. LeCun</author></authors></contributors><added-date format="utc">1555167826</added-date><pub-location>Kyoto, Japan</pub-location><ref-type name="Conference Proceeding">10</ref-type><rec-number>1732</rec-number><publisher>IEEE</publisher><last-updated-date format="utc">1555167904</last-updated-date><electronic-resource-num>10.1109/ICCV.2009.5459469</electronic-resource-num></record></Cite></EndNote>[12]. Mathieu, Henaff and LeCun ADDIN EN.CITE <EndNote><Cite><Author>Mathieu</Author><Year>2013</Year><IDText>Fast training of convolutional networks through ffts</IDText><DisplayText>[13]</DisplayText><record><titles><title>Fast training of convolutional networks through ffts</title><secondary-title>arXiv preprint arXiv:1312.5851</secondary-title></titles><pages>1-9</pages><contributors><authors><author>Mathieu, Michael</author><author>Henaff, Mikael</author><author>LeCun, Yann</author></authors></contributors><added-date format="utc">1555169861</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2013</year></dates><rec-number>1733</rec-number><last-updated-date format="utc">1555169973</last-updated-date></record></Cite></EndNote>[13] proposed an algorithm which fast-track training and interference by a critical factor, which yields enhancements performance over a magnitude compared to the existing advanced application ADDIN EN.CITE <EndNote><Cite><Author>Mathieu</Author><Year>2013</Year><IDText>Fast training of convolutional networks through ffts</IDText><DisplayText>[13]</DisplayText><record><titles><title>Fast training of convolutional networks through ffts</title><secondary-title>arXiv preprint arXiv:1312.5851</secondary-title></titles><pages>1-9</pages><contributors><authors><author>Mathieu, Michael</author><author>Henaff, Mikael</author><author>LeCun, Yann</author></authors></contributors><added-date format="utc">1555169861</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2013</year></dates><rec-number>1733</rec-number><last-updated-date format="utc">1555169973</last-updated-date></record></Cite></EndNote>[13]. On the other hand, Bahnsen et al., ADDIN EN.CITE <EndNote><Cite><Author>A.</Author><Year>2017</Year><IDText>Classifying phishing URLs using recurrent neural networks</IDText><DisplayText>[14]</DisplayText><record><dates><pub-dates><date>25-27 April 2017</date></pub-dates><year>2017</year></dates><keywords><keyword>computer crime</keyword><keyword>learning (artificial intelligence)</keyword><keyword>pattern classification</keyword><keyword>recurrent neural nets</keyword><keyword>phishing URL classification</keyword><keyword>recurrent neural networks</keyword><keyword>phishing attacks</keyword><keyword>phishing threat detection</keyword><keyword>random forest classifier</keyword><keyword>machine learning models</keyword><keyword>feature engineering</keyword><keyword>Uniform resource locators</keyword><keyword>Feature extraction</keyword><keyword>Recurrent neural networks</keyword><keyword>Radio frequency</keyword><keyword>Biological system modeling</keyword><keyword>Predictive models</keyword><keyword>Manuals</keyword><keyword>Phishing detection</keyword><keyword>Cybercrime</keyword><keyword>Feature engineering</keyword><keyword>Long short term memory networks</keyword></keywords><isbn>2159-1245</isbn><titles><title>Classifying phishing URLs using recurrent neural networks</title><secondary-title>APWG Symposium on Electronic Crime Research (eCrime)</secondary-title><alt-title>2017 APWG Symposium on Electronic Crime Research (eCrime)</alt-title></titles><pages>1-8</pages><contributors><authors><author>A. C. Bahnsen</author><author>E. C. Bohorquez</author><author>S. Villegas</author><author>J. Vargas</author><author>F. A. González</author></authors></contributors><added-date format="utc">1545999576</added-date><pub-location>Scottsdale, AZ, USA</pub-location><ref-type name="Conference Paper">47</ref-type><rec-number>1641</rec-number><publisher>IEEE</publisher><last-updated-date format="utc">1546021652</last-updated-date><electronic-resource-num>10.1109/ECRIME.2017.7945048</electronic-resource-num></record></Cite></EndNote>[14] explored the use of LSTM with features from URLs as input for machine learning schemes that could be applied for phishing site prediction. They use LSTM units to build a model that receives a URL in the form of a feature sequence as input to predict whether a website link goes to a phishing or legitimate site. In the model, each input feature is translated by a 128-dimension embedded and fed into the LSTM layer as a 150-step sequence. Hence, classification is done using an output signed neuron ADDIN EN.CITE <EndNote><Cite><Author>A.</Author><Year>2017</Year><IDText>Classifying phishing URLs using recurrent neural networks</IDText><DisplayText>[14]</DisplayText><record><dates><pub-dates><date>25-27 April 2017</date></pub-dates><year>2017</year></dates><keywords><keyword>computer crime</keyword><keyword>learning (artificial intelligence)</keyword><keyword>pattern classification</keyword><keyword>recurrent neural nets</keyword><keyword>phishing URL classification</keyword><keyword>recurrent neural networks</keyword><keyword>phishing attacks</keyword><keyword>phishing threat detection</keyword><keyword>random forest classifier</keyword><keyword>machine learning models</keyword><keyword>feature engineering</keyword><keyword>Uniform resource locators</keyword><keyword>Feature extraction</keyword><keyword>Recurrent neural networks</keyword><keyword>Radio frequency</keyword><keyword>Biological system modeling</keyword><keyword>Predictive models</keyword><keyword>Manuals</keyword><keyword>Phishing detection</keyword><keyword>Cybercrime</keyword><keyword>Feature engineering</keyword><keyword>Long short term memory networks</keyword></keywords><isbn>2159-1245</isbn><titles><title>Classifying phishing URLs using recurrent neural networks</title><secondary-title>APWG Symposium on Electronic Crime Research (eCrime)</secondary-title><alt-title>2017 APWG Symposium on Electronic Crime Research (eCrime)</alt-title></titles><pages>1-8</pages><contributors><authors><author>A. C. Bahnsen</author><author>E. C. Bohorquez</author><author>S. Villegas</author><author>J. Vargas</author><author>F. A. González</author></authors></contributors><added-date format="utc">1545999576</added-date><pub-location>Scottsdale, AZ, USA</pub-location><ref-type name="Conference Paper">47</ref-type><rec-number>1641</rec-number><publisher>IEEE</publisher><last-updated-date format="utc">1546021652</last-updated-date><electronic-resource-num>10.1109/ECRIME.2017.7945048</electronic-resource-num></record></Cite></EndNote>[14]. Methodology In this study, we use both the LSTM layer and the CNN to extract features from different websites, where the output is a binary number that reflects whether or not the input sequences are real or fake (phishing). Convolutional Neural NetworkThe CNN is a type of architecture that is discriminative and shows satisfactory performance in processing two-dimensional data with grid topologies, such as images and videos. The concept of CNN is superior to that of NN in terms of time-delay. In the CNN concept, the weights are shared in a temporal dimension, which leads to a decrease in computation time. The general matrix multiplication in the standard NN is therefore replaced in the CNN. Hence the CNN approach reduces the weights, thereby decreasing the complexity of the network. Consequently, the feature extraction procedure in a standard learning algorithm can be enhanced by directly importing images into the network as raw inputs. This type of model for the training of the architecture layers led to the success of the first DL algorithms. Furthermore, the use of the standard backpropagation algorithm enables CNN topology to influence three-dimensional connections to decrease the number of parameters in the network and improve its performance. Another benefit of the CNN model is the lower pre-processing requirement. The use of the Graphics Processing Unit (GPU) has accelerated computing techniques and has been exploited to develop the computational requirements of CNN rapidly. Hence, in recent times, CNN's have been applied to image classification, face detection, speech recognition, handwriting recognition, behavioural recognition and recommender N Algorithm structureThere are three main components in the learning process of a CNN: equal representation, sparse interaction and parameter sharing ADDIN EN.CITE <EndNote><Cite><Author>Goodfellow</Author><Year>2016</Year><IDText>Deep learning. Book in preparation for MIT Press</IDText><DisplayText>[15]</DisplayText><record><urls><related-urls><url>. </url></related-urls></urls><titles><title>Deep learning. Book in preparation for MIT Press</title><secondary-title>URL? . deeplearningbook. org</secondary-title></titles><access-date>13 April 2019</access-date><contributors><authors><author>Goodfellow, Ian</author><author>Bengio, Yoshua</author><author>Courville, Aaron</author></authors></contributors><added-date format="utc">1555163303</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2016</year></dates><rec-number>1730</rec-number><last-updated-date format="utc">1555163553</last-updated-date></record></Cite></EndNote>[15]. The CNN is different from the standard NN, which draws out the connection among the input and output units from matrix development. In contrast, CNN decreases the computational load with a thin interface where the kernels are made slighter than inputs and are used for the entire image. Also, in the CNN, the idea behind parameter allocation is that, as a substitute of learning a detached set of parameters at each location, the CNN only needs to learn a set of features, which allows the CNN to perform better than NN. Also, the CNN has a beautiful property called equivariance, that works with parameter distribution so that every time the input changes the output follow suits. Hence, CNN requires fewer parameters than legacy NN algorithms. This requirement leads to a reduction in memory usage and improves efficiency. The components of the standard CNN layers are illustrated in Fig. 1. The figure shows how the input image is convolved with trainable filters with possible offsets to produce features maps in the first c-layer. The filters contain a layer of connection weights. In a real sense, there four pixels in the feature map from a group. These pixels pass through a sigmoid function to produce additional feature maps in the first s-layer. This process is continued to obtain the feature maps in the following c-layers and s-layers. Then, at the end of this process, the values of these pixels are rasterised and displayed in a single vector as the input of the network ADDIN EN.CITE <EndNote><Cite><Author>I.</Author><Year>2010</Year><IDText>Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier]</IDText><DisplayText>[16]</DisplayText><record><keywords><keyword>belief networks</keyword><keyword>learning (artificial intelligence)</keyword><keyword>neural nets</keyword><keyword>deep machine learning approach</keyword><keyword>artificial intelligence research</keyword><keyword>convolutional neural networks</keyword><keyword>deep belief networks</keyword><keyword>DBN</keyword><keyword>CNN</keyword><keyword>Artificial neural networks</keyword><keyword>Machine learning</keyword><keyword>Training</keyword><keyword>Computer architecture</keyword><keyword>Robustness</keyword><keyword>Computational modeling</keyword><keyword>Brain modeling</keyword></keywords><isbn>1556-603X</isbn><titles><title>Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier]</title><secondary-title>IEEE Computational Intelligence Magazine</secondary-title></titles><pages>13-18</pages><number>4</number><contributors><authors><author>I. Arel</author><author>D. C. Rose</author><author>T. P. Karnowski</author></authors></contributors><added-date format="utc">1555163286</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2010</year></dates><rec-number>1729</rec-number><last-updated-date format="utc">1555163286</last-updated-date><electronic-resource-num>10.1109/MCI.2010.938364</electronic-resource-num><volume>5</volume></record></Cite></EndNote>[16]. Fig. SEQ Fig. \* ARABIC 1: Schematic Structure of Fully Connected Convolutional Neural Network (Source: LeNet). The layer that is responsible for the collection of elements when the input of each neuron is connected from the previous layer is called the c-layer. After the local features are extracted, the positional association can then be identified. Also, the function kernel has a slight influent over the activation function used by the sigmoid function to achieve scale invariance. Hence, the model uses the filter to connect the series of overlapping available fields and convert the 2D image set from the input to a single element in the output. However, when overfitting occurs, a pooling process called sub-sampling is presented to decrease the total size of the signal. This solution has been used for data size reduction in audio compression ADDIN EN.CITE <EndNote><Cite><Author>Mathieu</Author><Year>2013</Year><IDText>Fast training of convolutional networks through ffts</IDText><DisplayText>[13]</DisplayText><record><titles><title>Fast training of convolutional networks through ffts</title><secondary-title>arXiv preprint arXiv:1312.5851</secondary-title></titles><pages>1-9</pages><contributors><authors><author>Mathieu, Michael</author><author>Henaff, Mikael</author><author>LeCun, Yann</author></authors></contributors><added-date format="utc">1555169861</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2013</year></dates><rec-number>1733</rec-number><last-updated-date format="utc">1555169973</last-updated-date></record></Cite></EndNote>[13]. Deep Long Short-Term MemoryIn this study, we also use LSTM as an algorithm that forms part of the structure of our scheme that takes the input from a URL as a character sequence and predicts whether the link is a phishing or legitimate website. Long Short-term Memory is an adaptive recurrent neural network (RNN), where each neuron is swapped by a memory cell which is additional to the conservative neuron on behalf of an internal state. It also uses multiplicative units as gates to control the flow of information. The LSTM layers consist of a set of repeatedly linked blocks called memory blocks. These blocks each contain one or more recurrently connected memory cells. Hence, a normal LSTM cell has an input gate that controls the input of data from outside the cell, which determines whether the cell keeps or overlooks the data in the internal state and an output gate that prevents or allows the inner state to be seen from the outside ADDIN EN.CITE <EndNote><Cite><Author>A.</Author><Year>2017</Year><IDText>Classifying phishing URLs using recurrent neural networks</IDText><DisplayText>[14]</DisplayText><record><dates><pub-dates><date>25-27 April 2017</date></pub-dates><year>2017</year></dates><keywords><keyword>computer crime</keyword><keyword>learning (artificial intelligence)</keyword><keyword>pattern classification</keyword><keyword>recurrent neural nets</keyword><keyword>phishing URL classification</keyword><keyword>recurrent neural networks</keyword><keyword>phishing attacks</keyword><keyword>phishing threat detection</keyword><keyword>random forest classifier</keyword><keyword>machine learning models</keyword><keyword>feature engineering</keyword><keyword>Uniform resource locators</keyword><keyword>Feature extraction</keyword><keyword>Recurrent neural networks</keyword><keyword>Radio frequency</keyword><keyword>Biological system modeling</keyword><keyword>Predictive models</keyword><keyword>Manuals</keyword><keyword>Phishing detection</keyword><keyword>Cybercrime</keyword><keyword>Feature engineering</keyword><keyword>Long short term memory networks</keyword></keywords><isbn>2159-1245</isbn><titles><title>Classifying phishing URLs using recurrent neural networks</title><secondary-title>APWG Symposium on Electronic Crime Research (eCrime)</secondary-title><alt-title>2017 APWG Symposium on Electronic Crime Research (eCrime)</alt-title></titles><pages>1-8</pages><contributors><authors><author>A. C. Bahnsen</author><author>E. C. Bohorquez</author><author>S. Villegas</author><author>J. Vargas</author><author>F. A. González</author></authors></contributors><added-date format="utc">1545999576</added-date><pub-location>Scottsdale, AZ, USA</pub-location><ref-type name="Conference Paper">47</ref-type><rec-number>1641</rec-number><publisher>IEEE</publisher><last-updated-date format="utc">1546021652</last-updated-date><electronic-resource-num>10.1109/ECRIME.2017.7945048</electronic-resource-num></record></Cite></EndNote>[14].Furthermore, LSTM units are known to have the ability to learn extensive range dependency from input sequences. The LSTM training algorithm uses an error gradient for its calculation, where it combines real-time recurrent learning and backpropagation ADDIN EN.CITE <EndNote><Cite><Author>Xu</Author><Year>2015</Year><IDText>Learning temporal features using LSTM-CNN architecture for face anti-spoofing</IDText><DisplayText>[17]</DisplayText><record><dates><pub-dates><date>3-6 November 2015</date></pub-dates><year>2015</year></dates><isbn>1479961000</isbn><titles><title>Learning temporal features using LSTM-CNN architecture for face anti-spoofing</title><secondary-title>2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR)</secondary-title></titles><pages>141-145</pages><contributors><authors><author>Xu, Zhenqi</author><author>Li, Shan</author><author>Deng, Weihong</author></authors></contributors><added-date format="utc">1555252626</added-date><pub-location>Kuala Lumpur, Malaysia</pub-location><ref-type name="Conference Proceeding">10</ref-type><rec-number>1738</rec-number><publisher>IEEE</publisher><last-updated-date format="utc">1555254022</last-updated-date></record></Cite></EndNote>[17]. However, backpropagation is dropped after the first timestamp because the long-term dependencies are dealt with by the memory blocks, and not by the flow of the backpropagation error gradient. This step has further helped in making LSTM directly comparable to other RNNs in terms of performance because training can be done with standard backpropagation with time ADDIN EN.CITE <EndNote><Cite><Author>Graves</Author><Year>2005</Year><IDText>Framewise phoneme classification with bidirectional LSTM and other neural network architectures</IDText><DisplayText>[18]</DisplayText><record><isbn>0893-6080</isbn><titles><title>Framewise phoneme classification with bidirectional LSTM and other neural network architectures</title><secondary-title>Neural Networks</secondary-title></titles><pages>602-610</pages><number>5-6</number><contributors><authors><author>Graves, Alex</author><author>Schmidhuber, Jürgen</author></authors></contributors><added-date format="utc">1555252907</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2005</year></dates><rec-number>1741</rec-number><last-updated-date format="utc">1555252907</last-updated-date><volume>18</volume></record></Cite></EndNote>[18]. LSTM Algorithm architecture The central components of the LSTM architecture are the memory cell, which can maintain its state over some time and non-linear gate units, which regulate the information input and output flow of the network ADDIN EN.CITE <EndNote><Cite><Author>Greff</Author><Year>2017</Year><IDText>LSTM: A search space odyssey</IDText><DisplayText>[19]</DisplayText><record><isbn>2162-237X</isbn><titles><title>LSTM: A search space odyssey</title><secondary-title>IEEE transactions on neural networks and learning systems</secondary-title></titles><pages>2222-2232</pages><number>10</number><contributors><authors><author>Greff, Klaus</author><author>Srivastava, Rupesh K</author><author>Koutník, Jan</author><author>Steunebrink, Bas R</author><author>Schmidhuber, Jürgen</author></authors></contributors><added-date format="utc">1555252643</added-date><ref-type name="Journal Article">17</ref-type><dates><year>2017</year></dates><rec-number>1739</rec-number><last-updated-date format="utc">1555252643</last-updated-date><volume>28</volume></record></Cite></EndNote>[19]. Based on the insights derived from secure networks, it is considered that because the LSTM neuron consists of internal cells and gate units, one should not only look at the output of the neuron but also at the internal structure to design original features for LSTM so that it can address classification problems ADDIN EN.CITE <EndNote><Cite><Author>Hakkani-Tür</Author><Year>2016</Year><IDText>Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM</IDText><DisplayText>[20]</DisplayText><record><dates><pub-dates><date>8–12 September 2016</date></pub-dates><year>2016</year></dates><titles><title>Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM</title><secondary-title>Interspeech</secondary-title></titles><pages>715-719</pages><contributors><authors><author>Hakkani-Tür, Dilek</author><author>Tür, G?khan</author><author>Celikyilmaz, Asli</author><author>Chen, Yun-Nung</author><author>Gao, Jianfeng</author><author>Deng, Li</author><author>Wang, Ye-Yi</author></authors></contributors><added-date format="utc">1555252547</added-date><pub-location>San Francisco, USA</pub-location><ref-type name="Conference Proceeding">10</ref-type><rec-number>1735</rec-number><last-updated-date format="utc">1555253427</last-updated-date></record></Cite></EndNote>[20].Figure 2 shows the architecture of an LSTM in which there are three bidirectional LSTM layers, two feed-forward layers, and a SoftMax layer that gives the predictions. This fully connected architecture allows us to take advantage of the inherent correlations among connections. Before the second layer in the network, co-occurrence exploration is applied to the connection to learn from the input features. Lastly, backpropagation is applied to the LSTM layer to allow more effective learning ADDIN EN.CITE <EndNote><Cite><Author>Zhu</Author><Year>2016</Year><IDText>Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks</IDText><DisplayText>[21]</DisplayText><record><dates><pub-dates><date>12–17 February 2016</date></pub-dates><year>2016</year></dates><titles><title>Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks</title><secondary-title>Thirtieth AAAI Conference on Artificial Intelligence</secondary-title></titles><pages>3697-3703</pages><contributors><authors><author>Zhu, Wentao</author><author>Lan, Cuiling</author><author>Xing, Junliang</author><author>Zeng, Wenjun</author><author>Li, Yanghao</author><author>Shen, Li</author><author>Xie, Xiaohui</author></authors></contributors><added-date format="utc">1555252562</added-date><pub-location>Phoenix, Arizona, USA</pub-location><ref-type name="Conference Proceeding">10</ref-type><rec-number>1736</rec-number><publisher>AAAI</publisher><last-updated-date format="utc">1555254644</last-updated-date></record></Cite></EndNote>[21]. Fig. SEQ Fig. \* ARABIC 2: Schematic Structure of Fully Connected Long Short-Term Memory (Source: Zhu et al., 2016) In this study, CNN and LSTM are used to build a hybrid model, the IPDS, which can be used to classify phishing websites. The general structure of the IPDS is presented in Fig. 3. The aim behind this conceptualisation is to integrate the CNN, LSTM and a DL algorithm and apply them to the features extracted from websites to detect phishing activities more accurately. Based on a comparison between the features extracted and the knowledge model, the classification of legitimate and phishing sites is achieved. Websites are evaluated individually to determine whether they are legitimate or spoof sites. In the proposed method, the features of the web page that have similarity with the proposed solution are compared to remove duplication in the feature set. Then the feature set is used to train the model that is used in the classification process. Fig. SEQ Fig. \* ARABIC 3: Intelligent Phishing Detection System (IPDS) Structure The overall conceptual framework for the intelligent phishing detection system (IPDS) structure using deep learning is presented in Fig. 3. The concept involves using two deep learning algorithms, namely LSTM and CNN, on different types of features that have been extracted from websites in order to better predict phishing activities. The feature extraction step and machine learning are applied in the initial stage in our classification process. The structure diagram of the IPDS in Fig. 3 above illustrates the process of acquiring the website features and feeding them into the deep learning system for classification purposes. Then, the trained LSTM-CNN network is applied to distinguish accurately between legitimate, suspicious and phishing websites in real-time. Based on the differences between the features extracted and the IPDS model, the classification of websites as either legitimate, suspicious or phishing is achieved. Websites are assessed separately to ascertain whether they are legitimate or fake (phishing). The feature set is used in the classification process, where 70% is used for the training set, and 30% is used for the testing set. Experiment setupTo train both the LSTM and CNN, a dataset was constructed that consisted of legitimate and phishing URLs. In total, 1 million URLs were used to train LSTM. Half of the dataset consisted of phishing sites from PhishTank, which is a site that is used as phishing URL depository, and half of the dataset was comprised of legitimate sites from Common Crawl, a corpus of web crawl data. To train the CNN, we collected more than 10,000 images from both legitimate and phishing sites. This dataset was used to train and test the network using holdout cross-validation of 70% and 30% respectivelyData Pre-processingThe raw data from both images and URLs contained a lot of background information and varied in length and sizes. Therefore, we had to pre-process this data to make it available for training the model. For the CNN architecture, we cropped images from the sites based on the springing box and merely removed the wrong image. For the LSTM architecture, we collect several URLs and save them in Microsoft Excel as comma-separated values with only the URL in one column and their category label in the other column (Table 2).Model DetailsThe model was developed in Matlab version 9.5 using the deep learning toolbox. For the CNN architecture, there were three categories of data. The image (Fig. 4) is a sample of image which was loaded into the image data store and processed to extract the speeded-up robust features (SURF) from all images using the grid method to create a bag of features where the GridStep was set to [8 8] with BlockWidth of [32 64 96 128]. Then we used clustering to create a 1000-word visual vocabulary. For the LSTM architecture, the dataset was partitioned, and holdout cross-validation was set to 0.3 for training and validation. The URLs were tokenised to separate each URL into a series of separate words, all of which were set in lowercase. The tokenised data was then encoding to make it available for training, where the maximum length was set to 75, the hidden size was set to 180, and the embedding dimension was 100 with the fully connected network. The training options were set to adam; epoch = 100, gradient threshold = 1, learning rate = 0.01 and verbose = false. By doing this, we were able to tweak the network architecture layer that included the parameter mentioned above to achieve better training accuracy.Results The evaluation of the proposed method was performed based on traditional feature engineering, plus the classification algorithm methodology presented in Section III. We created features based on the URLs and images features and website element. We trained the CNN and LSTM classifier using one million URLs and over 10,000 images to build our model. For the experimental results, we performed three series of experiments for each evaluation method, testing them against legitimate, suspicious and phishing websites. In the time-based evaluation process, time-stop at the point the scheme was able to classify the against all the legitimate datasets, suspicious datasets and phishing datasets. Then the process was also repeated severally to determine the average time to each classification at an interval. In the accuracy-based assessment, all the legitimate datasets, suspicious datasets and phishing datasets were utilised to test the toolbar. We tested the accuracy of the model using the holdout cross-validation strategy. In the experiment, the overall classification accuracy result (Chart 1) for the proposed IPDS (CNN+LSTM) was 93.28% (Table 1). The best relative performance in classification was achieved by CNN with 92.55% and that for testing was achieved by LSTM with 92.79% (Table 1). Thus, the results showed that the accuracy of the proposed model was at 93.28%. The results of our experiment showed that some level of improvement in phishing detection was achieved through the use of hybrid features by combining the images, text and frames of a site with the use of a hybrid DL algorithm. Furthermore, in our experiment, we obtained information about the usefulness of unsupervised pre-training and the effectiveness of image feature extraction in detecting phishing sites.Table 1: Relative performanceAlgorithmAccuracy %Recall % Precision % F-measure %CNN92.5592.5192.5892.54LSTM92.7992.7892.8192.80IPDS93.2893.2793.3093.29Chart SEQ Chart \* ARABIC 1: Experimental result for CNN, LSTM and (CNN+LSTM) IPDS classificationFig. SEQ Fig. \* ARABIC 4: Legitimate URL Check with Application InterfaceFig. SEQ Fig. \* ARABIC 5: Suspicious URL Check with Application InterfaceFig. SEQ Fig. \* ARABIC 6: Phishing URL Check with Application Interface Table 2: Relationship between URL and Various Category LabelsURLsCategory 80468169901Fig. SEQ Fig. \* ARABIC 7: Sample of Images Used to Train the CNN0Fig. SEQ Fig. \* ARABIC 7: Sample of Images Used to Train the CNNConclusion and Future workThis study explored the possibility of differentiating unique legitimate URLs from phishing URLs by using two techniques, the CNN and LSTM, as a combined classifier in a novel approach called the Intelligent Phishing Detection System (IPDS). To evaluate the proposed hybrid approach, we used a dataset containing one million legitimate and phishing URLs from both PhishTank and Common Crawl dataset, as well as 10,000 images that were personally collected from both phishing and legitimate websites. The proposed IPDS gave an excellent classification accuracy of 93.28%. Based on this result, we can deduce that distinguishing websites by their URLs, similarity context, and the images on the site by their pattern is an effective way of detecting phishing websites. The IPDS was able to respond with great agility and could verify a URL in 30.5 seconds. In addition, our analysis revealed the advantages and disadvantages of both the CNN and LSTM methods. On the one hand, LSTM has an overall higher prediction performance, but there is a need for expert knowledge when creating the features. On the other hand, the CNN perform better, but achieves a performance that is, on average, a bit lower than that of LSTM. Hence, combining the two methods leads to a better result with less training time for LSTM architecture than the CNN model. We carried out the extensive experimental analysis of the proposed hybrid approach in order to evaluate its effectiveness in the detection of phishing web pages and phishing attacks on large datasets. Then the sensitiveness of the proposed approach to various factors such as the type of features, number misclassification and split issues was studied. The result of our experiments showed that the proposed approach was highly effective in the detection of website phishing attacks as well as in the identification of fake websites. To the best of our knowledge, this is the first work that considers how best to integrate image, text and frame features into a combined solution for a phishing detection scheme. Future work will include developing a web browser plug-in based on a DL algorithm to detect web phishing and thus protect users in real-time. References ADDIN EN.REFLIST [1]L. De Kimpe, M. Walrave, W. Hardyns, L. Pauwels, and K. Ponnet, "You have got mail! Explaining individual differences in becoming a phishing target," Telematics and Informatics, vol. 35, no. 5, pp. 1277-1287, 01 August 2018, 2018, doi: .[2]H. Thakur and S. Kaur, "A Survey Paper On Phishing Detection," International Journal of Advanced Research in Computer Science, vol. 7, no. 4, pp. 64-68, 2016.[3]M. Imani and G. A. Montazer, "Phishing website detection using weighted feature line embedding," The ISC International Journal of Information Security, vol. 9, no. 2, pp. 49-61, 2017.[4]Y. Li, Z. Yang, X. Chen, H. Yuan, and W. Liu, "A stacking model using URL and HTML features for phishing webpage detection," Future Generation Computer Systems, vol. 94, no. 2019, pp. 27-39, 01 May 2019 2019, doi: .[5]APWG, "Unifying the Global Response to Cybercrime," in "Phishing Activities Trends Report, Third Quarter 2018," Anti-Phishing Working Group, Washington, D.C, USA, Online 31 July 2018 2018. [Online]. Available: [6]P. Yang, G. Zhao, and P. Zeng, "Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning," IEEE Access, vol. 7, pp. 15196-15209, 11 January 2019 2019, doi: 10.1109/ACCESS.2019.2892066.[7]W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, "A survey of deep neural network architectures and their applications," Neurocomputing, vol. 234, pp. 11-26, 2017/04/19/ 2017, doi: .[8] J. James, L. Sandhya, and C. Thomas, "Detection of phishing URLs using machine learning techniques," in 2013 International Conference on Control Communication and Computing (ICCC), Thiruvananthapuram, India, 13-15 Dec. 2013 2013: IEEE, pp. 304-309, doi: 10.1109/ICCC.2013.6731669. [9]H. Le, Q. Pham, D. Sahoo, and S. C. Hoi, "URLnet: Learning a URL representation with deep learning for malicious URL detection," presented at the arXiv preprint arXiv:1802.03162, Washington, DC, US, 2 March 2018, 2018.[10]P. Yi, Y. Guan, F. Zou, Y. Yao, W. Wang, and T. Zhu, "Web phishing detection using a deep learning framework," Wireless Communications and Mobile Computing, vol. 2018, pp. 1-9, 2018.[11]B. Yoshua, M. Jordan, Ed. Learning Deep Architectures for AI (Learning Deep Architectures for AI, no. 1). Hanover, MA USA: now?Publishers Inc, 2009, pp. 1-127.[12] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, "What is the best multi-stage architecture for object recognition?," in 12th International Conference on Computer Vision, Kyoto, Japan, 29 Sept.-2 Oct. 2009 2009: IEEE, pp. 2146-2153, doi: 10.1109/ICCV.2009.5459469. [13]M. Mathieu, M. Henaff, and Y. LeCun, "Fast training of convolutional networks through ffts," arXiv preprint arXiv:1312.5851, pp. 1-9, 2013.[14]A. C. Bahnsen, E. C. Bohorquez, S. Villegas, J. Vargas, and F. A. González, "Classifying phishing URLs using recurrent neural networks," presented at the APWG Symposium on Electronic Crime Research (eCrime), Scottsdale, AZ, USA, 25-27 April 2017, 2017.[15]I. Goodfellow, Y. Bengio, and A. Courville, "Deep learning. Book in preparation for MIT Press," URL? , 2016.[16]I. Arel, D. C. Rose, and T. P. Karnowski, "Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier]," IEEE Computational Intelligence Magazine, vol. 5, no. 4, pp. 13-18, 2010, doi: 10.1109/MCI.2010.938364.[17] Z. Xu, S. Li, and W. Deng, "Learning temporal features using LSTM-CNN architecture for face anti-spoofing," in 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur, Malaysia, 3-6 November 2015 2015: IEEE, pp. 141-145. [18]A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural Networks, vol. 18, no. 5-6, pp. 602-610, 2005.[19]K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber, "LSTM: A search space odyssey," IEEE transactions on neural networks and learning systems, vol. 28, no. 10, pp. 2222-2232, 2017.[20] D. Hakkani-Tür et al., "Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM," in Interspeech, San Francisco, USA, 8–12 September 2016 2016, pp. 715-719. [21] W. Zhu et al., "Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks," in Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, 12–17 February 2016 2016: AAAI, pp. 3697-3703. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download