NLP and Sentiment Driven Automated Trading
NLP and Sentiment Driven Automated Trading
Atish Davda (adavda@seas.upenn.edu) Parshant Mittal (pmittal@seas.upenn.edu) Faculty Advisor: Michael Kearns (mkearns@cis.upenn.edu)
Atish Davda Parshant Mittal
NLP and Sentiment Driven Automated Trading Senior Design 2007-08
Page 2
Abstract
Movements in financial markets are directly influenced by information exchange ? between a company
and its owners, between the government and its citizens, between one individual and another. The
channels of distributing news have expanded from the singular ticker tape in the middle of town to
intra-minute delivery to the computer via RSS feeds. With information quickly available markets are
becoming increasingly efficient, as humans design intricate algorithms to continuously take advantage of
any perceived mispricing in the markets (Kelly, 2007). This phenomenon, which is especially prevalent in
the stock market, begs the question: is there still an active need for the human element? After all,
machines are faster ? given more information and better hardware, their computation power decidedly
exceeds that of humans. The answer lies in the challenge of abstraction; deciding the impact of each
piece of information is important and more isn't always better (Greenwald, Jennings, & Stone, 2003) .
In this project we explored the field of natural language processing and identified methods we can use to automate stock trading based on news articles. The project was implemented in three phases (see Appendix 1). The first phase included data collection from sources on the web. News articles and headlines were scraped from Yahoo! Finance; historical market data was collected from Google Finance. The data was collected for 600 small market cap stocks (SML), 400 medium market cap stocks (MID) and 500 stocks from S&P 500 index (SP500). The second phase included sentiment analysis on the first half of the dataset, in order to compute sentiments to be tested on the (out of sample) second half. In the final stage, we implemented an NLP approach to quantifying the headlines. This was done using a number of NLP packages available online, including the Stanford Lex Parser, WordNET, and General Inquirer.1 The last stage of the project comprised of developing a trading module with which we could incorporate the results of historical market, sentiment, and NLP analysis to give a Buy, Sell, or a Hold
1 Please refer to the Bibliography section for further information on these projects.
Atish Davda Parshant Mittal
NLP and Sentiment Driven Automated Trading Senior Design 2007-08
Page 3
recommendation for securities under consideration. Using sentiment and NLP analysis we were able to
achieve significantly improved returns. In fact we averaged a return of 4.0% over a two month period
(27% annualized), while the market fell 8.7% during the same period (-42.1% annualized). With the help
of this and other metrics, we explored the value of NLP in automated trading.
Related Work
Given the widespread implications of introducing abstraction capability to machines, it isn't surprising that NLP is a highly researched discipline. In fact, even in just the US there exist several groups sponsored by universities, corporations, and the government, which focus solely on improving the capabilities of current language-processing techniques (Fallows, 2004). However, although the paradigm of examining news articles attracts a lot of academic studies, it is rather biased toward long-term, macro news reports2; unexplored by comparison, is the realm of short-term, firm-specific news.3 One of the first studies specifically focused on quantifying the relationship between news releases and movements in the stock markets was conducted not too long ago (Gillam, Ahmad, & Ahmad, 2002).
The challenge of predicting which news events will have what impact on the trading characteristics, such as price and volume traded of stocks still remains. While there have been recent advancements in the applications of NLP in predicting other markets (e.g. election markets), the specific role of language analysis in financial markets is unclear (Gilder & Lerman, 2007). The novelty of our project lies in applying NLP analysis to news headlines, rather than the entire article. In addition, we consider highly liquid and efficient markets. These markets present additional challenges as there is no end date and our analysis must then include a wider range of factors. One natural dimension we explored in detail was distinguishing the impact between the headline "IBM's earnings drop" and "IBM's earnings
2 Macro news reports include interest rate changes by central banks, announcements of inflation news, etc. 3 Firm-specific news includes earnings reports, merger/acquisition rumors, etc.
Atish Davda Parshant Mittal
NLP and Sentiment Driven Automated Trading Senior Design 2007-08
Page 4
plummet."4 Our paper is, in part, an extension of the 2002 study "Economic News and Stock Market
Correlations" which solely looked at the sign (positive or negative) of the connotation associated with
words in the news articles. We have implemented a framework with the use of General Inquirer as well
as our own sentiment analysis to distinguish between the emotional charges people innately give to
certain words, which lead to varying degrees of influence the news has on the characteristics of the
stock. Upon additional research on generic topics such as conjunctive handling, we found a good fit for
such fundamental pillars of NLP (Meena & Prabhakar, 2007). Furthermore, we expanded upon this kind
of study by examining syntax in addition to semantics, empirically deriving an adjustment factor to each
word's sentiment charge, depending on its use in a sentence. This second order correction helped
improve accuracy of predictions, once we moved away from the na?ve bag-of-words analysis.
Building this lexicon with each word having an associated sentiment is a field of research in itself, Sentiment Analysis. There are several models for generating such a corpus; one of the fundamental models is described in the paper "Determining the Sentiment of Opinions" by Kim and Hovy (2004). The study discusses a region (news headline) around the central anchor (company of interest), which when examined as a whole, yields a positive or negative rating for the company itself (Kim & Hovy, 2004). Another approach suggests a more empirical analysis by examining vast amounts of HTML documents in order to generate a polarity score for words, described as a function of the distance of a given word from a pre-defined, manually selected corpus (Kaji & Kitsuregawa, 2007). While it would certainly help having a sentiment list as close to perfect as possible, we focused on the use of sentiment scores, rather than determining the optimal method to calculate them. As you will read in the Technical Approach section, we adopted a combination of these two methodologies along with General Inquirer ? initially, we used a discretionary method akin to the latter model, and eventually, will develop a hybrid.
4 Gilliam article titled "Economic News and Stock Market Correlation" discusses the impact of "good" versus "bad" words, but does not incorporate degrees of positive/negative sentiment associated with the word.
Atish Davda Parshant Mittal
NLP and Sentiment Driven Automated Trading Senior Design 2007-08
Page 5
The project's goal is two-fold: one is to test whether a relationship exists between news articles and the movements in the market data of a stock; the second goal is to model this relationship, if it exists, by implementing it into a trading strategy. In regard to the former, the scope of news content can be broadly divided into two sets: news reporting on past performance, and announcements of future activity (Gillam et al., 2002). While it would be an interesting dimension to explore, this study limits itself to quantifying relationships between characteristics of news articles and relevant stock returns, regardless of the category under which the news falls. The reason for doing so is because we focus on implementing this strategy as if it were to be used in a high frequency event driven trading platform where it is often acceptable to be accurate just little over 50%. The reason hinges on consistently being right over half the time, so that the profits generated will more than account for the losses sustained due to incorrect decisions. Detailed analysis on the subject of Statistical Arbitrage has been performed, by testing various experimental trading strategies used to test predictive effects of news releases on stock movement (Hariharan, 2004).
While Hariharan's ideas are in a way predecessors to the space of stock trading based on news release, this project delves more into the realm of NLP in the context of financial textual data, rather than the development of a trading strategy (which is a secondary focus of the project).5 Primarily, the project will explore and attempt to derive a predictive relationship between news reports and stock movements. Another study by Subramanian, aimed at optimization of automated trading algorithms would have come in handy in later phases, had we decided to focus on strategies. Rather, we employed a simple set of trading ideas, described later, to quantify and avoid confounding the results with advanced models (Subramanian, 2004).
5 If we happen to make significant progress towards our goals of achieving satisfactory NLP accuracy, we may begin to shift our focus on refining the trading strategy tailored to the results.
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- how to use sas system to filter stock for trade
- sentiment analysis of twitter data for predicting stock
- big data analysis of historical stock data using hive
- nlp and sentiment driven automated trading
- a study of time varying copula approach to oil and stock
- using bloomberg to get the data you need
- package quantmod the comprehensive r archive network
- chapter 1 descriptive statistics for financial data
- how to get data an introduction into quantmod
- pandas datareader documentation read the docs
Related searches
- automated plastics
- free automated online money system
- wells fargo automated payoff request
- automated order processing system
- automated craigslist posting service
- stock trading tools and software
- nlp question answering
- anti german sentiment during ww1
- anti german sentiment during ww2
- anti german sentiment ww1
- wwi anti german sentiment kansas
- premarket stock trading and cnn