WordPress.com

 A Big Data Approach to the Predictive Analysis on Fortune 500 company’s success 9664708890TABLE OF CONTENTS.0-126990-12699? Authorsp.1? Summaryp.2? Problem and Motivationp.2? Background & Related Workp.2? Approach and Uniquenessp.2? Results and Contributionsp.2? Visualization Summaryp.2? Referencesp.3Authors:Faculty Advisors:.0-126990-12699Kerri Easterbrook Dr. Wlodek?ZadroznyUNC Charlotte Associate ProfessorContact: keasterb@uncc.edu / 704-930-6903 UNC CharlotteXiazhi Fang Dr. Jared HansenUNC Charlotte Associate Professor of MarketingContact: xfang2@uncc.edu / 704-458-8017 UNC CharlotteMadlen IvanovaUNC CharlotteContact: mivanov1@uncc.edu / 704-957-8989Diana Kinney6111 Tesh Court, Charlotte NC 28269Contact: dkinney1@uncc.edu / 704-609-4498Mahalakshmi Vishnampettai RaghuramanUNC CharlotteContact: mvishnam@uncc.edu / 704-302-3663Aravindharaj Rajendran212 Barton Creek Drive, Apt E, Charlotte, NC 28262Contact: arajendr@uncc.edu / 469-877-7108Divya Ravi9309 Kittansett Drive, Apt C, Charlotte, NC 28262.Contact: dravi@uncc.edu / 980-226-3076Summary.0-126990-12699In order to enhance an individual investor’s return on investment, we aim to discover predictive factors that impact a company’s success. To accomplish this, we collected various data points such as company innovation, corporate social responsibility, economy (GDP), patents filed by Fortune 500 companies and political contribution and election data. First, we staged the data as structured and semi structured data in Hadoop Database. We then cleansed the data, performed tokenization & stemming, removed stop words (for patent abstracts & titles, and for missing data – Omission/Imputation). We mined the data and performed patent analysis using Map Reduce technique and also produced Word Cloud and Topic Modeling using Mallet. To visualize the patterns and see relationships between variables we used Tableau. Finally, we put together some hypotheses based on some initial regression analyses. Problem and Motivation.0-126990-12699Our objective is to not only discover predictive factors that impact a company’s success - which in turn helps improve an individual investor’s return on investment – but also to suggest recommendations that can potentially improve company growth. The different factors that we considered (using descriptive statistics) are: Industry Type, Corporate Social Responsibility (CSR), Political Environment and Patents, and assumed that finding the relationship between different attributes in the above mentioned factors will impact company success. Limiting our research to Fortune 500 companies, we focused on two main aspects of company's’ success – revenue and innovation. In addition, we assessed the impact of political party in (Presidential) office on company success.Background and Related work.0-126990-12699We heavily researched and gathered information related to the problem statement. We also analyzed the different factors that influence Fortune 500 companies. We assessed how utilization of Big Data technology can be effective in deriving results. We fine-tuned the process to be followed and acquainted ourselves to different big data concepts in order to apply it to this problem.Approach and Uniqueness.0-126990-126990-126990-12699Solution OverviewTechnology solutions involved analysis of big data including semi-structured data using Java MapReduce program developed using Eclipse IDE on Hadoop framework. The extracted data was analyzed quickly using techniques such as word cloud and topic modeling. Essential data from this extract and structured data collected were loaded into Excel sheets and imported into MS Access database. Descriptive statistical measures were calculated using R-studio and preliminary variable relationship analysis was done using Tableau.Data CollectionFortune 500 company, patent, social responsibility, and political contribution data, along with presidential election and GDP data was collected because it was assumed that these would include variables that impacts company success, as measured by revenue. Sources used came from CompuStat and government website databases (see references below). Structured data was collected by downloading .csv or .xls files directly from their respective sources. Semi- structured patent data was downloaded from their respective sources. Using the Hadoop framework, further extraction was performed using a MapReduce program written in Java. This process was used to extract only essential data from the semi-structured xml format (e.g. “<assignee>”, “<orgname>”) to a structured table format.Data Cleansing and TransformationAll text data was tokenized, stemmed, converted to lowercase, had stop words removed, and standardized company names. A count of US patents granted from 2005 - 2015 was performed using a MapReduce function. Subsequently, a “sounds like” function was used in SAS to pull company names similar to those found in the Fortune 500 list, with all non-fortune 500 companies returned manually removed. A Lookup table of possible aliases of organization names for a given Fortune 500 company name was created and then mapped to the respective ticker symbol. All structured data was imported into an Access database, with joins made between datasets based upon ticker symbol.Preliminary Analyses and VisualizationsDescriptive statistics, variable relationship analysis, word clouds based upon word count in patent abstracts, distribution charts on industry campaign contribution data, bubble charts for industry based upon patent counts, and an exploratory linear regression with scatter plots were used to visualize the data mentioned above, with the goal of finding correlations and discovering potential predictive elements that could influence company success. Result and Contributions .0-126990-12699Hypotheses & Rationale:H1: Company revenues will be significantly impacted by type of industry.H2: Company revenue will be significantly impacted by its companies’ total level of Corporate Social Responsibility (CSR). H3: If a candidate from a company’s supported political party is elected as president, then that company will experience increased revenues during the duration of that president’s term.H4: Company revenue will increase as a company licenses more patents.Verified Results:Once we completed the above initial in-depth analyses, we focused on the top Fortune 500 companies with the most patents filed (most innovative) and the highest Total Revenue. By far, the company that filed the most patents was IBM while Walmart had the highest revenue. We ran correlations – many had statistically significant relationships (at alpha level of 0.05). The most interesting relationships were between US Patent Count, Research & Development Expense, Market Value, and Total Revenue. Separate regression analyses were performed on each industry/firm type comparing revenue to research and political party. As a result, we were able to derive what industry types benefit most by Presidential party affiliation. Contribution: Investing in Research and Development (R&D) directly benefits its Market Value and Total Revenue – resulting in company success.This insight is integral to investors who want or should capitalize on industries who invest in R&D (innovation)Recommendations:IBM: stay competitive in products and services that retain and satisfy large business clientele.Walmart: Consider investing more resources into improving HR practices (better pay and benefits, improved employee satisfaction) and meeting certain standards (improve social responsibility practices) in order to improve success and public perception (e.g. increased stock value) in domestic and international markets.Current Student Work.0-126990-12699We recently completed a comparison between the Fortune 500 firm with the largest number of patents (IBM) and the firm with the largest revenue (Walmart) to their competitors on the above listed variables. In addition, we performed cluster analysis and regression modeling to derive correlations that support our final recommendations. Finally, we rated them both as 5s on the Davenport and Harris “analytical competitor” continuum. Visualization Summary (Samples).0-126990-12699Preliminary Scatter Plot graphing between different variables such as Revenue, Net Income, Age, CSR, Patent Count, Research Expenditure and Politics:Political contribution by companies present in different Industry types:Total Patents published by different Industry types:Cluster analysis of the CSR variables:305752538100Reference and Citation.0-126990-12699Data Articles ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download