Introduction and objectives: .edu.au



Introduction and objectives:This blog is the second of the three blogs about the data visualisation tools. The objective of this blog is to try the Matplotlib package in Python for visualisation and provide some feedback about the advantage and disadvantage of this package as a visualisation tool. The objective of this post is not exploring the dataset and find insights. Visualisation tool:The tool which is used for this exercise is a Python package, Matplotlib.pyplot. The Jupyter Notebook is used for coding environment. My reflection Matplotlib.pyplot are as follows:The Matplotlib.pylot is exceptionally flexible. It provides an integrated environment for data mining, data cleansing, modelling and visualisation. The Matplotlib.pylot is a popular and accepted tool in data visualisation industry, meaning knowing how to use the package will be a great benefit. It is free, with no subscription requiredDespite all the advantages that I mentioned above, The Matplotlib.pylot has some restrictions:It is not easy for a non-programmerIf Jupyter Notebook is used for coding, there is no direct extract to word format.Dataset:The dataset used in this exercise is composed of?three numeric variables, two categorical variables and a date (year). It considers an abstract of the?Gapminder?dataset made famous through the?Hans Rosling Ted Talk. It contains the following variables:lifeExp: A numerical variable reflecting the average life expectancy in years.gdpPercap: A numerical variable reflecting the Gross Domestic Product per population in US dollar ADDIN EN.CITE <EndNote><Cite><Author>Rosling</Author><Year>2019</Year><RecNum>59</RecNum><DisplayText>(Rosling 2019)</DisplayText><record><rec-number>59</rec-number><foreign-keys><key app="EN" db-id="tpetzs9puszeznede275ra53v29rzpftxp9t" timestamp="1554212146">59</key></foreign-keys><ref-type name="Web Page">12</ref-type><contributors><authors><author>Hans Rosling</author></authors></contributors><titles><title>Gapminder</title></titles><dates><year>2019</year></dates><publisher>&#xD;</publisher><urls><related-urls><url>;(Rosling 2019).Pop: a numerical variable reflecting the population size for more than 100 countries.year: An integer numerical variable reflecting the year the corresponding data was recorded.Continent: a categorical variable reflecting the continent in which the corresponding country is located in.Data Preparation:The code below shows how to import the required packages and the data into the Python Jupiter notebook environment: Figure SEQ Figure \* ARABIC 1_Data Summary REF _Ref5098470 \h Figure 1 depicts the data summary for variables. We need to change the type of the variable continent to categorical as depicted in REF _Ref5141547 \h Figure 2 below:Figure SEQ Figure \* ARABIC 2_Final Data SummaryFinal dataset used in this exercise is shown in REF _Ref5141547 \h Figure 2. Noting that the data does not have any missing value, therefore no further data preparation is required.Explanatory Data AnalysisThe objective of this practice is to visualise the trend of gdpPercap in different countries between 1952 and 2007.Figure 3_Scatter Plot REF _Ref5145243 \h Figure 3 depicts the life expectancy versus GDP per Capita in 2007. Right now, the scatter plot is indistinguishable. I would want that the size of the dots corresponds to the population. To accomplish this, I used the country population to the scatter method, as the argument for size. Besides, I want to make the plot more readable by adding colour to each country. The result is shown in REF _Ref5145575 \h Figure 4 below.Figure SEQ Figure \* ARABIC 4_Bobble plot for life expectancy in 2007In the last step, we would want to show the trend of the life expectancy per GDP for different countries between 1952 to 2007.Figure SEQ Figure \* ARABIC 5_Bobble plot matrixA loop iteration is utilised to create a bobble plot matrix as shown in REF _Ref5098875 \h Figure 5. The countries in blue, corresponding to Africa, have both low life expectancy and a low GDP per capita.Conclusion:The matplotlib.pyplot package in Python is not as easy as Tableau for a non-programmer. However, I believe that programming skill is essential for a creative data visualisation. My take away from my experiment is I would suggest starting with Tableau at the beginning of any project to immediately have more understanding and potentially some initial insights from data. This would help to plan for the analysis part of the project. I would then suggest keeping the visualisation in the environment that the analysis has progressed. This could be in any packages in R or Python. As an option and at the end of the analysis phase of the project, Tableau can be utilised to various creative visualisation.References: ADDIN EN.REFLIST Rosling, H. 2019, Gapminder, <;. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download