Pandas for Everyone: Python Data Analysis

 Contents

Chapter 1. Pandas Dataframe basics 1.1 Introduction 1.2 Concept map 1.3 Objectives 1.4 Loading your first data set 1.5 Looking at columns, rows, and cells 1.6 Grouped and aggregated calculations 1.7 Basic plot 1.8 Conclusion Chapter 2. Pandas data structures 2.1 Introduction 2.2 Concept map 2.3 Objectives 2.4 Creating your own data 2.5 The Series 2.6 The DataFrame 2.7 Making changes to Series and DataFrames 2.8 Exporting and importing data

2.9 Conclusion Chapter 3. Introduction to Plotting 3.4 matplotlib Chapter 4. Data Assembly 4.1 Introduction 4.2 Concept map 4.3 Objectives 4.4 Concatenation 4.6 Summary Chapter 5. Missing Data 5.1 Introduction Concept map Objectives 5.2 What is a NaN value 5.3 Where do missing values come from? 5.3.3 User input values 5.4 Working with missing data Summary Chapter 6. Tidy Data by Reshaping 6.1 Introduction

Concept Map 6.2 Columns contain values, not variables 6.3 Columns contain multiple variables 6.4 Variables in both rows and columns 6.5 Multiple Observational Units in a table (Normalization) 6.6 Observational units across multiple tables 6.7 Summary

Chapter 1. Pandas Dataframe basics

1.1 Introduction

Pandas is an open source Python library for data analysis. It gives Python the ability to work with spreadsheet-like data for fast data loading, manipulating, aligning, merging, etc. To give Python these enhanced features, Pandas introduces two new data types to Python: Series and DataFrame. The DataFrame will represent your entire spreadsheet or rectangular data, whereas the Series is a single column of the DataFrame. A Pandas DataFrame can also be thought of as a dictionary or collection of Series.

Why should you use a programming language like Python and a tool like Pandas to work with data? It boils down to automation and reproducibility. If there is a articular set of analysis that needs to be performed on multiple datasets, a programming language has the ability to automate the analysis on the datasets. Although many spreadsheet programs have its own macro programming language, many users do not use them. Furthermore, not all spreadsheet programs are available on all operating systems. Performing data takes using a programming language forces the user to have a running record of all steps performed on the data. I, like many people, have accidentally hit a key while viewing data in a spreadsheet program, only to find out that my results do not make any sense anymore due to bad data. This is not to say spreadsheet programs are bad or do not have their place in the data workflow, they do, but there are better and more reliable tools out there.

1.2 Concept map

1. Prior knowledge needed (appendix)

(a) relative directories

(b) calling functions

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download