To create a new data column from existing data columns To create an ...

Course: Day and Date: Student's Name:

Teacher:

Mini-Lesson #3

Objective:

? To create a new data column from existing data columns ? To create an additional dataframe ? To understand how to fill in missing data using interpolation ? To save the clean data as a new csv file

Creating a new data column and dataframe

You may need to re-open your file weather_data_analysis.ipynb. Looking at the code below, what does it appear to do to the data? ____________

Enter and run the two following codes. Note, you may do this as two separate entries and run them separately or put them in the same cell and run it once.

What is the name of the new dataframe that we created? ________________ Use code from previous lessons to display the dataframe. What is the first entry? ____________ Use new_df_() to determine how many missing entries exist for TAVE. ____________

Understanding missing data and interpolation

Interpolation is a method to insert missing values between known values. Since the method is linear, it will space the missing values equally. For example, if there is one missing value between 10 and 30, then linear would place a 20 between them. If there are five missing values between 40 and 50, then to cover a span of 10 with 5 values, they would be equally spaced two apart. What would be the five missing values?____, ____, ____, _____, _____.

To help identify where the missing data is, we will ask python to find any values that are "null", that is, that are missing. Run the code below and notice the NaN values that are listed. Not only are there some isolated values missing, but there are also consecutive values missing.

Notice that in May 1975 there are more than a dozen missing values.

Enter the code below to find the listing of values in the range provided. How many missing values exist? ______

Run the three following pieces of code. The first line of code will fill in values that were missing by interpolation. The second line of code provides the mean (average) of the values of each month and utilizes this value for the value for each month since `1M' is included. Again, run new_df_TAVE_() to see how many values are present. _____________ (NOTE, we have created a different new dataframe that has _M at the end to show that it is the df with Monthly values. This begins in the second line of code below.) Although each month has a different number of days, Python handles this for us.

It is possible that missing values still exist as seen in the .info() results. In order to fill in those values, we will once again run the interpolation command. Notice that we want to do this on the new dataframe that has _M at the end. This is similar to the code run three lines above. Once again, run the .info() command to see that the number of non-null values finally match the number of rows. Although it has taken some work, we now have a full data set with no missing values. Consider most data sets you have seen in math classes. These are usually nice clean sets that are probably much smaller and already come to you complete. When working with real world data, there is frequently sets that are incomplete. What are some of the reasons these data sets may be incomplete?

Saving as a new csv file

We want to save this data to be used in the next several lessons. In order to do this, we will want to save it in the same folder where we are running our Jupyter session.

Confirm that it has been saved by viewing your windows explorer. It should be in the same folder with the temp_ElDorado file.

You may have noticed many # labels in the code with green comments to the side. Programming is much more than just making something run. It is also about effectively communicating what you are trying to achieve so you and other readers can better understand. As we move forward, work diligently to add comments as you write code.

In the coming lessons, we will consider how to use forecasting techniques that are appropriate for seasonal data.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download