F01.justanswer.com



Question 3Import the cereal_names data provided into your Python environment and reference it with a variable called cereal_names. Horizontally combine this cereal_names DataFrame to the cereals_data. The cereal_names DataFrames should be on the left on the combined data. Reference the combined DataFrames with a variable called cereals_data2. Select the first five rows and the last 10 columns of your combined DataFrame. Paste a screenshot of your code and output. Display only the first five rows of DataFrame output. Then import the additional_cereals_data provided on canvas into Python. Vertically Combine this additional_cereals_data to the cereals_data2. The cereals_data2 should be stacked (or placed) on top of the additional_cereals_data. Reference the combined DataFrames to a variable called cereals_data3. Select the last 10 rows of the cereals_data3. Paste a screenshot of your code and outputCheck how many missing values are in the cereals_data3. Output each column and the corresponding number of missing values in that column . Paste a screenshot of your code and outputAssuming that this data is missing at random, fill in the missing data with the mean values of the non-missing data in that column, for the numerical variables. If the variable is categorical, forward fill (there is no rational for the forward filling but do it for the sake of practice if there is a missing value in a categorical column). Paste a screenshot of your code and outputCheck if there are any duplicates in cereals_data3 ? Output the rows that are duplicates. Paste a screenshot of your code and outputRemove the duplicates permanently. Check and output the shape of your cereals_data3 DataFrame after removing the duplicates. Paste a screenshot of your code and outputUse the .cut() method to cut the values in the Cal column into three bins. Also use the .qcut() to cut the values of the Cal column into three buckets. For both .cut() and qcut(), label each of the bins or buckets as “low”, “moderate” or “high”. Use the .value_counts() method to count the data points inside each bin or bucket using the results of your .cut() and .qcut() respectively. Plot a bar chart of the counts from the .cut() and .qcut() results separately. Use the two plots to comments on the difference between .cut() and .qcut(). Paste a screenshot of your code and outputCreate dummy variables using the values of the Shelf column. Join the dummy variables to the cereals_data3. Select the first five rows and the last 5 columns of your DataFrame.Paste a screenshot of your code and outputCreate a function that takes a DataFrame, and a list of columns to be standardized, as inputs and return a DataFrame as output. Inside this function, loop through the DataFrame columns supplied as input and standard the values of these columns. The columns must contain only numerical values. Inside your function, you may need to initialize an empty DataFrame and populate it with columns having standardized values. The returned DataFrame should contain only the columns supplied as input. Each column of the returned DataFrame should contain standardized values only. Call your function using the cereals_data3 and the columns ['Carbo', 'Sugars', 'Potass', 'Vit', 'Shelf']. Note that a standardized value is a z-score. A standardize value can be computed as:z=x-xsWhere x = original value in the column x=mean of values in that column s=standard deviation of values in that columnPaste a screenshot of your code and outputUsing this new DataFrame with the standardized values, check if there is any outlier on each column of that DataFrame. An outlier is a z-score or standardized value greater than 3 (Sometimes 2.5 is use as the cutoff but in this case, use 3). Paste a screenshot of your code and output ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download