Cdn.quesba.com



TaskThe Iris data set is a comprehensive data set compiled by Robert Fisher in 1936, detailing a number of measurements of three species of Iris flowers. It has gained some popularity in the fields of Data Analytics and Machine Learning, as it provides a large number of measurements across a relatively small number of categories.The assessment task is to carry out some simple, computer-supported analysis of the Iris data set.Task detailsThe task has been broken into several, largely independent stages.Stage 1: Reading and processing dataFor this stage you need to complete the specification of the read_and_process(csv_filename) function. This function should do the following:Import the csv file named csv_filename as a Pandas DataFrameDrop any rows that do not contain entries in all columnsStrip ' cm' and ' mm' from each data point, and convert them to floatsDivide the second column ('sepal_width') by 10Return the resulting DataFrameYou may assume that csv_filename is a readable csv file with a similar format to iris.csvStage 2: User menuFor this stage you need to implement the initial interactions. When your program is run:Prompt the user to enter a csv file with Enter csv file: Read and process the user-entered file using the function from Stage 1 Display the menu:1. Create textual analysis2. Create graphical analysis3. ExitPrompt the user to select an option with Please select an option: Process the user's choice:If they select '1', proceed to Stage 3If they select '2', proceed to Stage 4If they select '3', exit the program with the exit() functionYou may assume that only valid options are selected. Stage 3: Text-based analysisFor this stage you will output some simple statistics based on the DataFrame loaded in Stage 2. Upon entering this stage, the program should:Prompt the user for a species with: Select species (all, setosa, versicolor, virginica): To obtain full marks, the available species should be extracted from the DataFrame, and may be different from those listed above. They should be arranged alphabetically, after all.Display the following statistics: Mean, 25%-ile, Median, 75%-ile, Standard deviation for each of the characteristics (sepal_length, sepal_width, petal_length, petal_width) for the species selected by the user. If the user chose all, then the resulting table should be a summary of all the data.The output should be the result of printing a DataFrame with index: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'] and column headings: Mean, 25%, Median, 75%, Std. (See Sample Interactions)Return to the main menu (Stage 2)The output resulting from pandas function calls is sufficient. You do not need to manually round any results.Stage 4: Graphics-based analysisFor this stage you will output some simple graphical plots based on the DataFrame loaded in Stage 2. Upon entering this stage, your program should:Prompt the user for a characteristic for the x-axis with: Choose the x-axis characteristic (all, sepal_length, sepal_width, petal_length, petal_width): The available characteristics do not need to be extracted from the DataFrameIf the user does not select all:Prompt the user for a characteristic for the y-axis with: Choose the y-axis characteristic (sepal_length, sepal_width, petal_length, petal_width): Plot a scatter-plot of the two chosen characteristics (does not have to be displayed)If the user does select all:Using a scatter_matrix or pairplot, plot the relationships between all pairs of characteristicsIn both cases, the program should prompt the user to enter a file with: Enter save file: and then save the graphical plot to the entered file.Return to the main menu (Stage 2)To obtain full marks, the outputs should differentiate the different species by colouring the data points based on their species.In addition to the automarked test-cases, the output of this Stage will be inspected by your OL, and up to 5 marks awarded for output.The marks will be based on the following criteria:Scatter plots of the correct characteristics (3 marks)Differentiation of species by colour (2 marks)Stage 5: ConclusionFor this Stage you are required to complete the provided function conclusion(). Your function should return a tuple containing the two (non-species) characteristics you believe answer the following question: In iris.csv, which pair of characteristics is best for separating the species? In other words, which pair of characteristics have the most significant impact in determining what species the plant belongs to?The two characteristics should be ordered alphabetically within the tuple, and should be two of: 'sepal_length', 'sepal_width', 'petal_length', or 'petal_width' .The return value should be hard-coded into the function (i.e., no calculations are required) based on your own analysis of the data (using the program you just created, if appropriate).If you are failing the last (hidden) test case, but passing the second last test case then add a comment indicating the reason for your choice. Justification is not needed if you pass the last test case.Subjective componentIn addition to the above tasks, your code will be inspected by your OL and evaluated on its adherence to good coding practices. Particular attention will be on the following aspects of your code:Documentation: Appropriate use of commentsModularity: Appropriate use of functions (Note: if appropriate you should define your own functions outside of those outlined above). All functions should "stand alone" - that is, not be dependent on global variablesReadability: Appropriate use of variable namesStructure: Appropriate code layout so that the program flow is clearSample interactionsEnter csv file: iris.csv1. Create textual analysis2. Create graphical analysis3. ExitPlease select an option: 1Select species (all, setosa, versicolor, virginica): all Mean 25% Median 75% Stdsepal_length 5.843333 5.1 5.80 6.4 0.828066sepal_width 3.054000 2.8 3.00 3.3 0.433594petal_length 3.758667 1.6 4.35 5.1 1.764420petal_width 1.198667 0.3 1.30 1.8 0.7631611. Create textual analysis2. Create graphical analysis3. ExitPlease select an option: 3Enter csv file: iris_test.csv1. Create textual analysis2. Create graphical analysis3. ExitPlease select an option: 1Select species (all, versicolor, virginica): versicolor Mean 25% Median 75% Stdsepal_length 5.955102 5.6 5.9 6.3 0.503348sepal_width 2.785714 2.6 2.8 3.0 0.296507petal_length 4.275510 4.0 4.4 4.6 0.461668petal_width 1.332653 1.2 1.3 1.5 0.1940661. Create textual analysis2. Create graphical analysis3. ExitPlease select an option: 3Enter csv file: iris.csv1. Create textual analysis2. Create graphical analysis3. ExitPlease select an option: 2Choose the x-axis characteristic (all, sepal_length, sepal_width, petal_length, petal_width): allEnter save file: iris_all.png1. Create textual analysis2. Create graphical analysis3. ExitPlease select an option: 3After the above interaction, an example of iris_all.png would be either of the following:Enter csv file: iris.csv1. Create textual analysis2. Create graphical analysis3. ExitPlease select an option: 2Choose the x-axis characteristic (all, sepal_length, sepal_width, petal_length, petal_width): sepal_widthChoose the y-axis characteristic (sepal_length, sepal_width, petal_length, petal_width): sepal_widthEnter save file: sw_vs_sw.png1. Create textual analysis2. Create graphical analysis3. ExitPlease select an option: 1Select species (all, setosa, versicolor, virginica): all Mean 25% Median 75% Stdsepal_length 5.843333 5.1 5.80 6.4 0.828066sepal_width 3.054000 2.8 3.00 3.3 0.433594petal_length 3.758667 1.6 4.35 5.1 1.764420petal_width 1.198667 0.3 1.30 1.8 0.7631611. Create textual analysis2. Create graphical analysis3. ExitPlease select an option: 3After the above interaction, an example of sw_vs_sw.png would be:Note: Your plots do not have to have the same style options (e.g., colours, fonts) as the ones presented here. Your plots will be assessed on whether they are plotting the correct data with the correct chart type (i.e., a scatterplot) ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download