Assignment No
SD Module- Python
Assignment No. 4
Title:
Write python code that loads any data set (example game_medal.csv) & does some basic data cleaning. Add component on data set.
Objectives:
Understand the basics of Data preprocessing, learn Pandas basic plot function ,matplotlib, Seaborn etc.
Problem Definition:
Develop python code that loads any data set (example game_medal.csv) & does some basic data cleaning. Add component on data set
Outcomes:
10 1. Students will be able to demonstrate Python data preprocessing
11 2. Students will be able to demonstrate Plot the Graph in Python using Pandas Plot Function
12 3. Students will be able to demonstrate matplotlib, seborn packages.
Hardware Requirement: Any CPU with Pentium Processor or similar, 256 MB RAM or more,1 GB Hard Disk or more
14
Software Requirements: 32/64 bit Linux/Windows Operating System, R Studio
16
Theory:
Preprocessing
Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. Data preprocessing prepares raw data for further processing.
Why preprocessing?
Real-world data are generally:
Incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data
Noisy: containing errors or outliers
Inconsistent: containing discrepancies in codes or names
Tasks in data preprocessing:
• Data cleaning: fill in missing values, smooth noisy data, identify or remove outliers, and resolve inconsistencies.
• Data integration: using multiple databases, data cubes, or files.
• Data transformation: normalization and aggregation.
• Data reduction: reducing the volume but producing the same or similar analytical results.
• Data discretization: part of data reduction, replacing numerical attributes with nominal ones.
Mini Project- In this Assignment we are using PANDAS Package to perform following Operation on given dataset
grouping
filtering
visualizing
Dataset- Here we are used same dataset Summer.CSV used in Assignment-3
[pic]
[pic]
Using .pivot_table() to count medals by type
• Rather than ranking countries by total medals won and showing that list, you may want to see a bit more detail.
▪ You can use a pivot table to compute how many separate bronze, silver and gold medals each country won.
o That pivot table can then be used to repeat the previous computation to rank by total medals won.
[pic]
[pic]
[pic]
[pic]
[pic]
Applying .drop_duplicates()
• What could be the difference between the 'Event_gender' and 'Gender' columns?
• you should be able to evaluate your guess by looking at the unique values of the pairs (Event_gender, Gender)
• The duplicates can be dropped using the .drop_duplicates() method, leaving behind the unique observations.
[pic]
[pic]
Finding possible errors with .groupby()
• you will now use .groupby() to continue your exploration. Your job is to group by 'Event_gender' and 'Gender' and count the rows.
• You will see that there is only one suspicious row: This is likely a data error.
Locating suspicious data
[pic]
Constructing alternative country rankings
Counting distinct events
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
Conclusion/Analysis: Hence we are able to draw the various plot using seaborn, matplotlib and pandas packages on suitable dataset.
Assignment Question?
1. Write a command for draw pivot table?
2. What is the command for see table information?
3. How to Applying .drop_duplicates() method?
4. How to use .groupby() Method?
5. What do you mean by unstacking?
6. Write a command for to create area plot?
7. Write list of command for visualization?
8. Write list of command for grouping?
9. Write list of command for filtering?
Oral Question?
1. What do you mean histogram?
2. What do you mean scatter plot?
3. What do you mean pie chat?
4. What do you mean bar chart?
5. What do you mean heatmap?
6. What do you mean scatter plot?
References:-
[pic]
-----------------------
|W (4) |C |D |V |T |Total Marks with |
| |(4) |(4) |(4) |(4) |Sign |
| | | | | | |
-----------------------
SNJB’S K.B.J. COLLEGE OF ENGINEERING, CHANDWAD
1
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- natural selection as a cause and as explanation
- assignment no
- informatics practices new 065 class xii
- python class room diary be easy in my python class
- provisional agenda for the tenth to fifteenth
- list of tables national blood authority national blood
- home convention on biological diversity
- book excerpt brain on fire by susannah cahalan
Related searches
- writing assignment for 2nd grade
- aesop substitute assignment aesop online
- 6th grade writing assignment ideas
- 6th grade writing assignment pdf
- 9th grade writing assignment worksheet
- 9th grade writing assignment classroom
- 10th grade writing assignment idea
- biol 101 individual assignment 1
- aesop substitute assignment pin number
- literacy narrative assignment essay
- online homework assignment help
- new york life assignment form