Programming and frameworks for ML Python for data analyis

[Pages:162]Programming and frameworks for ML Python for data analyis

1

About Me

Big Data Consultant at Santander / Big Data Lecturer More than 20 years of experience in different environments,

technologies, customers, countries ... Passionate about data and technology Enthusiastic about Big Data world and NoSQL

daniel.villanueva@immune.institute

2

Agenda

Reading / Writing data Exploring a DataFrame Renaming columns Filtering Columns Filtering Rows Sorting Data Adding new columns Deleting Data Grouping Data Concatenating Data Joining Data Pivot Tables

3

Reading / Writing data

Pandas supports the integration with many file formats or data sources out of the box (csv, excel, sql, json, parquet,...).

4

Reading data in text format

The function pd.read_csv() allows you to read a file and store it in a DataFrame

With the default options, files must have a header (first row) and the separator is a comma

The file could be both on disk and on the network The file can be compressed!



5

Reading data in text format

Sometimes the separator is something different to a comma, for example a pipe (|).

Use the sep argument to change the separator

6

Reading data in text format

We can use the header parameter to tell pandas where the header is located (None if the data don't have a header)

7

Reading data in text format

If the file don't have a header we can specify the columns' names with the names parameter

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download