Release 2.7 AalborgUniversitet

pygrametl Documentation

Release 2.7 Aalborg Universitet

May 26, 2021

1 Getting started 2 Code Examples 3 API

CONTENTS

3 13 61

i

ii

pygrametl Documentation, Release 2.7

pygrametl is a package for creating Extract-Transform-Load (ETL) programs in Python. The package contains several classes for filling fact tables and dimensions (including snowflaked and slowly changing dimensions), classes for extracting data from different sources, classes for optionally defining an ETL flow using steps, classes for parallelizing an ETL flow, classes for testing an ETL flow, and convenient functions for often-needed ETL functionality. The package's modules are:

? datasources for access to different data sources ? tables for giving easy and abstracted access to dimension and fact tables ? parallel for parallelizing an ETL flow ? JDBCConnectionWrapper and jythonmultiprocessing for Jython support ? aggregators for aggregating data ? steps for defining steps in an ETL flow ? FIFODict for a dict with a limited size and where elements are removed in first-in, first-out order ? drawntabletesting for testing an ETL flow pygrametl is currently being maintained at Aalborg University in Denmark by the following people: Current Maintainers

? Christian Thomsen ? S?ren Kejser Jensen Former Maintainers ? Christoffer Moesgaard ? Ove Andersen

CONTENTS

1

pygrametl Documentation, Release 2.7

2

CONTENTS

CHAPTER

ONE

GETTING STARTED

1.1 Install Guide

Installing pygrametl is fairly simple, mainly due to the package having no mandatory dependencies. This guide contains all the information needed to install and use the package with CPython. pygrametl also supports the JVM-based Python implementation Jython. For more information about using pygrametl with Jython see Jython.

1.1.1 Installing a Python Implementation

pygrametl requires an implementation of the Python programming language to run. Currently, pygrametl officially supports the following implementations (other implementations like PyPy and IronPython might also work):

? Jython, version 2.7 or above ? Python 2, version 2.7 or above ? Python 3, version 3.4 or above Warning: As Python 2 is no longer being maintained support for it will slowly be reduced as we continue to develop pygrametl. Currently, dttr is the only pygrametl module that requires Python 3 (version 3.4 or above). After a Python implementation has been installed and added to the system's path, it can be run from either the command prompt in Windows or the shell in Unix-like systems. This should launch the Python interpreter in interactive mode, allowing commands to be executed directly on the command line. Python 3.9.2 (default, Feb 20 2021, 18:40:11) [GCC 10.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>

3

pygrametl Documentation, Release 2.7

1.1.2 Installing pygrametl

pygrametl can either be installed from PyPI using a package manager, such as pip or conda, or by manually checking out the latest development version from the official GitHub repository. Installing pygrametl from PyPI is currently the simplest way to install pygrametl as the process is automated by the package manager. Bug fixes and new experimental features are, however, of course, available first in the GitHub repository.

Install from PyPI with pip

pip can install pygrametl to the Python implementation's global package directory, or to the user's local package directory which is usually located in the user's home directory. Installing pygrametl globally will often require root or administrator privileges with the advantage that the package will be available to all users of that system. Installing it locally will only make it available to the current user, but the installation can be performed without additional privileges. The two types of installation can be performed using one of the following commands:

# Install pygrametl to the global package directory $ pip install pygrametl

# Install pygrametl to the user's local package directory $ pip install pygrametl --user

Install from PyPI with conda

conda is an alternative package manager for Python. It is bundled with the Anaconda CPython distribution from Anaconda, Inc. There is no official pygrametl conda package as it uses a different package format than pip. It is however trivial to download, convert, and install the PyPI package using conda with only a few commands.

# Create a template for the conda package using the PyPI package $ conda skeleton pypi pygrametl

# Build the conda package $ conda build pygrametl/meta.yaml

# Install the conda package $ conda install --use-local pygrametl

Afterward, the folder containing the package template can be deleted as it is only used for building the package.

Install from GitHub

The latest development version of pygrametl can be downloaded from the official GitHub repository. The project currently uses Git for version control, so the repository can be cloned using the following command.

# Clone the pygrametl repository from GitHub $ git clone

Before Python can import the modules, the pygrametl package must be added to sys.path. This can be done manually in your Python programs, by setting PYTHONPATH if CPython is used, or by setting JYTHONPATH if Jython is used. More information about how CPython and Jython locate modules can be found in the two links provided.

4

Chapter 1. Getting started

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download