I built an E(T)L-pipeline in Python to transfer data from MS SQL Server to PostgreSQL

… and you should, too!

Credit: Author

Introduction

Motivation

The setup

Structure of this article

1. Python code

Python libraries and how to connect to the databases

expression = 'WORLD'
print(f"Hello, {expression.lower()}!")
>>> Hello, world!

How to get table and schema information in the source database

{'AdventureWorksDWBuildVersion': 'dbo',
'DatabaseLog': 'dbo',
'DimAccount': 'dbo',
...}

How to load all information from the source to the target database

SELECT "DateKey"
, "FullDateAlternateKey"
, "DayNumberOfWeek"
FROM "dbo"."DimDate"
table_split = [t for t in re.split("([A-Z][^A-Z]*)", table_name) if t]>>> ['Dim', 'Date']
table_split = '_'.join(table_split)>>> 'Dim_Date'
table_split = table_split.lower()>>> 'dim_date'
DateKey  | FullDateAlternateKey | DayNumberOfWeek | ...
---------+----------------------+-----------------+----
20050101 | 2005-01-01 | 7 | ...
20050102 | 2005-01-02 | 1 | ...
20050103 | 2005-01-03 | 2 | ...
20050104 | 2005-01-04 | 3 | ...
20050105 | 2005-01-05 | 4 | ...
... | ... | ... | ...

2. Encoding pitfalls (WIN1252 and UTF-8)

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-61: character maps to <undefined>
                         List of databases
Name | Owner | Encoding | Collate
----------------------+----------+----------+--------------------
adventureworksdw2019 | Chris So | WIN1252 | German_Germany.1252
postgres | Chris So | WIN1252 | German_Germany.1252
template0 | Chris So | WIN1252 | German_Germany.1252
template1 | Chris So | WIN1252 | German_Germany.1252
engine = create_engine(DATABASE_URI, encoding=”cp1252")
UPDATE pg_database
SET ENCODING = pg_char_to_encoding('UTF8')
WHERE datname = 'adventureworksdw2019';

3. Critical appraisal

Scalability

No suitable error-handling

Just for ad-hoc-loading

The mini-transformation isn’t that great

4. (Presumably) Frequently asked questions

Why didn’t you include real transformation?

Why did you not use a scheduling-tool like Airflow or Luigi?

The code could use some improvements, right?

There is already an AdventureWorks database for Postgres (found here), why did you not use that?

5. Conclusion

Data Dude and SQL enthusiast. Occasionally does Pixel art◾

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store