Posts

Showing posts from October, 2018

concurrent data pull

concurrent data pull Sometimes you need to pull data concurrently from a database if the database is throttled or there are other parts of the system that constraining extract performance. For example, if you need to pull multiple partitions to upload to the cloud but the on-premise system is slow on the extract, concurrent extracts may be helpful. The below python script uses the multiprocessing module (vs threading) to handle multiple extracts at once based on a partition key you identify in the table to be extracted. The partition value should be a string or number, or anything that can be converted to a string that can be used in a filename. The script is not sophisticated and does not really support restart well but it does have some support for restart in case your extract is interrupted. Enhance as you see fit! #!/usr/bin/env python import os import argparse import datetime from sqlalchemy import create_engine from multiprocessing import Pool impo