One of the great strengths of Python is the wide range of available tools and libraries.
Here are some recommended libraries that the Dataplane team often use:
- Pandas Dataframes is a powerful framework to join, transform and analyze data in memory.
- SQLAlchemy is a SQL client with connections to many different databases.
- Requests is a HTTP framework for connecting and consuming APIs.
- Boto3 is a AWS framework that we often use for storing data in S3 Compatible storage.
- Redis is a Redis client and is useful for temporarily storing data in memory and to cache data models.
- Tensorflow is a deep learning framework for machine learning or AI.
- RPA Framework is a Robotics Process Automation (RPA) framework for automating repetitive tasks for example downloading daily reports from the company's ERP platform.
You can find more Python packages at https://pypi.org/
At Dataplane, we noticed how much code is required to do simple operations such as store a file or transfer data between pipeline steps. We have developed a Dataplane Python package to reduce the amount of code and make it much easier for you.
Below is a step by step guide on how to use python libraries in Dataplane.
Go to the Python code editor
To install python packages, open the code editor and click on the three dots on the processor then click on Code.
This will open up the code editor for that step in the pipeline.
Install a Python package
- Click Edit in the Python packages section
- Update the packages with these pip packages
- Click the Install button