Skip to main content

Configure Python workers

The workers are responsible for the actual data processing in the data pipeline. These settings can help you deploy new worker nodes. Worker nodes are clustered into worker groups, and these worker groups can be assigned to specific pipelines or even a specific step in a pipeline.

Environments

Workers and their associated worker groups are specific to an environment. Only pipelines in that environment can be run on worker groups in the same environment. The isolation of environments is an important concept in Dataplane to assist data operations in segregating access, projects and compute resources.

Environment variables common across Dataplane and python workers

Environment variableDescription
secret_db_hostHost of the Postgresql database
secret_db_userUser for connection to Postgresql database
secret_db_pwdPassword for connection to Postgresql database
secret_db_sslOne of disable, allow, prefer, require, verify-ca, verify-full - https://www.postgresql.org/docs/current/libpq-ssl.html
secret_db_portDatabase port
secret_db_databaseDatabase name, default dataplane
secret_jwt_secretGenerate a UUID secret for JWT. It is important that you keep this secret safe. To create a secret, you can use an online generator for example https://www.uuidgenerator.net/
secret_encryption_keyGenerate a 32 charater long random password. It is important you keep this password safe. You can use an online generator for example https://www.lastpass.com/features/password-generator

Environment variables specifc to workers

Environment variableOptionsExampleDescription
DP_NATSnats://nats:4222, nats://nats-r_1:4222Connection string to NATS
DP_DEBUG"true", "false"falsePrint debug logs to console. Recommended to turn off in production.
DP_DB_DEBUG"true", "false"falsePrint database debug logs to console.
DP_MQ_DEBUG"true", "false"falsePrint message queue debug logs to console.
DP_METRIC_DEBUG"true", "false"falsePrint CPU and memory metrics debug logs to console.
DP_WORKER_HEARTBEAT_SECONDS1The interval in seconds that the worker sends a heart beat to the main app.
DP_WORKER_GROUPpython_1The worker group is the collection of worker nodes that have the same configuration. For example, a python worker group that runs the python scripts in the pipeline.
DP_WORKER_CMD/bin/bashThe shell command installed on the linux. This is useful for different linux installations.
DP_WORKER_TYPE"container", "other"containerThe worker type is for CPU and memory metrics collection. This can differ between a containerised or bare metal installation. If unsure, recommended to keep it to "other".
DP_WORKER_LB"roundrobin"roundrobinThe load balancer strategy is how analytical workloads are distributed to worker nodes.
DP_WORKER_ENVDevelopmentThis is the name of the environment the worker node belongs to. This must match environments set inside the main app.
DP_WORKER_PORT9005The port that the worker node runs on.