Skip to content

Testing a Processing Step

While it is possible to invoke a ProcessingStep from the command line, ProcessingSteps can also be called from within Python for local unit testing. This test interface is currently under development, and is subject to change.

To ensure robust ProcessingSteps and minimize failures in the PineXQ cloud, we recommend a three-tier testing strategy. This approach moves from fast, granular tests to full-scale integration.

Note: Good testing improves the developer experience by providing rapid feedback or test driven development. This enables faster development cycles, allowing teams to ship features and refactor with confidence.

Goal: Verify the core logic of your functions, readers, and writers in isolation.

Scope: Single functions or modules, small mocked data (but with multiple variations).

In this tier, you test the “business logic” inside your code without the full ProCon execution. If external systems are called consider mocking them for fast feedback, no external dependency and control on the responses.

What to test:

  • Transformation Logic: Test the pure Python functions that process data.
  • Readers & Writers: Verify that your code can correctly parse input formats and generate valid output formats using small, in-memory data snippets.
  • Edge Cases: Test how your functions handle empty inputs, malformed data, or boundary values.
  • Parametrization: Ensure parameters and input by users is valid and inside the expected ranges.

Goal: Verify that your ProcessingStep works as a cohesive unit within the ProCon framework on your local machine.

Scope: The full step execution lifecycle, local file system, realistic (and potentially large) datasets.

This tier simulates the ProCon execution environment on your local machine. It ensures that your parameters and DataSlot configurations are correct before you ever upload to the cloud.

What to test:

  • Step Execution: Run the full ProcessingStep using the local test, integrating the reader and writers and function execution
  • Performance: Use realistic data sizes (Megabytes to Gigabytes) to catch memory issues or timeouts that unit tests miss.
  • Real data: Use a curated set of realistic Data files stored locally. These should represent real-world scenarios.

In order to test a ProcessingStep, we must create an instance of our ProcessingStep, and invoke a function by name. Function parameters must be passed in as a dict; DataSlots are passed in via create_dataslot_description(), which takes a dict from argument names to lists of filenames. Note that output DataSlots must also be specified in the same way as input DataSlots. Result DataSlots are a special case: while they do not need to be specified, you will likely wish to test that the results serialize correctly. The example below uses a pytest fixture (not shown) to generate a temporary output file for the test.

test_cloud_processing.py
import os
from conftest import create_tmp_file
from cloud_step import CloudProcessingStep, CloudRegions
from models import Region
from pinexq.procon.dataslots import create_dataslot_description
from pinexq.procon.dataslots.annotation import RETURN_SLOT_NAME #=="__returns__"
from pinexq.procon.step import ExecutionContext
def test_get_job_results(create_tmp_file):
"""
Test retrieving jobs from Cloud Service.
"""
worker = CloudProcessingStep(use_cli=False) # Suppress pop-up
tmp_output_file = create_tmp_file()
result = worker._call(
ExecutionContext(
function_name="get_task_results",
input_dataslots=create_dataslot_description(
{"license_file": [os.environ["CLOUD_LICENSE_PATH"]]} # load license file from environmental variable-specified path
),
parameters={
"region": Region(CloudRegions.EU_WEST_2),
"backend_name": "cb1",
"job_id": "e01200e5-c000-4003-9000-c0abb7000c52"
},
output_dataslots=create_dataslot_description(
{RETURN_SLOT_NAME: [str(tmp_output_file)]} # write result to temp file
)
)
)
# "result" is function return value if return DataSlot is not specified. Otherwise...
assert result is None
result_json = tmp_output_file.read_text()
assert result_json is not None

Some ProcessingSteps may require API keys in order to test properly, especially those accessing external resources. We strongly recommend you do not include these API keys in your source code; see the discussion here for alternatives.

In certain cases, it may be necessary to mock various methods your ProcessingStep calls, especially if those methods depend on being called by the JMA. For example, if your function acquires a Client from its step context, you will need to supply a Client during testing. An example is shown below, using pytest-mock to mock getting the Client, and creating the Client itself with a fixture.

from main import UploadStep
from pinexq.procon.step import ExecutionContext
def test_sync_step(client, mocker):
"""
Test a step that uploads workdata.
Mocks get_client() with mocker, supplying an API client via fixture.
"""
step = UploadStep(use_cli=False)
# Note: patch where function is looked up, not where it's defined.
# See e.g. https://docs.python.org/3/library/unittest.mock.html#where-to-patch.
mocker.patch("main.get_client", return_value=client)
step._call(
ExecutionContext(
function_name="sync_files"
)
)

Tier 3: Testing and debugging on a remote JMA

Section titled “Tier 3: Testing and debugging on a remote JMA”

Goal: Verify integration with the PineXQ cloud and interaction with other steps.

Scope: Live environment, local ProCon execution.

This is the final validation. Even if a step works locally, environment differences or integration issues with the JMA or other ProcessingSteps in a Workflow can occur. Also debugging a ProcessingStep which is already deployed is possible.

What to test:

  • Cloud Integration: Verify that the step can fetch WorkData from previous steps and pass results to subsequent steps in a workflow.
  • Environment Specifics: Check for issues related to the specific deployment like usable resources, timing issues and integartion with the JMA.

In more complicated cases, it may be necessary to debug your worker while it processes a job on our platform. By authenticating with a remote-enabled API key, and setting environment variables and launch arguments in your IDE, it is possible to debug a worker in remote mode; you can then step through your code, line by line, and watch the stack as it executes. For more information, see Using ProCon from CLI.

Note that, in order to ensure that your worker receives a particular job, it may be necessary to scale down deployment of workers implementing the same function; alternatively, you may wish to register a new pre-release version for testing.