Enhance Outbox: Add Fan-In/Join Support Discussion
Introduction
This document proposes extending the outbox framework to include fan-in / join semantics, enhancing its existing fan-out capabilities. The primary goal is to enable the execution of a "join" step after a set of independent outbox messages (often initiated via fan-out) have completed or failed based on predefined rules. This enhancement should operate without requiring the caller to be blocked or tied to a specific hosting model. Whether messages originate from fan-out processes or are enqueued individually, the system should seamlessly support this new functionality.
High-Level Requirements
1. Join Concept
At the core of this extension is the introduction of a first-class âjoinâ (or âjobâ) concept. This join represents a group of related outbox messages that need to be tracked collectively. The join concept is essential for coordinating and managing the execution flow of dependent tasks. Each join should maintain the following attributes:
- A stable
JoinId: This unique identifier ensures that the join can be consistently referenced and tracked throughout its lifecycle. The stability of theJoinIdis crucial for maintaining data integrity and facilitating reliable tracking. - Total expected steps: This attribute specifies the total number of steps or messages that the join is waiting for. It serves as a benchmark for determining when all associated tasks have been completed.
- Completed steps: This attribute tracks the number of steps or messages that have been successfully processed. It provides a real-time indication of the join's progress.
- Failed steps: This attribute records the number of steps or messages that have encountered errors or failed during processing. Monitoring failed steps is important for identifying potential issues and triggering appropriate error handling mechanisms.
- Status: This attribute indicates the current state of the join, which can be one of the following:
Pending,Completed,Failed,Cancelled, etc. The status provides a high-level overview of the join's lifecycle stage.
It is imperative that the join lifecycle is idempotent and resilient to retries. This means that regardless of how many times a particular operation is executed, the end result remains the same, ensuring data consistency and reliability. The system should be designed to handle retries gracefully without causing unintended side effects or data corruption.
2. Attach Outbox Messages to a Join
It must be possible to associate one or more outbox messages with one or more joins. A critical requirement is that we must not assume a 1:1 relationship between an outbox message and a join. This flexibility is necessary because:
- A single outbox message may participate in multiple joins: For example, multiple downstream workflows may be waiting on the same work item. Allowing a single message to be part of multiple joins enables complex orchestration scenarios and promotes reusability.
The association between outbox messages and joins should be managed through the outbox abstraction, rather than by directly manipulating tables. This approach ensures that the relationships are maintained consistently and that any necessary validations or business rules are enforced. By using the outbox abstraction, we can encapsulate the complexity of managing these associations and provide a clean, intuitive interface for developers.
3. Fan-In Handler / Orchestration
We need a standardized mechanism for enqueuing a âfan-inâ message that waits on a given join. The fan-in handler should perform the following steps:
- Load the join state: Retrieve the current status and details of the join from the data store. This information is essential for determining whether all required steps have been completed.
- If not all steps are finished yet, Abandon the message: If there are still pending steps, the handler should abandon the message, causing it to be retried later. This ensures that the handler does not proceed prematurely and that all dependencies are met before further processing.
- Once all steps are finished, mark the join as
Completed/Failed: After verifying that all steps have been completed (or failed according to the configured rules), the handler should update the join's status accordingly. This reflects the final state of the join and triggers any subsequent actions. - Enqueue follow-up work: Initiate any configured follow-up tasks, such as starting an ETL transform. This allows the fan-in handler to act as a trigger for downstream processes, enabling seamless integration with other parts of the system.
This approach should leverage existing outbox semantics (retry/backoff, Ack/Abandon) instead of relying on a separate scheduler. By utilizing the existing outbox infrastructure, we can minimize the complexity of the implementation and ensure consistency with other outbox-based processes.
4. Backwards Compatibility
It is essential that existing outbox usage continues to function without any modifications. The introduction of fan-in / join support should not break or alter the behavior of existing applications that rely on the outbox framework. Furthermore, existing outbox tables and APIs should remain valid. The join support can be added as an optional extension, allowing developers to adopt it incrementally without disrupting their existing workflows.
Proposed Design (Initial Sketch)
1. Schema Changes
To support the fan-in / join functionality, we propose introducing the following new tables:
OutboxJoin:JoinId(GUID or bigint; primary key): A unique identifier for the join.PayeWaiveTenantId:ExpectedSteps(int): The total number of expected steps in the join.CompletedSteps(int): The number of completed steps.FailedSteps(int): The number of failed steps.Status(tinyint or smallint): The current status of the join (Pending,Completed,Failed, etc.).CreatedUtc:LastUpdatedUtc:Metadata(JSON or similar, optional): Additional information about the join, such as its type or description.
OutboxJoinMember:JoinId(FK âOutboxJoin): A foreign key referencing theOutboxJointable.OutboxMessageId(FK âOutbox): A foreign key referencing theOutboxtable.- Composite PK on (
JoinId,OutboxMessageId): A composite primary key consisting of theJoinIdandOutboxMessageId.
This schema design enables:
- Tracking multiple messages within a single join.
- Referencing a single outbox message by multiple joins.
Note: This design avoids adding a
JoinIdcolumn to theOutboxtable, while still allowing queries for âall messages for a joinâ and âall joins for a messageâ. TheOutboxJoinMembertable acts as a many-to-many link between joins and outbox messages.
2. Outbox API Extensions
We propose extending the outbox abstraction with the following methods:
StartJoinAsync(JoinDescriptor descriptor):- Creates an
OutboxJoinrow withExpectedStepsand metadata.
- Creates an
AttachMessageToJoinAsync(JoinId joinId, OutboxMessageId messageId):- Inserts
OutboxJoinMemberrow(s) to associate an outbox message with a join.
- Inserts
- Helper methods (optional):
- A method to start a join, enqueue N messages, and attach them in a single call.
- A method to increment the completed/failed counters when a step finishes successfully/unsuccessfully.
Handlers that participate in a join should be able to:
- Call a helper method (e.g.,
ReportStepCompletedAsync(joinId, messageId)) to update theCompletedSteps(and/orFailedSteps) in theOutboxJointable. This method should be idempotent to handle potential duplicate calls.
3. Fan-In Handler
Define a standard topic/handler pattern, such as:
- Topic:
join.wait(or similar), with a payload containing:JoinId: The ID of the join to wait for.- Optional behavior flags (e.g., âfail if any step failedâ, âignore failed stepsâ).
Handler behavior:
- Load the
OutboxJoinfor the givenJoinId. - If the join is already
Completed/Failed,Ackand exit (idempotent). - If
CompletedSteps + FailedSteps < ExpectedSteps,Abandonthe message to retry it later according to the standard outbox backoff policy. - If all steps are finished:
- Set the join
Statusappropriately. - Enqueue any configured âon-completeâ messages (e.g., `
- Set the join