Published on

Utilising the Outbox Pattern with Typescript

The outbox pattern is a powerful way of handling transactions in a distributed system. In this post I'll go over an example problem and then how we can solve it by utilizing the outbox pattern.

Note: While the examples in this post are in typescript and postgres, the pattern is not langauge or technology specific.

Example overview

Let us imagine we are running a logistics company with a monolithic node backend connected to a postgres database.

Processing a shipping request

Our main flow for this system consists of:

  1. User payment is processed
  2. Shipping infomation is processed

The business also does not want to end up in a situation where a payment is successfully stored but the shipping info is not as this will result in angry customers.

Original Example

async function processOrder(paymentInfo: PaymentInfo, shippingInfo: ShippingInfo) {
  // acquire a persistent connection from the database
  const connection = await acquireConnection();
  try {
    // start a transaction so that the operations can be atomic
    await connection.query('BEGIN')
    // pass the connection through so the functions utilize the same transaction
    await processPayment(paymentInfo, connection);
    await lodgeShipment(shippingInfo, connection);
    await connection.query('COMMIT')
  catch (e) {
    await connection.query('ROLLBACK');
  } finally {
    connection.release();
  }
}

As shown above, this is straightforward when storing information into the same database as we can lean on traditional transactions.

Splitting the monolith

Fast forward and we've grown and so we need to rethink our tech strategy. Our architects have informed us that we need to "split the monolith" which means that the service that deals with shipping will no longer handle payments.

Instead we will make an api call from our payment service to the shipping service so it can handle the shipping detail. Upon doing this we notice a problem which is illustrated below.

// Store the data to the database
await processPayment(paymentInfo, connection)
// Make an api call
await sendDataToShippingService(shippingInfo)

As we are making a call to a different service we can't be sure that the call to shipping service will be successful (If the call times out was it succesfully recorded or not?). There is no retry logic with this approach and api calls within a transaction like this can add a lot of latency to our main user flow as well as cause issues when we try to horizontally scale.

diagram showing the payment service storing data in the database and making an api call to the shipping service

Enter the Outbox Pattern

Instead of making an api call we will add a seperate row to an "outbox" table within the Payment database, this indicates that this information needs to be handled by a seperate process.

diagram showing the payment service storing data in the database and in the outbox and then a process manager reading the outbox and sending to the shipping service

Outbox Pattern Implementation

// shipment-service application
async function processOrder(paymentInfo: PaymentInfo, shippingInfo: ShippingInfo) {
  // acquire a persistent connection from the database
  const connection = await acquireConnection()
  try {
    // start a transaction so that the operations can be atomic
    await connection.query('BEGIN')
    // pass the connection through so the functions utilize the same transaction
    await processPayment(paymentInfo, connection)
    // store the shipment information in the outbox table to be processed later
    await storeShipmentToOutbox(shippingInfo, connection)
    await connection.query('COMMIT')
  } catch (e) {
    await connection.query('ROLLBACK')
  } finally {
    connection.release()
  }
}

The service code actually looks almost identical to our original solution, but under the hood we are storing in an outbox table rather than a shipment table. From a technical point of view you may view these as the same thing however there is a big difference which is that the shipment service is the source of truth for all thing relating to the shipments domain. The purpose of the outbox is just to be temporary storage while the order is being processed.

Process Managers

Process managers are not the focus of this post so just know that there is a lot to research about them if you haven't encountered them already (I may write a post on them in the future as the online resources are pretty thin). For our use-case we simply need an application that can:

  1. Read shipment details from the outbox
  2. Send this info to the shipment service
// process-manager application
async function processOutbox(dbconnection: PoolClient) {
  // Read from latest bookmark
  const connection = await acquireConnection()
  try {
    // start a transaction so that the operations can be atomic
    await connection.query('BEGIN')
    // get the latest bookmark of the table
    const bookmark = await getBookmarkForOutbox(connection)
    // get the shipment information from the outbox table
    const shipmentInfo = await getShipmentInformationFromOutbox(connection, bookmark)
    // send via an api call to the shipment service
    await sendShipmentInfo(shipmentInfo)
    // increment the bookmark by 1
    await updateBookmark(bookmark + 1)
    await connection.query('COMMIT')
  } catch (e) {
    await connection.query('ROLLBACK')
  } finally {
    connection.release()
  }
}

In the above approach I've utilized a very naive bookmark strategy where we store the latest processed row in a bookmark table and increment by 1 after processing. Unlike our payment service example, we would run this continuously in a single standalone node application. This helps to negate some of the drawbacks of having an open transaction during an api call, as this flow is not part of our main user experience and does not put as much pressure on the database since there is only one instnace running at a time.

Here is an overview of a robust way to run functions like this one.

Advantages

  1. We have succesfully create a distributed way of dealing with our orders that has created a seperation of concerns.
  2. The system can still function if the shipment service is down.
  3. We've created a way that we can throttle requests to the shipment service during peak loads (by limiting the speed of the process manager).
  4. We only process the shipment information if the payment is successfully stored.

Disadvantages

  1. We have created extra complexity by introducing a new process into our system.
  2. There is now a delay between payment and shipment information being stored, this is something that you will need to discuss with stakeholders in order to determine what it reasonable for your system.
  3. The api call in the process manager could still fail, unlike in the original example however, the process will be retried (this also adds a constraint of idempotency to that api endpoint).

Conclusion

Distributed transaction can be tough, in this post I've outlined a potential way to deal with them that I hope is useful to some people. There is no silver bullet so here are some similar solutions and resources that you can learn more from.

If you want a deeper dive I highly recommend Designing Data Intensive Applications by Martin Kleppmann