Engineering a Low-Latency Smart Order Router

Intended audience: Senior, staff, and principal engineers building trading infrastructure, low-latency systems, or distributed deterministic platforms; quants and execution-strategy engineers who own SOR or routing code in production; trading-system architects evaluating whether their platform’s execution layer is actually doing the work an SOR claims to do. Working familiarity with the JVM, FIX, electronic markets, and the NitroJEx architecture from the prior article is assumed.

Reference implementation: NitroJEx V14 introduces SmartOrderRoutingExecution as a first-class execution strategy on top of the existing parent-intent / execution-engine layer. The full source — specifications, implementation plans, migration notes, release-evidence bundles, and platform code — lives at github.com/rueishi/nitroj-exchange.

A buy intent arrives. Quantity: 12.5 BTC. Limit price: $42,150. Coinbase shows 7 BTC offered at $42,148. Binance shows 4 BTC at $42,150 with a 5 bps maker rebate. Kraken shows 5 BTC at $42,151, but its FIX session has been flapping for the last hour. OKX shows 3 BTC at $42,149, but post-fill mark-out on its recent children has been negative. Where does the order go, in what size, in what sequence? When the first child fills only partially and the rest of the book moves through your limit, what happens next? When a venue drops mid-route?

A Smart Order Router is the layer that answers these questions, in microseconds, deterministically, on every parent intent the trading strategy emits. The hard part is not the algorithm. The hard part is making the algorithm correct, fast, and replay-stable while every input it consumes is incomplete, noisy, or stale by the time it acts on it.

Why this article exists

The previous article ended on a deliberate boundary. It described the prior architecture honestly: a deterministic, single-writer cluster with a clean hot/cold split, four orthogonal plugin axes (protocol, venue, trading strategy, execution strategy), market-making and arbitrage trading strategies, three execution strategies (ImmediateLimitExecution, PostOnlyQuoteExecution, MultiLegContingentExecution), and an explicit roadmap for everything else, including TWAP, VWAP, POV, iceberg, and Smart Order Routing.

V14 closes one item on that roadmap. It does not close all of them, and it deliberately does not try. SOR is large enough as a problem domain to deserve its own release line, its own evidence bundle, its own benchmark surface, and its own architectural treatment. Treating it as “one more execution strategy slotted in next to the existing three” understates what changes when you introduce a router that actively decides which venues see which slices of a parent intent.

There is also a more honest reason. Smart Order Routing is among the most over-marketed and under-explained layers of a trading stack. Vendors describe their SOR as “AI-driven”, “best-execution-aware”, “latency-optimized”, or “adaptive” without distinguishing between what the algorithm actually does on the hot path, what the cold-path machinery contributes to it, and what the system literally cannot do under its current architecture. The result is a category of software where buyers assume capabilities the implementations do not have, and where engineers building competing systems inherit confused mental models from the marketing copy. This article tries to fix that for the readership most likely to care.

The structure mirrors the previous article. We start by being precise about what an SOR is and is not. We walk through the trade lifecycle and business model around it. We unpack the algorithm in enough detail that the implementation choices become visible. We integrate that algorithm into the existing NitroJEx substrate without violating the established hot-path, deterministic, replay-correct, evidence-gated discipline. We close with what V14 explicitly does not do, and what the evidence behind the V14 release actually proves.

Recap: NitroJEx invariants this article assumes

The previous article established the architectural substrate the SOR plugs into. The relevant invariants, in compressed form:

Single-writer deterministic cluster. All cluster-side state evolves on one thread, in deterministic order. Same input stream, same output stream, byte-identical.

Hot/cold path split. Hot paths are allocation-free, bounded-latency, and benchmark-gated. Cold paths are allowed to allocate, fail, retry, and recompute.

Four orthogonal plugin axes. Protocol, venue, trading strategy, and execution strategy extend independently. Each new feature touches exactly one.

Three state layers. Public market state, own-order state, and parent-intent state each have one authoritative source. Inter-layer attribution is a primitive field, never an auxiliary map.

Self-liquidity awareness. The platform routes against ExternalLiquidityView (gross consolidated depth minus the firm’s own resting orders), not the raw consolidated book.

Three-layer evidence discipline. ArchUnit catches forbidden patterns at build time; JMH benchmarks prove zero-allocation and latency budgets at test time; the preflight gate composes both into a single release-blocking check.

If any of these are unfamiliar, the previous article covers them in depth. Everything below assumes them.

1. What a Smart Order Router actually is

A Smart Order Router is a software component that takes a single order — a parent order expressing intent like “buy 12.5 BTC at no worse than $42,150” — and decides, in real time, how to break that order into one or more child orders directed at specific trading venues, at specific prices, at specific sizes, in a specific sequence, and re-decides those choices as the market changes and as fills, partial fills, and rejects come back from the venues.

BEFORE SOR: ORDER ROUTING WAS CHAOS — a gamified anime illustration of a chaotic trading floor where panicked traders shout into phones with speech bubbles like 'BUY 5K BTC MARKET NOW!!!', 'SELL ETH!!! WHERE'S THE BEST BID?!', 'CALL OKX! CHECK BINANCE DEPTH!!!', surrounded by RPG-style UI overlays: a TRADER LV. 0 status bar reading 'OVERWHELMED', an HP (Sanity) gauge at 12/100, a Problems Unlocked checklist, a Quest Objective panel titled 'WHY SOR EXISTS', and a glowing 'LEVEL UP TO SOR' button. Caption: SOR turns chaos into edge.

The definition is short and the implications are not. Several things are deliberately bundled into “decides”:

Which venues to trade on. A serious SOR works against a configured universe of venues, not all of them at once. Venues differ in price, depth, fees, latency, fill probability, reliability, and market structure. The router selects subsets of that universe per parent order, and the selection itself is part of the routing decision.
What size to send to each venue. Parent quantity must be allocated across venues in a way that respects executable depth (not displayed depth), fee/rebate structure, expected fill probability, and the firm’s appetite for partial fills versus market impact.
What price and order type to use at each venue. A child order is not just a price and size; it is a price, size, time-in-force, post-only flag, hidden flag, minimum-fill flag, self-trade-prevention flag, and several venue-specific extensions. The router chooses these per venue, per leg.
What to do when a leg doesn’t behave. A child order can be rejected, partially filled, queued behind a moving market, or filled at a price that has already moved away. The router must observe each of these outcomes and reroute the unfilled remainder, within latency budgets that are tight enough that the reroute has to be deterministic and bounded.
When to stop. Every parent order has terminal conditions: completed, expired, canceled by the trading strategy, killed by risk, killed by the kill switch. The router owns the lifecycle from intent to terminal state.

Two boundaries are worth drawing immediately, because the rest of the article depends on them.

An SOR is not an execution algorithm in the TWAP/VWAP/POV sense. Time-weighted, volume-weighted, and percent-of-volume algorithms answer the question when to trade. SOR answers the question where and how much to trade, given a decision to trade now. The two layers compose: a TWAP algorithm slices a parent into time-bucketed child intents, and an SOR routes each time bucket’s child intent across venues. The architectural mistake is to conflate them by writing a “TWAP-with-routing” class that owns both the time-slicing and the venue allocation. That class is doing two jobs and cannot be replaced cleanly by any single component when the requirements for either job evolve. The clean architecture treats them as orthogonal: time-based execution algorithms emit per-bucket child intents to the SOR; the SOR routes them.

An SOR is not a trading strategy. A trading strategy decides whether and how much to trade. It looks at fair value, inventory, signals, hedging needs, arbitrage edges, and risk capacity. Its output is a parent intent: a declarative statement of the position desired, not an instruction about which venue to send it to. The SOR consumes parent intents and converts them to venue-specific child orders. This separation is the existing trading-strategy / execution-strategy split applied honestly. The same trading strategy can run with different execution strategies (and therefore different routing behaviors) by configuration, and the same SOR can serve different trading strategies without modification.

Key Principle: Smart Order Routing is execution, not strategy and not scheduling. It decides where and how a parent order is worked across venues, given that the trading strategy already decided what to trade and the execution algorithm already decided when to trade.

Everything that follows is a consequence of this definition.

2. Why SOR exists: market fragmentation and the cost of ignoring it

The institutional case for SOR exists because real markets are fragmented. The same instrument trades on multiple venues simultaneously, and at any given microsecond, those venues do not display the same price, the same depth, or the same fee structure. A buy order routed naively to a single venue is, with overwhelming probability, leaving execution quality on the table.

In US equities the fragmentation is structural: roughly sixteen registered exchanges, more than thirty alternative trading systems and dark pools, two large internalizers (Citadel Securities and Virtu Financial) that handle the bulk of retail flow, and a regulatory regime — Reg NMS, Rule 611 — that legally requires brokers to honor the best displayed price across all venues. Reg NMS did not create SOR; it codified what good brokers were already doing and required everyone else to catch up. MiFID II did the equivalent in Europe a decade later, with explicit best-execution obligations and post-trade transaction-cost-analysis requirements that gave routing decisions a paper trail.

In cryptocurrency spot markets the fragmentation is different but more severe. There is no Reg NMS, no consolidated tape, no obligation on any venue to honor any other venue’s price, and no protocol-level discipline that prevents two venues’ books from diverging by tens of basis points for sustained periods. Bitcoin trades simultaneously on Coinbase, Binance, Kraken, OKX, Bitstamp, Gemini, Bitfinex, and dozens of smaller venues, with material differences in price, depth, and fee structure at every moment. Ethereum, Solana, and the long tail of altcoins are even more fragmented because liquidity is thinner per venue and the marginal venue often has 5-15% of the total depth.

The economic consequence of fragmentation is straightforward. If the market for an instrument is fragmented across N venues, and the best price is not always on the same venue, then routing all your flow to a single venue costs you the difference between that venue’s average execution quality and the achievable execution quality across the consolidated market. For a high-frequency market maker quoting on a single venue, this manifests as adverse selection: faster competitors trade against your stale quote because they see the broader market and you don’t. For a buy-side trader executing a 1,000 BTC parent order, it manifests as slippage: the venue you chose runs out of liquidity at attractive prices well before the parent fills, while liquidity on other venues sits unused.

The case for SOR is the case for taking the consolidated market view seriously when you trade. Without an SOR, your trading system reasons about a single venue’s book and pays the cost of every venue you didn’t see. With an SOR, your trading system reasons about external executable liquidity across the venues it is connected to, and routes flow to wherever that liquidity actually lives.

One subtlety deserves emphasis here, because it is the same trap that defeats the naive arbitrage strategy in the previous article: your own quotes are part of the market data you are observing. An SOR that routes against gross consolidated depth, including its own resting orders, will route flow that crosses its own bids and asks. The previous article called this the own-liquidity trap and required arbitrage to use ExternalLiquidityView rather than the raw consolidated book. An SOR has the same requirement, doubled, because it makes the routing decision more often and with more aggressive child orders.

3. A short history that explains why SOR looks the way it does

The shape of modern SOR is not arbitrary. It is the product of three specific historical pressures, and the design choices a current implementation makes are inherited from the way those pressures resolved, either directly or by reaction.

The first was the proliferation of trading venues themselves. Through the 1980s and into the early 1990s, equity trading was dominated by a small number of primary exchanges. By the late 1990s, electronic communication networks (ECNs) and alternative trading systems began to capture meaningful share, and the broker-dealer infrastructure that had assumed a single-venue world had to be rebuilt to handle several. Direct Order Turnaround (DOT) systems gave way to early routing algorithms; the first instances of what would now be recognized as smart order routers appeared in the late 1990s in the US.

The second was regulation. The SEC’s Regulation ATS in 1998 codified the existence of alternative trading systems. Reg NMS in 2005-2007 introduced Rule 611, the Order Protection Rule, which forbade trades through better-priced quotes elsewhere. Brokers were now legally obligated to consider the consolidated market when routing customer orders, and demonstrating compliance required explicit routing logic with explicit logs. MiFID in Europe (2007) and MiFID II (2018) introduced equivalent best-execution obligations with post-trade reporting and transaction-cost-analysis requirements. These regulations did not invent SOR, but they made the absence of SOR a compliance risk, which is what made SOR universal.

The third was the arms race that followed. Once routing was a regulated activity, the marginal value of doing it faster and smarter was enormous. Latency arbitrage — exploiting the time gap between when a quote moves on one venue and when slower routers see it on the consolidated tape — became a major source of profit for HFT firms and a major source of cost for slower routers. The institutional response was a coordinated investment in lower latency: co-located servers, kernel-bypass NICs, FPGA-accelerated FIX parsers, hardware timestamps, microwave links between exchanges, AOT-compiled hot paths. The SOR became one of the lowest-latency-budget components in the trade lifecycle, because every microsecond of routing delay was a microsecond in which the market moved against the order being routed.

In cryptocurrency markets none of this regulatory machinery exists, but the economic pressure does, and it is in some ways more severe. There is no Order Protection Rule in crypto, but there is an active community of latency-arbitrage firms who will pick off any venue’s stale quote within milliseconds. There is no MiFID II transaction-cost reporting in crypto, but there is fierce competition among crypto OTC desks and prime brokers on transparent execution quality, which functions as a market-driven equivalent of best-execution obligations. The infrastructure differs (crypto venues use REST and WebSocket and FIX in inconsistent ways; latency to a Tokyo crypto exchange from a US co-location is dominated by physics, not protocol); the underlying problem and the shape of the solution are similar.

The result is that a serious crypto SOR in 2026 looks, architecturally, much like a serious equities SOR in 2010: multi-venue, low-latency, deterministic, fee-aware, fill-probability-aware, and integrated with risk and execution-report handling end to end. The historical pressures that produced that shape were different; the shape is similar because the problem is similar.

4. The trade lifecycle, with SOR placed honestly

To put SOR in its proper place, it helps to walk through the full lifecycle of an institutional order, from the trading desk forming an intent to the position being reflected in the books, and identify exactly where SOR participates and where it does not.

Consider a portfolio manager at a buy-side firm who has decided to acquire 500 BTC. The lifecycle of that decision, in a fully-electronic institutional setup, looks roughly like this:

The decision originates in the investment management system. The PM enters or programmatically generates a target position. Compliance and pre-trade risk run on the proposed order: position limits, concentration limits, regulatory restrictions, counterparty limits. Approved orders flow into the order management system (OMS), which assigns a parent order ID, persists the order, and prepares it for execution.

The OMS hands the order to an execution management system (EMS). The EMS is the layer where the trader, distinct from the PM, chooses how the order will be worked. They might select a TWAP over the next four hours, an implementation-shortfall algorithm with a 20% urgency setting, a VWAP for the day, or a manual approach where they discretionarily release child orders to the SOR. The execution algorithm slices the parent into child intents: smaller orders, each with its own size, its own price constraint, and its own time horizon.

Each child intent flows from the execution algorithm to the Smart Order Router. This is where SOR lives. The SOR takes the child intent and routes it across venues, producing one or more grandchild orders: venue-directed child orders with venue-specific price, size, type, and time-in-force.

Grandchild orders flow through pre-trade risk one more time at the execution layer (per-venue, per-instrument, per-strategy, aggregate). Approved orders are encoded into the venue’s wire format (FIX, in NitroJEx’s case) and submitted by the gateway.

The venue responds with execution reports: acknowledgments, fills, partial fills, rejects, and cancels. Each execution report flows back through the gateway, through normalization, into the cluster, and is applied to the relevant order state. Fills update the portfolio engine; partial fills produce a residual that the SOR observes and reroutes. Rejects produce a retry decision that the SOR makes within its latency budget. The SOR’s parent state aggregates fills across all the grandchildren until the parent is terminal.

Finally, the post-trade layer takes over: confirmations to counterparties, settlement instructions, allocation of fills across sub-accounts, position reconciliation, and transaction-cost analysis (TCA). TCA compares the realized execution to benchmarks (arrival price, VWAP, implementation shortfall) and produces metrics that flow back into the SOR’s calibration cold path: venue scores, fill probability estimates, fee/rebate tables, and latency observations.

In NitroJEx’s vocabulary, the trading strategy is the integrated decision-making layer that decides whether to trade (in the buy-side narrative above, this is roughly the PM + risk + execution algorithm + trader collapsed into one programmatic layer, because NitroJEx is a proprietary trading platform, not an agency broker). The execution strategy is the layer that owns the parent intent through to terminal state. V14 introduces SmartOrderRoutingExecution as a new execution strategy that does for multi-venue routing what MultiLegContingentExecution already does for two-leg contingent execution: own the parent lifecycle, own the child encoding, own the cancel/reroute decisions, own the partial-fill accounting, own the kill-switch escalation paths, and respect every architectural rule the existing platform imposes.

The architectural lesson generalizes: SOR sits between the execution algorithm and the gateway, and it is not the same component as either of them. Conflating SOR with execution algorithms produces the TWAP-with-routing class that cannot be replaced cleanly. Conflating SOR with the gateway produces the venue-specific routing class that cannot serve other venues without rewriting. The clean placement is the one V14 takes: a first-class execution strategy with the same plugin contract as the other three, plugged in by configuration, swappable per-strategy, observable through the same parent state machine.

SOR placement in the institutional trade lifecycle: pre-execution layers (IMS, OMS, EMS) feed into an execution algorithm that emits child intents to the SOR; the SOR handles the five-phase routing pipeline and submits grandchild orders through pre-trade risk and the gateway to multiple venues; execution reports flow back, and post-trade TCA feeds calibration into the SOR cold path.

5. The business model, the economic structure, and who pays for SOR

A frequent omission in technical writing on SOR is the economic structure that surrounds it. The architecture changes meaningfully depending on who pays for the routing, who benefits from it, and what the legal and contractual obligations look like. Several models exist; NitroJEx V14 sits squarely in one of them, and being honest about that affects the design choices.

The agency-broker model. A buy-side client routes orders through a broker-dealer that owes them best-execution obligations. The broker’s SOR is a service offered as part of the trading platform; revenue comes from commissions, sometimes from rebates the broker captures from venues, sometimes from payment-for-order-flow arrangements with internalizers. The broker’s SOR is optimized to deliver execution quality for the client measured against TCA benchmarks, because clients rotate to brokers who measurably do better. Architectural pressure: detailed audit trails, regulator-friendly logging, demonstrable fee transparency, configurable per-client routing rules, post-trade reporting infrastructure.

The internalizer / wholesaler model. A market-making firm (Citadel Securities, Virtu, the major prime brokers) takes retail and institutional flow, internalizes some of it (matches it against its own book) and routes the rest. Revenue comes from the spread captured on internalized flow plus rebates from venues for routed flow. The “SOR” here is partly a router and partly a matching engine; the routing decision includes “should I take this trade onto my own book first” before it includes “which venue should I send the residual to.”

The proprietary trading model. A firm trades its own capital, not customer orders. There is no client owed best execution; the firm is its own client. The SOR’s objective is purely to maximize the firm’s risk-adjusted PnL on the orders the firm’s strategies generate. There is no PFOF, no commission revenue, no audit obligation to a customer. The optimization function is straightforward: minimize the implementation shortfall against the strategy’s intended fair value, subject to inventory, latency, and fee constraints.

The exchange model. Some venues offer SOR-like routing to their participants, sweeping orders across their family of venues or across linked external venues. The economic structure here is complex (the venue is paid by participants for the service while also competing with the venues it routes to); the architectural pressure is on transparency and conflict-of-interest controls.

NitroJEx sits firmly in the third category. It is proprietary trading infrastructure, not a broker platform, and V14 is designed for that model. The implications are pervasive:

The optimization function in NitroJEx’s SOR is the firm’s own implementation shortfall against the trading strategy’s fair value, weighted for fees, fill probability, and the firm’s own self-liquidity. There is no client whose definition of “best execution” must be respected; the firm is the client. This simplifies some of the harder parts of an institutional SOR (no per-client routing rules, no client audit trails) and complicates others (the firm has no external benchmark to point at when defending its routing decisions; benchmarking is internal and adversarial).

There is no payment-for-order-flow, no rebate-capture incentive distinct from the firm’s own PnL, no internalization of “client” flow. The firm captures rebates because rebates affect realized fees on its own trades; it pays takers’ fees because takes are unavoidable on certain paths. Every basis point shows up directly in the firm’s P&L.

The audit obligations are internal: the firm wants reproducible routing decisions for post-mortem analysis, regulator interactions (crypto firms increasingly face venue and money-transmitter scrutiny), and risk-committee defense. NitroJEx’s deterministic-replay discipline gives the firm this for free; the SOR layer benefits from the same property without doing additional work.

The TCA loop is closed-loop within the firm. Realized fill quality, venue performance, and slippage statistics flow back into the SOR’s cold-path calibration without crossing organizational boundaries. The firm can iterate the venue-scoring model far more aggressively than an agency broker can, because there is no external client whose routing behavior must remain stable for compliance reasons.

The architectural lesson is that the right SOR is the SOR for your business model. Many of the design decisions in V14, particularly the absence of per-client rule layers, the absence of PFOF infrastructure, and the close coupling between SOR and the trading strategy’s fair-value model, are direct consequences of NitroJEx’s position as proprietary infrastructure. A firm building an agency-broker SOR would make different decisions. The architecture must follow the economics, not the other way around.

6. The shape of the SOR problem: liquidity is more complicated than it looks

Before designing the algorithm, we should be precise about the shape of the input space. The SOR is given a parent intent and a market view; it must produce a routing decision. What does that market view actually contain, and what does an honest SOR have to reckon with that a naive one ignores?

Displayed depth is not executable depth. The displayed L2 book on a venue says “1.2 BTC offered at $42,150.” The amount you can actually take at $42,150, right now, depends on whether some other faster participant takes it first; on whether the order is a hidden iceberg whose visible portion is much smaller than the total reserve; on whether the order is your own (in which case taking it is a self-trade, not an execution); on whether the venue applies last-look or kill-switch behavior that can yank the quote between your decision and your fill. The number on the screen is the upper bound of executable depth, not its actual value.

Fees and rebates are first-order, not second-order. A venue with the best displayed price but a 30 basis-point taker fee can be worse than a venue with a slightly worse price and a 5 basis-point fee. Maker rebates, when applicable, can flip the sign on routing decisions for passive child orders. A naive SOR that ignores fees and rebates is making decisions on the wrong objective function. NitroJEx’s venue scoring is fee-net throughout: the decision price is the price after applying the venue’s fee/rebate schedule for the relevant order side and order type, not the gross book price.

Fill probability matters more than fill price at the margin. Two venues showing the same price for the same size will not, in general, give the same fill outcome. One has a 95% historical hit rate at that price for a child of that size; the other has 60%. Routing all of the size to the second venue is a strictly worse decision than routing it all to the first, even though they look identical on the book. Modern SORs estimate fill probability empirically and weight venue selection accordingly.

Latency to each venue is part of the price. A child order routed to a venue 50 milliseconds away is making a routing decision against the venue’s 50ms-old book. By the time the child arrives, the book has moved. The SOR must account for the per-venue latency budget, including both the network round-trip and the venue’s match-engine processing time, when scoring routing options. A venue with a marginally better price but a much higher latency may be a worse choice.

Adverse selection and venue toxicity are real signals. Some venues have a higher proportion of informed flow than others. A child order that fills at a venue with high adverse-selection toxicity is, statistically, filling at a price that has just moved against the firm. Venue toxicity is observable through TCA on realized fills (the post-fill mark-out: how much did the price move against you in the next 50ms / 500ms / 5s after your fill?), and a serious SOR penalizes routing to toxic venues even when their displayed prices are best.

Information leakage is a routing decision. Every child order sent to a venue tells the rest of the market something. A large parent order routed in many small slices across many venues leaks more information than one routed to fewer venues in larger slices. Information leakage feeds back into adverse selection on subsequent slices. A serious SOR considers parent-level information leakage in its splitting decisions, not just leg-level fill quality.

Self-liquidity is part of the environment. As the previous article emphasized, the firm’s own resting orders are part of the market data the SOR consumes. The SOR must distinguish gross venue depth from external executable depth, and must explicitly avoid routing flow that crosses its own resting bids and offers. The point goes beyond self-trade prevention: a venue whose displayed depth is mostly the firm’s own quotes has very little external executable liquidity even if its book looks deep, so self-liquidity affects the routing decision itself.

These complications are not academic. They are the difference between an SOR that performs as advertised in production and one that produces routing decisions that look good on the displayed book and lose money on realized fills. Every one of them shapes the V14 hot-path algorithm.

The table below summarizes how a naive SOR and the V14 SOR handle each complication. The right column is what the rest of this article unpacks; the left column is what most “AI-powered best-execution” implementations actually do once you read the source.

Dimension	Naive SOR	V14 SOR
Liquidity view	Gross consolidated L2 (includes own resting orders)	`ExternalLiquidityView` — gross consolidated minus the firm’s own resting orders
Fee handling	Decision price = displayed book price	Decision price = book price + per-venue fee/rebate adjustment, applied per side and per order type
Fill probability	Hard-coded constant or simple EWMA over recent fills	Bayesian posterior bucketed by (venue, instrument, side, size, price-distance), refreshed in the cold path
Allocation	Single hard-coded mode	Three configurable modes: `GREEDY_FILL_BEST` (default), `PROPORTIONAL_FILL`, `WATER_FILLING`
Determinism	Hash-map iteration, wall-clock reads, thread-local randomness on the hot path	Primitive int-keyed maps, deterministic tie-breaking by `venueId`, no wall-clock reads on the hot path
Hot-path allocation	Per-event boxed types, String keys, ad-hoc lists	0 B/op after warmup; bounded preallocated buffers; flyweight views over reusable memory
Reroute on partial fill	Best-effort retry, unbounded budget	Invariant-driven; conservation-law check before each child; bounded reroute budget per parent per second
Race conditions	”It’s fast enough that races are rare”	Three explicit invariants force the same final state regardless of event ordering
Risk integration	Per-child risk only	Per-child risk + parent-level conservation check before submission, plus per-strategy reroute-rate limit
Replay correctness	Not tested	Routing-decision divergence test runs every recorded session twice with non-routing-affecting state diffs and asserts identical decisions
Release evidence	”We tested it”	Single evidence manifest gating the release tag: ArchUnit results, JMH JSON, test reports, configuration capture, and explicit hot-path coverage entries

The contrast is not that the V14 SOR is fancier. It is that the V14 SOR is honest about what each row of this table actually requires, and ships with the evidence that the right column is true. The next section walks through how it plugs into the existing NitroJEx substrate.

7. SOR as an execution strategy: where V14 plugs in

The existing architecture provides the substrate V14’s SOR plugs into. Recall the relevant pieces from the previous article:

Trading strategies (MarketMakingStrategy, ArbStrategy) consume normalized market views, decide what positions the firm wants, and emit declarative ParentOrderIntent SBE messages through StrategyContext.executionEngine().submit(...). They never see SBE encoders; they never own child-order lifecycle.

Execution strategies (ImmediateLimitExecution, PostOnlyQuoteExecution, MultiLegContingentExecution) consume parent intents and own everything below them: child-order encoding, cluster offers, cancel/replace lifecycle, leg sequencing, leg timeouts, partial-leg imbalance hedging, post-only retry behavior, parent state transitions, terminal reason emission.

The ParentOrderRegistry owns parent lifecycle (active count, state machine, fill aggregation, terminal reasons, snapshot/load, parent recovery evidence). OrderState.parentOrderId is the authoritative child-to-parent attribution field on the hot path. Every child order passes through pre-trade risk before reaching the gateway. The single-writer cluster thread sees every event in deterministic order; followers and replaying nodes reconstruct identical state.

V14 adds SmartOrderRoutingExecution to that execution-strategy axis. The plug-in shape is identical to the existing three:

public final class SmartOrderRoutingExecution implements ExecutionStrategy {

    @Override
    public void init(ExecutionStrategyContext ctx) {
        // resolve venue universe, scoring config, latency table, fee table,
        // fill-probability prior, capacity bounds; cold-path; one-time
    }

    @Override
    public void onParentIntent(ParentOrderIntentView intent) {
        // hot path: score venues, allocate parent qty, encode children,
        // route through risk, offer through Aeron egress
    }

    @Override
    public void onMarketDataTick(int venueId, int instrumentId, long tNanos) {
        // hot path: re-evaluate stale child quotes if any; reroute residuals
        // when scoring shifts cross configured thresholds
    }

    @Override
    public void onChildExecution(ChildExecutionView execution) {
        // hot path: update parent fill aggregation, decide reroute on partial,
        // observe venue performance for cold-path TCA feedback
    }

    @Override
    public void onTimer(long correlationId) {
        // hot path: handle child timeout (cancel + reroute residual)
    }

    @Override
    public void onCancel(long parentOrderId, byte reasonCode) {
        // hot path: cascade cancel to all live children; emit parent terminal
    }
}

This is structurally identical to the existing three execution strategies. The hot/cold split is the established split. The forbidden API list is the established forbidden API list. The risk integration is the established risk integration. The deterministic replay obligations are the established obligations. The benchmarking surfaces are the established surfaces with V14-specific benchmarks added. Nothing about being an SOR exempts the layer from any of these rules.

The configuration also follows the established pattern. A trading strategy instance pairs with an execution strategy by configuration; the default mapping for V14 introduces routing behavior without forcing it on existing strategies:

[[strategy]]
id = "arb-btc-multi-venue"
type = "Arb"
executionStrategy = "SmartOrderRouting"
executionStrategyConfig = "sor-default"

[[execution-strategy-config]]
id = "sor-default"
type = "SmartOrderRouting"
venueUniverse = ["COINBASE", "BINANCE", "KRAKEN"]   # roadmap; V14 ships COINBASE only
splitMode = "GREEDY_FILL_BEST"
maxChildrenPerParent = 4
childTimeoutMicros = 250000
rerouteOnPartial = true
maxRerouteAttempts = 2
feeAware = true
fillProbabilityWeight = 0.35
latencyWeight = 0.15

Two architectural decisions in this configuration deserve calling out, because they show how V14 stays consistent with the established discipline. First, the venue universe is configured, not discovered. The SOR does not dynamically discover venues at runtime; the venues it can route to are exactly those configured in venues.toml and exposed in this universe. Adding a new venue is a venue-plugin task plus a configuration update, not an SOR-class change. Second, the fill-probability and latency weights are explicit hyperparameters with defaults that come out of cold-path TCA. The hot path consumes them as primitive values; it does not compute them. This is the same hot/cold separation the rest of the system uses.

A pairing-validation step at startup ensures that incompatible trading-strategy / execution-strategy combinations fail before any market data is processed. SmartOrderRoutingExecution is compatible with single-instrument trading strategies whose parent intents express a directional take (buy or sell some quantity of one instrument). Without explicit configuration acknowledging the limitation, it is not compatible with MarketMakingStrategy, because market making requires per-venue post-only quoting whose lifecycle is owned by PostOnlyQuoteExecution. It is also not compatible, in V14, with multi-leg parents; multi-leg routing across venues is a V15+ concern that combines MultiLegContingentExecution with SOR per leg.

8. The hot-path SOR algorithm, in five phases

What does the SOR actually do when a parent intent arrives? V14’s algorithm runs in five phases on the cluster hot path. Each phase has a documented latency budget, a documented allocation budget, a documented determinism guarantee, and a documented failure path. We walk through each phase with the implementation choices that make it fit the existing substrate.

Phase 1: Routable liquidity construction

When onParentIntent fires, the first job is to construct the routable liquidity view: the set of (venue, price, size) tuples the parent can realistically execute against, sorted in the order the SOR will consider them.

The inputs are the consolidated L2 book, the per-venue L2 books, the OwnOrderOverlay, and the ExternalLiquidityView. The routable liquidity is built from ExternalLiquidityView, not from the raw consolidated book. This is the established own-liquidity rule, and it applies to SOR for exactly the same reasons it applies to arbitrage. Routing against gross consolidated depth would route flow that crosses the firm’s own resting orders.

The construction is allocation-free under the established capacity limits. The output is a bounded array of RoutableLevel flyweight views over a preallocated buffer. Each level carries (venueId, instrumentId, side, scaledPrice, externalSize, feeRateBps, latencyMicros, fillProbabilityScaled), all primitive longs and ints with no boxed types, no String-keyed maps, and no per-event allocation. The capacity limit is configured (maxRoutableLevels = 16 is a reasonable V14 default) and exceeded levels are truncated with a counter increment, never with a resize.

The sort order is by decision price, where decision price is the venue’s quoted price plus the venue’s effective fee (positive for takers, negative for makers eligible for rebates), expressed as a scaled long. This is the first place the algorithm differs visibly from a naive implementation. A naive SOR sorts by gross book price and applies fees at the end; V14 sorts by fee-adjusted price from the start, because that is what the optimization function actually depends on.

Latency budget for this phase: under 2 microseconds for the V14 default of three configured venues. Allocation budget: 0 B/op. Failure path: if ExternalLiquidityView is stale (no recent update for the parent’s instruments), the parent is rejected with reason MARKET_DATA_STALE, the trading strategy is notified through the parent terminal callback, and a counter increments. The SOR does not silently route on stale data.

Phase 2: Venue scoring

With the routable liquidity view constructed, the SOR scores each (venue, level) pair against the parent’s objective. The score is a composite of fee-adjusted decision price, fill probability, latency, and a small set of penalty terms (venue toxicity, recent reject rate, recent disconnect history). All of these are primitive scaled longs; the score itself is a primitive long.

The scoring function for a buy-side parent is:

score(venue, level) =
    -decisionPrice                                      # lower is better, so negate
    + fillProbabilityWeight * fillProbabilityScaled
    - latencyWeight * latencyPenalty(latencyMicros)
    - toxicityWeight * toxicityScore(venue)
    - rejectPenaltyWeight * recentRejectRate(venue)

The weights come from the configured execution-strategy parameters. The fill probability and toxicity scores come from the cold-path TCA layer, refreshed at low frequency (every few minutes is fine; the values are stable on shorter timescales) and exposed to the hot path as primitive values in a bounded primitive map. The recent reject rate is a hot-path counter the SOR maintains itself, decayed over a configured window.

Two implementation details matter. First, all of this is fixed-point arithmetic on scaled longs. There is no double, no BigDecimal, no Math.exp or Math.log on the hot path. Logistic-like behavior, where required, is approximated by piecewise-linear lookup tables with primitive int interpolation. Second, the scoring is deterministic: given the same routable liquidity view and the same cold-path state, two runs of the cluster (a leader and a follower, or the leader and a replay) produce the same scores in the same order. This is the established determinism rule, and it forbids any non-deterministic input from entering the score: wall-clock reads, thread-local randomness, and hash-map iteration order on object keys are all out of bounds.

Latency budget: under 1 microsecond per (venue, level) scored. Allocation budget: 0 B/op. Failure path: if a required cold-path input is missing (no fill probability prior loaded for this venue/instrument), the SOR uses a configured default with a counter increment. The parent is not rejected.

Venue scoring composition: five primitive scaled-long inputs (fee-net decision price, fill probability from a bucketed Bayesian posterior, per-venue latency, cold-path toxicity score, recent reject rate) feed into a deterministic hot-path scoring function whose output is a sorted array of primitive long scores per venue and level, with explicit tie-breaking by venueId.

Phase 3: Allocation

The allocation phase decides how the parent quantity is split across the scored venues. V14 ships three allocation modes, configurable per-execution-strategy-config:

GREEDY_FILL_BEST is the default. The SOR sorts the routable liquidity view by score (best first), and walks down the list, allocating to each level the minimum of the level’s external size and the parent’s residual quantity, until the parent is fully allocated or the list is exhausted. This is the simplest allocation, and for the latency budgets V14 targets, it is also usually the right one. The marginal value of more sophisticated allocations is small for parent sizes that fit within the top few levels, and the marginal cost is real (more time in the allocation phase, more child orders on the wire, more parent state to manage, more failure modes to test).

PROPORTIONAL_FILL is available for parents large enough to benefit from parallelism. The SOR allocates to the top k scored venues simultaneously, with size proportional to each venue’s external size at the parent’s price limit, weighted by fill probability. Total allocated size matches parent quantity; over-allocation (where the sum of allocations exceeds parent qty due to weighting) is truncated at the lowest-scored venue. This is more expensive computationally but well within the latency budget for V14’s bounded-venue case.

WATER_FILLING is the most aggressive allocation, suitable for parents whose price impact at any single venue would be material. The SOR computes the marginal-impact curve for each venue (price degradation as size increases) and allocates so that marginal expected impact is equalized across venues. This is the textbook optimal allocation for impact-minimizing routing, and it is the most expensive to compute. V14 includes it as an option but does not use it by default; the calibration data needed to make it perform better than PROPORTIONAL_FILL in practice is more demanding than V14 ships with.

The allocation phase produces a bounded array of ChildAllocation flyweight views: (venueId, instrumentId, side, scaledPrice, scaledSize, orderType, timeInForce). The array is preallocated; the views write into it in place. Capacity limit: maxChildrenPerParent, configured per execution-strategy-config, default 4.

Latency budget: under 5 microseconds for the default (greedy + 4 children). Allocation budget: 0 B/op. Failure path: if the routable liquidity view does not contain enough external size to fill the parent, the SOR routes what it can and emits a PARTIAL_ALLOCATION event. The unallocated residual remains on the parent’s leavesQty and is eligible for rerouting in Phase 5.

Phase 4: Encoding and submission

Each ChildAllocation becomes an SBE-encoded NewOrderCommand written to a reusable buffer, passed through pre-trade risk on the cluster thread, and offered through Aeron to the relevant venue’s gateway. The encoding uses the existing venue-agnostic adapter pattern; V14 introduces no new encoding code because the same code that encodes a child for ImmediateLimitExecution encodes a child for SmartOrderRoutingExecution.

The pre-trade risk check is unchanged from the previous phase. Every child passes through RiskEngine.evaluate(...), which returns an APPROVED or a primitive reject reason in sub-microsecond time. If any child is rejected, that child’s allocation is dropped, the parent is notified, the residual is restored to leavesQty, and a counter increments. The SOR does not retry rejected children at a different price without explicit configuration; risk rejects are not “try again from a different angle” signals.

Each successful child registers its clOrdId against the parent in ParentOrderRegistry, with OrderState.parentOrderId set to the parent ID. This is the existing attribution rule, applied without modification.

Latency budget: under 3 microseconds per child (dominated by SBE encode and Aeron offer). Allocation budget: 0 B/op. Failure path: a downstream Aeron back-pressure response or gateway capacity reject causes the child to be dropped; the SOR treats it as a reject for accounting purposes.

Phase 5: Reroute and parent-state evolution

The SOR’s parent state begins to evolve as soon as children are submitted. Execution reports flow back through the gateway, through normalization, into OrderManager, and into ParentOrderRegistry. The SOR observes them through onChildExecution.

There are four execution outcomes per child, each handled deterministically:

A complete fill updates the parent’s cumFillQty, decrements leavesQty, and removes the child from the active-child list. If leavesQty == 0, the parent transitions to COMPLETED with terminal reason COMPLETED.

A partial fill updates the parent’s cumFillQty and decrements leavesQty by the filled amount. The child remains live with its own residual leavesQty; the SOR may decide to cancel it and reroute its residual to a different venue, depending on the configured rerouteOnPartial policy and the time elapsed since the child was submitted. The reroute-or-wait decision is itself deterministic: it depends on configured thresholds (time elapsed, market move since submission, current routable liquidity), not on wall-clock judgment.

A reject removes the child from the active-child list, restores its residual size to the parent’s leavesQty, and triggers a reroute attempt against the current routable liquidity view if maxRerouteAttempts has not been exceeded. Each reroute attempt counts toward the parent’s lifetime reroute budget; once exceeded, the parent transitions to EXECUTION_ABORTED with a primitive reason code.

A cancel-acknowledged for a child the SOR initiated removes the child cleanly. Cancels initiated by the venue or by external operators (administrative cancels) trigger a parent-level investigation: if the parent’s strategy did not request the cancel, the SOR escalates per the runbook (kill switch on the venue, reconciliation required before resumption).

Timer-driven behavior follows the same pattern. Each submitted child receives a deterministic cluster timer at childTimeoutMicros. On timer fire, if the child is still live, the SOR cancels it and reroutes the residual. The onTimer path uses the established timer-correlation-ID-registered-before-scheduling discipline (registered before scheduling, rolled back on scheduling failure, primitive failure path on required-timer absence).

The parent transitions to COMPLETED, CANCELED_BY_PARENT, EXPIRED, RISK_REJECTED, or EXECUTION_ABORTED on the appropriate terminal condition. These are the same terminal reasons the previous phase defined; SOR does not introduce new ones.

Latency budget for onChildExecution: under 2 microseconds for the common case. Allocation budget: 0 B/op. Failure paths: every transition is documented and either explicitly legal or explicitly rejected; an attempted illegal transition increments a counter and is dropped.

9. Determinism for SOR: harder than it looks, but not optional

The previous article argued that determinism is the foundation of replay-based testing, bug reproduction, cluster recovery, and audit defensibility, and that the discipline determinism imposes is “significant.” For SOR, the discipline is more significant, because SOR has more moving parts that are tempted to be non-deterministic.

Three specific traps deserve attention.

Wall-clock-driven decisions. A naive SOR is full of “if more than 100ms have passed, reroute” logic. If “more than 100ms” means “wall clock at decision time minus wall clock at submission time,” the system is non-deterministic: a follower processing the same event sequence at a different wall-clock instant computes a different elapsed-time and may make a different decision. The fix is the established one: the cluster’s deterministic clock is the authoritative source, and the elapsed time the SOR sees is (currentClusterTimeMicros - submissionClusterTimeMicros), both reads from the same monotonic deterministic clock. Wall-clock reads are forbidden on the SOR hot path, the same way they are forbidden on the trading-strategy hot path.

Hash-map iteration order for venue scoring. A scoring loop that iterates over a HashMap<String, VenueScore> to find the best venue is non-deterministic, because hash-map iteration order is not specified and can change between JVM runs even with the same inputs. Two replays of the same event stream will, in general, see ties broken differently. The fix is to use primitive int-keyed maps (Int2ObjectMap from Eclipse Collections, or a hand-rolled bounded array indexed by venueId) and to sort scored levels into a deterministic order with explicit tie-breaking. The V14 default is to break ties by venueId ascending, which is stable and replayable.

Random-number injection. Some SOR designs inject randomness into venue selection to obfuscate parent footprint and reduce information leakage. This is legitimate, but the randomness must come from a deterministic seed that is itself part of the replayable input. The seed enters the cluster as part of the parent intent (or as part of a periodic seed-rotation event), is captured in the ingress log, and is replayed identically. A Math.random() call on the SOR hot path, even for the legitimate purpose of footprint obfuscation, breaks replay and is forbidden.

The tests that prove SOR determinism are an extension of the existing deterministic-replay tests, not a new test category. Recorded ingress streams are replayed against fresh cluster builds; the resulting state, including parent state, child orders submitted, allocations, scores, terminal reasons, and counters, is byte-identical to the original. Divergence on any of these is a build break, treated the same way as the established zero-allocation regressions.

There is one SOR-specific test class that V14 introduces: the routing-decision divergence test. A recorded session is replayed twice with intentionally different non-routing-affecting state (e.g., different log timestamps, different operator metadata, different cold-path counter snapshots), and the routing decisions are required to be identical. This catches non-determinism that hides in cold-path-derived inputs to hot-path decisions, which is one of the more subtle ways SOR implementations leak randomness.

Key Principle: A non-deterministic SOR is a router whose decisions you cannot reproduce, whose performance you cannot reliably test, and whose behavior you cannot defend in a post-mortem. Determinism is not a feature of the SOR; it is a precondition for the SOR being trustworthy.

10. Fill probability: the cold path that makes the hot path smart

Fill probability is the single most consequential cold-path input to the SOR’s hot-path scoring. It is also the input most often handled badly. A naive implementation hard-codes a constant per venue and never updates it. A slightly less naive implementation maintains an exponentially-weighted moving average of “did the last child fill or not.” Neither is good enough.

The V14 fill-probability model is honest about what it knows and what it doesn’t. The model produces a per-(venue, instrument, side, size-bucket, price-distance-bucket) probability estimate, updated at the end of each completed child’s lifecycle from the realized fill outcome. The estimate is a Bayesian posterior with a configurable prior, smoothed across adjacent buckets, and refreshed in the cluster’s cold path on a regular cadence: fast enough that recent venue behavior is reflected (every few minutes), slow enough that the hot path sees a stable input that does not flicker between cluster events.

The size and price-distance buckets matter. A child for 0.1 BTC has a different fill probability than a child for 5 BTC, even at the same price; a child priced one tick into the spread has a different fill probability than one priced ten ticks back. Bucketing by both dimensions captures the structure of fill-probability surfaces without requiring a continuous function on the hot path. The hot path looks up the bucket for the child it is considering and reads a primitive scaled long.

The buckets are bounded and primitive. The full table for V14’s default configuration (3 venues × 1 instrument × 2 sides × 8 size buckets × 16 price-distance buckets) is a small primitive array; the lookup is a single indexed read, costing nanoseconds.

The cold-path update is straightforward and entirely off the hot path. Each completed child’s outcome (filled, partially filled, canceled with no fill, rejected) updates the relevant bucket’s posterior. The update runs on a cold-path executor, not the cluster thread, and writes its result to an off-heap buffer that the hot path reads through a primitive atomic snapshot pointer. The hot path never blocks on the update; the update never allocates onto a heap path the hot path cares about.

One subtlety deserves elaboration: the prior. A new venue, or a new instrument on an existing venue, has no historical fills. The naive choice is to use a uniform prior (50% fill probability across the board), which produces poor routing decisions in the first hour of the venue’s operation. The V14 choice is to use a structural prior derived from the venue’s displayed depth and historical reliability of comparable instruments. A venue showing 10 BTC of depth at the touch is given a high prior fill probability for small children at the touch; a venue showing 0.1 BTC is given a lower prior. This is an opinionated default; firms with stronger priors from their own historical data will override it.

A second subtlety: the model is a predictive model, not a causal one. It predicts what the fill probability will be for a hypothetical child at a hypothetical venue, given everything the model has seen. It does not predict what will happen if the SOR routes more flow to a particular venue, because the model has no causal handle on that. This matters because aggressive routing to a “high fill probability” venue will, eventually, exhaust the queue at that venue and the fill probability for new children will drop. The model catches up to this within the cold-path update cadence, but during the lag between exhaustion and update, the SOR is making routing decisions on stale fill-probability estimates. V14 mitigates this with a small adversarial term in the scoring function: each child sent to a venue temporarily reduces that venue’s effective fill-probability score by a configured decay, and the decay rebuilds over time. This is a heuristic, not a model; it is good enough for V14, and a more principled solution is roadmap territory for V15+.

The lesson here is familiar from the original hot/cold split. Cold paths can do sophisticated work, including Bayesian posteriors, smoothing, structural priors, and decay schedules, exactly because they are off the hot path. The hot path consumes their output as primitive values. The boundary is enforced; the sophistication is contained.

The model is amenable to backtesting, and the V14 architecture leaves room for it: a TCA replay harness can be run against the cold-path posterior to report calibration loss and bias per (venue, instrument, side, size-bucket) cell. As of V14, the harness exists and the model is in production use; release-blocking validation thresholds for the fill-probability surface are roadmap territory for V15+, alongside the broader scoring-validation work.

11. Splitting and allocation: simpler than the literature suggests

The academic literature on optimal order splitting is rich. There are papers proving optimality of various allocation rules under various market microstructure assumptions; there are papers showing how to extend those rules to multi-venue, multi-period, partially-observable settings; there are papers using stochastic control to derive the impact-minimizing trajectory through an order book. A reader coming to SOR design from this literature can be forgiven for expecting that V14’s allocation phase would implement a sophisticated optimization.

V14’s default allocation is GREEDY_FILL_BEST. We chose simplicity over sophistication, deliberately, and the reasons are worth being explicit about.

The first reason is latency. Every additional microsecond spent in the allocation phase is a microsecond during which the market moves. The optimal allocation under perfect-information assumptions is a worse allocation under imperfect-information conditions, because the perfect-information allocation took too long to compute and the information it was computing on is no longer accurate. Greedy fill from the best score takes microseconds. Water-filling takes tens of microseconds. Stochastic-control-based allocation takes hundreds. The marginal benefit at the allocation-phase margin is dominated by the latency cost.

The second reason is parent size. The academic literature is most powerful when the parent order is large enough to materially impact the order book, where filling the parent will move the price by basis points or more. V14 is built for a proprietary trading firm whose typical parent sizes are small relative to the touch depth on its configured venues. For a parent that fits comfortably within the top three levels of routable liquidity, water-filling and greedy-fill produce nearly identical allocations, and the simpler one is provably better because it is faster.

The third reason is testability. Greedy-fill has a single deterministic output for a given routable liquidity view. Water-filling and proportional-fill have configured weights that make the output sensitive to those weights, which complicates the test surface. Every allocation mode V14 ships has a corresponding suite of replay tests; the test cost grows with the complexity of the allocation, and the V14 release was scoped to the modes whose test surfaces had been completed and whose evidence had been archived.

PROPORTIONAL_FILL and WATER_FILLING are still in V14 because some parent shapes benefit from them: large parents at thin venues, or parents whose impact at a single venue would push price meaningfully through the configured price limit. The configuration knob is per-execution-strategy-config; firms can choose per use case. The V14 default is greedy-fill because the V14 default parent shape is the one greedy-fill handles best, and changing the default would be a configuration choice the firm made deliberately, not a sophistication-driven default that performs worse on the firm’s actual workload.

WATER_FILLING deserves an honest footnote about its calibration demands. The mode requires per-venue marginal-impact curves to be calibrated, refreshed in the cold path, and validated against recent fills. A miscalibrated curve makes water-filling worse than greedy on every workload, because the allocation it computes is optimal under impact assumptions that no longer hold. In our internal sim work, GREEDY_FILL_BEST outperforms WATER_FILLING on small parents (well within the top-of-book depth), ties on medium parents, and starts to trail only on parents large enough to materially move price through the configured limit at a single venue. The medium-and-large regimes are rare for the V14 default workload, which is why greedy is the V14 default. Firms whose typical parent sizes fall in the higher regime should calibrate impact curves against their own simulator traces and validate the resulting allocations before flipping the configuration; calibration-drift detection and per-(venue, instrument) gating of WATER_FILLING are roadmap items.

This generalizes beyond SOR: default to the simplest implementation that meets the requirements, and add complexity only with evidence that the complexity earns its keep on the workload that matters. The literature defines the upper bound of what is theoretically optimal; the engineering question is what is empirically optimal at production constraints. The two are usually different, and the difference favors simplicity more often than newcomers to the domain expect.

12. Reroute, race conditions, and the partial-fill problem

The interesting parts of an SOR are the recovery paths. The happy path is straightforward: parent intent in, scoring, allocation, encoding, child fills, parent completes. The hard parts are everything else: the child that is rejected at a venue while a sibling child is filling; the child that fills partially while the rest of the book moves through its limit; the venue that drops the FIX session in the middle of a parent’s lifecycle; the parent that the trading strategy cancels while three of its children are still live and a fourth is in transit through the network.

V14’s reroute logic is built around three invariants that combine to give the parent state machine its safety properties.

Invariant 1: The parent’s cumFillQty and leavesQty always sum to its original orderQty, modulo administratively canceled portions. This is the established cumulative-consistency rule, applied across all of the parent’s children. Every fill on any child increments the parent’s cumFillQty and decrements its leavesQty by the filled amount. Every reject restores the rejected child’s residual to the parent’s leavesQty. Every administrative cancel removes the canceled portion from leavesQty without crediting it to cumFillQty. The invariant must hold at every observable state, not just at terminal state.

Invariant 2: The set of live children’s leavesQty plus the parent’s leavesQty equals the parent’s orderQty - cumFillQty - cancelledQty. This is the conservation rule for the parent’s outstanding quantity, and it is the rule that prevents reroute logic from accidentally double-counting. A reroute that moves residual from one venue to another must remove the residual from the source child’s leavesQty (typically by canceling that child) and add it to the destination child’s leavesQty (typically by submitting a new child). If both happen, the invariant holds. If either fails, the parent is in an inconsistent state, and the runbook escalation applies.

Invariant 3: Every state transition is idempotent. Duplicate execution reports, common from FIX resend requests, must produce identical state if applied twice. The existing exec-ID dedup machinery handles this for child orders. V14 extends it to parent-level events: the parent’s cumFillQty is updated only on the first observation of a given exec ID, and the parent-level reroute-on-partial decision is keyed by the (child, exec-ID) pair so that a duplicate execution report does not trigger a second reroute.

To make the invariants concrete, the sequence below traces a typical reroute on partial fill. The strategy submits a 10 BTC parent intent. The SOR allocates 5 BTC to Venue 1 and 5 BTC to Venue 2. Venue 1 fills only 2 BTC and the rest of the book moves through the limit. The SOR cancels the residual on Venue 1, waits for the cancel-ack, and reroutes the 3 BTC residual to Venue 2 as a marketable child. Venue 2 fills both children, the parent reaches its terminal state, and the strategy receives the COMPLETED callback. Every transition is gated by Invariant 2 (conservation of outstanding quantity) and Invariant 3 (idempotency on duplicate execution reports).

These invariants govern several specific race conditions that SORs encounter.

The cancel-fill race. The SOR cancels child A on venue 1 because scoring shifted and venue 2 looks better. Before the cancel reaches venue 1, child A fills. The cancel arrives at the venue and is rejected (CANCEL_REJECT_TOO_LATE). The SOR observes the fill before, at the same time as, or after the cancel-reject. All three orderings produce the same final state under the invariants: cumFillQty increases by the fill amount, the cancel-reject is recorded as a counter event with no state effect, and the SOR adjusts its outstanding routing plan because the residual it expected to reroute from venue 1 has shrunk by the fill amount. The order in which the events arrive does not change the final state; it changes only the latency of recognizing the state.

The partial-fill race. Child A on venue 1 partially fills. The SOR decides to cancel the residual and reroute it to venue 2. The cancel is in flight when child A receives a second fill that completes it. The cancel is then rejected as too-late. The SOR’s reroute child for venue 2 is already submitted and is now over-allocating. The invariants force the SOR to recognize this on the second fill: the parent’s leavesQty drops to zero, the venue-2 reroute child’s allocation now exceeds the parent’s outstanding qty, and the SOR cancels the venue-2 child to restore the invariant. This is unpleasant, costing a venue submission and a venue cancel for no net effect, but it is safe. The invariants guarantee no over-execution.

The venue-disconnect race. Venue 1 drops its FIX session while child A is live. The SOR observes the disconnect through a normalized session event. It cannot know whether child A is still live, has filled, has canceled, or is in some indeterminate state at the venue’s matching engine. The established rule applies: orders whose state cannot be reconciled trigger the kill switch. The SOR’s parent transitions to a RECOVERY_REQUIRED state pending operator action; the operator runbook reconciles the venue’s records with the firm’s records before the parent is allowed to terminate or to issue further routing.

The strategy-cancel race. The trading strategy issues a parent cancel while three children are live across three venues. The SOR cascades the cancel: it issues a cancel to each live child and waits for the venue’s response. As cancels complete, the parent’s state evolves. If any child fills before its cancel takes effect, the fill is honored (the invariants force this; there is no way to undo a venue fill). The parent’s terminal state is CANCELED_BY_PARENT once all children are terminal, with cumFillQty reflecting whatever filled before the cancels propagated.

The takeaway is structural: race conditions in distributed routing systems do not get resolved by being faster; they get resolved by being correct. The invariants prevent the bad outcomes; the speed of the system determines how often the SOR has to deal with the tail of the race-condition distribution. A faster SOR sees fewer races, but a less correct one sees the same races and corrupts state when they arrive.

13. ArchUnit, preflight, and benchmarks: the three-layer defense for deterministic low-latency engineering

The previous article introduced benchmarking as an architectural contract: performance characteristics treated as correctness properties, with regressions handled as build breaks rather than performance issues. That framing is the right starting point, but a single technique alone is not enough to keep an institutional-grade trading platform deterministic and allocation-free across years of evolution. NitroJEx layers three complementary techniques: ArchUnit for static, source-level enforcement of architectural rules; JMH benchmarks with -prof gc for empirical proof of runtime allocation and latency behavior; and a preflight gate that orchestrates the full evidence chain into a single release-blocking check. Each catches a class of failure the other two cannot. Together they form the defense in depth that makes the V14 release’s deterministic and zero-allocation claims trustworthy.

This section unifies these three. They were treated separately in earlier releases, partly because they were introduced at different times, and partly because each has its own engineering surface. V14 deserves a unified treatment because the interaction between them is what produces the rigor, and because newcomers to the codebase often understand each individually but miss the way they cover for each other.

Key Principle: No single enforcement technique is sufficient for deterministic, allocation-free engineering. Static analysis cannot see what the JIT does at runtime. Benchmarks cannot cover every code path. Preflight cannot prove anything it does not check. The rigor comes from layering all three so the failures each one misses are caught by another.

Why three layers, not one

Consider the failure modes a low-latency deterministic platform must avoid, and what would catch each one.

A developer writes String.format("...", venueId) on the SOR hot path. ArchUnit’s forbidden-API rule fails the build immediately; the change cannot be merged. A benchmark would also catch it through an allocation regression, but the build break arrives sooner, with a clearer error message, before anyone has to run a benchmark.

A developer writes a Map<String, VenueScore> map = new HashMap<>() outside a forbidden-pattern, but allocates per-event by calling map.computeIfAbsent(...) with a method reference that captures a lambda. ArchUnit’s rules cover the explicit new HashMap site; the lambda-capture allocation is invisible to static analysis. The benchmark catches it: B/op regresses, the JMH artifact shows non-zero allocation, the gate fails.

A developer adds a new hot-path code shape and forgets to add a benchmark for it. ArchUnit’s rules pass, since the code has no forbidden APIs. The existing benchmarks pass too, since they don’t exercise the new shape. The preflight gate catches it: the V14 preflight check refuses to ship if the documented hot-path surface has uncovered shapes, and the release-evidence README requires explicit coverage entries per declared hot-path surface.

A developer fixes a bug, runs benchmarks locally, sees they pass, and pushes. The CI runs a stale benchmark profile that does not match the V14 capacity bounds. The numbers look good but mean nothing for the production configuration. Preflight catches it: the gate runs against the configured V14 capacity profile, refuses to accept benchmark JSON older than the configured staleness window, and rejects releases whose evidence does not match the configuration the system will run under.

Each technique covers a different gap. None of them, alone, would have caught all four of these failures. The point of layering is not redundancy; it is coverage.

ArchUnit: static enforcement at source

ArchUnit is the first line. It runs as part of the build, alongside compileJava and the unit tests, and it fails the build if the source code violates an architectural rule. The rules are written in Java, live in test sources, and look like this:

@Test
void smartOrderRoutingHotPathMustNotAllocateChildAllocationOutsideBuffers() {
    noClasses()
        .that().resideInAPackage("..exchange.cluster.execution.sor..")
        .should().callConstructorWhere(
            target(owner(name("ChildAllocation")))
        )
        .because("ChildAllocation must come from preallocated buffers")
        .check(sorPackage);
}

Every rule is declarative, version-controlled, and reviewable. The build fails on violation, with an error message pointing at the exact source line. There is no “we’ll catch it in code review” or “we’ll catch it in a benchmark later”; the violation is rejected before the change can be merged.

The original ArchUnit baseline forbids, on declared hot-path packages: new String(...), Long.toString(...), String.format(...), per-event new byte[], HashMap and LinkedHashMap mutation with object keys, List growth, boxed Long and Integer values, exception construction for expected data-quality failures, formatted logging for expected failures, and wall-clock reads outside the deterministic cluster clock. The previous phase extended the rules to the parent-registry and execution-engine packages and added the rule that forbids any hot-path code from holding references to flyweight ParentOrderIntentView and ChildExecutionView objects past the call that delivered them.

V14 extends the ruleset again. The new rules forbid, in the SOR package: allocation of ChildAllocation, RoutableLevel, and VenueScoreEntry outside their preallocated buffers; hash-map-based venue lookups (the V14 substitute is a primitive int-keyed bounded array indexed by venueId); wall-clock reads in SmartOrderRoutingExecution and any class it transitively calls on the hot path; use of Math.random or any java.util.Random instance not seeded from a deterministic ingress event; and direct method calls into venue-specific packages from the SOR class (the SOR routes through the venue-agnostic adapter, never through venue-specific code). Each rule is paired with a violation message explaining why the rule exists, so developers who hit it understand the architectural reason rather than just the mechanical rejection.

What ArchUnit cannot see is what the JIT does. A method call that looks allocation-free in source can still allocate at runtime through autoboxing the developer didn’t notice, escape-analysis failure on a method the JIT decided not to inline, or hidden synthetic objects the compiler emits for things like enhanced-for-loops over collections. ArchUnit also cannot see allocation that crosses package boundaries through generics, method handles, or reflection. These are exactly the gaps benchmarks cover.

Key Principle: ArchUnit catches what is visibly wrong in the source. It cannot catch what the runtime does that the source does not say.

Benchmarks: empirical proof of runtime behavior

JMH with -prof gc is the second line. Where ArchUnit asks “does the source obey the rules?”, JMH asks “does the runtime obey the rules?” The two are different questions and they have different answers.

A JMH benchmark runs the hot-path code under controlled conditions and reports two numbers per iteration: throughput (operations per second) and allocation rate (bytes per operation, or B/op). Throughput drives the latency budget; allocation rate drives the zero-allocation claim. Both are archived as JMH JSON artifacts that the release-evidence bundle owns. A benchmark that reports 0 B/op after warmup proves something ArchUnit cannot: that the actual runtime, with the actual JIT-compiled code and the actual escape-analysis decisions, allocates nothing on the measured path.

The V14 hot-path latency budget is:

Phase	Latency budget	Allocation budget
`onParentIntent` total (Phases 1-4)	25 microseconds	0 B/op
Phase 1 (routable liquidity construction)	2 µs	0 B/op
Phase 2 (venue scoring)	5 µs (3 venues × 5 levels)	0 B/op
Phase 3 (allocation)	5 µs (greedy default)	0 B/op
Phase 4 (encode + risk + offer per child)	3 µs × children	0 B/op
`onChildExecution` (parent state update)	2 µs	0 B/op
`onTimer` (reroute decision)	5 µs	0 B/op

These are targets for the V14 release evidence, not claims for arbitrary configurations. Each is gated by a JMH benchmark with -prof gc, archived as JMH JSON output, with explicit configuration captured (venue count, instrument count, capacity bounds, JVM flags, host CPU model, JDK build). Every non-zero allocation, if any, has owner, reason, path classification, and remediation task. The established evidence rule applies without modification.

The V14 benchmark surface adds five classes to the platform-benchmarks module. SmartOrderRoutingScoringBenchmark measures Phase 2 at venue counts 1, 3, 5, and 10; per-venue and total scoring time, plus B/op. SmartOrderRoutingAllocationBenchmark covers Phase 3 across all three allocation modes at several parent-size points relative to routable liquidity. SmartOrderRoutingEndToEndBenchmark measures Phases 1-4 as a single target with realistic input distributions captured from simulator traces. This is the production-relevant number, not the per-phase numbers, because production sees the phases composed and any inter-phase overhead shows up only end to end. SmartOrderRoutingChildExecutionBenchmark covers onChildExecution for fill, partial-fill, reject, and cancel-ack outcomes. SmartOrderRoutingRerouteBenchmark covers onTimer-triggered reroute including the cancel-and-resubmit path.

Each runs with -prof gc and Mode.SampleTime for percentile reporting (p50, p90, p99, p99.9), at a fixed event rate to address coordinated omission. Each publishes both throughput and latency percentiles. The release-evidence bundle archives both alongside the existing evidence from earlier releases.

Two limits of JMH-based proof deserve explicit acknowledgment, because they shape what claims V14 can responsibly make.

JMH does not prove production latency; it proves benchmark-environment latency. Production has CPU-pinning effects, NUMA effects, NIC interrupt effects, GC pressure from cold paths sharing the heap, JIT-deoptimization events, kernel jitter, and Aeron back-pressure dynamics that JMH cannot reproduce. The institutional pattern is to complement JMH with HDR-Histogram-based observability in production, dimensioned by the same phases, with explicit coordinated-omission correction. The benchmarks gate the build; the production observability validates the build’s behavior survives contact with reality. V14 ships the JMH side; the production observability is operational deployment configuration that lives outside the spec.

JMH covers what you write benchmarks for. A new hot-path code shape introduced without a corresponding benchmark is invisible to JMH-based gates. The defense is that the preflight gate (next subsection) requires explicit benchmark coverage entries per declared hot-path surface; new code shapes without coverage fail preflight even if the existing benchmarks pass.

Key Principle: Benchmarks prove what they measure under the conditions they measure. They do not prove what they do not measure, and they do not prove production behavior. The surrounding discipline (coverage requirements, configuration capture, production observability) is what makes the benchmark numbers meaningful.

Preflight: the release-blocking orchestration

The third line is the preflight gate. ArchUnit catches violations in source. Benchmarks catch them at runtime. The preflight gate ensures that every check has actually run, has actually passed, and has actually produced current evidence under the configuration the system will run under. Without it, the other two layers can pass individually and still not protect the release, because nothing forces them all to be current at the same time, against the same code, against the right configuration.

scripts/v14-preflight-check.sh orchestrates the V14 evidence chain. The script is idempotent, scripted, and exits non-zero on any failure. It runs:

The full unit test suite. The full integration test suite. The full simulator deterministic test suite. The full live-wire simulator E2E test suite (including the V14-specific multi-venue, partial-fill reroute, and disconnect recovery tests). The deterministic replay test suite (including the V14 routing-decision divergence test). The snapshot/load test suite. The full ArchUnit ruleset (the original, prior-phase, and V14 rules, in that order, since they compose). The full JMH benchmark suite for the V14 configuration profile, with -prof gc. A staleness check on every benchmark JSON artifact (rejects evidence older than the configured window). A configuration-match check that compares the benchmark configuration profile to the production-target configuration profile. A coverage check that verifies every declared hot-path surface has both a corresponding ArchUnit rule and a corresponding benchmark.

Any failure in any of these stops the release. The preflight script does not warn; it exits non-zero, and the release tag cannot be cut.

The V14 preflight extends the earlier preflights rather than replacing them. The original evidence chain (zero-allocation hot-path proof, deterministic replay, simulator coverage) and the prior-phase evidence chain (parent-registry coverage, execution-engine replay, parent-intent E2E) are both required to pass alongside V14’s additions. Releasing V14 without current prior evidence is impossible, because the V14 preflight script invokes the earlier preflights as dependencies. The chain composes; the rigor compounds.

There is a subtle but consequential discipline embedded in the staleness check. Benchmark JSON artifacts and ArchUnit rule outputs carry timestamps; the preflight gate rejects evidence older than a configured window (typically 24 hours for benchmarks, immediate-rerun for ArchUnit). This sounds like bookkeeping, but its purpose is structural: it forces the evidence to be regenerated on the same code that is being released, under the same configuration that production will use. A team that runs benchmarks on a feature branch, merges to main, and tries to ship without rerunning is blocked; preflight reruns before the tag is allowed.

The preflight script also produces an evidence manifest: a single JSON document listing every artifact, its timestamp, its hash, its configuration, and its result. The manifest is committed to release-evidence/v14/ alongside the JMH JSON, the test reports, and the ArchUnit results. Anyone reviewing the V14 release can read one file to see what was checked, when, and against what configuration.

Key Principle: Preflight is what turns a collection of checks into a release gate. It enforces that all the checks ran, that they ran on the right code, against the right configuration, with current evidence. Without preflight, individual passing checks do not protect the release.

How the three compose for V14

The V14 release ships only when all three layers say yes. ArchUnit confirms the source obeys the architectural rules. Benchmarks confirm the runtime obeys the latency and allocation budgets. Preflight confirms that all the checks ran on the right code, against the right configuration, and produced current evidence.

Each layer protects against the failures the others miss. ArchUnit catches violations before the build completes; benchmarks catch violations before the release ships; preflight catches incomplete evidence before the tag is cut. A change that violates an ArchUnit rule never reaches the benchmark stage. A change that passes ArchUnit but allocates at runtime fails the benchmark. A change that passes both but lacks coverage on a new code shape fails preflight. No path through the gates is left uncovered.

The V14 deterministic and zero-allocation claims rest on this layering. The claim is not “we believe the SOR is allocation-free”; it is “the source obeys the rules ArchUnit checks, the runtime obeys the budgets the benchmarks measure, and the preflight gate certifies that both checks ran on the V14 release commit against the V14 configuration profile.” The professional claim language, deterministic, allocation-free, benchmark-gated, maps directly onto these three layers: deterministic is what the replay tests prove (the routing-decision divergence test is the SOR-specific addition); allocation-free is what ArchUnit forbids in source and what benchmarks verify at runtime; benchmark-gated is what preflight enforces.

The architectural lesson generalizes beyond NitroJEx and beyond SOR. Any system whose correctness depends on properties the type system does not enforce — zero-allocation, determinism, latency bounds, architectural-boundary integrity — needs an enforcement layer that runs at build time, an evidence layer that runs at test time, and a gate layer that runs at release time. Picking any two leaves the third gap exposed. Skipping any one is the failure mode that explains why most “low-latency” systems are slower than their marketing claims and why most “deterministic” systems are not actually deterministic in production.

Layered enforcement is what turns architectural intent into engineering fact, and what allows V14 to earn its evidence rather than assert it.

14. Risk integration: the SOR is not exempt

The previous article emphasized that the risk engine sits pre-trade on the critical path of every outbound order, with sub-microsecond bounded-and-fast evaluation, configurable limits, an authoritative position model derived only from execution reports, and a kill switch with fast escalation paths. V14 changes none of this. The SOR is not exempt from any of the risk rules; it merely has a more complicated relationship with them than the existing execution strategies do, and the architecture has to be honest about that complication.

Three specific risk-integration points deserve treatment.

Per-child risk evaluation, with parent-level coordination. Each child the SOR submits is risk-evaluated independently on submission, on the same path with the same sub-microsecond budget and the same primitive reject reasons. But there is also a parent-level constraint that no individual child violates: the parent’s orderQty is bounded by the trading strategy’s risk envelope at the time the parent intent was submitted, and the SOR must not allocate child quantity that would, in aggregate, exceed the parent’s orderQty. This is a conservation-law check, enforced by the SOR itself, before it reaches the per-child risk path. A bug in the SOR allocation logic that double-allocates is a bug the SOR’s own invariants must catch, because the per-child risk engine evaluates one child at a time and cannot see the aggregate.

Cross-venue position aggregation. The risk engine’s position model aggregates across venues for the firm-level exposure check. A parent that allocates to venues 1 and 2 cannot, by definition, exceed the firm’s per-instrument position limit, because each child’s risk evaluation already includes the firm-level aggregate as of the moment of submission. But there is a subtle race: between the moment Phase 2 scoring uses the firm’s current position to weight venue selection and the moment Phase 4 submits the children, the firm’s position may have changed (a sibling parent’s children may have filled). The V14 design handles this by re-checking the firm’s position at submission time, since Phase 4’s risk evaluation is the authoritative check, and accepts that Phase 2’s scoring may have used slightly stale data. The lag is microseconds; the position has not materially changed; the children are evaluated authoritatively at submission.

Kill switch escalation paths specific to SOR. The SOR introduces failure modes that the previous phase did not have: a child whose parentOrderId does not match any active parent (the existing kill-switch trigger applied to SOR children); a parent whose live-children’s aggregate leavesQty does not match the parent’s leavesQty (the V14 invariant violation); a venue whose disconnect prevents reconciliation of a live child; a reroute attempt that exceeds the parent’s lifetime reroute budget while children are still live (which would indicate the SOR is unable to make progress on the parent and should escalate). Each of these has a primitive failure code and an explicit operator-runbook entry.

The V14 risk integration extends the existing risk engine in exactly one place: per-strategy reroute-rate limits. A misconfigured SOR could, in pathological conditions, reroute the same parent’s residual repeatedly across venues, generating venue-API-quota pressure and venue-side-suspicion signals. The risk engine adds a configurable reroute-rate limit (rerouts per parent per second, per strategy per second) with the same token-bucket pattern as the existing order-rate limit. This is a small extension; it is included in V14 because the failure mode it protects against is a direct consequence of introducing reroute logic, and not having the protection would be an obvious oversight.

15. Operational runbooks for the SOR layer

The previous article required that operational runbooks be first-class. V14 ships five SOR-specific runbooks alongside the existing set. Each follows the established pattern: explicit signal, explicit actions, explicit kill-switch escalation.

SOR Routing Storm. Signal: per-strategy reroute-rate counter exceeds configured limit, or per-venue reject-rate counter exceeds configured limit, sustained over a configured window. Actions: pause new parent-intent submission for the affected strategy; allow live parents to complete or be canceled normally; investigate venue-side cause (rate-limiting, market-data quality, connectivity); escalate to kill switch only if the routing storm correlates with reconciliation issues or if the strategy’s own risk budget has been exceeded.

SOR Score Divergence. Signal: replay of recent ingress against a fresh cluster build produces routing decisions that differ from the original. Actions: this is a determinism failure; treat it as a build break in production. Halt the affected strategy. Capture the divergent decisions, the inputs, and the cold-path state snapshots. Run the V14 routing-decision-divergence test suite against the build to identify the source of non-determinism. Do not resume routing until the divergence is reproduced and resolved in test.

SOR Stale Routable Liquidity. Signal: an SOR parent rejects with reason MARKET_DATA_STALE, repeated across multiple parents. Actions: investigate the gateway-to-cluster market-data path for the affected venues for disconnect, FIX session degradation, simulator-vs-production divergence, or normalizer regression. Until resolved, parents that depend on the stale venue are routed only to venues whose data is fresh; if no venue’s data is fresh, the strategy is paused. Do not relax the staleness check; the staleness check is the protection.

SOR Reroute Budget Exhausted. Signal: a parent transitions to EXECUTION_ABORTED with reason REROUTE_BUDGET_EXCEEDED while children are still live. Actions: the parent’s residual leavesQty is unrouted and must be reconciled. Live children continue to fill or are canceled by the operator-initiated cancel flow. The trading strategy receives the parent terminal callback and decides whether to re-issue a new parent intent for the unfilled residual or accept the partial. Investigate whether the reroute budget was exhausted by genuine market conditions (acceptable; configure a higher budget if the conditions are expected) or by a routing pathology (regression; halt and reproduce).

SOR Venue Capacity Full. Signal: per-venue child-capacity counter approaches its configured bound for the SOR’s active children at a venue. Actions: the SOR begins routing away from that venue, weighted by the capacity headroom (a venue near its bound is scored lower regardless of price). If the capacity is genuinely exhausted, new child allocations to that venue are rejected; the SOR routes to alternatives. If alternatives are also exhausted, the parent fails with EXECUTION_ABORTED and the strategy is paused pending operator review of capacity configuration.

The pattern across all five is the established pattern: the SOR does not silently absorb anomalies; it surfaces them as parent terminal reasons or counter signals, and the runbooks specify exactly what the operator does when each one fires. The kill switch is the brake of last resort, not the only brake; the runbooks define the gradients between “investigate and continue” and “halt and reconcile” so the operator does not have to choose between ignoring everything and halting everything.

16. What V14 explicitly does not do

The V14 release boundaries are deliberate, and being explicit about them is part of the discipline of evidence the previous article argued for. The following capabilities are not in V14, will not be claimed in V14 documentation, and are tracked in the roadmap for explicit future releases:

Machine-learning-based venue scoring. V14’s scoring function is configurable and weight-driven. It is not a learned model. Adding an ML-based scorer is plausible, since the cold-path TCA loop already produces the data such a model would train on, but the engineering surface (training pipeline, model versioning, online vs. offline scoring, feature drift monitoring, deterministic inference on the hot path, replay correctness across model updates) is large and warrants its own release line. V14 ships a configured-weight scorer because the configured-weight scorer is what the V14 evidence proves.

Dark-pool routing. Crypto spot markets do not have meaningful dark-pool venues in the equities-market sense. Some crypto OTC desks offer dark-pool-like RFQ channels; integrating those would require an RFQ-style execution strategy that V14 does not include. RFQ-aware execution is roadmap territory.

Cross-asset-class routing. V14 routes one instrument across multiple venues. It does not route across instruments (e.g., choosing between BTC-USD, BTC-USDT, BTC-EUR for a USD-denominated parent) or across asset classes. Multi-instrument routing requires a stronger fair-value model and a cross-currency conversion layer; multi-asset-class routing requires venue adapters for non-spot products. Both are roadmap items.

Multi-leg routing. V14 does not support a single SOR parent that routes a multi-leg trade (e.g., a triangular arbitrage) across venues. Multi-leg execution is the previous phase’s MultiLegContingentExecution; combining it with SOR per-leg is V15+ work.

ISO sweeps and equities-market-specific order types. Reg NMS Intermarket Sweep Orders are equities-specific; V14 is crypto-spot. The roadmap entry exists for completeness.

Latency-arbitrage-style routing. V14 routes against the consolidated market’s external executable liquidity. It does not actively pursue stale quotes on slower venues following a faster venue’s move. Latency arbitrage is a trading-strategy concern, not an execution-strategy concern; if a firm wants it, the right place to add it is as a trading strategy on top of V14.

Production deployment of additional venues. V14’s venue universe in production is Coinbase, the same as the previous phase. Binance, Kraken, and other venues are roadmap items requiring the same venue-plugin discipline the prior release used for Coinbase: spec, plan, simulator, FIX integration, normalizer tests, live-wire E2E tests, deterministic replay coverage, JMH evidence, and pre-UAT certification. The SOR’s venueUniverse configuration is forward-looking; what runs in production is what has been certified.

The V14 release-evidence bundle covers exactly what V14 ships. The professional claim language is precise: “V14 introduces SmartOrderRoutingExecution as a deterministic, allocation-free, benchmark-gated execution strategy for routing single-instrument parent intents across configured venues, currently certified for the Coinbase venue plugin only.” Stronger claims wait for stronger evidence.

17. The V14 evidence bundle: what we actually proved

Section 13 covered the three rigor layers (ArchUnit, JMH benchmarks, and the preflight gate) that ensure V14’s deterministic and zero-allocation claims hold against the source code, the runtime behavior, and the release process. Those layers prove non-functional properties: determinism, allocation, latency bounds, configuration freshness. The V14 evidence bundle additionally covers functional correctness, that the SOR does what it claims to do under the workload patterns it must handle, through the test tiers below. All of these run under the V14 preflight gate alongside the rigor layers, and all are required to be current for the V14 release tag.

Unit tests. Cover Phase 1 routable-liquidity construction with empty/partial/full external liquidity views; Phase 2 scoring with edge weights and tie-breaking; Phase 3 allocation across all three modes with parent sizes from “fits in one level” to “exceeds total external liquidity”; Phase 4 encoding with all child order types V14 emits; Phase 5 transitions across all four child outcomes plus all four parent terminal reasons specific to SOR. Case-category coverage (positive, negative, edge, exception, failure) is owner-mapped per the established standard.

Integration tests. Wire the SOR through the cluster harness with normalized market data, verify cluster-side state evolution end to end, confirm OrderState.parentOrderId attribution, confirm ParentOrderRegistry parent state evolution, and confirm that pre-trade risk evaluates each child independently and aggregate-blocks correctly when the firm’s exposure approaches limits.

Simulator deterministic tests. Run the SOR against the deterministic L2/L3 simulator for parent intents covering the routing patterns the algorithm is designed to handle: trivial single-venue parents, multi-venue greedy fills, partial-fill reroutes, reject reroutes, cancel-fill races, partial-fill races, venue-disconnect handling, kill-switch escalations, and capacity-full conditions. Each scenario has explicit expected state evolution and the test asserts it byte-by-byte.

Live-wire simulator E2E tests. Run the SOR through the full Coinbase simulator FIX session loop: gateway FIX session, gateway disruptor, Aeron Cluster ingress, cluster service (books, risk, trading strategy, parent intent, SOR, child orders), cluster egress, gateway order command handler, Coinbase FIX order entry into simulator, simulator execution reports, gateway execution handler, cluster OrderManager / PortfolioEngine / RiskEngine / StrategyEngine. The V14 live-wire E2E adds three new test classes: SmartOrderRoutingMultiVenueLiveWireE2ETest (multi-venue allocation through the simulator’s emulated venues), SmartOrderRoutingPartialFillRerouteLiveWireE2ETest, and SmartOrderRoutingDisconnectRecoveryLiveWireE2ETest.

Deterministic replay tests. Recorded sessions including SOR parents are replayed against fresh cluster builds and required to produce byte-identical state evolution. The V14-specific routing-decision-divergence test runs each recorded session twice with intentionally different non-routing-affecting cold-path state and asserts identical routing decisions. This is the SOR-specific extension of the existing replay surface, and it is the test that catches non-determinism leaking from cold-path state into hot-path decisions.

Snapshot/load tests. Cluster snapshots taken mid-SOR-parent-lifecycle reload correctly: parent state, active children, parent-to-child mappings, fill aggregation, reroute counters, and scoring snapshots. Recovery on a fresh cluster from any snapshot point produces a parent that continues from the same state the original would have.

The functional tests above prove correctness under the workload patterns. The §13 rigor layers prove the non-functional properties. The preflight gate composes both into a single release-blocking check that passes only when every artifact is current, every configuration matches the production target, and every declared hot-path surface is covered. There is no path to a V14 release tag that bypasses any of this.

Measured results from the V14 release archive

The V14 evidence bundle includes the JMH artifacts produced during release gating. The numbers below are extracted from release-evidence/v14/jmh-latency-results.json and release-evidence/v14/jmh-allocation-results.json and reproduced here for orientation. They were generated in a developer WSL2 environment, not on production-class hardware — kernel-bypass NIC, isolated cores, fixed CPU governor, and a low-noise host are not present in this run.

SOR Hot-Path Surface	Mean	p50	p99	p99.9
Parent intent dispatch	1.81 µs	1.06 µs	6.80 µs	85.27 µs
Slice plan (allocation phase)	0.63 µs	0.37 µs	2.59 µs	30.50 µs
Re-slice path (reroute)	4.51 µs	2.68 µs	32.54 µs	188.65 µs
Order manager parent ack/fill/release cycle	0.12 µs	0.09 µs	0.21 µs	0.38 µs
Binance L2 normalizer event path	0.54 µs	0.55 µs	0.57 µs	0.57 µs

The mean and p50 numbers are within order-of-magnitude of the design targets in §8. The p99.9 tails are dominated by VM scheduling, laptop power management, OS interruptions, and profiler overhead; those tails are characteristic of the developer environment, not of pinned production hardware, and should not be quoted as production latency. What the measurement does establish is that the SOR’s hot-path surfaces measure in the microsecond regime under JMH on a developer laptop, with deterministic and replay-stable behavior. Production-grade quoting waits for production-grade hardware.

The Thread Allocated Bytes column is the primary zero-allocation evidence for V14’s declared hot-path surfaces. It comes from the JMH thread-allocation profiler (-prof gc reports the same value field as measuredThreadAllocatedBytes), and it captures actual allocations performed by the JIT-compiled code under measurement — including escape-analysis decisions and lambda-capture allocations that source-level analysis cannot see. Across the SOR surfaces, the path measures zero bytes per operation after warmup. This is the empirical anchor that turns “the source obeys the rules ArchUnit checks” (§13) into “the runtime obeys the budgets the benchmarks measure” (§13).

Release-gate summary archived alongside the JMH artifacts. All upstream gates passed: ./gradlew clean, ./gradlew check, ./gradlew e2eTest, ./gradlew :platform-benchmarks:jmh, ./gradlew :platform-benchmarks:jmhLatencyReport, scripts/v14-preflight-check.sh, and scripts/archive-v14-release-evidence.sh. The archive contains 95 test XML files covering 828 test cases (zero failures, zero errors, zero skipped), 49 allocation benchmark entries, and 44 latency benchmark entries (JMH 1.37 on OpenJDK 21.0.10).

What still requires manual evidence before live trading. Binance credential rotation and live FIX session signoff. Two-venue partial-outage and reconciliation rehearsal. Deployment, monitoring, alert routing, failover, disaster recovery, and rollback signoff. These are tracked in the V14 manual-evidence checklist alongside the automated bundle.

The professional claim attached to the V14 bundle is calibrated to match what the evidence actually establishes. It does not say “NitroJEx routes optimally across venues.” It says: “NitroJEx V14 introduces a deterministic, allocation-free, benchmark-gated single-instrument cross-venue Smart Order Router as a first-class execution strategy, certified for the Coinbase venue plugin under the documented capacity bounds, with the listed evidence available at release-evidence/v14/.” The first claim is marketing; the second is engineering. The evidence bundle is what makes the second claim defensible.

18. A checklist for the SOR layer

The questions below are an audit, derived from this article’s argument, for any SOR implementation claiming institutional readiness.

Definition. Have you been precise about what your SOR does and does not do? Does it sit between execution algorithms and the gateway, or has it absorbed responsibilities of either neighbor that should belong to them?

Architectural placement. Is the SOR a first-class execution strategy in your platform’s plugin model, or is it bolted onto a venue plugin or a trading strategy? Can the same trading strategy run with different routing behaviors by configuration?

Liquidity view. Does your SOR route against external executable liquidity, or against gross consolidated depth that includes your own resting orders? Does it have a pre-trade self-cross check across venues?

Fee/rebate awareness. Is your SOR’s decision price the fee-adjusted price, or the gross book price? Are maker rebates included where the order type qualifies?

Fill probability. Does your SOR maintain a per-(venue, instrument, side, size, price-distance) fill-probability surface? Does the surface update from realized fills? Is the update on a cold path that the hot path consumes as primitive values?

Splitting algorithm. Have you chosen the simplest allocation that meets your workload’s requirements? Do you have evidence that more sophisticated allocations earn their cost on your actual parent-size distribution?

Determinism. Are wall-clock reads forbidden on the SOR hot path? Are hash-map iteration orders deterministic (primitive int keys, sorted output)? Is randomness, if any, seeded from replayable input? Do you have a routing-decision-divergence test?

Race conditions. Are your parent invariants documented and tested? Does cumulative-consistency hold across all children at every observable state? Are duplicate execution reports idempotent at parent level, not just child level?

Latency budget. Have you defined per-phase and end-to-end latency budgets? Are they benchmarked with JMH at fixed event rates that address coordinated omission? Do you complement JMH with HDR-Histogram-based production observability?

Risk integration. Does every child pass through pre-trade risk independently? Does the SOR enforce parent-level conservation laws before per-child risk evaluation? Do you have per-strategy reroute-rate limits to protect against routing-storm pathologies?

Determinism testing. Are SOR-specific deterministic-replay tests part of your release-evidence chain? Are routing decisions byte-identical across replays of the same ingress?

Operational runbooks. Do you have explicit runbooks for routing storms, score divergence, stale routable liquidity, reroute-budget exhaustion, and venue capacity full? Do they have explicit conditions, explicit actions, and explicit kill-switch escalations?

Three-layer defense. Do you enforce architectural rules at source level (ArchUnit or equivalent) so violations fail the build? Do you prove runtime allocation and latency behavior empirically (JMH with -prof gc or equivalent) so the JIT-compiled code is verified, not assumed? Do you orchestrate both into a release-blocking preflight gate that refuses to ship without current evidence under the production-target configuration? If any of the three layers is missing, which class of failure are you accepting?

Evidence discipline. Is your SOR’s release-evidence bundle current? Are the architectural claims, empirical claims, and roadmap claims kept distinct? Have you written down what your SOR explicitly does not do?

If your SOR fails any of these audits, the gap will eventually surface. Routing systems fail badly when they fail; the cost is borne in basis points, repeatedly, until the gap is closed. The audit is cheap; the failure is not.

Closing thoughts

The Smart Order Router is one of the layers of a trading platform where the gap between marketing claims and engineering reality is widest. Vendors describe their SORs in ways that imply capabilities the implementations do not have, and engineers building competing systems inherit the confused vocabulary as if it were technical specification. Cutting through that confusion is part of why this article exists.

The other part is more constructive. A serious SOR is not a black box. It is a layer whose decisions can be explained, measured, replayed, and improved. The architectural moves that make it explainable are not specific to NitroJEx: separation from execution algorithms and trading strategies, the same plugin contract as other execution strategies, the same hot/cold split, the same determinism, the same evidence discipline. They are the standard moves of institutional-grade trading-system engineering, applied to a layer that often escapes them.

NitroJEx V14 is a working reference for those moves. The SOR is a deterministic, allocation-free, benchmark-gated execution strategy that respects every architectural rule the existing platform imposes. Its scoring is configurable, its allocation is simple by default and sophisticated by option, its reroute logic is invariant-driven, its determinism is replay-tested, its risk integration is unbroken, and its operational runbooks are explicit. The release-evidence bundle is what closes the loop between architectural claim and empirical fact.

What V14 does not do is also explicit. It ships no learned model, no cross-asset routing, no multi-leg parents, no latency arbitrage, and no production venues beyond Coinbase. The boundaries are a feature, not a limitation; an SOR that does fewer things well is more honest than one that does many things poorly.

The structural lesson, drawn from this article and the previous one, is consistent: architectural quality and evidence discipline are coupled, and both compound. A platform that gets the architecture right earns the right to make strong claims; a platform that maintains evidence discipline earns the right to keep making them as the system evolves. Each release builds on the last, with the boundaries of what is proven moving outward at the rate the engineering can sustain. There is no shortcut, and no version of this work that you can do at half the rigor for half the result.

NitroJEx is open source and under active development. The full source, specifications (including the V14 master spec and the migration document from the previous phase), implementation plans, release-evidence bundles, and platform code are at github.com/rueishi/nitroj-exchange. Engineers working on similar problems are welcome to engage through issues, discussions, or contributions. The project is most useful as a public artifact that documents how the patterns actually fit together; the more eyes on the architecture, the better the artifact gets.

Migrating to V14. If your strategy already runs on ImmediateLimitExecution or PostOnlyQuoteExecution, moving to SmartOrderRoutingExecution is configuration-only — no code changes in the trading strategy. The migration document in the repository walks through the configuration profile, the new pairing-validation rules, the additional evidence-bundle artifacts that gate the V14 release, and the V14-specific runbooks. The default scoring weights and allocation mode are tuned for the parent shapes the existing strategies emit, so the typical migration is a one-line configuration change followed by a fresh evidence bundle run through the preflight gate.

Get the structural choices right early, document them honestly, and prove them with evidence. The Smart Order Router is one more layer where that discipline pays off, and one more layer that fails predictably when it doesn’t.