<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Akin Ocal]]></title><description><![CDATA[C++ developer specialising in low latency electronic trading systems : https://github.com/akhin]]></description><link>https://akinocal1.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!3uH8!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fakinocal1.substack.com%2Fimg%2Fsubstack.png</url><title>Akin Ocal</title><link>https://akinocal1.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 11 Apr 2026 03:14:55 GMT</lastBuildDate><atom:link href="https://akinocal1.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Akin Ocal]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[akinocal1@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[akinocal1@substack.com]]></itunes:email><itunes:name><![CDATA[Akin Ocal]]></itunes:name></itunes:owner><itunes:author><![CDATA[Akin Ocal]]></itunes:author><googleplay:owner><![CDATA[akinocal1@substack.com]]></googleplay:owner><googleplay:email><![CDATA[akinocal1@substack.com]]></googleplay:email><googleplay:author><![CDATA[Akin Ocal]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How to achieve P90 sub-microsecond latency in a C++ FIX engine]]></title><description><![CDATA[Techniques and benchmarks behind a sub-microsecond P90 implementation]]></description><link>https://akinocal1.substack.com/p/how-to-achieve-p90-sub-microsecond</link><guid isPermaLink="false">https://akinocal1.substack.com/p/how-to-achieve-p90-sub-microsecond</guid><dc:creator><![CDATA[Akin Ocal]]></dc:creator><pubDate>Tue, 07 Apr 2026 11:02:48 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4818f0d0-e6cc-4ee2-8211-6de6071bfadc_728x369.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3 style="text-align: center;"><strong>Introduction</strong></h3><p>In this article, we explore what it takes to push FIX message encoding and send latency below one microsecond at the P90 percentile.</p><p>We achieved 685 nanoseconds P90 for full FIX message encoding, file system persistence, and NIC enqueue on an AMD Solarflare using llfix, a low latency FIX engine. </p><p></p><p>Achieving this level of latency is not purely a software problem. It requires a combination of hardware selection and tuning, operating system level tuning, and careful software design:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JxAB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JxAB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png 424w, https://substackcdn.com/image/fetch/$s_!JxAB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png 848w, https://substackcdn.com/image/fetch/$s_!JxAB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png 1272w, https://substackcdn.com/image/fetch/$s_!JxAB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JxAB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png" width="253" height="295" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14608e93-37d2-45fa-85c2-0e213a279078_253x295.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:295,&quot;width&quot;:253,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14090,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://akinocal1.substack.com/i/193450018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JxAB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png 424w, https://substackcdn.com/image/fetch/$s_!JxAB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png 848w, https://substackcdn.com/image/fetch/$s_!JxAB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png 1272w, https://substackcdn.com/image/fetch/$s_!JxAB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14608e93-37d2-45fa-85c2-0e213a279078_253x295.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The following sections focus primarily on software-level techniques, while the benchmarks section outlines the applied hardware and OS tunings.</p><p></p><p>The benchmarks and techniques presented in this article are based on <em>llfix</em>, a low-latency C++ FIX engine, available as both an open-source edition (<a href="https://github.com/CorewareLtd/llfix">https://github.com/CorewareLtd/llfix</a>) and a commercial edition (<a href="http://www.llfix.net/">www.llfix.net</a>).</p><h3 style="text-align: center;"><strong>Basics of FIX and what we are measuring</strong></h3><p>FIX protocol is a TCP based messaging protocol used in electronic trading. A typical use case is &#8220;order entry&#8221; which is sending orders to venues. A FIX message is simply a group of ASCII encoded tag and value pairs. Those pairs are split by SOH (Ascii 0x1) character.</p><p>As an example, the message we used in the benchmarks (0x1s replaced by pipes) :</p><p></p><p>8=FIXT.1.1|9=218|35=D|34=2|49=CLIENT1|52=20251231-18:21:36.457245600|</p><p>56=EXECUTOR|50=SNDR_SUB|57=SRVR_SUB|11=1|55=NOKIA.HE|54=1|38=10| 44=10000|40=2|59=0|453=2|448=PARTY1|447=D|452=1|448=PARTY2|447=D|452=3| 60=20251231-18:21:36.457245600|10=221|</p><p></p><p>In low-latency trading, exchanges also provide native binary order entry protocols&#8212;such as OUCH, BOE, and ETI&#8212;typically at higher cost. These protocols are not ASCII-encoded but operate on compact binary representations, and their messages are often fixed in size.</p><p></p><p>This provides a structural advantage: fixed-size binary messages are simpler to parse, more cache-friendly, and avoid the overhead of string handling.</p><p></p><p>In contrast, the FIX protocol is string-based and inherently variable in length. As a result, achieving comparable latency with FIX is more challenging, requiring careful optimisation of encoding, parsing, and memory access patterns.</p><p></p><p>As for benchmarks, we will be measuring combined latency of the followings :</p><p><strong>a. FIX message encoding :</strong> In a FIX engine, users interact with the engine by calling methods to set tag-value pairs. Encoding means to compute message body length for tag9 and checksum for tag10 and then concatenate all tag value pairs including equals signs and SOH characters into a contiguous stream.</p><p><strong>b. Message serialisation to file system :</strong> Many financial systems are required by law to keep records of all transactions, including the exact data sent and received.</p><p><strong>c. Enqueuing to NIC :</strong> Refers to the process of placing the encoded message into a network interface card (NIC) queue for transmission over the network. Note that the measurements in this article doesn&#8217;t include wire to wire latencies.</p><p></p><h3 style="text-align: center;"><strong>Benchmarks</strong></h3><p>The benchmark server details are as below :</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/YsLpd/2/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ad6c49aa-0947-4f01-80f9-72c474a32036_1220x594.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0082813f-cb09-4998-850c-d3be9bf482c0_1220x594.png&quot;,&quot;height&quot;:289,&quot;title&quot;:&quot;Created with Datawrapper&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/YsLpd/2/" width="730" height="289" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>As mentioned in the introduction , the major tunings applied are as below :</p><p>- Benchmark threads have been pinned to isolated CPU cores.</p><p>- CPU frequency was maximised and hyperthreading was disabled.</p><p>All benchmarks were conducted by initially sending 1000 FIX messages for warmups and then by sending 1 million FIX messages. Latency measurements are based on CPU timestamp counters using RDTSCP.</p><p>The compiler flags used for all builds are:<br>-<code>DNDEBUG -O3 -march=native -mtune=native</code></p><p>For comparison, we include the well-known open-source FIX engines <strong>QuickFIX</strong> and <strong>FIX8</strong>, all benchmarked under identical conditions.</p><p>All benchmark source code is available at:</p><p><a href="https://github.com/CorewareLtd/llfix/tree/main/benchmarks/networked_client_tx">https://github.com/CorewareLtd/llfix/tree/main/benchmarks/networked_client_tx</a></p><p><a href="https://github.com/CorewareLtd/llfix/tree/main/benchmarks/networked_client_tx_fix8">https://github.com/CorewareLtd/llfix/tree/main/benchmarks/networked_client_tx_fix8</a></p><p><a href="https://github.com/CorewareLtd/llfix/tree/main/benchmarks/networked_client_tx_quickfix">https://github.com/CorewareLtd/llfix/tree/main/benchmarks/networked_client_tx_quickfix</a></p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/41SrV/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/805d1e32-27e3-49f4-bc8f-b9f551c1b492_1220x1488.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/148ec3eb-7e4f-4834-addc-9d641fe6cb23_1220x1488.png&quot;,&quot;height&quot;:772,&quot;title&quot;:&quot;Created with Datawrapper&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/41SrV/1/" width="730" height="772" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><h3 style="text-align: center;"><strong>1. Cache friendly design</strong></h3><p>The <strong>memory wall</strong> is one of the most critical constraints in low-latency systems. It arises when CPU execution speed significantly outpaces the bandwidth and latency of the memory hierarchy :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DE3m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DE3m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png 424w, https://substackcdn.com/image/fetch/$s_!DE3m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png 848w, https://substackcdn.com/image/fetch/$s_!DE3m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png 1272w, https://substackcdn.com/image/fetch/$s_!DE3m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DE3m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png" width="878" height="525" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:525,&quot;width&quot;:878,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59175,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://akinocal1.substack.com/i/193450018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DE3m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png 424w, https://substackcdn.com/image/fetch/$s_!DE3m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png 848w, https://substackcdn.com/image/fetch/$s_!DE3m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png 1272w, https://substackcdn.com/image/fetch/$s_!DE3m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F296a8c40-48cb-497e-82dd-190b00829dd6_878x525.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>(The memory hierarchy diagram is taken from the Microarchitecture cheatsheet <a href="https://github.com/akhin/microarchitecture-cheatsheet">https://github.com/akhin/microarchitecture-cheatsheet</a> )</p><p></p><p>The latency degradation becomes worse as code moves further away from L1 cache residency and starts incurring L2, L3, and ultimately main memory accesses. For this reason, <strong>data layout must be treated as a first-class design concern from the very beginning</strong>, particularly in low-latency trading systems.</p><p></p><p><strong>a. Separation of incoming and outgoing messages :</strong> One of the key design decisions in llfix is separating data structures for <strong>incoming</strong> and <strong>outgoing</strong> FIX messages.</p><p>Incoming messages require flexible, tag-by-tag access, which naturally leads to the use of hash based dictionaries.</p><p>Outgoing messages, however, do not have the same requirements. They are constructed once and encoded sequentially. Therefore, llfix represents outgoing messages using a simple <code>std::vector</code> of fields.</p><p>This avoids pointer chasing and unnecessary indirections, improving cache locality and reducing latency during encoding.</p><p><strong>b. Using pools and reusable message instances per FIX session:</strong> It is important to use contigious memory for FIX message&#8217;s data to avoid indirections and potential memory access penalties.</p><p>Also heap allocations are a well-known source of latency spikes due to allocator slow paths and potential system calls. Beyond that, they also may negatively impact cache locality due to fragmentation and non-contiguous memory layouts.</p><p>To address both issues, llfix uses <strong>preallocated memory pools</strong> for both incoming and outgoing messages. This ensures contiguous memory usage improving cache locality.</p><p>Additionally, each FIX session maintains <strong>a single reusable instance</strong> for both incoming and outgoing messages. This further eliminates allocation overhead and keeps hot data consistently resident in cache.</p><p><strong>c. No internal message queuing:</strong> Message queuing introduces additional memory traversal, pointer indirection, and cache misses. Unlike general-purpose FIX engines, llfix does not introduce internal message queues between the application and the engine.</p><p>Instead, llfix operates in <strong>immediate mode</strong>, where messages are processed and encoded directly upon invocation.</p><p></p><h3 style="text-align: center;"><strong>2. Memory mapped files for message serialisation</strong></h3><p>Most trading systems have the requirement of storing all messages due to regulatory rules in their region. Persisting FIX messages is also critical for automated testing, replay scenarios, and comparing against drop copies for reconciliation and record-keeping.</p><p>As a result, efficient message persistence is a core requirement of any FIX engine.</p><p>For this purpose, llfix uses operating systems&#8217; memory mapped file feature. Memory-mapped I/O maps a file directly into the process&#8217;s virtual address space, allowing writes to be performed as simple memory stores rather than explicit file I/O operations.</p><p>The advantages compared to traditional buffered file IO are :</p><p>- Reduces required data copies between a buffer and virtual memory by mapping the file to our process&#8217;s address space</p><p>- Reduces syscall overhead</p><p>The latency impact of serialising messages via memory-mapped files is shown below:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/LV5kD/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a547017-628b-46f5-902b-63cbb795c1f6_1220x388.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ccf711c5-d0e7-40f5-b287-0799d2e35a91_1220x388.png&quot;,&quot;height&quot;:188,&quot;title&quot;:&quot;Created with Datawrapper&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/LV5kD/1/" width="730" height="188" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>You can find this benchmark&#8217;s source code on : <a href="https://github.com/CorewareLtd/llfix/tree/main/benchmarks/encoder">https://github.com/CorewareLtd/llfix/tree/main/benchmarks/encoder</a></p><p>One important consideration when using memory-mapped I/O is that mappings are managed in fixed-size blocks. This requires preallocating files and designing a rotation strategy. In practice, mapping sizes should be aligned to the system&#8217;s virtual memory page size (typically 4096 bytes).</p><p></p><h3 style="text-align: center;"><strong>3. Kernel bypass &amp; BSD socket interface bypass</strong></h3><p>One of the well known topics in low latency trading is the complexities in Linux traditional network IO :</p><p>- Additional data copies between kernel space and user space</p><p>- Deep callstacks</p><p>- The cost of system calls</p><p></p><p>The diagram below illustrates the lifecycle of an I/O packet under the standard Linux networking stack and highlights these overheads :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dX7u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dX7u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png 424w, https://substackcdn.com/image/fetch/$s_!dX7u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png 848w, https://substackcdn.com/image/fetch/$s_!dX7u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png 1272w, https://substackcdn.com/image/fetch/$s_!dX7u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dX7u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png" width="796" height="674" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:674,&quot;width&quot;:796,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:60533,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://akinocal1.substack.com/i/193450018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dX7u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png 424w, https://substackcdn.com/image/fetch/$s_!dX7u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png 848w, https://substackcdn.com/image/fetch/$s_!dX7u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png 1272w, https://substackcdn.com/image/fetch/$s_!dX7u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F972484d5-63ab-4cef-9aaf-444ea101a835_796x674.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A widely adopted solution is <strong>kernel bypass</strong>, where the traditional kernel networking path is avoided. This is typically achieved through a combination of user-space TCP/IP stacks and specialised NIC hardware APIs.</p><p>Several vendors provide such solutions with their own user-space TCP stacks, including AMD&#8217;s Solarflare and NVIDIA&#8217;s Mellanox. These approaches eliminate kernel involvement in the data path, significantly reducing latency.</p><p>The diagram below illustrates the difference between the traditional kernel-based stack and user-space networking stacks :</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LO8J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LO8J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png 424w, https://substackcdn.com/image/fetch/$s_!LO8J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png 848w, https://substackcdn.com/image/fetch/$s_!LO8J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png 1272w, https://substackcdn.com/image/fetch/$s_!LO8J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LO8J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png" width="466" height="396" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:396,&quot;width&quot;:466,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22010,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://akinocal1.substack.com/i/193450018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LO8J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png 424w, https://substackcdn.com/image/fetch/$s_!LO8J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png 848w, https://substackcdn.com/image/fetch/$s_!LO8J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png 1272w, https://substackcdn.com/image/fetch/$s_!LO8J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbd9ed89-8959-4255-bf9e-838fca32c84f_466x396.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The table below compares latency in the same benchmark using standard Linux BSD sockets versus Solarflare Onload:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/Xrl0i/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/909fecc0-7cdb-4a0d-aeff-8b173670388d_1220x484.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5b963f6-d50c-4f67-a3c6-517906d0ff31_1220x484.png&quot;,&quot;height&quot;:253,&quot;title&quot;:&quot;Created with Datawrapper&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/Xrl0i/1/" width="730" height="253" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>These solutions require no application changes. Libraries such as Onload can transparently override BSD socket calls via <code>LD_PRELOAD</code>, allowing existing applications to benefit from reduced latency without recompilation.</p><p>In addition, vendors often provide proprietary APIs that support reduced features for even lower latency. llfix commercial edition uses Solarflare TCPDirect API to shave latency further. The table below compares Solarflare Onload with TCPDirect:</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/brBlR/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7b3cdd3-95d6-425d-9f30-2e9aaa0e3de7_1220x452.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/efa91cc2-b2e9-411d-8105-db8158ee351a_1220x452.png&quot;,&quot;height&quot;:236,&quot;title&quot;:&quot;Created with Datawrapper&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/brBlR/1/" width="730" height="236" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><h3 style="text-align: center;">4. Others</h3><p><strong>a. Syscall avoidance : </strong>The primary overhead during syscalls comes from the context switch between kernel and user space. Typically, a 0x80 trap is used to switch the CPU into kernel mode. During this switch, the CPU registers must be saved and restored, which adds latency.</p><p>To avoid syscalls, llfix uses the methods below :</p><p>- Using memory pool instead of heap allocations ( as discussed previously )</p><p>- Kernel bypass ( as discussed previously )</p><p>- vDSO functions usage for timestamps</p><p><strong>vDSO</strong> refers to a set of syscalls exposed by recent Linux kernels, allowing userspace programs to access certain system functions without entering kernel mode, thus avoiding the context switch and improving performance.</p><p>You can view the vDSO functionalities in the following link: <a href="https://github.com/CorewareLtd/llfix/blob/main/include/llfix/core/os/vdso.h">https://github.com/CorewareLtd/llfix/blob/main/include/llfix/core/os/vdso.h</a></p><p><strong>b. SIMD : </strong>Using SIMD (Single Instruction, Multiple Data) intrinsics allows parallelism within individual instructions. In llfix, we utilise SIMD AVX2 for checksum computations and validations.</p><p>Below is a benchmark comparison between 1 million iterations of checksum computation with and without SIMD (AVX2):</p><div id="datawrapper-iframe" class="datawrapper-wrap outer" data-attrs="{&quot;url&quot;:&quot;https://datawrapper.dwcdn.net/kAb2y/1/&quot;,&quot;thumbnail_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4af29c7f-bb0d-40ce-af7b-678da0f63e24_1220x546.png&quot;,&quot;thumbnail_url_full&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8809e644-a8a9-4d4f-ae57-69c3b42f39df_1220x546.png&quot;,&quot;height&quot;:267,&quot;title&quot;:&quot;Created with Datawrapper&quot;,&quot;description&quot;:&quot;&quot;}" data-component-name="DatawrapperToDOM"><iframe id="iframe-datawrapper" class="datawrapper-iframe" src="https://datawrapper.dwcdn.net/kAb2y/1/" width="730" height="267" frameborder="0" scrolling="no"></iframe><script type="text/javascript">!function(){"use strict";window.addEventListener("message",(function(e){if(void 0!==e.data["datawrapper-height"]){var t=document.querySelectorAll("iframe");for(var a in e.data["datawrapper-height"])for(var r=0;r<t.length;r++){if(t[r].contentWindow===e.source)t[r].style.height=e.data["datawrapper-height"][a]+"px"}}}))}();</script></div><p>You can explore the SIMD-enabled methods in the following link: <a href="https://github.com/CorewareLtd/llfix/blob/main/include/llfix/fix_utilities.h">https://github.com/CorewareLtd/llfix/blob/main/include/llfix/fix_utilities.h</a></p><p></p><h3 style="text-align: center;">Conclusion</h3><p>Achieving sub-microsecond latency at P90 in a FIX engine is not solely the result of software optimization, but rather the collective impact of several factors, including hardware and OS tuning.</p><p>On the software side, cache locality and the elimination of memory allocations along critical paths are fundamental to minimizing latency by avoiding the high cost of cache misses. Additionally, techniques like memory-mapped files for message serialisation and syscall avoidance further reduce overhead.</p><p>Ultimately, the network stack optimisation plays a pivotal role in achieving these latencies. By employing kernel bypass techniques and bypassing the BSD sockets interface (e.g., using Solarflare TCPDirect), we can eliminate the remaining latency introduced by the OS. These combined efforts make it possible to consistently maintain sub-microsecond latency, even at higher percentiles like P90.</p>]]></content:encoded></item></channel></rss>