30x to 70x faster than mitmproxy/mitmdump, 4x faster than Squid

I was recently contacted by a user asking about the performance overhead when using impersonation rules in Fluxzy. Since I had never conducted precise benchmarks before, I decided to set something up in a way that would allow for reproducible benchmarks across various configurations.

The idea was straightforward: measure basic performance indicators such as the number of requests per second and bandwidth usage, without all the rocket science typically found in benchmarks.

So, I decided to take it a step further and compare Fluxzy CLI with similar tools.

As a reminder, Fluxzy CLI is an open-source command-line application that acts as an HTTP intermediary, allowing various types of manipulation and recording of HTTP(S) traffic.

Since most MITM tools are either closed-source, paid, or restricted by non-comparison clauses, this test will focus only on comparing Fluxzy CLI with mitmproxy/mitmdump — their console-based counterparts optimized for quick traffic dumping. Also, to provide a reference point, the benchmark will be executed under two additional configurations: one without a proxy and another using a well-known proxy, Squid, configured with caching disabled (only with plain text).

Setting Up the Benchmark

To carry out this experiment, we will need an HTTP client capable of performing benchmark tests and an HTTP server that is fast enough to handle the workload. The tests should be executed locally to avoid any influence from network-related factors.

The setup for this test is documented in the following repository: https://github.com/haga-rak/floody and follows this simple schema:

+---------+       +-----------+       +-----------+
|  floody |------>|   Proxy   |------>|  floodys  |
| (Client)|       | (to test) |       | (Server)  |
+---------+       +-----------+       +-----------+

Client

After reviewing dozen HTTP stress-testing tools available online, I realized that none of the regularly maintained, cross-platform tools support proxy integration. This includes big names like wrk and k6.

Given that the performance requirements are relatively moderate compared to real reverse proxies, I decided to create a trivial wrapper around .NET's HttpClient to generate HTTP requests. You can find this implementation it here: https://github.com/haga-rak/floody/tree/main/src/floody.

The input parameters are pretty simple: warm-up and test duration, concurrent connections, payload size, proxy option, extra header, ...

Server

For those unfamiliar with the .NET ecosystem, Kestrel is a cross-platform web server for ASP.NET Core. It stands out as an unrivaled web server due to its speed, efficiency, and flexibility. Given Kestrel's reputation as an exceptionally fast HTTP server, I used it to set up a lightweight endpoint. This endpoint simply returns a response of a size specified by the client, making it enough for benchmarking purposes. By default, the server listens on both HTTP and HTTPS to allow testing every scenario. This implement can attain easily more than 300K request/seconds with 128 connections on a workstation rig and with tools like bombardier and wrk. When paired with the client, it reaches 220K request/seconds with TLS on and 16 connections.

The Tests

The test is conducted under a limit number of connections to ensure that clients and server resources usages does not affect the results. Depending on the testing computer test, values up to 128 connections can be used without any issues.

Reproducing the Tests

Here are quick steps to reproduce the tests:

  1. Download and start fluxzy CLI
fluxzy start -k

-k flag disables TLS verification. Use --max-upstream-connection to increase the number of connections, which is set to 16 by default.

  1. Start mitmdump
mitmdump -k -q

-k flag disables TLS verification and -q suppresses stdout logs that could significantly slow down the proxy.

  1. (optional) Install and start Squid with configuration cache deny all and cache_dir null /tmp

  2. Clone the repository floody

git clone https://github.com/haga-rak/floody
  1. Run the benchmark
./build.sh "compare:3128 44344 8080"

Port number 3128 is the Squid port, 44344 is the Fluxzy port, and 8080 is the mitmdump port. Of course, you can change these values to match your setup.

Results

The tests take into account the following configurations:

  • Active MITM, the proxy decode the TLS request and send it forward to the other peer.
  • 16 concurrent connections
  • Plain HTTP/HTTP/1.1 and H2/TLS
  • Response body size of No response body and 8192 bytes response body

PLAIN - No response body - 15s

Total Success Fail req/s Bandwidth
No proxy 4035035 4035031 0 269002.0 33.86 MB/s
squid 389874 389874 0 25991.6 5.4 MB/s
fluxzy 1525442 1525442 0 101696.1 12.8 MB/s
mitmproxy/mitmdump 22064 22064 0 1470.9 189.61 KB/s
Diff. MITM 69 times 69 times / 69 times 69 times

Fedora Linux 41 (Workstation Edition) **AMD Ryzen 9 7950X3D 16-Core Processor

TLS - No response body - 15s

Total Success Fail req/s Bandwidth
No proxy 3317020 3317020 0 221134.667 27.84 MB/s
fluxzy 852732 852732 0 56848.800 15.51 MB/s
mitmproxy/mitmdump 20994 20994 0 1399.600 392.9 KB/s
Diff. MITM 40 times 40 times / 40 times 40 times

Fedora Linux 41 (Workstation Edition) **AMD Ryzen 9 7950X3D 16-Core Processor

PLAIN - 8192 bytes response body - 15s

Total Success Fail req/s Bandwidth
No proxy 2669850 2669850 0 177990.000 1.38 GB/s
squid 228279 228279 0 15218.600 122.49 MB/s
fluxzy 930238 930238 0 62015.867 493.61 MB/s
mitmproxy/mitmdump 20860 20860 0 1390.667 11.07 MB/s
Diff. MITM 44 times 44 times / 44 times 44 times

Fedora Linux 41 (Workstation Edition) **AMD Ryzen 9 7950X3D 16-Core Processor

TLS - 8192 bytes response body - 15s

Total Success Fail req/s Bandwidth
No proxy 1822330 1822330 0 121488.667 966.97 MB/s
fluxzy 532140 532134 0 35475.600 568.46 MB/s
mitmproxy/mitmdump 18784 18784 0 1252.267 20.02 MB/s
Diff. MITM 28.329 times 28.329 times / 28.329 times 28.401 times

Fedora Linux 41 (Workstation Edition) **AMD Ryzen 9 7950X3D 16-Core Processor

Breaking down the Performance Gap

This benchmark primarily measures I/O operations combined with TLS processing, examining how efficiently data is received, processed, and returned. The goal is to evaluate each tool's performance under similar conditions.

The performance gap between mitmproxy and Fluxzy can likely be partially attributed to their underlying platforms. mitmproxy relies on Python, which can exhibit slower performance characteristics in high-throughput scenarios. Fluxzy is built on .NET 8.0, which benefits from recent performance optimizations, particularly in garbage collection and memory management.

Fluxzy also incorporates several design choices aimed at maximizing efficiency:

  • Single Buffer Usage: A single buffer processes client requests, reducing memory overhead and streamlining data handling.
  • Always-On Streaming Mode: Built-in actions in Fluxzy maintain an active streaming approach that does not store an entire response in user-space memory if it exceeds the buffer size. This keeps memory usage consistent, even for large payloads.
  • Stack Manipulation Techniques: Utilizing features such as stackalloc and Span minimizes heap allocations for synchronous operations with moderate memory requirements, a common scenario in HTTP intermediary services.
  • Predefined Configuration Rules: Unlike dynamic scripting (as seen in mitmproxy), Fluxzy employs predefined rules mapped to compiled code. Testing showed that adding a response header in this manner has negligible impact on performance.

Finally, TLS implementation does not appear to be a decisive factor in the performance difference. Both tools use OpenSSL by default on Linux, offering native TLS support that operates independently of the application layer.

Final words

Are theses results important? Probably not. Mitmproxy/mitmdump is fast enough for most use cases. It’s a tool that has been around for over a decade, benefiting from extensive user feedback and a large, active community. The fact that it’s built in Python—a very accessible language—makes it incredibly flexible to use and to extend. And in fact, I'm still secretly a mitmproxy enjoyer.

As for Fluxzy, while it already offers extensive capabilities for traffic manipulation (40+ possibles actions for now), most users who integrate Fluxzy into their tools use it to either collect synthetic monitoring data with minimal overhead or to implement enterprise-level WAFs with advanced rules.

This simple benchmark session, of course, wasn’t conducted under perfect conditions or following strict scientific methods. It was simply designed to give a rough overview of performance differences.


Published at Wednesday, 22 January 2025