30x to 70x faster than mitmproxy/mitmdump, 4x faster than Squid

I was recently contacted by a user asking about the performance overhead when using impersonation rules in Fluxzy. Since I had never conducted precise benchmarks before, I decided to set something up in a way that would allow for reproducible benchmarks across various configurations.

The idea was straightforward: measure basic performance indicators such as the number of requests per second and bandwidth usage, without all the rocket science typically found in benchmarks.

So, I decided to take it a step further and compare Fluxzy CLI with similar tools.

As a reminder, Fluxzy CLI is an open-source command-line application that acts as an HTTP intermediary, allowing various types of manipulation and recording of HTTP(S) traffic.

Since most MITM tools are either closed-source, paid, or restricted by non-comparison clauses, this test will focus only on comparing Fluxzy CLI with mitmproxy/mitmdump — their console-based counterparts optimized for quick traffic dumping. Also, to provide a reference point, the benchmark will be executed under two additional configurations: one without a proxy and another using a well-known proxy, Squid, configured with caching disabled (only with plain text).

Setting Up the Benchmark

To carry out this experiment, we will need an HTTP client capable of performing benchmark tests and an HTTP server that is fast enough to handle the workload. The tests should be executed locally to avoid any influence from network-related factors.

The setup for this test is documented in the following repository: https://github.com/haga-rak/floody and follows this simple schema:

+---------+       +-----------+       +-----------+
|  floody |------>|   Proxy   |------>|  floodys  |
| (Client)|       | (to test) |       | (Server)  |
+---------+       +-----------+       +-----------+

Client

After reviewing dozen HTTP stress-testing tools available online, I realized that none of the regularly maintained, cross-platform tools support proxy integration. This includes big names like wrk and k6.

Given that the performance requirements are relatively moderate compared to real reverse proxies, I decided to create a trivial wrapper around .NET's HttpClient to generate HTTP requests. You can find this implementation it here: https://github.com/haga-rak/floody/tree/main/src/floody.

The input parameters are pretty simple: warm-up and test duration, concurrent connections, payload size, proxy option, extra header, ...

Server

For those unfamiliar with the .NET ecosystem, Kestrel is a cross-platform web server for ASP.NET Core. It stands out as an unrivaled web server due to its speed, efficiency, and flexibility. Given Kestrel's reputation as an exceptionally fast HTTP server, I used it to set up a lightweight endpoint. This endpoint simply returns a response of a size specified by the client, making it enough for benchmarking purposes. By default, the server listens on both HTTP and HTTPS to allow testing every scenario. This implement can attain easily more than 300K request/seconds with 128 connections on a workstation rig and with tools like bombardier and wrk. When paired with the client, it reaches 220K request/seconds with TLS on and 16 connections.

The Tests

The test is conducted under a limit number of connections to ensure that clients and server resources usages does not affect the results. Depending on the testing computer test, values up to 128 connections can be used without any issues.

Reproducing the Tests

Here are quick steps to reproduce the tests:

Download and start fluxzy CLI

fluxzy start -k

-k flag disables TLS verification. Use --max-upstream-connection to increase the number of connections, which is set to 16 by default.

Start mitmdump

mitmdump -k -q

-k flag disables TLS verification and -q suppresses stdout logs that could significantly slow down the proxy.

(optional) Install and start Squid with configuration cache deny all and cache_dir null /tmp
Clone the repository floody

git clone https://github.com/haga-rak/floody

Run the benchmark

./build.sh "compare:3128 44344 8080"

Port number 3128 is the Squid port, 44344 is the Fluxzy port, and 8080 is the mitmdump port. Of course, you can change these values to match your setup.

Results

The tests take into account the following configurations:

Active MITM, the proxy decode the TLS request and send it forward to the other peer.
16 concurrent connections
Plain HTTP/HTTP/1.1 and H2/TLS
Response body size of No response body and 8192 bytes response body

PLAIN - No response body - 15s

	Total	Success	Fail	req/s	Bandwidth
No proxy	4035035	4035031	0	269002.0	33.86 MB/s
squid	389874	389874	0	25991.6	5.4 MB/s
fluxzy	1525442	1525442	0	101696.1	12.8 MB/s
mitmproxy/mitmdump	22064	22064	0	1470.9	189.61 KB/s
Diff. MITM	69 times	69 times	/	69 times	69 times

_{Fedora Linux 41 (Workstation Edition) **AMD Ryzen 9 7950X3D 16-Core Processor}

TLS - No response body - 15s

	Total	Success	Fail	req/s	Bandwidth
No proxy	3317020	3317020	0	221134.667	27.84 MB/s
fluxzy	852732	852732	0	56848.800	15.51 MB/s
mitmproxy/mitmdump	20994	20994	0	1399.600	392.9 KB/s
Diff. MITM	40 times	40 times	/	40 times	40 times

_{Fedora Linux 41 (Workstation Edition) **AMD Ryzen 9 7950X3D 16-Core Processor}

PLAIN - 8192 bytes response body - 15s

	Total	Success	Fail	req/s	Bandwidth
No proxy	2669850	2669850	0	177990.000	1.38 GB/s
squid	228279	228279	0	15218.600	122.49 MB/s
fluxzy	930238	930238	0	62015.867	493.61 MB/s
mitmproxy/mitmdump	20860	20860	0	1390.667	11.07 MB/s
Diff. MITM	44 times	44 times	/	44 times	44 times

_{Fedora Linux 41 (Workstation Edition) **AMD Ryzen 9 7950X3D 16-Core Processor}

TLS - 8192 bytes response body - 15s

	Total	Success	Fail	req/s	Bandwidth
No proxy	1822330	1822330	0	121488.667	966.97 MB/s
fluxzy	532140	532134	0	35475.600	568.46 MB/s
mitmproxy/mitmdump	18784	18784	0	1252.267	20.02 MB/s
Diff. MITM	28.329 times	28.329 times	/	28.329 times	28.401 times

_{Fedora Linux 41 (Workstation Edition) **AMD Ryzen 9 7950X3D 16-Core Processor}

Breaking down the Performance Gap

This benchmark primarily measures I/O operations combined with TLS processing, examining how efficiently data is received, processed, and returned. The goal is to evaluate each tool's performance under similar conditions.

The performance gap between mitmproxy and Fluxzy can likely be partially attributed to their underlying platforms. mitmproxy relies on Python, which can exhibit slower performance characteristics in high-throughput scenarios. Fluxzy is built on .NET 8.0, which benefits from recent performance optimizations, particularly in garbage collection and memory management.

Fluxzy also incorporates several design choices aimed at maximizing efficiency:

Single Buffer Usage: A single buffer processes client requests, reducing memory overhead and streamlining data handling.
Always-On Streaming Mode: Built-in actions in Fluxzy maintain an active streaming approach that does not store an entire response in user-space memory if it exceeds the buffer size. This keeps memory usage consistent, even for large payloads.
Stack Manipulation Techniques: Utilizing features such as stackalloc and Span minimizes heap allocations for synchronous operations with moderate memory requirements, a common scenario in HTTP intermediary services.
Predefined Configuration Rules: Unlike dynamic scripting (as seen in mitmproxy), Fluxzy employs predefined rules mapped to compiled code. Testing showed that adding a response header in this manner has negligible impact on performance.

Finally, TLS implementation does not appear to be a decisive factor in the performance difference. Both tools use OpenSSL by default on Linux, offering native TLS support that operates independently of the application layer.

Final words

Are theses results important? Probably not. Mitmproxy/mitmdump is fast enough for most use cases. It’s a tool that has been around for over a decade, benefiting from extensive user feedback and a large, active community. The fact that it’s built in Python—a very accessible language—makes it incredibly flexible to use and to extend. And in fact, I'm still secretly a mitmproxy enjoyer.

As for Fluxzy, while it already offers extensive capabilities for traffic manipulation (40+ possibles actions for now), most users who integrate Fluxzy into their tools use it to either collect synthetic monitoring data with minimal overhead or to implement enterprise-level WAFs with advanced rules.

This simple benchmark session, of course, wasn’t conducted under perfect conditions or following strict scientific methods. It was simply designed to give a rough overview of performance differences.

Published at Wednesday, 22 January 2025