Sean Cunningham & Mark Thomas
Today, Mandiant™ is making available a highly efficient reverse HTTP(S) proxy called simply 'RProxy™'. We are releasing RProxy as an open sources tool to encourage the general community to participate in its evolution. You can download the tool here.
At Mandiant we avoid trying to re-invent the wheel when it comes to software;so why create another reverse-proxy?
Many of the wonderful open-source proxies that exist today are tailored to the average GET <-> RESPONSE traffic types. For each request, they may spawn a new thread; create a new connection to the back-end, or both. Many of the projects we analyzed could not handle large streams of data efficiently since they would block until the full client request has been received (hey, where did my memory go?). Resource exhaustion was a common issue under high load, such as memory, file descriptors, CPU, etc. These existing tools are designed perfectly for common traffic flows, but can quickly capsize under pressure.
Mandiant had a requirement for a proxy that could scale to thousands of simultaneous SSL connections, with certificate verification, and various caching methods, all while maintaining a low system resource footprint. After testing the various popular and well maintained open-source proxy projects, we could not find one that met our specific needs. That's why we decided to roll out our own.
The RProxy architecture uses a mix of threading and event-driven methods of handling requests. At startup, a configured number of threads are spawned, each with their own event loop. Each of these threads will make a configured number of persistent connections to the configured back-end servers.
We leverage HTTP 1.1 to keep these connections open so that each incoming request from a client does not force RProxy to establish a new connection to the back-end. This results in each request being assigned a pre-existing connection to a back-end (even if the client is using HTTP 1.0, or HTTP 1.1 with keep-alive disabled). This technique is known as pipelining, a feature which most proxies avoid due to the complexity of maintaining request states.
We solved this by creating the three states a back-end connection can have:
- IDLE: The connection is up and is able to be used to service a new request.
- ACTIVE: The connection is being used to service another request.
- DOWN: The connection is down, pending a reconnect.
When a new request is made, it is placed into a pending queue. This pending queue is processed whenever a back-end's state transitions to IDLE. The request is then associated with that IDLE connection and its state is changed to ACTIVE.
There are many configuration options that affect how requests in a pending state are handled so that resource consumption does not become an issue under high load.
The RProxy source code has a detailed and up-to-date configuration guide, but some of the main features that stand out are:
- Various methods of load-balancing requests to a back-end
- Transparent URI rewriting
- The ability to append X-Header fields to the request being made to the back-end, including dynamic additions of extended TLS fields
- Configurable thresholding and backlogging for both front-end and back-end IO.
- A flexible logging system
- Full SSL support (via OpenSSL)
- TLS False start
- x509 verification
- Certificate caching
- Session caching
- All other commonly used SSL options
As mentioned, it is best to read the documentation to get a detailed understanding of the many aspects of the system.
RProxy was built on top of several well maintained open-source libraries such as Libevent, Libconfuse, Libevhtp, and OpenSSL. While in the process of writing RProxy, many of the above libraries needed fixes and patches. We would like to thank the maintainers of these projects for their willingness to help and accept our changes (A special thanks to Nick Mathewson, maintainer of Libevent, whom we harassed the most). It is suggested that the most recent versions of the above libraries are used for optimal performance.
RProxy was primarily tested on various *NIX platforms, however most of the performance tweaks were targeting Linux. We used an Intel i7 quad-core processor, with a generic 1Gbethernet adapter running the latest version of CentOS for testing. Our SSL keys were 2048 bits, with client certificate validation enabled. With neither host or client based (RFC5077) caching, RProxy was able to handle on the order of 2000 full SSL transactions per-second. If one of the above cache methods were enabled, our testing demonstrated RProxy was able to handle over 6600 SSL transactions per-second.
Large data flow tests showed that RProxy was able to run at 1 gigabit line-rate (or as close as you can expect once the data has reached user-land).
We continue to add functionality to the software; virtual server support is currently in development, as well as support for internal redirection.