Apache Log4cxx
Version 1.6.0
|
One important question with a logging library is: is it fast enough? While Log4cxx may not be the fastest logging implementation, it is more than fast enough for the vast majority of cases.
While Log4cxx can generate 2,000,000 log messages/second, skipping a disabled logging request requires only a few nano-seconds, so application performance is not affected when logging requests are not removed from the release build.
Benchmark testing shows MessageBuffer (i.e. std::stringstream) filling consumes the majority of CPU time when the logging request is not disabled. For the best performance, use the LOG4CXX_[level]_FMT series of macros, as the {fmt} library they use is significantly faster (up to twice as fast as operator<<). Note that you must include the headers from {fmt} manually.
These two pieces of logging code are logically equivalent(printing out the same values), however the one utilizing fmt is close to 2x as fast on some systems.
If you wish to benchmark Log4cxx on your own system, have a look at the tools under the src/test/cpp/throughput and src/test/cpp/benchmark directories. The throughput tests may be built by specifying BUILD_THROUGHPUT with CMake when building Log4cxx. The benckmark tests require Google's Benchmark library and may be built by specifying BUILD_BENCHMARK_CHECKS with CMake when building Log4cxx.
The following table shows some timing tests using Google's benchmarking library. The "Time" column shows the average elapsed time, i.e real_accumulated_time / iteration_count. The "CPU" column is also an average, the cpu_accumulated_time / iteration_count. The "Iterations" column derivation is explained in Google Benchmark documentation.
g++ (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 Run on (8 X 2328.61 MHz CPU s) CPU Caches: L1 Data 32 KiB (x4) L1 Instruction 32 KiB (x4) L2 Unified 256 KiB (x4) L3 Unified 6144 KiB (x1) Load Average: 0.07, 0.03, 0.01
Benchmark | Time | CPU | Iterations |
---|---|---|---|
Testing disabled logging request | 0.472 ns | 0.472 ns | 1000000000 |
Testing disabled logging request/threads:6 | 0.128 ns | 0.766 ns | 816161856 |
Appending 5 char string using MessageBuffer, pattern: %m%n | 334 ns | 334 ns | 2094794 |
Appending 5 char string using MessageBuffer, pattern: %m%n/threads:6 | 434 ns | 2437 ns | 281586 |
Appending 49 char string using MessageBuffer, pattern: %m%n | 370 ns | 370 ns | 1886606 |
Appending 49 char string using MessageBuffer, pattern: %m%n/threads:6 | 499 ns | 2736 ns | 228720 |
Appending int value using MessageBuffer, pattern: %m%n | 509 ns | 509 ns | 1361391 |
Appending int value using MessageBuffer, pattern: %m%n/threads:6 | 495 ns | 2776 ns | 251646 |
Appending int+float using MessageBuffer, pattern: %m%n | 911 ns | 911 ns | 768870 |
Appending int+float using MessageBuffer, pattern: %m%n/threads:6 | 581 ns | 3370 ns | 203220 |
Appending int+10float using MessageBuffer, pattern: %m%n | 4579 ns | 4567 ns | 151798 |
Appending int+10float using MessageBuffer, pattern: %m%n/threads:6 | 1694 ns | 10092 ns | 65982 |
Appending int value using MessageBuffer, pattern: [%d] %m%n | 548 ns | 548 ns | 1276999 |
Appending int value using MessageBuffer, pattern: [%d] [%c] [%p] %m%n | 600 ns | 600 ns | 1156877 |
Appending 49 char string using FMT, pattern: %m%n | 346 ns | 346 ns | 2021907 |
Appending 49 char string using FMT, pattern: %m%n/threads:6 | 488 ns | 2672 ns | 257544 |
Appending int value using FMT, pattern: %m%n | 376 ns | 376 ns | 1863727 |
Appending int value using FMT, pattern: %m%n/threads:6 | 486 ns | 2674 ns | 258096 |
Appending int+float using FMT, pattern: %m%n | 508 ns | 508 ns | 1371269 |
Appending int+float using FMT, pattern: %m%n/threads:6 | 537 ns | 3036 ns | 212844 |
Appending int+10float using FMT, pattern: %m%n | 1671 ns | 1671 ns | 417402 |
Appending int+10float using FMT, pattern: %m%n/threads:6 | 1275 ns | 7297 ns | 96222 |
Async, Sending int+10float using FMT | 2190 ns | 2190 ns | 320109 |
Async, Sending int+10float using FMT/threads:6 | 1363 ns | 7862 ns | 84306 |
Async, Sending int+10float using AsyncBuffer, pattern: %m%n | 1226 ns | 1226 ns | 571351 |
Async, Sending int+10float using AsyncBuffer, pattern: %m%n/threads:6 | 1398 ns | 7902 ns | 89688 |
Logging int+float using MessageBuffer, pattern: %d %m%n | 1073 ns | 1073 ns | 656652 |
Logging int+float using MessageBuffer, pattern: %d %m%n/threads:6 | 1083 ns | 4895 ns | 142776 |
Logging int+float using MessageBuffer, JSON | 1394 ns | 1394 ns | 507493 |
Logging int+float using MessageBuffer, JSON/threads:6 | 2110 ns | 6827 ns | 104646 |
Multiprocess logging int+float using MessageBuffer, pattern: %d %m%n | 3253 ns | 3253 ns | 214839 |
The above table shows that the overhead of an enabled logging request varies greatly with the message content. A single operations-per-second number is not meaningful. Most importantly note that using buffered output reduces overhead more than any other detail.
Note also that logging from multiple threads concurrently to a common appender generally does not increase throughput due to lock contention in doAppend method. To simplify the work of an appender implementator, the doAppend method currently prevents multiple threads concurrently entering the append method, which is the method required to be implemented by a concrete appender class.
The AsyncAppender provides the least overhead when logging concurrently from multiple threads as it overrides the doAppend method and uses std::atomic counters and a ring buffer to store logging events. A single background thread is used to extract the logging events from the ring bufffer and send them to the attached appenders. This moves the overhead of the layout method and the blocking transfer of message data to the operating system from the calling thread to the background thread.
When logging floating point values from a high priority thread, and you cannot use a background thread to format and write the log data, the LOG4CXX_[level]_FMT series of macros impose the least overhead.