An efficient circular queue of strings
I’ve been interested in all-things HTTP/2 for a while now and I’ve meaning to write my own implementation of the whole protocol. One of the first steps is to implement the header compressor, so we can go from a blob of bytes to the headers in plain text that we all know.
This article is about one of the parts needed to build the compressor: a queue that can store strings efficiently.
Queue Requirements
The requirements for this queue are quite simple: it has to limit the length of all strings put together and it has to limit the number of strings. But it must also be able to be reused efficiently.
My first approach to make it work was to just use a queue that contains references to individual strings, since that’s built-in in my language of choice. However the only way to make this efficient is to use something similar to an arena in C++. But even then, it wouldn’t be this efficient, more on this later.
I did some research but I didn’t have any luck finding an algorithm that would meet all the requirements. Thankfully, the implementation is pretty straightforward.
The Queue
This code is written in Nim, but it should be easy to read for those coming from any procedural language.
There are two important things here: a single string of substrings and a dynamic array that keeps track of the boundaries of each substring. Note that this queue can be easily reused by simply resetting the counters.
To add an element, we first remove elements from the tail/end of the queue until there’s enough space to store the new substring. The tail may be in any index of the vector and it moves in a circular way when removing the elements. Then, the head/start of the queue is increased to point to the new element. By doing modulo we wrap-around when the actual end of the vector is reached. Lastly, the substring is stored pretty much the same way as its boundaries.
Note that since this is a circular queue the tail/head terms are reversed from a regular queue. This is because the used portion of it resembles a snake going around in a circle and so the head is the end at which elements are added, or so I’ve read…
To remove an element from the end of the queue, we take the element at the tail and then the tail is incremented, moving the tail to the previous element.
Optimizations
Here there are some optimizations:
- Use arrays to avoid allocations, if the queue is small enough.
- Make the string and boundaries size a power of two and do
x & (y-1)
instead of modulo. - Use
memcopy
to store the substrings.
Benchmarks
@IJzerbaard asked me to add some benchmarks. So, I’ve made some for the add
function which is the most expensive one. Results on my laptop, YMMV:
============================================================================
GlobalBenchmark relative time/iter iters/s
============================================================================
GlobalBenchmark 334.15ps 2.99G
============================================================================
crap.nim relative time/iter iters/s
============================================================================
addWithModulo 1.38ms 722.64
addWithAnd 1652.41% 83.75us 11.94K
addWithMemCopy 36495.54% 3.79us 263.73K
simpleStrCopy 3.39us 294.82K
strMemCopy 270.54% 1.25us 797.60K
noOp 0.00fs inf
The bitwise AND
solution is around ~16x faster than the modulo solution. Memcopy solution is around ~365x faster than the modulo one, also it’s ~3x slower than doing a memcopy and nothing else.
The slowest solution was about as fast as the regular queue of string references I used at first, so the fastest solution is an improvement.
Conclusion
Compared to a regular queue, this one can be preallocated at the start of the program and be efficiently reused over and over again, it potentially avoids cache misses, and it avoids resorting to something like arenas.