More efficient string concatenation with string.Create()
Strings are immutable in C#, this is a common knowledge. Unless you use pointers in unsafe blocks, that is. Apparently, there is another way of making strings mutable.
When I discovered string.Create()
, a not-so-new but for some reason overlooked (by me at least!) method that was added since .Net Core 2.2 and Netstandard 2.1, I was really curious how it works (and how well it works too), so I looked at relevant .Net Core source code.
1 | public static string Create<TState>(int length, TState state, SpanAction<char, TState> action) |
This means that the Span<char>
that the action
receives is a glorified pointer to raw memory of newly allocated string. So far, a similar effect could have been achieved by using pointers to strings in an unsafe
block.
Naturally, I decided to test how fast would be such approach. I decided to test the performance of string concatenation, if implemented using string.Create()
First, I implemented string.Join()
using pointers in an unsafe
method - using pointers allows to simply to modify the resulting string in place, without new allocations.
1 | public unsafe static string JoinWithPtrs(IReadOnlyList<string> strings, string separator = null) |
Then I implemented string.Join()
using the newer string.Create()
method.
1 | public static string JoinStringCreate(IReadOnlyList<string> strings, string separator = null) |
Here is the test I came up with.
1 | [ ] |
Just to be thorough, I added to the test the ZString library, an awesome library by the creator of equally awesome Utf8Json. According to the code I saw in ZString repo, it should perform minimum allocations on managed heap (if any!)
The tests results were a bit surprising.
Method | ListSize | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
StringCreateJoin | 3 | 1.436 μs | 0.0589 μs | 0.1511 μs | 1.350 μs | - | - | - | 48 B |
StringPtrJoin | 3 | 1.327 μs | 0.1638 μs | 0.4428 μs | 1.100 μs | - | - | - | 48 B |
RegularStringJoin | 3 | 2.631 μs | 0.2790 μs | 0.7731 μs | 2.300 μs | - | - | - | 88 B |
StringConcatenation | 3 | 1.196 μs | 0.0576 μs | 0.1497 μs | 1.200 μs | - | - | - | 160 B |
ZStringJoin | 3 | 2.619 μs | 0.2829 μs | 0.8118 μs | 2.200 μs | - | - | - | 48 B |
StringBuilderJoin | 3 | 1.154 μs | 0.0674 μs | 0.1728 μs | 1.050 μs | - | - | - | 152 B |
StringCreateJoin | 25 | 2.224 μs | 0.2860 μs | 0.8206 μs | 1.700 μs | - | - | - | 224 B |
StringPtrJoin | 25 | 2.277 μs | 0.3194 μs | 0.9366 μs | 1.700 μs | - | - | - | 224 B |
RegularStringJoin | 25 | 3.296 μs | 0.3298 μs | 0.9302 μs | 2.800 μs | - | - | - | 264 B |
StringConcatenation | 25 | 2.529 μs | 0.0487 μs | 0.0891 μs | 2.500 μs | - | - | - | 6144 B |
ZStringJoin | 25 | 4.727 μs | 0.3210 μs | 0.8949 μs | 4.300 μs | - | - | - | 224 B |
StringBuilderJoin | 25 | 2.632 μs | 0.3303 μs | 0.9531 μs | 2.050 μs | - | - | - | 768 B |
StringCreateJoin | 100 | 2.875 μs | 0.0579 μs | 0.0452 μs | 2.900 μs | - | - | - | 824 B |
StringPtrJoin | 100 | 4.034 μs | 0.4147 μs | 1.1964 μs | 4.200 μs | - | - | - | 824 B |
RegularStringJoin | 100 | 4.677 μs | 0.0791 μs | 0.1561 μs | 4.700 μs | - | - | - | 2320 B |
StringConcatenation | 100 | 13.320 μs | 0.5394 μs | 1.5036 μs | 12.750 μs | - | - | - | 84744 B |
ZStringJoin | 100 | 12.110 μs | 0.5239 μs | 1.3893 μs | 12.100 μs | - | - | - | 824 B |
StringBuilderJoin | 100 | 4.233 μs | 0.3968 μs | 1.1575 μs | 4.350 μs | - | - | - | 2280 B |
First, I was really surprised to find that .Net’s StringBuilder
does more allocations than I expected, especially in comparison to my implementations of Join()
. Yes, it is consistently more efficient in allocations and runtime than regular string concatenation (with the ‘+’ operator), but since StringBuilder
deals with pointers (as can be seen in the reference source), I expected it to be doing less allocations.
After browsing StringBuilder
reference source, I noticed the following code in its constructor (as can be seen here)
1 | //capacity by default is 8000, if no capacity is specified. |
Interestingly enough, in the Append()
method, if there is not enough capacity of m_ChunkChars
, the StringBuilder
will allocate new buffer char array in ExpandByABlock()
method (which can be seen here).
So, in order to see the best performance out of the StringBuilder
I changed its test method to supply the resulting string length and re-run the test.
Here are the results.
Method | ListSize | Mean | Error | StdDev | Median | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
StringCreateJoin | 3 | 1.505 μs | 0.0962 μs | 0.2501 μs | 1.400 μs | - | - | - | 48 B |
StringPtrJoin | 3 | 1.647 μs | 0.2524 μs | 0.7282 μs | 1.200 μs | - | - | - | 48 B |
RegularStringJoin | 3 | 2.901 μs | 0.3827 μs | 1.0667 μs | 2.400 μs | - | - | - | 88 B |
StringConcatenation | 3 | 1.524 μs | 0.2490 μs | 0.7062 μs | 1.200 μs | - | - | - | 160 B |
ZStringJoin | 3 | 3.125 μs | 0.3665 μs | 1.0515 μs | 2.800 μs | - | - | - | 48 B |
StringBuilderJoin | 3 | 1.743 μs | 0.2516 μs | 0.7219 μs | 1.350 μs | - | - | - | 144 B |
StringCreateJoin | 25 | 2.337 μs | 0.3120 μs | 0.8902 μs | 1.800 μs | - | - | - | 224 B |
StringPtrJoin | 25 | 2.268 μs | 0.3398 μs | 0.9805 μs | 1.700 μs | - | - | - | 224 B |
RegularStringJoin | 25 | 4.023 μs | 0.6376 μs | 1.8293 μs | 3.400 μs | - | - | - | 264 B |
StringConcatenation | 25 | 3.078 μs | 0.3658 μs | 1.0437 μs | 2.600 μs | - | - | - | 6144 B |
ZStringJoin | 25 | 5.118 μs | 0.4363 μs | 1.2449 μs | 4.650 μs | - | - | - | 224 B |
StringBuilderJoin | 25 | 2.474 μs | 0.3755 μs | 1.0894 μs | 1.800 μs | - | - | - | 496 B |
StringCreateJoin | 100 | 4.506 μs | 0.4507 μs | 1.3217 μs | 4.500 μs | - | - | - | 824 B |
StringPtrJoin | 100 | 3.969 μs | 0.3715 μs | 1.0778 μs | 4.000 μs | - | - | - | 824 B |
RegularStringJoin | 100 | 6.362 μs | 0.4833 μs | 1.3472 μs | 6.100 μs | - | - | - | 2320 B |
StringConcatenation | 100 | 13.838 μs | 0.6251 μs | 1.7425 μs | 13.400 μs | - | - | - | 84744 B |
ZStringJoin | 100 | 10.792 μs | 0.1016 μs | 0.0793 μs | 10.800 μs | - | - | - | 824 B |
StringBuilderJoin | 100 | 3.914 μs | 0.4024 μs | 1.1739 μs | 4.100 μs | - | - | - | 1696 B |
This change did make a difference, especially for large size of the list to concatenate (because if the m_ChunkChars
is initialized to proper length, the StringBuilder
doesn’t need to make additional allocations to grow its buffer)
Plain old string concatenation unsurprisingly makes lots of allocations, but for small amount of strings it is really fast.
ZString.Join
is works slower than anything except plain string concatenation, but its Join()
method is as efficient as my implementations - which is good to know!
And finally, the StringCreateJoin()
allocates the same amount of memory as the approach with pointers, but interestingly, StringCreateJoin()
is a bit slower than StringPtrJoin()
approach.
Note that the kind of code I have in
StringPtrJoin()
andStringCreateJoin()
are micro-optimizations and while occasionally useful, should be used only when profiling shows a hot-spot in code.
Also, StringCreateJoin()
being a bit slower than StringPtrJoin()
shouldn’t be surprsing. In order to support the “magic” of that SpanAction
lambda, the compiler generates the following code. Extra method invocations and in general doing more work than StringPtrJoin()
, makes StringCreateJoin()
slower.
1 | [ ] |
Conclusions
Overall, I can say that this was an interesting excercise in micro-optimizations!
Also, my two attempts at string concatenation are not guaranteed to handle any characters othen than utf8 - .Net’s StringBuilder
uses wstrcpy()
to copy strings around into the buffer, which properly handles wide characters (reference to the source code here)
Since I use sizeof(char)
, I believe that my code should work for UTF16 characters as well, but I haven’t tested it. If I were to optimize string concatenation hotspot, I would probably use the implementation in JoinWithPtrs()
as it has shown consistently better performance both with low count of strings to concatenate and high amount.