Strings Are Evil
Originally published by Indy Singh on June 5th 2018 Reducing memory allocations from 7.5GB to 32KB Contents Context of the problem Establishing a baseline Easy win 1 Easy win 2 Splits are never cool Lists are not always nice Pooling byte arrays Goodbye StringBuilder Skipping commas The war between classes and structs Goodbye StreamReader TLDR — Give me a table Context of the problem Codeweavers is a financial services software company, part of what we do is to enable our customers to bulk import their data into our platform. For our services we require up-to-date information from all our clients, which includes lenders and manufacturers across the UK. Each of those imports can contain several hundred megabytes uncompressed data, which will often be imported on a daily basis. This data is then used to power our real-time calculations. Currently this import process has to take place outside of business hours because of the impact it has on memory usage. In this article we will explore potential optimisations to the import process specifically within the context of reducing memory during the import process. If you want to have a go yourself, you can use this code to generate a sample input file and you can find all of the code talked about here. Establishing a baseline The current implementation uses StreamReader and passes each line to the lineParser. The most naive implementation of a line parser that we originally had looked something like this:- The ValueHolder class is used later on in the import process to insert information into the database:- Running this example as a command line application and enabling monitoring:- Our main goal today is to reduce allocated memory. In short, the less memory we allocate, the less work the garbage collector has to do. There are three generations that garbage collector operates against, we will also be monitoring those. Garbage collection is a complex topic and outside of the scope of this article; but a good rule of thumb is that short-lived objects should never be promoted past generation 0. We can see V01 has the following statistics:- Took: 8,750 ms Allocated: 7,412,303 kb Peak Working Set: 16,720 kb Gen 0 collections: 1809 Gen 1 collections: 0 Gen 2 collections: 0 Almost 7.5 GB of memory allocations to parse a three hundred megabyte file is less than ideal. Now that we have established the baseline, let us find some easy wins… Easy win 1 Eagle-eyed readers will have spotted that we string.Split(‘,’) twice; once in the line parser and again in the constructor of ValueHolder. This is wasteful, we can overload the constructor of ValueHolder to accept a string array and split the line once in the parser. After that simple change the statistics for V02 are now:- Took: 6,922 ms Allocated: 4,288,289 kb Peak Working Set: 16,716 kb Gen 0 collections: 1046 Gen 1 collections: 0 Gen 2 collections: 0 Great! We are down from 7.5GB to 4.2GB. But that is still a lot of memory allocations for processing a three hundred megabyte file. » Read More
Like to keep reading?
This article first appeared on hackernoon.com. If you'd like to keep reading, follow the white rabbit.