Connexion v14 - Loading Many Channels
Summary
The goal of this set of tests was to see how Connexion would perform under extreme load. The test is considered a break test in that we attempted to load more channels and perform more work than what we would expect from a single installation of Connexion.
When we first ran the test, we were unable to get a 100 channels to run in a single process without the system becoming unresponsive. The channels that were able to start, had a jittery performance curve. Further, the memory utilization (over 2GB) and thread counts (over 1200) was much higher than we had anticipated for the given set of channels.
After the changes described at the bottom of this document, we were able to launch more that 400 channels without issue with a smooth performance curve. In version #3 of the code, the memory and thread count were much lower memory (1GB), thread count (70).
Test Scenario
The general idea of the test is to load 400 channels into Connexion and test starting/stopping of all channels and characterize how Connexion behaves under this load. Each of 400 channels is generating HL7 messages, and posting them on the queue, and sending them out via HL7 to an HL7 Sink, as fast as they can.
A total of 3 machines were used for the test.
- App Server - Used for running the channel.
- Database Server - Strictly used as the database server. No other operation performed.
- HL7 Sink - A separate instance of Connexion running 2 HL7 Inbound devices (1 or port 11000, the other on port 11001).
Each of the channels includes the following:
- Custom Code Device - The device when started runs in a loop generating HL7 Messages and posting them on the Queue
- Queue Device - Provides temporary storage for the messages before they are sent to the HL7 Outbound device
- HL7 Outbound Device - Every other channel is configured to point to either 11000, or 11001 on the HL7 Sink machine
Configuration
- Starting with a new system, import the following channels into the Connexion: Connexion v14 - Loading Many Channels (for Version #2), AsyncTest-2014-04-07.cxn (Version #3). This will install 2 channels.
- Enter "Administration Mode" (Control-Alt-Shift-A)
- Right-Click on the tab where the 2 channels were installed and select "Bulk Add Channels"
- Enter 199 (to create a total of 400 channels), and select the location of the file from (1)
- This will take a couple of minutes while all the channels are added to the system.
Test
- Right click the tab where the channels are located and select "Start All Channels"
- Verify that all channels are started
- Measure the time it takes for the last channel in the tab to show "Running"
- Estimate the average speed of all running channels
- Verify that the channels process at roughly the same speed
- Note the number of threads being used (Windows Task Manager)
- Note the amount of memory being used (Windows Task Manager)
- Note the amount of CPU being used (Windows Task Manager)
- Right click the tab and stop the channels
- Measure the time it takes for the last channel in the tab to show "Stopped"
- Right click the tab and start the channels
- Measure the time it takes for the last channel in the tab to show "Running"
- Pause all channels
- Estimate the Queuing speed of all channels
Results
Code Version | Description | Change Set | Number of Channels | Time to Start (First Start) | Time to Stop | Time to Start (After First) | Queue | Queue and | Processing Fairness | CPU | Memory | Threads | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Initial Test | 03049d170d0f | 100 | >60 | - | - | - | - | no | - | - | - | Connexion would restart before all channels could load. |
2 | First set of changes | 3dc6b65a71b8 | 400 | 60 sec | 15 | 15 | +10,000/s | +5000/s Q +3200/s P | yes | 88% | > 2GB | 1251 | Increased number of ThreadPool.SetMinThreads to 400,400 (worker,completion port) |
3 | Async Changes | 9bf1d16c1820 | 400 | 35 sec | 4 sec | 3 sec | +8800/s | +4200/s Q | yes | 92% | 1.5GB | 70 | No ThreadPool changes. Capped 25 simultaneous Queues, and 25 simultaneous result stores. |
4 | Async | 0112d7bc0abe | 1000 | 65 | 9 sec | 13 sec | 8500/s | +4100/s Q | yes | 88 | 2.2GB | 72 |
Observations/Conclusions
- In version 1, it was impossible to get data because the system failed to run.
- Version #2 actually had over-all better throughput performance, but poorer start/stop times, more threads required and thus more memory, both virtual and real.
- Version #3 which incorporated async/await methods for database calls and message processing, was slightly slower than Version #2, however; the overall characteristics appeared to be more favorable: less memory, fewer threads, better start/stop performance and more even Queuing/Processing speed. Overall, Version #3 seemed more reliable. Throttling of the database activity prevented the .NET from running out of pooled database connections.
Significant Code Changes Made
Code Change | Rationale | Version 1 | Version 2 | Version 3 | |
---|---|---|---|---|---|
1 | Added a separate Task Scheduler to control starting and stopping. | Having a dedicated Task Scheduler prevent the wait time we were seeing while the start/stop operation had to wait for a Task from the threadpool to become available to run. | no | yes | yes |
2 | Increased Minimum number of thread pool threads.
| This was added because the scheduler delays how quickly it creates new thread pool threads. We were seeing long delays during startup and this change sped this up. | no | yes | no |
3 | Added IDevice.ProcessNextAsync method and IMessage.PostOnChannelAsync to support async/wait methods | This allowed us to reduce the number of threads required to run many channels in Connexion from 1250 to 70. | no | no | yes |
4 | Added a cap on the number of operations that could be performed against a given database at one time. 25 Queues, and 25 Results storage. | We were bumping against the default maximum number of simultaneous database connections (100) that were allowed by .NET. Once the MaxPool size was hit, the system failed and took several minutes to recover. Attempting to increase this number caused performance problems. | no | no | yes |