Friday, December 23, 2016

Plone performance: threads or instances?

Recently we had a discussion on the Plone community forum on how to increase performance of Plone-based sites.

I was arguing in favor of instances, because some time ago I read a blog post by davisagli taking about the impact of the Python GIL on performance of multicore servers. Others, like jensens and djay, were skeptical on this argument and told me not to overestimate that.

So, I decide to test this using the production servers of one of our customers.

The site is currently running on 2 different DigitalOcean servers with 4 processors and 8GB RAM; we are using Cloudflare in front of it, and round-robin, DNS-based load balancing.

Prior to my changes, both servers were running with the same configuration:
  • nginx, doing caching of static files and proxy rewrites
  • Varnish, doing caching and URL-based load balancing
  • 4 Plone instances running on ZEO client mode, with 1 thread and 100.000 objects in cache

Both servers where running also a ZEO server on ZRS configuration, one as a master and the other as a slave, doing blob storage replication.

First, here we have some information from this morning, before I made the changes. Here are some graphics I obtained using New Relic on the master:




Here is the same information from the slave:



As you can see, everything is running smoothly: CPU consumption is low and memory consumption is high and… yes, we have an issue with some PhamtomJS processes left behind.

This is what I did later:

  • on the master server, I restarted the 4 instances
  • on the slave server, I changed the configuration of instance1 to use 4 threads and restarted it; I stopped the other 3 instances
I also stopped the memmon Supervisor plugin (just because I had no idea on how much memory the slave server instance will be consuming after the changes), and killed all PhamtomJS processes.

The servers have been running for a couple of hours now and I can share the results. This is the master server:



And this is now the slave:



The only obvious change here is in memory consumption: wow! the sole instance on the slave server is consuming 1GB less than the 4 instances in the master server!

Let's do a little bit more research now. Here we have some information on database activity on the master server (just one instance for the sake of simplicity):



Now here is some similar information for the slave server:



I can say that I was expecting this: there's a lot more activity and the caching is not very well utilized on the slave server (see, smcmahon, that's the beauty of the URL-based load balancing on Varnish demonstrated).

Let's try a different look, now using the vmstat command:



Not many differences here: the CPU is idle most of the time and the interrupts and context switching are almost the same.

Now let's see how much our instances are being used, with the varnishstat command:




Here you can see why there's not too much difference: in fact Varnish is taking care of nearly 90% of the requests and we have only around 3 requests/second hitting the servers.

Let's make another test to see how quickly we are responding the requests using the varnishhist command:



Again, there's almost no difference here.

Conclusion: for our particular case, using threads seems not to affect too much the performance and has a positive impact on memory consumption.

What I'm going to do now is to change the configuration used in production to have 2 instances and 2 threads… why? because restarting a single instance on a server for maintenance purposes would let us without backends during the process if we were using only one instance.

Share and enjoy!

No comments:

Post a Comment