We are using Gunicorn with multiple workers with the Gunicorn max_requests option. We found that memory usage in our containers increases with every worker restart. By reducing the max_requests value so workers restart very frequently, the problem became apparent.
Using the method described in the readme:
from prometheus_client import multiprocess
def child_exit(server, worker):
multiprocess.mark_process_dead(worker.pid)
While running a performance test, I see the following in the container:
[root@mock-adapter-8568fdcf6d-lz8tx app]# cd /app/prometheus_tmp
[root@mock-adapter-8568fdcf6d-lz8tx prometheus_tmp]# ls
counter_10.db counter_214.db counter_319.db counter_41.db counter_91.db histogram_196.db histogram_296.db histogram_401.db histogram_69.db
counter_111.db counter_222.db counter_31.db counter_425.db counter_97.db histogram_1.db histogram_305.db histogram_409.db histogram_77.db
counter_117.db counter_232.db counter_328.db counter_433.db counter_9.db histogram_202.db histogram_311.db histogram_419.db histogram_84.db
counter_130.db counter_23.db counter_338.db counter_444.db histogram_10.db histogram_214.db histogram_319.db histogram_41.db histogram_91.db
counter_138.db counter_244.db counter_345.db counter_450.db histogram_111.db histogram_222.db histogram_31.db histogram_425.db histogram_97.db
counter_148.db counter_251.db counter_353.db counter_458.db histogram_117.db histogram_232.db histogram_328.db histogram_433.db histogram_9.db
counter_154.db counter_258.db counter_361.db counter_470.db histogram_130.db histogram_23.db histogram_338.db histogram_444.db
counter_175.db counter_269.db counter_36.db counter_48.db histogram_138.db histogram_244.db histogram_345.db histogram_450.db
counter_187.db counter_277.db counter_374.db counter_56.db histogram_148.db histogram_251.db histogram_353.db histogram_458.db
counter_18.db counter_286.db counter_384.db counter_61.db histogram_154.db histogram_258.db histogram_361.db histogram_470.db
counter_196.db counter_296.db counter_401.db counter_69.db histogram_175.db histogram_269.db histogram_36.db histogram_48.db
counter_1.db counter_305.db counter_409.db counter_77.db histogram_187.db histogram_277.db histogram_374.db histogram_56.db
counter_202.db counter_311.db counter_419.db counter_84.db histogram_18.db histogram_286.db histogram_384.db histogram_61.db
Watching that directory, new files are created with every new worker and old ones are not being cleaned up. Deleting the files in that directory reduce down the memory usage on that container, which then starts building again.
What is the proper way of dealing with these database files? Is this mark_process_dead's responsibility?
And a general question about Prometheus: do we need to keep metrics around after they are collected? Could we wipe our metrics after the collector hits our metrics endpoint? If so, can this be done via the prometheus_client?
We are using Gunicorn with multiple workers with the Gunicorn
max_requestsoption. We found that memory usage in our containers increases with every worker restart. By reducing themax_requestsvalue so workers restart very frequently, the problem became apparent.Using the method described in the readme:
While running a performance test, I see the following in the container:
Watching that directory, new files are created with every new worker and old ones are not being cleaned up. Deleting the files in that directory reduce down the memory usage on that container, which then starts building again.
What is the proper way of dealing with these database files? Is this
mark_process_dead's responsibility?And a general question about Prometheus: do we need to keep metrics around after they are collected? Could we wipe our metrics after the collector hits our metrics endpoint? If so, can this be done via the prometheus_client?