Running a script erring out with Segmentation Fault using Supervisor

Recently I encountered a hitch while trying to index some documents in Solr. Without going into the reasons of why it was happening, let’s start with where I was. I had to index a few hundred thousand documents in Solr. This was being done from a single script, which would fetch all the doc ids to be indexed from DB, and sequentially create documents and then push them to a remote Solr server.

The problem was that after every thousand or so documents, the script would error out with Segmentation Fault. I found the reason for why it was happening, but there wasn’t much control on that end in my hand. So I knew I had to do something with my script.

My script was written in PHP, and my first idea was to add exception handling. But in case of segmentation fault, the script does not reach exception handling, and is instantly killed by the OS. Registering a shutdown function with register_shutdown_function had the same problem.

Supervisor to the rescue!

Supervisor, as the name suggests, is a cute little program which “supervises” other programs. The way it works is that it creates new processes as its own subprocesses (child processes), so that whenever it is killed by the operating system, Supervisor is notified of the same. This way, it can restart the process.

I modified my script to store the id of the last indexed doc in an external file. This way, whenever the script crashes, I can find the last indexed doc id in this file, and start indexing this doc id onwards. This saves me a ton of time, as I don’t have to re-index from the beginning, in which case the script may not ever finish, even with Supervisor’s restarts!

Intro To Supervisor

pip install supervisor

For other modes of installation, check this link.

Once installed, start the Supervisor process.

> supervisord

This should error out saying that it couldn’t find any configuration file.

To fix it, create the following directories

> /etc/supervisor/
> /etc/supervisor/conf.d/

Add the following to supervisord.conf and place the file at /etc/supervisor/.

; supervisor config file[unix_http_server]file=/var/run/supervisor.sock   ; (the path to the socket file)
chmod=0700 ; sockef file mode (default 0700)
[supervisord]logfile=/var/log/supervisor/supervisord.log ; (main log file;default $CWD/supervisord.log)
pidfile=/var/run/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
childlogdir=/var/log/supervisor ; ('AUTO' child log dir, default $TEMP)
[rpcinterface:supervisor]supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface[supervisorctl]serverurl=unix:///var/run/supervisor.sock ; use a unix:// URL for a unix socket[include]files = /etc/supervisor/conf.d/*.conf

The last line is important. It includes any configuration files that you put inside /etc/supervisor/conf.d/. Now run supervisord command again. It should work now.

Putting the pieces together

[program:doc_indexer]process_name=doc_indexer
command=php /Users/yasoobhaider/Documents/code/solr/indexing_script.php
stdout_logfile=/var/log/supervisor/doc_indexer.log
stdout_logfile_maxbytes=100MB
redirect_stderr=true
autorestart=true
stdout_logfile_backups=1

Two points of interest here. First, notice the autorestart=true option in configuration. This tells Supervisor to restart the script in case it crashes or exits. Second, and this is the more powerful part of Supervisor, is that you can ask Supervisor to spawn more than one copy of the process! All you need to do is add the numprocs=k directive in the configuration, where k is the number of parallel processes that you want to run. (This also requires that your process_name has a particular format. Find out more on it here.)

Time to start the supervised processes!

> sudo supervisorctl start doc_indexer

And that’s it. Now the original script would pick up just where it left!

Supervisor is a great tool, and there are a ton of configurations available. Do check it out!

Please do drop in comments if you have any feedback, or if you found anything inaccurate.

Thanks for reading!

I code, among other things.

I code, among other things.