Ben Summers’ blog

Control untrusted processes with Solaris SMF

If you’re writing a web application which handles files uploaded by your users and does anything more than storing them for download later, then you need to think about security on the server. Even insignificant processing relies on libraries which have bugs, which can range from infinite loops to execution of arbitrary code on the server.

To pick one example, here’s some past security flaws with libpng. No library is going to be bug free, especially if it deals with complicated file formats. Since you’re not going to be the first to know about the bugs, you need to take precautions.

Processing untrusted files

Minimising your application’s exposure to these issues is relatively simple.

  • Create an untrusted worker process which communicates with the main process with your language’s most convenient method of RPC. (Ruby’s DRb, Java’s RMI, and so on)
  • Run the worker process as a different user, using all your operating system’s security features to lock down what the process can access.
  • Copy the files to the worker process using RPC or within the filesystem, whichever is most convenient. Don’t give it access to all the files to limit the data exposed at any one time.
  • Use some method of controlling the worker process, which allows your main process to kill it at the slightest sign of trouble, such as an operation taking too long.
  • Regularly restart the process, to reduce the length of time any exploit code can run.

These methods are far less efficient than processing the files in your application’s process, but there are many cases where the additional security is worth the cost.

On a side note, even if you’re just offering these untrusted files for download, you need to think about security on the client side. For example, see GIFAR and Flash origin policy.

Controlling the untrusted process

The application process needs to be able to start and stop the untrusted process. On a UNIX-like operating system, this probably involves a controlling server process. This needs to:

  • Run with some root-like privilege, so it can start child processes as the untrusted user, but the untrusted process cannot affect it.
  • Kill the process on demand, escalating from graceful shutdown to forceful termination.
  • Communicate with the application process in a secure manner, only accepting instructions from the right source.

This is a lot of code to write, and implementing this reliably and securely is non-trivial.

These requirements have a lot in common with the service/daemon management features of modern operating systems, which start up processes and keep them running, like Apple’s launchd and Solaris’ SMF.

Since I use Solaris, I’ve implemented this process control with SMF. I was pleasantly surprised to find it provides everything required to do this reliably and securely.

There’s a few elements to put together:

  • The SMF manifest, to describe how the untrusted worker process should be run, and which enables some security features.
  • The SMF method, which starts the process and exits when the worker process is ready to accept connections.
  • Solaris RBAC, used to give the application process permission to control the worker process (and none of the other services).
  • Control of SMF within the application process, using libscf.

The SMF manifest is a relatively simple XML file describing the properties of the service, and the method is a shell script which starts the service process. To avoid repeating the basics, here’s some tutorials from c0t0d0s0 and BigAdmin, and an overview of the whole thing.

Multiple instances

SMF allows multiple instances of the same service, with different properties. In this case, it’s the port number the service listens on. The method reads the properties and starts the processes accordingly. I’ve used this feature to run multiple worker processes, both for concurrency and to allow them to be stopped and started where at least one is always available to do work.

In the manifest, you shouldn’t use the single_instance and create_default_instance elements, and you need to define the instances and a property to set the port for IPC:

	<property_group name='worker’ type='application’> 
	  <propval name='port’ type='integer’ value='0’ /> 
	</property_group>

	<instance name='w2000’ enabled='true’>
	  <property_group name='worker’ type='application’>
	    <propval name='port’ type='integer’ value='2000’ />
	  </property_group>
	</instance>
	
	<!-- more instance definitions go here -->

Here I’ve defined the property, with a default and invalid value of 0, then defined an instance which has this property set to 2000. I’ll create other instances with a different instance name and property value.

Locking down the untrusted process

In the exec_method elements of the manifest, you specify the method script, and the user and privileges it should be started with.

	<exec_method type='method’ name='start’ exec='/path/to/method start’ timeout_seconds='30’>
	  <method_context working_directory="/tmp”>
	    <method_credential user='untrusted’ group='untrusted’ privileges='basic’ />
	  </method_context>
	</exec_method>

Create the untrusted user and group in your installation scripts, and use the normal UNIX permissions to stop it accessing anything more than the directories it absolutely needs to use. In addition, you can use Solaris privileges to lock it down even further, see man privileges and this handy introduction. Simply remove all the privileges the process doesn’t need.

Writing the method

This is a normal shell script. It’s best to use the contract mode, where the method script must terminate within a timeout, leaving another forked process running. The alternatives don’t allow the method to report to SMF when the process is ready, and I’ve experienced bugs with the tempting duration option.

To make it easier to write the application process, only exit the method script when the process is running and ready to receive connections. The best way is for the process to fork and the parent exit when the child is ready, but second best is to write a small utility to poll the expected port, and exit when the connection succeeds.

This ensures that when SMF tells your application that the instance is online, it really is online and ready to perform work.

Since you don’t entirely trust the process, you should also use ulimit within the script, for example

	ulimit -c 0
	ulimit -n 200
	ulimit -v 409600

to disable core dumps (you expect this might fail, and core dumps take time), limit the number of files it can have open, and set a limit to how much memory the process can use. If the limits are exceeded, SMF will restart the process.

You also need to read the property in your script:

	getproparg() {
	  val=`svcprop -p $1 $SMF_FMRI`
	  [ -n “$val” ] && echo $val
	}

	PORT=`getproparg worker/port`

	if [ -z $PORT ]; then
	  echo “worker/port property not set”
	  exit $SMF_EXIT_ERR_CONFIG
	fi

and then the PORT variable can be used in the start method to build the command line arguments for your process.

Using RBAC to authorise control

The final Solaris feature we’ll use is Role Based Access Control to give the application process permission to control the worker SMF process.

RBAC defines a set of named authorizations. Solaris maintains a list of these authorizations for each process, and they can be queried by other processes to determine whether an action is allowed. SMF uses this to determine what can control the service instances.

First, an authorization needs to be created in /etc/security/auth_attr. Add a line like:

	solaris.com.example.application.control.worker:::Control worker process within Example application::

While man auth_attr says that the authorization should be named with a reverse order internet domain name, and everything beginning solaris is reserved to Sun, following this rule gives you (auth name) is not a valid authorization errors when attempting to use them. Prepending the recommendation with solaris. seems an adequate compromise between following the rules and actually working.

Your trusted application user needs to have this authorization, which can be added to a user with

	usermod -A solaris.com.example.application.control.worker example

where example is the trusted user under which the application process runs. Note that this specifies all the authorizations, so if you’re using other authorizations, include them too.

Finally, the SMF manifest needs some properties set to allow processes with this authorization to control the instances:

	<property_group name='general’ type='framework’>
	  <propval name='action_authorization’ type='astring’
			value='solaris.com.example.application.control.worker’ />
	</property_group>
	<property_group name='general’ type='framework’>
	  <propval name='value_authorization’ type='astring’
			value='solaris.com.example.application.control.worker’ />
	</property_group>

The first, action_authorization, allows the service to be restarted, and the second, value_authorization, allows the process to be enabled and disabled. The latter is useful to be able to start and stop the instance by enabling and disabling it. This simplifies the code you have to write to control the instances, and you can run a variable number of instances to cope with varying load.

Controlling the instances from the application

While you could use the svcs and svcadm executables to control the worker instances, it’s more efficient and easier to use libscf. While the man page makes it looks complicated, there are some very simple smf_* functions which do everything that’s required.

	int smf_enable_instance(const char *instance, int flags);
	int smf_disable_instance(const char *instance, int flags);
	char *smf_get_state(const char *instance);

It’s trivial to interface these to your language of choice with a C extension, or some form of FFI. Under Java, I used JNA, which made it ridiculously easy. Just remember to free() the result of smf_get_state() to avoid memory leaks.

A nice simple way of starting a instance is to call smf_enable_instance() with the full name of the instance, for example, svc:/application/example/worker:w2000, and then poll smf_get_state() until it returns online. And the reverse is to call smf_disable_instance() and wait for it to become disabled.

Since your method script only returns when the worker process is ready to receive requests, as soon as it’s online you can start using it. If there are any problems, or you suspect something has gone wrong, stop the instance and start it again.

Security is hard work

While SMF and Solaris’ security features make it relatively simple to separate out the untrusted code from your trusted application and contain any problems, it is much more work than the simple approach of not worrying about the problem.

However, this minimises the risk to your user’s data — a priority for any responsible developer. To me, it’s well worth the effort.

 

COMMENTS

blog comments powered by Disqus

 

Hello, I’m Ben.

I’m the Technical Director of Haplo Services, an open source platform for information management.

 

About this blog

 

Twitter: @bensummers

 

Subscribe

Jobs at Haplo
Come and work with me!