Drupal security testing for everyone
I've just published a new project for performing static application security testing (SAST) on Drupal sites, mortenson/psalm-plugin-drupal. Using Psalm, custom plugins, funky scripts, and a lot of elbow grease, I think I have something that will help everyone write safer Drupal code.
If you want to skip straight to installing this on your own sites and testing your custom/contrib modules, you can click here!
Drupal is getting smarter, but we're still human
I do Drupal security research in my free time, and have found a lot of bugs, mostly in Drupal core. These have ranged from low impact cross site request forgery all the way to remote code execution. This kind of research exposes issues at the framework level, but the fixes for those issues don't necessarily make the custom code you write more secure.
The problem is that while Drupal's APIs are improving, those improvements rely on Drupal being used properly. At the end of the day humans still write code, and there's not a lot of automated ways to know that the code you're writing is secure.
This is where static analysis tools come in. Drupal has a few of these already, namely pfrenssen/coder and mglaman/phpstan-drupal. Among other features, these tools can check for security issues by using regular expressions or by inspecting abstract syntax trees.
These tools are good at what they do, but unlike a traditional SAST tool they do not perform taint analysis. Taint analysis is the process of tracking data flows across files, function calls, and variable assignments to determine that a "source" of user input flows into a dangerous output or "sink". For example, any analysis tool could easily see that this code is unsafe:
echo $_GET["input"];
But code like this is harder to track:
index.php
$array = [
"key" => $_GET["input"],
];
my_echo($array["key"]);
tools.php
function my_echo($input) {
echo $input;
}
Taint analysis is able to keep track of these data flows, determining when taint sources land in taint sinks without first being sanitized.
Finding an agnostic (har har) tool for the job
From my research, the best solution for PHP taint analysis appears to be vimeo/psalm, a static analysis tool that beyond security findings is also suitable for enforcing strong typing in your code.
Psalm's taint analysis is driven by annotations you can add to your functions and methods which identify what part of the data flow they're in (as a source, sanitizer, or sink). You can add these annotations right in your codebase, or in stubs, which is useful for me since I'm not a Drupal core maintainer.
Here's an example of a stubbed out Drupal file which shows a taint sink for SQL injection:
namespace Drupal\Core\Database;
class Connection {
/**
* @psalm-taint-sink sql $query
*/
public function query($query, array $args = [], $options = []) {}
}
With this stub in place, analyzing code like this will trigger an error when running Psalm:
$connection = new \Drupal\Core\Database\Connection();
$connection->query($_GET["input"]);
I tested this out, got it working, and felt motivated to move forward on integrating Psalm with Drupal!
But that's not real Drupal code…
If you look back at the example above, you may notice some weird things:
- The real \Drupal\Core\Database\Connection is actually an abstract class, you literally can't create instances of it with core.
- No one constructs database objects with "new".
- Drupal code probably doesn't use $_GET anyway.
If so, you're right, and you've just made my life much harder. Great!
Let's dive into the first two issues - how people actually use the database.
Ideally, they use the database service in a class, and that class uses dependency injection. If so, they're probably accessing the container in a factory (create) method, or passing services to their constructor via a service definition (*.services.yml) file.
In this case, a custom Psalm plugin already exists that does much of the work for you. psalm/psalm-plugin-symfony recognizes when the container is used, and ensures that the right class is detected by Psalm when you call something like:
$container->get("database");
Neato! But there was already a hangup - this plugin requires a dump of your container, which is an XML file output by normal Symfony applications. This dump contains information needed to map a service name ("database") to a class (\Drupal\Core\Database\Database). Unfortunately, as you may know, Drupal is not a normal Symfony application, and does not have a way to dump the container normally.
So, I went ahead and created a script that can dump the container XML for you. By running:
php ./vendor/mortenson/psalm-plugin-drupal/scripts/dump_script.php
From your webroot you'll get a DrupalContainerDump.xml file that includes all the information needed to associate service names to classes.
The only hiccup from here was that we also have \Drupal::service() - an abstraction on top of $container->get() - which the Symfony plugin didn't support. I copied some of their code and got things working after a few hours of headaches.
For the $_GET issue (that most Drupal code does not use globals anymore) I simply had to add more stubs for everything I considered a taint source. The Symfony Request object was covered by the existing plugin, but I added taint sources for entity fields and form state values.
With all that in place, code like this would now throw an error:
/** @var \Drupal\node\Entity\Node $node */
$node = \Drupal::entityTypeManager()->getStorage('node')->load(1);
echo $node->get('title')->value;
That's still not real Drupal code…
Alright, you got me. We don't use "echo" anymore either, we use Twig templates and render arrays to output HTML.
Twig templates are covered by the Symfony plugin, although I haven't tackled taint analysis from PHP all the way into Twig yet. That said, Drupal's Twig implementation is already fairly safe, and you can probably get away with just grep'ing for bad code that looks like:
{{ value | raw }}
or
<tag attribute={{ you_should_quote_this }}>
And you'll likely be OK.
Twig aside, render arrays still pose a major risk in Drupal. At its core, a render array is a way to represent HTML in a structured format. It's also a way to represent interactions between callback functions, using keys like #pre_render and #validate. If user input makes its way into these dangerous keys, you could end up with cross site scripting or even remote code execution.
Given this, I knew I needed to add render array support to my custom plugin. Essentially, I wanted code like this to trigger an error:
$build = [
"#pre_render" => [$_GET["callback"]],
];
\Drupal::service("renderer")->render($build);
A quick way to add this would be to add the "@psalm-taint-sink” annotation to the render method to recognize it as a taint sink, which is what I initially did. Unfortunately, now all render arrays would throw errors, including ones like this which seem odd but are actually safe (In Drupal 8+, do not do this in Drupal 7):
$build = [
"#markup" => $_GET["input"],
];
\Drupal::service("renderer")->render($build);
So, I needed a way to mark certain array keys as safe, or in other words to conditionally remove taint sources from the array.
Psalm didn't have a way to do this, so working with its maintainers I created a PR for a new event to add and remove taints. Big thanks to Matthew Brown and Bruce Weirdan for helping me out on this.
With the event in place, I could conditionally remove taints from arrays given certain conditions. The conditions I chose were that an array key started with "#”, and was not a known unsafe key. The code for this list of unsafe keys can be found here if you're curious - I mostly lifted it from the Webform module.
Schrödinger's vulnerability
A final hiccup, and an important one, is that Psalm needs code to have an execution path for it to be scanned. Otherwise, the taint analysis doesn't know where the start of the data flow is, or if code is used at all.
Drupal has a lot of abstractions so code paths are usually not clear. For instance, we may intuitively know that a controller's public method like "page” is an entrypoint, but Psalm has no idea that this is essentially the start of your custom logic for rendering a page.
To address this, users must make their execution paths clear, with code that actually instantiates objects and calls methods. I recognized this as a usability problem - now all users of my Psalm plugin will have to create a file like "entrypoint.php” which executes all the code they want to test.
As a workaround, I created a new script users can run to have an entrypoint file auto-generated for them. To use it, you run something like:
php ./vendor/mortenson/psalm-plugin-drupal/scripts/generate_entrypoint.php modules/custom/my_module
And then a file is generated that looks something like this:
$form = new \Drupal\my_module\Form\DeleteMultiple();
$form_state = new \Drupal\Core\Form\FormState();
$build = $form->buildForm([], $form_state);
\Drupal::service("renderer")->render($build);
$controller = new \Drupal\my_module\Controller\FooController();
$build = $controller->addPage();
\Drupal::service("renderer")->render($build);
This is "machine food" for Psalm - it clearly shows your controller and form methods being called, and also renders them which will complete the taint flow (since the render method is a taint sink).
Users are of course welcome to make their own entrypoints for testing their code, the one included in my plugin only works with forms and controllers defined in your *routing.yml files, so it's quite limited.
Time to find some bugs!
I think this is a solid first shot at performing SAST scanning of Drupal using taint analysis. Moving forward, I'm excited to see the kinds of results users find with the tool, especially on custom code which is rarely reviewed with the same scrutiny as contrib.
It's important to note that this is brand new, and probably has a lot of issues. Be patient with the project and follow the README closely when setting up your project.
Also, be mindful when running this on contrib modules you don't maintain! Any findings for contrib modules should be reported privately to the Drupal Security Team, but they also don't want to field a ton of false positive reports from users of this tool. Best to use it on your own code than on someone else's.
Thanks for reading and keeping Drupal secure!