CTF Challenge Storage Layout and Pluggable Challenge Types

Why this came up

~~Me: why is this CTF design so cursed? Static flags, dynamic flags, and now we also need an online judge on top of that?~~

Existing approaches (?)

To tell this story properly, we have to start with CTFd.

CTFd was designed as a broadly extensible platform, and both flag verification and platform features are implemented through plugins. Unfortunately, that plugin model has a few structural problems. It is so open-ended that it puts a heavy mental and maintenance burden on plugin authors. Implementing the backend is not enough; in many cases you also have to build the frontend. On top of that, some logic is still coupled to controllers, which means unusual requirements can still force you to patch CTFd itself.

One reason is that CTFd ships with its own file management model. A challenge author uploads a file, the platform hashes it, stores it under a content-addressed path, and records the association in the database. Plugins do not get to intervene in that flow, which means file management is effectively off-limits. Containers are the exact opposite. CTFd never really designed around them in the first place, so the entire container lifecycle has to be implemented by plugins. As a result, file management and container management end up living in two completely separate worlds.

Later, GZ::CTF gradually took over much of the Chinese CTF scene. Its user experience is genuinely strong, but to get there the project also gave up a fair amount of architectural flexibility, including things like custom challenge categories and pluggable challenge types. Its file storage model is not dramatically different from CTFd either. It adds support for external file links and team-specific dynamic file distribution, but local files are still stored and attached to challenges in roughly the same way.

That makes some requirements feel oddly awkward to implement, because the logic is tightly coupled:

Dynamic challenge file distribution: hand different files to different teams and map each one to a different flag so you can add anti-cheating measures;
Dynamic environment mounts: mount challenge files directly into the on-demand container used by contestants. A very concrete example is OJ-style judging. We may already have a general-purpose judging container that compiles the contestant’s program, redirects input and output files, and compares the result. If container-based judging is the only mechanism available, every judged challenge has to bake the full input/output dataset into its own image. If those files can be mounted at container startup instead, most judging problems can share a single base image;
Custom rule-based flag verification: some time ago, @koito suggested supporting custom flag verification scripts so more complex challenge types and validation rules would be possible. Even if embedding a custom scripting flow in Rust is not exactly trivial, it is still doable. The real problem is that under the traditional model the verifier only gets user info, challenge info, and the submitted content. That is too little context, which makes a “custom verification script” feel pretty underpowered. You can use it, sure, but it is not much more flexible than a regex.

These are only three examples, but they all point to the same thing: the flow from downloading files to solving the challenge to submitting a flag and validating the answer is tightly coupled.

As soon as a new requirement appears, you have to cut into that flow somewhere. And no plugin system can realistically cover every edge case, so sooner or later you end up sacrificing either developer ergonomics or user experience.

Is there a way to get both without making the whole thing even more cursed? Maybe. Read on.

A challenge-response model built on XXXX (insert your favorite buzzword here)

Challenge storage layout

When a challenge is created, the platform automatically allocates it a dedicated storage directory. I call that directory a Bucket. Inside the Bucket, files are split into four areas according to purpose:

provided: static challenge files. Anything uploaded here is served directly to contestants without extra processing;
mapped: dynamically distributed challenge files. Files uploaded here are handed out dynamically. This part is really a compromise: because of how dynamic mapping works, each entry here can only fan out to a single delivered file. If you need to distribute multiple dynamic files, the practical answer is to package them ahead of time as a tarball or zip archive;
preserved: extra judging files. Contestants cannot see these files, but they are passed to the flag verifier as additional context when a flag is submitted. This area can support quite a few use cases. For example, you could store the real verification logic here as a script, then implement a verifier that runs the script through some scripting engine together with related files and challenge context. That gives you a lot of freedom;
mounted: files that should be mounted into the containerized challenge environment. These files are mounted to the configured path inside the contestant’s runtime container. In some cases this can also reduce the burden on challenge authors. For example, if you prepare Ubuntu images with xinetd across several versions, then a pwn author only needs to provide the mount path and the challenge binary itself to publish a working challenge. No custom Dockerfile required, and existing images can be reused much more effectively.

Componentized challenge verification

With that storage layout in place, the next step is a challenge verification mechanism that can actually make use of it. Just as importantly, the mechanism should be easy to extend so it can support a wide range of challenge types.

Start with the contestant-side flow. Here are three common cases:

the contestant opens a challenge, downloads static attachments, solves it, submits the flag, and gets a static verification result;
the contestant opens a challenge, downloads dynamically mapped attachments, solves it, submits the flag, and gets a dynamic mapping-based verification result;
the contestant opens a challenge, starts the challenge environment, solves it, submits the flag, and has the result compared with the flag inside that environment.

Now look at the administrator-side flow:

upload static attachments and set the flag;
upload dynamic attachments and set the flag;
upload attachments, do not set a flag directly, and configure the challenge image.

Across these flows, the verification mechanism really only needs two hook points:

flag verification: determine whether the submitted flag is correct;
dynamic containers: inject environment variables into the container when the contestant starts it.

Both hook points can be extended through plugins, and the logic behind both can be fully customized. Once you have that, supporting all kinds of challenge types becomes much easier.

pub trait FlagChecker {
  async fn check(&self, user: &User, challenge: &Challenge, flag: &str) -> Result<bool, CheckerError>;
  async fn get(&self, user: &User, challenge: &Challenge) -> Result<String, CheckerError>;
  async fn env_vars(&self, user: &User, challenge: &Challenge) -> Result<HashMap<String, String>, CheckerError>;
}
 
...
 
pub struct StaticAttachmentChecker {
}
 
pub struct DynamicAttachmentChecker {
  cache: &RedisPool,
}
 
pub struct EnvironmentChecker {
  cache: &RedisPool,
}