How SVG Drawing Works and an SVG CAPTCHA

SVG vector graphics come with a lot of nice properties: small file size, crisp scaling, broad support, and so on. I have always been an SVG ~~snob~~ enthusiast, and I keep trying to replace bitmaps with SVG wherever I can, including the CAPTCHA described in this post.

Image CAPTCHA

An image CAPTCHA is a challenge-response mechanism. The basic idea is to render a few characters into a bitmap, add lots of visual noise, and make automated form submission harder. The usual flow looks like this:

generate a random string;
render that string into a bitmap;
render interference graphics into the same bitmap;
send the bitmap to the frontend while storing the corresponding answer string on the server;
let the user fill in the answer and submit the form;
validate the answer on the server.

This kind of verification works a little like a one-way transform. The bitmap couples the answer with the noise so tightly that you cannot simply reverse the process with brute-force computation alone; the original string has to be recovered through visual recognition. That is the basic protective idea behind image CAPTCHAs. Of course, with modern machine learning and OCR, image recognition has become much easier, so the protection offered by bitmap CAPTCHAs has weakened a lot over the years. Even so, they are still useful in some environments. For example, if your server sits behind truly ridiculous restrictions, cannot access the public Internet, and cannot integrate with a third-party CAPTCHA provider, then a local image CAPTCHA may still be the only realistic option. Yes, I am still cleaning up after the university IT office.

The initial idea

SVG is an XML-based vector graphics format, which means the source is just plain text. For server-side processing, that is dramatically friendlier than bitmaps, even if SVG rendering performance on the client side is often worse. So I started wondering whether an SVG could replace the bitmap in a CAPTCHA.

I tried drawing some text in Inkscape and quickly ran into the obvious problem: SVG uses the <text> tag by default. That immediately ruins the whole point of a CAPTCHA, because a script can simply read the contents of the <text> node and recover the answer.

<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
    <text x="0" y="50" font-size="50">Hello World</text>
</svg>

Even if you split the text across multiple tags and add some light obfuscation, that is still just wordplay. It is nowhere near as costly for a machine to process as a normal bitmap CAPTCHA.

So if an SVG CAPTCHA is going to work, it has to avoid text-based rendering entirely. The text and the noise need to be coupled through some other drawing technique, in a way that remains readable to humans after rendering but is not trivial for a machine to separate.

How SVG drawing works

Besides tags like <text>, <circle>, and <rect>, SVG also has the all-purpose <path> tag. A <path> can describe arbitrarily complex shapes through a sequence of drawing commands, such as:

M x y or m dx dy: move to a coordinate;
L x y or l dx dy: draw a line from the current point to a coordinate;
H x or h dx: draw a horizontal line from the current point;
V y or v dy: draw a vertical line from the current point;
… and many others;
Z or z: close the current path.

In SVG, the canvas uses the top-left corner as the origin 0, 0, with positive x to the right and positive y downward. In the commands above, uppercase means absolute coordinates, while lowercase means relative coordinates. M only moves the pen and draws nothing by itself. Commands such as L, H, V, and the Bezier-curve instructions actually draw. After each operation, the pen stays at the endpoint of that segment.

For example, the code below draws a triangle:

<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
    <path d="M 0 0 L 100 0 L 50 100 Z" />
</svg>

The commands first move the pen to 0, 0, then draw a straight line to 100, 0, then another line down to 50, 100, and finally close the path by drawing back to the starting point from the original M command.

The exact same triangle can also be drawn with relative coordinates:

<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
    <path d="M 0 0 l 100 0 l -50 100 z" />
</svg>

The final shape is identical.

Rendering the CAPTCHA

To render an SVG CAPTCHA, we first generate a random string and then draw each character onto the canvas. As discussed above, using <text> is not an option, so each character has to be drawn as a <path>. That means we need some kind of SVG font source that can be stitched into text. After some digging I found this one - though to be fair, you could also use Inkscape’s “Object to Path” feature on a system font if you are willing to do some manual work. I was not. The project contains a number of fonts expressed as <glyph> path data, which looked promising.

But there was a catch. All of the paths in that font set were drawn with absolute coordinates. That means when assembling text, every drawing command has to be parsed and transformed into the correct location with matrix operations. It adds quite a bit of complexity.

My first thought was to use SVG transform so each glyph could be moved into place without modifying the path itself. That would certainly be convenient. But it also fails in a crucial way: if each character is positioned only through transform, then all drawing commands for the same character remain identical. A script can then group identical paths across the whole image and crack the CAPTCHA surprisingly easily. So if an SVG CAPTCHA is going to work, random noise has to be injected directly into each character’s drawing commands. To do that, we need to transform the actual path data itself, and sometimes even offset a few path segments along the way.

With absolute coordinates, every point is relative to the canvas origin 0, 0; with relative coordinates, each point is relative to the previous endpoint. For coordinate transforms, translation is just x' = x + dx, y' = y + dy; rotation is x' = x * cos(theta) - y * sin(theta), y' = x * sin(theta) + y * cos(theta); scaling is x' = x * sx, y' = y * sy.

If a glyph is drawn with absolute coordinates, we can choose a temporary origin, such as the center of its bounding box, the top-left corner of that box, or simply the point from the first M command. Move every point into the local coordinate system defined by that temporary origin, apply translation, rotation, scaling, and whatever else is needed, then transform everything back into the original coordinate system. That gives us full control over glyph placement.

Relative coordinates behave a little differently. Since each command is expressed relative to the previous endpoint, a pure translation only needs to move the very first M point; the rest of the path can stay unchanged. Rotation and scaling still require all points to change, but translation no longer provides an additional noise-injection point. In other words, relative-coordinate paths lose one easy source of randomness. That is why I decided the glyphs in this CAPTCHA should all be rendered with absolute coordinates instead. It lets the noise couple more tightly into the text itself.

Taking the obfuscation further

The approach above already adds a decent amount of randomness to the glyph data, but if we also want more visual obstruction we probably need random noise lines as well. That immediately creates another problem: those noise-line paths are much shorter than the character paths. A script can filter by length, re-render the remaining long paths, and then feed the result into OCR, which makes cracking much easier.

So I went one step further.

During processing, every path is already parsed into a list of commands. That means a long path can be split into multiple very short paths, as long as each piece gets the correct leading M command added back in. Once that is done, the path-length range for glyphs and noise can be normalized into a similarly small band, which makes simple length-based filtering much less useful.

Implementation

The full implementation is here: BioSVG - GitHub

It is very cute. Please consider giving it a star.

Issues and PRs are very welcome if you want to push the idea further.

I also published the crate on crates.io, so if you want to try it, you can just run:

cargo add biosvg

Using it is pretty straightforward:

let (answer, svg) = BiosvgBuilder::new()
    .length(4)
    .difficulty(6)
    .colors(vec![
        "#0078D6".to_string(),
        "#aa3333".to_string(),
        "#f08012".to_string(),
        "#33aa00".to_string(),
        "#aa33aa".to_string(),
    ])
    .build()
    .unwrap();
println!("answer: {}", answer);
println!("svg: {}", svg);

More colors are better. Please pass at least four.

Also note that the generated SVG CAPTCHA has a transparent background, so make sure the colors you choose remain easy to distinguish against your site’s actual background.