Migrating my site to Rust

Fri 23 Jun 2023 - 10 min read

In August of last year, I published a blog post titled "Creating my website", but since then, significant changes have been made in the implementation of my site and the blog post is outdated. I thought I would share the process of migrating my website to Rust and the implementation details in this blog post.

Background and Motivation

First off, I should establish the motivation for the move from Go to Rust. I should clarify that performance and safety was not the primary concern with my original implementation in Go. In fact I like Go and the old codebase was just fine but there was a separate reason for my switch.

The reason for migrating the site to Rust is... Nix! In my previous blog post, "Nix is pretty awesome ❄️", I expressed my excitement with Nix as a development and deployment tool so naturally I wanted to deploy my site with Nix as well. There is, however, a problem with "nixifying" go applications as the hashing method used by Go for dependency management is fundamentally incompatible with Nix.

There is a way to get over this hurdle, by using a code generation tool like gomod2nix. This, however, is a pain and I'd rather not need to generate new Nix expressions everytime I update dependencies. Rust, however, doesn't have this problem and works exceptionally well with Nix.

For this reason, combined with my new interest in the Rust language I came to the conclusion that I wanted to rebuild my site in Rust, and maybe get more comfortable with the language.

Implementation

At the time of rewriting my site, I discovered that axum recently had a major release and I heard good things about its seamless integration with the tokio runtime, so I decided to go with axum as my choice of web framework.

In line with my previous site, I required most of the same endpoints for my revamped version:

The root or homepage, accessible via /
Individual blog posts, reachable through /blog/{url}
A directory for serving static files, located at /assets

However, I took the opportunity to introduce a new feature: an RSS Feed endpoint accessible via /rss.xml.

In axum, the routing for my webpage looks like the following:

let app = Router::new()
    .route("/", get(root))
    .route("/blog/:url", get(handle_blog))
    .route("/rss.xml", get(handle_rss))
    .with_state(state)
    .nest_service(
        "/assets",
        ServeDir::new(path_prefix.join(Path::new("assets"))),
    )
    .fallback(get(handle_404));

So far everything looks pretty familiar.

The .with_state(state) refers to the common state for all the endpoints. This state contains a Vec of BlogPost's:

pub struct AppState {
    blogposts: Vec<BlogPost>,
}

The BlogPost structure looks like this:

pub struct BlogPost {
    url: String,
    title: String,
    date: DateTime<Utc>,
    archived: bool,
    tags: Vec<String>,
    content: String,
    estimated_read_time: usize,
}

In this struct, we have the following fields:

url: Represents the URL used to access the blog post via the /blog/ endpoint.
title: Holds the title of the blog post.
date: Represents the date and time when the blog post was created or published.
archived: A boolean value indicating whether the post should be displayed or not.
tags: Contains a list of tags associated with the blog post, such as "rust" or "nix".
content: Stores the parsed and processed HTML content of the blog post.
estimated_read_time: Provides a rough estimation of the read time for the post.

This structure might not be as comprehensive a representation of a blog post as I'd have liked it to be. But it does the job for now.

I decided to load and parse the blog posts at startup so as to minimize runtime overhead. This is all done in the new_state() function:

async fn new_state(path_prefix: &Path) -> Result<AppState> {
    // Create an empty vector to store the blog posts
    let mut blogposts: Vec<BlogPost> = Vec::new();

    // Read the contents of the "blog" directory
    let mut blog_dir = match tokio::fs::read_dir(path_prefix.join(Path::new("blog"))).await {
        Ok(dir) => dir,
        Err(err) => return Err(eyre!(format!("Error reading blog directory: {err}"))),
    };

    // Set the options and plugins for parsing Markdown content
    let adapter = SyntectAdapter::new("base16-eighties.dark");
    let options = ComrakOptions::default();
    let mut plugins = ComrakPlugins::default();

    // Iterate over each entry (file or directory) in the "blog" directory
    while let Some(entry) = blog_dir.next_entry().await? {
        let path = entry.path();

        // Check if the entry is a file
        if path.is_file() {
            // Check for invalid file extensions
            let ext = path.extension();
            if ext != Some(std::ffi::OsStr::new("md"))
                && ext != Some(std::ffi::OsStr::new("markdown"))
                || ext.is_none()
            {
                // Skip files with invalid extensions
                tracing::warn!("skipping non markdown file: {}", path.display());
                continue;
            }

            if let Some(stem) = path.file_stem() {
                let url = stem.to_str().unwrap();

                // Check if a blog post with the same URL already exists
                if blogposts.par_iter().any(|b| b.url == url) {
                    // Skip duplicate blog posts
                    tracing::warn!("skipping duplicate blogpost: {}", url);
                    continue;
                }

                // Set the syntax highlighter adapter for code fences
                plugins.render.codefence_syntax_highlighter = Some(&adapter);

                // Parse the blog post content and metadata
                let start_time = Instant::now();
                let blogpost = parse_blog(url, &path, &options, &plugins).await?;
                let elapsed = start_time.elapsed().as_millis();

                // Add the parsed blog post to the vector
                blogposts.push(blogpost);
                tracing::info!("loaded blogpost - {} in {} ms", url, elapsed);
            }
        }
    }

    // Sort the blog posts by date in descending order
    blogposts.sort_by(|a, b| b.date.cmp(&a.date));

    // Create and return the application state with the loaded blog posts
    Ok(AppState { blogposts })
}

This is quite boring though, and all the interesting stuff is happening in the call to parse_blog() which looks like this:

async fn parse_blog(
    url: &str,
    path: &PathBuf,
    options: &ComrakOptions,
    plugins: &ComrakPlugins<'_>,
) -> Result<BlogPost, Report> {
    // Read the file contents as bytes
    let bytes = tokio::fs::read(path).await?;
    
    // Convert the bytes to a UTF-8 encoded string
    let text = String::from_utf8_lossy(&bytes);

    // Parse the frontmatter and content sections of the blog post
    let (frontmatter, content) = match parse_frontmatter(&text) {
        Ok((frontmatter, content)) => (frontmatter, content),
        Err(_) => {
            return Err(eyre!(format!(
                "Error parsing frontmatter ({url}). Most likely missing delimiter \"---\\n\""
            )))
        }
    };

    // Deserialize the frontmatter YAML into a Frontmatter struct
    let frontmatter: Frontmatter = match serde_yaml::from_str(frontmatter) {
        Ok(fm) => fm,
        Err(err) => return Err(eyre!(format!("Error parsing blog ({url}): {err}"))),
    };

    // Parse the date from the frontmatter and convert it to UTC DateTime
    let naive_date = NaiveDate::parse_from_str(&frontmatter.date, "%d-%m-%Y").unwrap();
    let naive_datetime = naive_date.and_hms_opt(0, 0, 0).unwrap();
    let date: DateTime<Utc> = Utc.from_utc_datetime(&naive_datetime);

    // Convert the Markdown content to HTML using the provided options and plugins
    let html = markdown_to_html_with_plugins(content, &options, &plugins);

    // Create a new BlogPost struct with the parsed data and return it
    Ok(BlogPost {
        url: url.to_string(),
        title: frontmatter.title,
        date,
        archived: frontmatter.archived,
        tags: frontmatter.tags,
        content: html,
        estimated_read_time: content.split_whitespace().count() / 200,
    })
}

So let's dissect what's happening here.

First, we're reading the contents of the file into text. Next we make a call to parse_frontmatter(), which is my dodgy frontmatter parser written using the parser combinator library nom. The parsing logic itself is straightforward: It searches for a pair of delimiters "---" and extracts the text between them as the frontmatter. The remaining part of the file is considered the main content.

fn parse_frontmatter(input: &str) -> IResult<&str, &str> {
    let delimiter = "---";

    let (input, frontmatter) =
        delimited(tag(delimiter), take_until(delimiter), tag(delimiter))(input)?;
    let content = input.trim_start();

    Ok((frontmatter, content))
}

The frontmatter, which is just YAML code, is then parsed by serde_yaml into a nice Frontmatter struct:

struct Frontmatter {
    title: String,
    date: String,
    archived: bool,
    tags: Vec<String>,
}

After parsing the date from the string literal in the Frontmatter, we convert the markdown to html with the comrak markdown parser, using a nice one liner:

let html = markdown_to_html_with_plugins(content, &options, &plugins);

Finally, we construct a BlogPost with all of the parsed data and return it:

Ok(BlogPost {
    url: url.to_string(),
    title: frontmatter.title,
    date,
    archived: frontmatter.archived,
    tags: frontmatter.tags,
    content: html,
    estimated_read_time: content.split_whitespace().count() / 200,
})

Handlers

Now that we have an understanding of the project's structural skeleton, let's explore how we handle requests. The good news is that Axum makes this process remarkably simple and convenient.

In conjunction with Axum, I've incorporated a template engine called maud. Maud provides an html! macro that compiles pseudo HTML into efficient Rust code, resulting in exceptional performance. This combination of Axum and Maud enables seamless and efficient rendering of HTML responses for our web application.

pub async fn handle_blog (
    Path(url): Path<String>,
    State(state): State<AppState>,
) -> impl IntoResponse {
    let blogpost = state
        .blogposts
        .par_iter()
        .find_first(|blogpost| blogpost.url == url);

    match blogpost {
        Some(blogpost) => (
            StatusCode::OK,
            html! {
                (header(&format!("Vilhelm Bergsøe - {}", blogpost.title), "Vilhelm Bergsøe - Blog"))
                main {
                    section #h {
                        div .blogpost {
                            h2 .blogtitle { (blogpost.title) }
                            span style="opacity: 0.7;" {
                                (blogpost.date.format("%a %d %b %Y"))
                                // 200 words per minute estimate
                                (format!(" - {} min read", blogpost.estimated_read_time))
                            }
                            br;
                            p {
                                (PreEscaped(&blogpost.content))
                            }
                        }

                        em { (format!("tags: [{}]", blogpost.tags.join(", "))) }
                    }
                }
                
                hr;

                (footer())
            },
        ),
        None => handle_404().await,
    }
}

This is the entirety of the blog handler for the /blog/{url} endpoint.

Essentially we find the first blog post that matches the url requested and return the default blog page with the contents of the blog post integrated otherwise we pass control to the 404 Not Found handler.

I'm aware that searching through a Vec isn't very efficient and I should look into using HashSet or HashMap for the lookups the problem with this is sorting for dates isn't possible and I have yet to do benchmarks to find out which really has the biggest effect on performance.

The root endpoint looks kind of the same, with the only dynamic part being the list of blog posts:

// ...
h2 { "Blog " a href="/rss.xml" title="RSS Feed" { img .rss-icon src="/assets/rss.png" alt="rss"; } }
ul {
    @for blogpost in &state.blogposts {
        @if !blogpost.archived {
            li {
                (blogpost.date.format("D%d-%m-%Y "))
                a href=(format!("/blog/{}", blogpost.url)) { (blogpost.title) }
            }
        }
    }
}
// ...

For my RSS Feed endpoint I chose to go with the ructe template engine as maud doesn't have explicit support for XML.

Here the template for the rss feed looks like this:

@use crate::BlogPost;

@(posts: Vec<BlogPost>)
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Vilhelm's Blog</title>
        <link>https://bergsoe.net/</link>
        <description>My Blog RSS Feed</description>
        @for post in posts {
            <item>
                <guid>https://bergsoe/blog/@post.url</guid>
                <title>@post.title</title>
                <link>https://bergsoe.net/blog/@post.url</link>
                <description>tags: @post.tags.join(", ")</description>
                <pubDate>@post.date.to_rfc2822()</pubDate>
            </item>
        }
    </channel>
</rss>

Nix deployment

As mentioned, the main drive behind my move from Rust is ease of deployment with Nix. So let's look into how that is done:

In the project root we define a Nix flake flake.nix. Here I utilize the crane library for building the project. Crane provides various niceties such as automatic source fetching and incremental builds.

One problem you run into is having relative paths work correctly when the service is run from the nix store. There are probably many ways of solving this problem, but I opted for an environment variable with the path to the project directory:

let site_root = std::env::var("SITE_ROOT").unwrap_or_else(|_| "./".to_string());
let path_prefix = Path::new(&site_root);

Here we load the environment variable if it exists and if it doesn't it just defaults to the current directory.

In the Nix flake we then make sure to define this environment variable through a wrapper:

# ...
default = pkgs.symlinkJoin {
    inherit (site) name pname version;
    nativeBuildInputs = [pkgs.makeWrapper];
    paths = [site];
    postBuild = ''
        wrapProgram $out/bin/site --set-default SITE_ROOT ${./.}
    '';
};
# ...

We can then build and run the project just fine with Nix:

$ nix run
2023-05-17T09:20:16.398678Z  INFO site: site root: /nix/store/z04g8kmpmkvbf0kxf81aigjbx61b5i4q-40kgjvsccc7ny75r4wfd4gi98kp7l004-source
2023-05-17T09:20:16.403586Z  INFO site: loaded blogpost - ascii-webcam in 0 ms
2023-05-17T09:20:16.422079Z  INFO site: loaded blogpost - nix-is-pretty-awesome in 18 ms
2023-05-17T09:20:16.422582Z  INFO site: loaded blogpost - creating-my-website in 0 ms
2023-05-17T09:20:16.423409Z  INFO site: listening on 0.0.0.0:8080

On my server I then use the following Nix module for serving the website:

{inputs, ...}: {
  systemd.services.site = {
    enable = true;

    description = "my site";
    wantedBy = ["multi-user.target"];
    after = ["network.target"];

    serviceConfig = {
      Type = "simple";
      ExecStart = "${inputs.site.packages.x86_64-linux.default}/bin/site";
      Restart = "on-failure";
    };
  };
}

Here we define a systemd service called site which uses the site input github:vilhelmbergsoe/site from the Nix flake.

Deployment is then as simple as importing the module in my host configuration and it runs!

If I have to update the site in the future all I have to do is push my changes and run

$ nix flake lock --update-input site
$ # and
$ sudo nixos-rebuild switch --flake .#clifton

And that's it!

Conclusion

Overall, migrating my site to Rust + Nix has been an awesome learning experience and I hope this post was an interesting read. I learned a lot about both Rust and Nix during this process.

If you're interested in looking at the full code you can find the repository here.

Also if you're interested in seeing the nixos configuration in it's entirety you can find it here.

Thanks for reading!

tags: [website, rust, nix, programming]