As soon as I start a project, I get it shipped to production with a Git-push based pipeline. I do this even before the project is anything. It means I can go through the motions of getting changes out to users when I don’t even have any yet. What’s more, there are problems that don’t show up running on localhost
. Solving those problems one-at-a-time is much less stressful than fixing them all at once on some pre-defined ship date.
Along with getting the project shipped early, I will instrument it to help me debug the code in production. That paid off with my latest project Devy which I plan on sharing with early users next month.
The API for Devy runs on fly.io which enables Grafana dashboards by default. Looking over these logs the other day, I noticed a 500
error. Oops.
A fact of life, running code open to the internet, is that when you configure an SSL cert, the public record of this act attracts bots. I have seen this with every project I’ve shipped that uses LetsEncrypt, but I’m certain it happens with any certificate authority.
Within milliseconds of deploying the API to Fly and getting the automatically generated SSL cert, I will see logs of bots hitting the API trying to extract credentials. In the screenshot below, a bot tried to hit an endpoint api.devy.page/blogs/wp-login.php
.
The wp-login.php
page is a common target of brute force attacks trying to guess passwords to log in to Wordpress sites. Devy is not Wordpress based, but the cost to try for a security hole is essentially zero for these botnets.
Reading through the logs, which are in reverse chronological order, the API tries to look up a blog with the slug and instead of returning a 404
which it should, it returns a 500
. Why?
This request passes through two Rust crates: router
and db
.
First the router
code.
/// GET /blogs/:blog_slug
///
/// Get a blog from the database given a blog slug.
async fn get_blog_by_blog_slug(
State(store): State<Store>,
Path(blog_slug): Path<String>,
) -> Result<Json<Blog>> {
Ok(Json(blog::get_by_slug(&store.db, blog_slug).await?))
}
This calls calls the function get_by_slug
in the blog
module of the db
crate.
pub async fn get_by_slug(db: &Database, slug: String) -> Result<Blog> {
Ok(
sqlx::query_file_as!(Blog, "src/blog/queries/get_by_slug.sql", slug)
.fetch_one(db)
.await?,
)
}
The ?
in the fifth line is somewhat like throwing an error in other languages. In Rust, it means return the Err
type on the Result
. This Result
is a custom type for the db
crate.
pub type Result<T> = std::result::Result<T, Error>;
/// Errors that can occur while performing an action on an entity.
#[serde_as]
#[derive(Debug, Serialize)]
pub enum Error {
/// The database configuration is invalid.
ConfigurationError(String),
/// The requested entity was not found.
EntityNotFound,
/// The request was malformed.
Malformed(String),
/// A field was missing from the request.
MissingField(String),
/// An error occurred while interacting with the database.
Sqlx(#[serde_as(as = "DisplayFromStr")] sqlx::Error),
}
This is one of my favorite features in Rust (and the source of this bug) which is defining a custom Error
enum and a Result
the encompasses all “failable” states in a crate.
Similarly, the router
crate has a defined Error
enum and a way to automatically translate db
errors into router
errors.
pub type Result<T> = std::result::Result<T, Error>;
#[serde_as]
#[derive(Debug, Serialize)]
pub enum Error {
StatusCode(#[serde_as(as = "DisplayFromStr")] StatusCode),
ServeFailure,
}
// --snip--
impl From<db::Error> for Error {
fn from(_: db::Error) -> Self {
Self::StatusCode(StatusCode::INTERNAL_SERVER_ERROR)
}
}
This allows the behavior seen in the get_blog_by_blog_slug
function in the router
where ?
is used on a function in the db
package: blog::get_by_slug(&store.db, blog_slug).await?
. As the error passes back from the db
crate to the router
crate, it will get automatically transformed from a db::Error
to a router::Error
. Neat!
Except the db
error in question is a NotFound
error… and when it gets returned I am telling the router
to transform it into a StatusCode::INTERNAL_SERVER_ERROR
. Really, I’m saying no matter what error the db
crate returns, transform it into a StatusCode::INTERNAL_SERVER_ERROR
. Not great.
The fix is to add a match case where the db
error EntitiyNotFound
is transformed into the 404
status code. I also added the “malformed” and “missing field” cases too.
// BEFORE
impl From<db::Error> for Error {
fn from(_: db::Error) -> Self {
Self::StatusCode(StatusCode::INTERNAL_SERVER_ERROR)
}
}
// AFTER
impl From<db::Error> for Error {
fn from(err: db::Error) -> Self {
match err {
db::Error::EntityNotFound => Self::StatusCode(StatusCode::NOT_FOUND),
db::Error::Malformed { .. } => Self::StatusCode(StatusCode::BAD_REQUEST),
db::Error::MissingField { .. } => Self::StatusCode(StatusCode::BAD_REQUEST),
_ => Self::StatusCode(StatusCode::INTERNAL_SERVER_ERROR),
}
}
}
I committed the change and pushed it. Now the bug is gone. I also added “looking up a blog that doesn’t exist returns 404” to my integration test suite. Perfect!
We all write bugs. It’s what we do when we’re not fixing bugs. I overlooked this pretty basic case because I was restructuring my code. Shipping and observing your application before you have any real users has helped me find all sorts of bugs which is why I’ll always insist on it. Hopefully this inspires you to try the same.