Reproducible Builds with Dagger

Feb 28, 2023

Vikram Vaswani

Reproducible Builds with Dagger

Feb 28, 2023

Vikram Vaswani

Reproducible Builds with Dagger

The term "reproducible build" refers to a build process that produces identical results on every run. With a reproducible build process, the build outputs of each run behave identically and they can be verified (via checksum) to be bit-by-bit identical.

The Importance of Being Reproducible

Reproducible builds are important for a number of reasons:

They provide a way to reliably connect build artifacts to source code. The same source code always produces the same build output, regardless of the build environment.
They provide a way to verify build artifacts. Different people can independently build a piece of source code and verify that their checksums are identical and/or match those provided by the original developer.
They improve software supply chain security. When builds are always consistent, externally-injected changes (such as malware or spyware) will be immediately visible in a bit-wise comparison of build outputs.

Different Inputs = Different Outputs

As you might expect, creating reproducible builds is harder than it sounds, because of the large number of variables involved in the usual build process. These variables include:

the host operating system
the host timezone and locale
the host environment configuration
the compiler and its configuration
the versions and configurations of build dependencies, such as system or third-party libraries
file timestamps
and many more...

Implementing a reproducible build strategy thus implies extreme and strict control over all the variables involved in the build...not an easy task!

Dagger and Reproducible Builds

While Dagger doesn't claim to solve reproducible builds, we are working towards making them easier to implement.

Dagger executes your pipelines entirely as standard OCI containers. Containerization helps to reduce build variance by isolating the build environment from the host filesystem and environment. Containerization also makes it easier to use deterministic build paths and provides mechanisms to audit the build environment.
For builds that depend on external resources, Dagger supports mounting pinned versions of those external resources. For example, you can use Dagger’s support for Git repositories and references to easily retrieve a snapshot of a source code repository at an exact commit.
Dagger also addresses part of what the Reproducible Builds website calls “the biggest source of reproducibility issues”: file timestamps. With Dagger, you can force all entries within a directory to have the same creation/modification timestamps. This eliminates one of the most common causes of variation in the build process.

Here's an example of this in action:

c, _ := dagger.Connect(ctx, dagger.WithLogOutput(os.Stdout))

   gitTag := "v0.3.10"
   repo := c.Git("https://github.com/dagger/dagger.git").Tag(gitTag).Tree()
   daggerBinary := c.Container().From("golang:1.19-alpine3.17").
   	  WithMountedDirectory("/src", repo).
   	  WithEnvVariable("CGO_ENABLED", "0").
   	  WithWorkdir("/src").
   	  WithExec([]string{"go", "build", "-o", "dagger", "./cmd/dagger"}).
   	  File("dagger")

    defaultTimestamp, _ := c.Container().From("alpine:3.17").
   	  WithMountedFile("/dagger", daggerBinary).
   	  WithExec([]string{"stat", "-c", "%y %s", "/dagger"}).
   	  Stdout(ctx)

    fixedTimestamp, _ := c.Container().From("alpine:3.17").
   	  WithMountedFile("/dagger", daggerBinary.WithTimestamps(0)).
   	  WithExec([]string{"stat", "-c", "%y %s", "/dagger"}).
   	  Stdout(ctx)

    fmt.Println("defaultTimestamp:", defaultTimestamp)
    fmt.Println("fixedTimestamp:", fixedTimestamp)

The Future

Containerization, resource pinning and timestamp fixation are only a few aspects of a reproducible build strategy. Other areas where Dagger could potentially be helpful are:

Support for the special SOURCE_DATE_EPOCH variable (recently added to buildkit)
Support for disabling internet access during container execution, thereby forcing the build to be more "hermetic"
Support for verifying if files downloaded via HTTP match a given checksum
Support for fine-grained source policies or “source pinning” (also recently added to buildkit)
Support for tools that help you identify what about your build isn't deterministic

Do you need support for one of the issues mentioned above? Leave us a comment in the issue to let us know. We look forward to continuing to improve the reproducible builds experience with Dagger.