Reproducible Builds with Dagger

February 28, 2023

Feb 28, 2023

Vikram Vaswani

Share
Share
Share
Share

The term "reproducible build" refers to a build process that produces identical results on every run. With a reproducible build process, the build outputs of each run behave identically and they can be verified (via checksum) to be bit-by-bit identical.

The Importance of Being Reproducible

Reproducible builds are important for a number of reasons:

  • They provide a way to reliably connect build artifacts to source code. The same source code always produces the same build output, regardless of the build environment.

  • They provide a way to verify build artifacts. Different people can independently build a piece of source code and verify that their checksums are identical and/or match those provided by the original developer.

  • They improve software supply chain security. When builds are always consistent, externally-injected changes (such as malware or spyware) will be immediately visible in a bit-wise comparison of build outputs.

Different Inputs = Different Outputs

As you might expect, creating reproducible builds is harder than it sounds, because of the large number of variables involved in the usual build process. These variables include:

  • the host operating system

  • the host timezone and locale

  • the host environment configuration

  • the compiler and its configuration

  • the versions and configurations of build dependencies, such as system or third-party libraries

  • file timestamps

  • and many more...

Implementing a reproducible build strategy thus implies extreme and strict control over all the variables involved in the build...not an easy task!

Dagger and Reproducible Builds

While Dagger doesn't claim to solve reproducible builds, we are working towards making them easier to implement.

  • Dagger executes your pipelines entirely as standard OCI containers. Containerization helps to reduce build variance by isolating the build environment from the host filesystem and environment. Containerization also makes it easier to use deterministic build paths and provides mechanisms to audit the build environment.

  • For builds that depend on external resources, Dagger supports mounting pinned versions of those external resources. For example, you can use Dagger’s support for Git repositories and references to easily retrieve a snapshot of a source code repository at an exact commit.

  • Dagger also addresses part of what the Reproducible Builds website calls “the biggest source of reproducibility issues”: file timestamps. With Dagger, you can force all entries within a directory to have the same creation/modification timestamps. This eliminates one of the most common causes of variation in the build process.

Here's an example of this in action:

c, _ := dagger.Connect(ctx, dagger.WithLogOutput(os.Stdout))

   gitTag := "v0.3.10"
   repo := c.Git("https://github.com/dagger/dagger.git").Tag(gitTag).Tree()
   daggerBinary := c.Container().From("golang:1.19-alpine3.17").
   	  WithMountedDirectory("/src", repo).
   	  WithEnvVariable("CGO_ENABLED", "0").
   	  WithWorkdir("/src").
   	  WithExec([]string{"go", "build", "-o", "dagger", "./cmd/dagger"}).
   	  File("dagger")

    defaultTimestamp, _ := c.Container().From("alpine:3.17").
   	  WithMountedFile("/dagger", daggerBinary).
   	  WithExec([]string{"stat", "-c", "%y %s", "/dagger"}).
   	  Stdout(ctx)

    fixedTimestamp, _ := c.Container().From("alpine:3.17").
   	  WithMountedFile("/dagger", daggerBinary.WithTimestamps(0)).
   	  WithExec([]string{"stat", "-c", "%y %s", "/dagger"}).
   	  Stdout(ctx)

    fmt.Println("defaultTimestamp:", defaultTimestamp)
    fmt.Println("fixedTimestamp:", fixedTimestamp)

The Future

Containerization, resource pinning and timestamp fixation are only a few aspects of a reproducible build strategy. Other areas where Dagger could potentially be helpful are:

Do you need support for one of the issues mentioned above? Leave us a comment in the issue to let us know. We look forward to continuing to improve the reproducible builds experience with Dagger.

The term "reproducible build" refers to a build process that produces identical results on every run. With a reproducible build process, the build outputs of each run behave identically and they can be verified (via checksum) to be bit-by-bit identical.

The Importance of Being Reproducible

Reproducible builds are important for a number of reasons:

  • They provide a way to reliably connect build artifacts to source code. The same source code always produces the same build output, regardless of the build environment.

  • They provide a way to verify build artifacts. Different people can independently build a piece of source code and verify that their checksums are identical and/or match those provided by the original developer.

  • They improve software supply chain security. When builds are always consistent, externally-injected changes (such as malware or spyware) will be immediately visible in a bit-wise comparison of build outputs.

Different Inputs = Different Outputs

As you might expect, creating reproducible builds is harder than it sounds, because of the large number of variables involved in the usual build process. These variables include:

  • the host operating system

  • the host timezone and locale

  • the host environment configuration

  • the compiler and its configuration

  • the versions and configurations of build dependencies, such as system or third-party libraries

  • file timestamps

  • and many more...

Implementing a reproducible build strategy thus implies extreme and strict control over all the variables involved in the build...not an easy task!

Dagger and Reproducible Builds

While Dagger doesn't claim to solve reproducible builds, we are working towards making them easier to implement.

  • Dagger executes your pipelines entirely as standard OCI containers. Containerization helps to reduce build variance by isolating the build environment from the host filesystem and environment. Containerization also makes it easier to use deterministic build paths and provides mechanisms to audit the build environment.

  • For builds that depend on external resources, Dagger supports mounting pinned versions of those external resources. For example, you can use Dagger’s support for Git repositories and references to easily retrieve a snapshot of a source code repository at an exact commit.

  • Dagger also addresses part of what the Reproducible Builds website calls “the biggest source of reproducibility issues”: file timestamps. With Dagger, you can force all entries within a directory to have the same creation/modification timestamps. This eliminates one of the most common causes of variation in the build process.

Here's an example of this in action:

c, _ := dagger.Connect(ctx, dagger.WithLogOutput(os.Stdout))

   gitTag := "v0.3.10"
   repo := c.Git("https://github.com/dagger/dagger.git").Tag(gitTag).Tree()
   daggerBinary := c.Container().From("golang:1.19-alpine3.17").
   	  WithMountedDirectory("/src", repo).
   	  WithEnvVariable("CGO_ENABLED", "0").
   	  WithWorkdir("/src").
   	  WithExec([]string{"go", "build", "-o", "dagger", "./cmd/dagger"}).
   	  File("dagger")

    defaultTimestamp, _ := c.Container().From("alpine:3.17").
   	  WithMountedFile("/dagger", daggerBinary).
   	  WithExec([]string{"stat", "-c", "%y %s", "/dagger"}).
   	  Stdout(ctx)

    fixedTimestamp, _ := c.Container().From("alpine:3.17").
   	  WithMountedFile("/dagger", daggerBinary.WithTimestamps(0)).
   	  WithExec([]string{"stat", "-c", "%y %s", "/dagger"}).
   	  Stdout(ctx)

    fmt.Println("defaultTimestamp:", defaultTimestamp)
    fmt.Println("fixedTimestamp:", fixedTimestamp)

The Future

Containerization, resource pinning and timestamp fixation are only a few aspects of a reproducible build strategy. Other areas where Dagger could potentially be helpful are:

Do you need support for one of the issues mentioned above? Leave us a comment in the issue to let us know. We look forward to continuing to improve the reproducible builds experience with Dagger.

The term "reproducible build" refers to a build process that produces identical results on every run. With a reproducible build process, the build outputs of each run behave identically and they can be verified (via checksum) to be bit-by-bit identical.

The Importance of Being Reproducible

Reproducible builds are important for a number of reasons:

  • They provide a way to reliably connect build artifacts to source code. The same source code always produces the same build output, regardless of the build environment.

  • They provide a way to verify build artifacts. Different people can independently build a piece of source code and verify that their checksums are identical and/or match those provided by the original developer.

  • They improve software supply chain security. When builds are always consistent, externally-injected changes (such as malware or spyware) will be immediately visible in a bit-wise comparison of build outputs.

Different Inputs = Different Outputs

As you might expect, creating reproducible builds is harder than it sounds, because of the large number of variables involved in the usual build process. These variables include:

  • the host operating system

  • the host timezone and locale

  • the host environment configuration

  • the compiler and its configuration

  • the versions and configurations of build dependencies, such as system or third-party libraries

  • file timestamps

  • and many more...

Implementing a reproducible build strategy thus implies extreme and strict control over all the variables involved in the build...not an easy task!

Dagger and Reproducible Builds

While Dagger doesn't claim to solve reproducible builds, we are working towards making them easier to implement.

  • Dagger executes your pipelines entirely as standard OCI containers. Containerization helps to reduce build variance by isolating the build environment from the host filesystem and environment. Containerization also makes it easier to use deterministic build paths and provides mechanisms to audit the build environment.

  • For builds that depend on external resources, Dagger supports mounting pinned versions of those external resources. For example, you can use Dagger’s support for Git repositories and references to easily retrieve a snapshot of a source code repository at an exact commit.

  • Dagger also addresses part of what the Reproducible Builds website calls “the biggest source of reproducibility issues”: file timestamps. With Dagger, you can force all entries within a directory to have the same creation/modification timestamps. This eliminates one of the most common causes of variation in the build process.

Here's an example of this in action:

c, _ := dagger.Connect(ctx, dagger.WithLogOutput(os.Stdout))

   gitTag := "v0.3.10"
   repo := c.Git("https://github.com/dagger/dagger.git").Tag(gitTag).Tree()
   daggerBinary := c.Container().From("golang:1.19-alpine3.17").
   	  WithMountedDirectory("/src", repo).
   	  WithEnvVariable("CGO_ENABLED", "0").
   	  WithWorkdir("/src").
   	  WithExec([]string{"go", "build", "-o", "dagger", "./cmd/dagger"}).
   	  File("dagger")

    defaultTimestamp, _ := c.Container().From("alpine:3.17").
   	  WithMountedFile("/dagger", daggerBinary).
   	  WithExec([]string{"stat", "-c", "%y %s", "/dagger"}).
   	  Stdout(ctx)

    fixedTimestamp, _ := c.Container().From("alpine:3.17").
   	  WithMountedFile("/dagger", daggerBinary.WithTimestamps(0)).
   	  WithExec([]string{"stat", "-c", "%y %s", "/dagger"}).
   	  Stdout(ctx)

    fmt.Println("defaultTimestamp:", defaultTimestamp)
    fmt.Println("fixedTimestamp:", fixedTimestamp)

The Future

Containerization, resource pinning and timestamp fixation are only a few aspects of a reproducible build strategy. Other areas where Dagger could potentially be helpful are:

Do you need support for one of the issues mentioned above? Leave us a comment in the issue to let us know. We look forward to continuing to improve the reproducible builds experience with Dagger.

Get Involved With the community

Discover what our community is doing, and join the conversation on Discord & GitHub to help shape the evolution of Dagger.

Subscribe to our newsletter

Get Involved With the community

Discover what our community is doing, and join the conversation on Discord & GitHub to help shape the evolution of Dagger.

Subscribe to our newsletter

Get Involved With the community

Discover what our community is doing, and join the conversation on Discord & GitHub to help shape the evolution of Dagger.

Subscribe to our newsletter