diff --git a/_layouts/post.html b/_layouts/post.html
new file mode 100644
index 0000000..b270978
--- /dev/null
+++ b/_layouts/post.html
@@ -0,0 +1,27 @@
+
+
+
+{%- include head.html -%}
+
+
+
+
+ {%- include header.html -%}
+
+
{{ page.title }}
+
+
+ {{ content }}
+
+
+
+ Written on {{ page.date | date: "%B %e, %Y" }}
+
+
+
+
+ {%- include footer.html -%}
+
+
+
+
\ No newline at end of file
diff --git a/_layouts/tools.html b/_layouts/tools.html
index 905052f..36c7aba 100644
--- a/_layouts/tools.html
+++ b/_layouts/tools.html
@@ -69,7 +69,7 @@
Minimalism and security first repository of reproducible and multi-signed OCI images of common open source software toolchains full-source bootstrapped from Stage 0 all the way up.
diff --git a/_posts/2024-03-28-determinism.md b/_posts/2024-03-28-determinism.md
new file mode 100644
index 0000000..00ec3ba
--- /dev/null
+++ b/_posts/2024-03-28-determinism.md
@@ -0,0 +1,298 @@
+---
+layout: post
+title: Adventures In Supply Chain Integrity
+date: 2024-03-28
+cover_image: "/assets/images/whale_shark.jpg"
+authors:
+ - name: Ryan Heywood
+ bio: Professional bonker / twerker.
+ twitter: le twitter
+ - name: Anton Livaja
+ bio: Professional banana juggler.
+ twitter: antonlivaja
+ - name: Lance R. Vick
+ bio: Dolphin trainer
+ twitter: no.
+---
+
+When a compiler is used to compile some piece of software, how do we verify
+that the compiler can be trusted? Is it well known who compiled the compiler
+itself? Usually compilers are not built from source, and even when they are,
+they are seeded from a binary that itself is opaque and difficult to verify.
+How does one check if the supply chain integrity of the compiler itself is
+intact, even before we get to building software with it?
+
+Compiler supply chains are obscured and at many points seeded from binaries,
+making it nearly impossible to verify their integrity. In 1984, Ken Thompson
+wrote "Reflections on Trusting Trust" and illustrated that a compiler can
+modify software during the compilation process, compromising the software. Put
+simply, this means that reviewing the source code is not enough. We need to be
+sure that the compiler itself isn't compromised, as it could be used to modify
+the intended behavior of the software.
+
+What about the software that's built using the compiler? Has the source code
+been modified during compilation? Has the resulting binary of the software been
+tampered with, perhaps in the CI/CD runner which runs an OS with a
+vulnerability in one of its sub dependencies? Or perhaps the server host has
+been compromised and attackers have gained control of the infrastructure?
+These are difficult software supply chain security issues which are often swept
+under the rug or completely overlooked due to lack of understanding. To
+eliminate this surface area of attack, we need a good answer to these
+questions, and more importantly we need tooling and practical methods which can
+help close these gaps in the supply chain.
+
+This line of questioning becomes especially concerning in the context of widely
+used software, such as images pulled from DockerHub, package managers, and
+Linux distributions. Software procured via these channels are used widely and
+are pervasive in almost all software and as such pose a severe attack vector.
+If the maintainer of a widely used DockerHub image has their machine
+compromised, or are coerced or even forced under duress to insert malicious
+code into the binaries they are responsible for, there is no effective measure
+in place to detect and catch this, resulting in millions of downstream
+consumers being impacted. Imagine what would happen if the maintainer of a
+default DockerHub image of a widely used language was compromised, and the
+binary they released had a backdoor in it. The implications are extremely far
+reaching, and would be disastrous.
+
+There are two distinct problems at hand which share a solution:
+
+1. How do we ensure that we can trust the toolchain used to build software
+2. How do we ensure that we can trust software built with the toolchain
+
+The answer to both questions is the same. We achieve it via verifiability and
+determinism. To be clear, we are not trying to solve the problem of the code
+itself being compromised in the source. If the source code is compromised,
+determinism does not help prevent that. If the code is reviewed and verified as
+being secure, then determinism and multiple reproductions of the software
+add a set of excellent guarantees.
+
+Deterministically built software is any software which always compiles to the
+same bit-for-bit exact binary. This is useful because it makes it trivial to
+check the integrity of the binary. If the binary is always the same, we can use
+hashing to ensure that nothing about the binary has changed. Typically minor
+differences which are introduced during the build process, such as time stamps,
+mean that software is typically non-deterministic. By pinning all aspects of
+the environment the software is built in and removing any changing factors such
+as time and user or machine IDs, we can force the software to always be
+bit-for-bit.
+
+Now, imagine a scenario where a developer is compiling software, and they are
+not doing it deterministically. Any time they build the software, they have no
+way to easily verify if the binary changed in a meaningful way compared to the
+previous one without doing low level inspection. With determinism, it's as
+simple as hashing one binary, repeating the compilation, hashing the second
+result, and comparing it with the original. This is great, but it's still not
+enough to ensure that the binary can be trusted, as there may be malware which
+always modifies the binary in the same manner. To mitigate this, we can build
+the software on multiple different machines, ideally by different maintainers,
+using different operating systems and even different hardware, as it's much
+less likely that multiple diverse stacks and individuals are compromised by the
+same malware or attacker. Following this process, we can eliminate the risk of
+modification during compilation going undetected. To add a layer of trust that
+the hashes can be trusted, we can use cryptographic signing, as is customary
+for many software releases.
+
+Assessing the current state of affairs regarding software package managers and
+Linux distributions, and how far they have gone to mitigate these risks, we
+performed an analysis of popular projects:
+
+Alpine is the most popular Linux distribution (distro) in the container
+ecosystem and has made great strides in providing a minimal `musl` based
+distribution with reasonable security defaults and is suitable for a lot of use
+cases, however in the interest of developer productivity and low friction for
+contributors, none of it is cryptographically signed.
+
+Debian (and derivatives like Ubuntu) is one of most popular option for servers
+and is largely reproducible and also signs all packages. Being `glibc` based
+with a focus on compatibility and desktop use cases, it results in a huge
+number of dependencies for almost any software run on it, enacts partial code
+freezes for long periods of time between releases, and often has very stale
+packages as various compatibility goals block updates. This overhead introduces
+a lot of surface area of malicious code to hide itself in. Unfortunately, due
+to its design, when building software deterministically on this OS, each and
+every repo needs to keep costly snapshots of all dependencies to reproduce
+build containers, as Debian packages are archived and retired after some time
+to servers with low bandwidth. This creates a lot of friction for teams who, as
+a result, have to archive often hundreds of .deb files for every project, and
+also has the added issue of Debian having very old versions of software such as
+Rust, which is a common requirement. This can be quite problematic for teams
+who want to access latest language features. Even with all this work, Debian
+does not have truly reproducible Rust (which will be discussed later in this
+post), and packages are signed only by single maintainers whom we have to fully
+trust that they didn't release a compromised binary.
+
+Fedora (and RedHat based distros) also sign all packages, but otherwise suffer
+from similar one-size-fits-all bloat problems as Debian with a different coat
+of paint. Additionally, their reliance on centralized builds has been used as
+justification for them to not pursue reproducibility at all which makes them a
+non-starter for security focused use cases.
+
+Arch has very fast updates as a rolling release distro, and package definitions
+are signed and often reproducible, but they change from one minute to the next,
+still resulting in the challenge of having to come up with a solution to pin
+and archive sets of dependencies that work well together for software that's
+built using it and requires determinism.
+
+Nix is almost entirely reproducible by design and allows for lean and minimal
+output artifacts. It is also a big leap forward in having good separation of
+concerns between privileged immutable and unprivileged mutable spaces, however
+like Alpine there is no maintainer-level signing in order to reduce the
+friction for hobbyist that wants to contribute.
+
+Guix is reproducible by design as well, borrowing a lot from Nix. It also does
+maintainer-level signing like Debian. It comes the closest to the solution we
+need, but it only provides single signed package contributions, and a `glibc`
+base with a large dependency tree, with a significant footprint of tooling to
+review and understand to form confidence in it. This is still too much overhead
+we simply don't want or need for use cases like container builds of software,
+lean embedded operating systems, or any sensitive system where we want the
+utmost level of supply chain security assurance.
+
+For those whose goal is to build their own software packages deterministically
+with high portability, maintainability, and maximally easy supply chain
+auditability, none of these solutions hit the mark.
+
+On reflecting on these issues, we concluded we want the `musl`-based
+container-ideal minimalism of Alpine, the obsessive determinism and full-source
+supply chain goals of Guix, and a step beyond the single-signature packages of
+Debian, Fedora, and Arch. We also concluded that we want a fully verifiable
+bootstrapped toolchain, consisting of a compiler and accompanying libraries
+required for building most modern software.
+
+You may know where this is going. Here is where we made the totally reasonable
+and not-at-all-crazy choice to effectively create…
+
+## Yet *Another* Linux Distribution
+Let’s take a look at some of the features we care about most compared to make
+it more clear why nothing else hit the mark for us.
+
+A comparison of `stagex` to other distros in some of the areas we care about:
+
+| Distro | Containerized | Signatures | Libc | Bootstrapped | Reproducible | Rust Deps |
+|--------|---------------|------------|-------|--------------|--------------|-----------|
+| Stagex | Native | 2+ Human | Musl | Yes | Yes | 4 |
+| Guix | No | 1 Human | Glibc | Yes | Yes | 4 |
+| Nix | No | 1 Bot | Glibc | Partial | Mostly | 4 |
+| Debian | Adapted | 1 Human | Glibc | No | Partial | 232 |
+| Arch | Adapted | 1 Human | Glibc | No | Partial | 262 |
+| Fedora | Adapted | 1 Bot | Glibc | No | No | 166 |
+| Alpine | Adapted | None | Musl | No | No | 32 |
+
+We are leaving out hundreds of distros here, but at the risk of starting a holy
+war, we felt it was useful to compare a few popular options for contrast to the
+goals of the minimal container-first, security-first, deterministic distro we
+put together.
+
+We are not the first to go down this particular road road. The Talos Linux
+project built their own tiny containerized toolchain from gcc to golang as the
+base to build their own minimal immutable k8s distro.
+
+Getting all the way to bootstrapping rust, however, is a much bigger chunk of
+pain as we learned…
+
+## The Oxidation Problem - Bootstrapping Rust
+Getting from gcc all the way to golang was mostly pain-free, thanks to Google
+documenting this path well and providing all the tooling to do it. One only
+needs 3 versions of golang to get all the way back to GCC.
+
+Bootstrapping Rust is a bit of an ordeal. People love Rust for its memory
+safety and strictness, however we have noticed supply chain integrity is not
+an area where it excels. This is mostly because Rust changes so much from one
+release to the next, that a given version of Rust can only ever be built with
+its immediate predecessor.
+
+If one follows the chicken-and-egg problem far enough the realization dawns
+that in most distros the chicken comes first. Most included a non-reproducible
+“seed” Rust binary presumably compiled by some member of the Rust team, then
+use that to build the next version, and then carry on from there. This means
+even some of the distros that _say_ their Rust builds are reproducible have a
+pretty big asterisk. We won’t call anyone out - you know who you are.
+
+Granted, even if you were to build all the way up from the OCaml roots of Rust
+(if you can find that code and then get it to build), you would still require a
+trusted OCaml compiler. Software supply chains are hard, and we always end up
+back at the famous Trusting Trust Problem.
+
+There have been some amazing efforts by the Guix team to bootstrap GCC and the
+entire package chain after it with a tiny human-auditable blob of x86 assembly
+via the GNU Mes project. That is probably in the cards for our stack as well,
+however for the short term we wanted to at least go as low in the stack as GCC
+like we do with go as a start which is already a sizable effort. Thankfully,
+John Hodge (mutabah), a brilliant (crazy?) member of the open source community,
+created “mrustc” which implements a minimal semi-modern rust 1.54 compiler in
+C++ largely from transpiled Rust code. It is missing a lot of critical features
+that make it unsuitable for direct use, but it _does_ support enough features
+to compile official Rust 1.55 sources, which can compile Rust 1.56 and so on.
+This is the path Guix and Nix both went down, and we are taking their lead
+here.
+
+Mrustc at the time lacked support for musl libc which threw a wrench in things,
+but after a fair bit of experimentation we were able to patch in support musl
+and get it upstream.
+
+The result is we now have the first deterministic `musl` based rust compiler
+bootstrapped from 256 bytes of assembly, and you can reproduce our builds right
+now from any OS that can run Docker 26.
+
+## Determinism and Real World Applications
+To demonstrate how determinism can be used to prevent real world attacks in
+practical terms let's consider a major breach which could have been prevented.
+
+SolarWinds experienced a major security breach in which Russian threat actors
+were able to compromise their infrastructure and piggyback on their software to
+distribute malware to their entire client base. The attackers achieved this by
+injecting malicious code into SolarWinds products, such as the Orion Platform,
+which was then downloaded by the end users. This seems like a very difficult
+thing to protect from, but there is a surprisingly simple solution. If
+SolarWinds leveraged deterministic builds of their software, they would have
+been able to detect that the binaries of the software they are delivering to
+their clients have been tampered.
+
+To achieve this, there are a few ways they could have gone about this, but
+without getting too deep into implementation details, it would have sufficed to
+have multiple runners in different isolated environments, or even on different
+cloud platforms, which would reproduce the deterministic build and compare the
+resulting hashes in order to verify the binaries have not been tampered. If any
+of the systems built the software and got a different hash - that would be a
+clear signal that further investigations should be made which would have likely
+lead to the detection of the intruder. Without this approach, SolarWinds was
+completely unaware of their systems being infiltrated for months, and during
+this period large quantities of end user data was exfiltrated, along with their
+tooling. Considering SolarWinds is a cybersecurity software and services
+provider, the tools stolen from them were then likely used to further develop
+and weaponize the attacker's capabilities.
+
+## Future Work
+These initial efforts were predominately sponsored with financial and
+engineering time contributions from Distrust, Mysten Labs, and Turnkey, who all
+share threat models and container-driven workflows Stagex is designed to
+support.
+
+While we all have a vested interest to help maintain it, we all felt it
+important this project stand on its own and belong to the community and are
+immensely appreciative to a number of volunteers that have very quickly dived
+in and started making significant contributions and improvements.
+
+As of writing this, Stagex has 100+ packages covering some of the core software
+you may be using regularly, all built using the deterministically built
+toolchain, and of course the software itself also built deterministically. Some
+of the packages include `rust`, `go`, `nodejs`, `python3.8`, `curl`, `bash`,
+`git`, `tofu` and many more.
+
+We would like to support building with `buildah` and `podman` for build-tooling
+diversity. We would also love help from the open source community to see GCC
+bootstrapped all the way down to x86_assembly via Mes. This may require using
+multiple seed distro containers to work in parallel to ensure we don’t have a
+single provenance source for that layer.
+
+We are also actively on and have made some progress towards the addition of
+core packages required to use this distribution as a minimal Linux OS.
+
+If you have need for high trust in your own build system, please reach out and
+we would love to find a way to collaborate.
+
+## References
+* [Bootstraping rust](https://guix.gnu.org/en/blog/2018/bootstrapping-rust/)
+* [Full source bootstrappin](https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down/)
+* [Running the "Reflections on Trusting Trust" Compiler](https://research.swtch.com/nih)
+* [Reflections on Trusting Trust](https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf)
diff --git a/_sass/base.scss b/_sass/base.scss
index f88afc9..ba495ef 100644
--- a/_sass/base.scss
+++ b/_sass/base.scss
@@ -62,7 +62,7 @@ h4,
h5,
h6 {
margin: 0px;
- margin-top: 12px;
+ margin-top: 0px;
margin-bottom: 12px;
font-weight: bold;
color: var(--text-color);
@@ -103,23 +103,26 @@ a:hover {
}
p {
+ /*
word-wrap: break-word;
word-break: break-word;
white-space: pre-wrap;
- margin-bottom: 15px;
+ */
+ margin-top: 16px;
+ margin-bottom: 16px;
}
footer {
color: var(--text-color);
border-top: var(--border);
- margin-top: 0;
- padding-top: 10px;
+ margin-top: 24px;
+ padding-top: 12px;
text-align: right;
}
header {
- margin-top: 50px;
- margin-bottom: 50px;
+ margin-top: 24px;
+ margin-bottom: 24px;
}
header p {
@@ -147,10 +150,6 @@ hr {
text-decoration: none;
}
-.header-page-links {
- margin-right: 10%;
-}
-
.header-page-links li:before {
content: ''
}
@@ -167,7 +166,7 @@ hr {
}
.right-menu {
- width: 70%;
+ width: 74%;
display: flex;
justify-content: flex-end;
align-items: center;
@@ -450,7 +449,7 @@ textarea {
.flex-container {
display: flex;
justify-content: space-between;
- align-items: center;
+ align-items: flex-start;
}
.flex-container-inner {
@@ -458,8 +457,13 @@ textarea {
}
section {
- padding-top: 100px;
- padding-bottom: 100px;
+ margin-top: 24px;
+ margin-bottom: 24px;
+}
+
+.extra-spacing {
+ margin-top: 70px;
+ margin-bottom: 70px;
}
.companies {
@@ -891,6 +895,19 @@ pre {
}
/** end carousel */
+/**
+ * Blog
+ */
+.post img {
+ max-width: 100%;
+}
+
+#lp-post-img {
+ max-width: 100%;
+}
+
+/** end blog */
+
*,
*::before,
*::after {
diff --git a/assets/base/rss.png b/assets/base/rss.png
new file mode 100644
index 0000000..2df0b6e
Binary files /dev/null and b/assets/base/rss.png differ
diff --git a/assets/images/whale_shark.jpg b/assets/images/whale_shark.jpg
new file mode 100644
index 0000000..71ea16e
Binary files /dev/null and b/assets/images/whale_shark.jpg differ
diff --git a/assets/js/main.js b/assets/js/main.js
index c99fe57..b543b14 100644
--- a/assets/js/main.js
+++ b/assets/js/main.js
@@ -10,13 +10,15 @@ collapsibleButton.addEventListener("click", function () {
});
document.addEventListener('DOMContentLoaded', function () {
- fetch('../assets/js/carousel-items.json')
- .then(response => response.json())
- .then(data => {
- createCarouselItems(data);
- initializeCarousel();
- })
- .catch(error => console.error('Error loading JSON:', error));
+ if (window.location.pathname === "/index.html") {
+ fetch('/assets/js/carousel-items.json')
+ .then(response => response.json())
+ .then(data => {
+ createCarouselItems(data);
+ initializeCarousel();
+ })
+ .catch(error => console.error('Error loading JSON:', error));
+ }
});
function createCarouselItems(items) {
diff --git a/blog.md b/blog.md
new file mode 100644
index 0000000..9b35887
--- /dev/null
+++ b/blog.md
@@ -0,0 +1,18 @@
+---
+layout: page
+title: Blog
+permalink: /blog.html
+---
+