Introduction
This repo contains:
- Server config and install scripts
server/nixosis the NixOS config
- Templates for CI runner images
profiles/servo-windows10/*is for Windows 10 runnersprofiles/servo-ubuntu2204/*is for Ubuntu 22.04 runnersprofiles/servo-macos13/*is for macOS 13 runnersprofiles/servo-macos14/*is for macOS 14 runnersprofiles/servo-macos15/*is for macOS 15 runners
- A service that automates runner management
monitoris the service.env.exampleandmonitor.toml.examplecontain the settings
Setting up a server on Hetzner
Overview of the server scripts:
server/build-nixos-installer-kexec.sh
From any existing NixOS system, build a NixOS installer kexec image.server/start-nixos-installer.sh
From the Hetzner rescue system, build and run the NixOS installer.server/first-time-install.sh <hostname> <disk> [disk ...]
From the NixOS installer image, wipe the given disks and install NixOS.server/install-or-reinstall.sh <hostname> <path/to/mnt>
From the NixOS installer image, install or reinstall NixOS to the given root filesystem mount, without wiping any disks. Won’t run correctly on the deployed server.
Start the rescue system, then connect over SSH (use ssh -oUserKnownHostsFile=/dev/null) and run the following:
$ git clone https://github.com/servo/ci-runners.git
$ cd ci-runners/server
$ apt update
$ apt install -y zsh
$ ./start-nixos-installer.sh
When you see + kexec -e, kill your SSH session by pressing Enter, ~, ., then reconnect over SSH (use ssh -4 -oUserKnownHostsFile=/dev/null this time) and run the following:
$ git clone https://github.com/servo/ci-runners.git
$ cd ci-runners/server
$ ./first-time-install.sh ci0 /dev/nvme{0,1}n1
$ reboot
Now you can set up the monitor service. Note that rebooting may not be enough to terminate the Hetzner rescue system. If the rescue system is still active, try Reset > Execute an automatic hardware reset in the Hetzner console.
Setting up the monitor service
To get a GITHUB_TOKEN for the monitor service in production:
- Create a fine-grained personal access token
- Token name:
servo ci monitor - Resource owner: servo
- Expiration: 90 days
- Repository access: Public Repositories (read-only)
- Organization permissions > Self-hosted runners > Access: Read and write
- Token name:
To get a GITHUB_TOKEN for testing the monitor service:
- Create a fine-grained personal access token
- Token name:
servo ci monitor test - Resource owner: your GitHub account
- Expiration: 7 days
- Repository access > Only select repositories
-
Your clone of servo/ci-runners
-
Your clone of servo/servo
-
- Repository permissions > Administration > Access: Read and write (unfortunately there is no separate permission for repository self-hosted runners)
- Token name:
To set up the monitor service, connect over SSH (mosh recommended) and run the following:
$ zfs create tank/base
$ git clone https://github.com/servo/ci-runners.git ~/ci-runners
$ cd ~/ci-runners
$ mkdir /var/lib/libvirt/images
$ virsh net-define cinet.xml
$ virsh net-autostart cinet
$ virsh net-start cinet
$ rustup default stable
$ mkdir ~/.cargo
$ git clone https://github.com/servo/servo.git ~/servo
$ mkdir /config /config/monitor
$ cp ~/ci-runners/.env.example /config/monitor/.env
$ cp ~/ci-runners/monitor/monitor.toml.example /config/monitor/monitor.toml
$ vim -p /config/monitor/.env /config/monitor/monitor.toml
$ systemctl restart monitor
Monitor API
Notes about endpoints
Some of the endpoints below require the monitor API token. Requests to these endpoints need to prove you know that token:
Authorization: Bearer <monitor API token>
Some of the endpoints below may require sequential processing in the backend. While requests to other endpoints are guaranteed to be cheap, requests to these endpoints can be a bit expensive when successful, because they’re processed one at a time in the monitor thread.
The monitor thread interacts with external resources like the GitHub API and the hypervisor, and it runs a loop that looks like the pseudocode below:
loop {
if registrations_last_updated.elapsed()
> MONITOR_DOT_TOML.api_cache_timeout
{
registrations = github_api::list_registered_runners();
registrations_last_updated = Instant::now();
}
guests = hypervisor_api::list_guests();
hypervisor_api::take_screenshots(guests);
hypervisor_api::check_ipv4_addresses(guests);
if let Some((request, response_tx)) = monitor_request_rx
.recv_timeout(MONITOR_DOT_TOML.monitor_poll_interval)
{
response_tx.send(match request {
// ...
});
}
}
Reserving runners
The recommended way to reserve runners is to use the tokenless API (POST /select-runner), which uses a temporary artifact to prove that the request is genuine and authorised.
This allows self-hosted runners to be used in pull_request runs (rather than only pull_request_target), and in workflows that do not have access to secrets.
Alternatively you can use the monitor API token, which for workflows means you will need to define it as a secret like ${{ secrets.MONITOR_API_TOKEN }}.
POST /select-runner
— Reserve one runner for a job using an artifact
- May require sequential processing in the backend
- ?unique_id (required; UUIDv4)
- uniquely identifies this job in its friendly name, even if the same workflow is called twice in the workflow call tree
- ?qualified_repo (required;
<user>/<repo>) - the repository running this job
- ?run_id (required; number)
- the workflow run id of this job
POST /profile/profile_key/take
— Reserve one runner for a job using the monitor API token
- Requires monitor API token
- May require sequential processing in the backend
- Response: application/json —
{"id", "runner"}|null
- profile_key (string)
- what kind of runner to take
- ?unique_id (required; UUIDv4)
- uniquely identifies this job in its friendly name, even if the same workflow is called twice in the workflow call tree
- ?qualified_repo (required;
<user>/<repo>) - the repository running this job
- ?run_id (required; number)
- the workflow run id of this job
POST /profile/profile_key/take/count
— Reserve runners for a set of jobs using the monitor API token
- Requires monitor API token
- May require sequential processing in the backend
- Response: application/json —
[{"id", "runner"}]|null
- profile_key (string)
- what kind of runners to take
- count (number)
- how many runners to take
- ?unique_id (required; UUIDv4)
- uniquely identifies these jobs in their friendly names, even if the same workflow is called twice in the workflow call tree
- ?qualified_repo (required;
<user>/<repo>) - the repository running these jobs
- ?run_id (required; number)
- the workflow run id of these jobs
Runner internals
GET /github-jitconfig
— Get the ephemeral runner token for this runner
- May require sequential processing in the backend
- Response: application/json
GET /boot
— Get the boot script for this runner
- May require sequential processing in the backend
- Response: text/plain
Dashboard internals
GET /dashboard.html
— Get the rendered contents of the dashboard for live updates
- Response: text/html
GET /dashboard.json
— Get a machine-readable version of the contents of the dashboard
- Response: application/json
GET /profile/profile_key/screenshot.png
— Get the last cached screenshot of a rebuild guest
- Response: image/png
GET /runner/runner_id/screenshot.png
— Get the last cached screenshot of a runner guest
- Response: image/png
GET /runner/runner_id/screenshot/now
— Take a screenshot of a runner guest immediately
- May require sequential processing in the backend
- Response: image/png
Policy overrides (EXPERIMENTAL)
Policy overrides provide rudimentary support for autoscaling, implemented as part of Servo’s effort to self-host WPT runs (#21). The design has several unsolved problems, and should not be used.
They allow us to dynamically reconfigure a server’s runner targets to meet the needs of a workflow. This can be useful if that workflow is huge and parallel, and you want to divert as much of your concurrent runner capacity as possible to it.
GET /policy/override
— Get the current policy override
POST /policy/override
— Initiate a new policy override
- Requires monitor API token
- Response: application/json —
{"<<profile_key>>": <count>}
- ?<profile_key>=count (required; string/number pairs)
- how many runners to target for each profile key
DELETE /policy/override
— Cancel the current policy override
- Requires monitor API token
- Response: application/json —
{"<<profile_key>>": <count>}
Hacking on the monitor locally
Easy but slow way:
$ nix develop -c sudo [RUST_BACKTRACE=1] monitor
Harder but faster way:
$ export RUSTFLAGS=-Clink-arg=-fuse-ld=mold
$ cargo build
$ sudo [RUST_BACKTRACE=1] IMAGE_DEPS_DIR=$(nix eval --raw .\#image-deps) LIB_MONITOR_DIR=. target/debug/monitor
Minimal base images
A useful image for hacking on the monitor locally, since it only takes around a minute to build. These are also used as base configs for some of Servo’s images.
This is a libvirt/KVM-based image, compatible with Linux amd64 servers only:
base-ubuntu2204
Maintenance guide
Current SSH host keys:
- ci0.servo.org =
SHA256:aoy+JW6hlkTwQDqdPZFY6/gDf1faOQGH5Zwft75Odrc(ED25519) - ci1.servo.org =
SHA256:ri52Ae31OABqL/xCss42cJd0n1qqhxDD9HvbOm59y8o(ED25519) - ci2.servo.org =
SHA256:qyetP4wIOHrzngj1SIpyEnAHJNttW+Rd1CzvJaf0x6M(ED25519) - ci3.servo.org =
SHA256:4grnt9EVzUhnRm7GR5wR1vwEMXkMHx+XCYkns6WfA9s(ED25519) - ci4.servo.org =
SHA256:Yc1TdE2UDyG2wUUE0uGHoWwbbvUkb1i850Yye9BC0EI(ED25519)
To deploy an updated config to any of the servers:
$ cd server/nixos
$ ./deploy -s ci0.servo.org ci0
$ ./deploy -s ci1.servo.org ci1
$ ./deploy -s ci2.servo.org ci2
$ ./deploy -s ci3.servo.org ci3
$ ./deploy -s ci4.servo.org ci4
To deploy, read monitor config, write monitor config, restart the monitor, or run a command on one or more servers:
$ cd server/nixos
$ ./do <deploy|read|write> [host ...]
$ ./do deploy ci0 ci1 ci2
$ ./do read ci0 ci1
$ ./do write ci1 ci2
$ ./do restart-monitor ci0 ci1 ci2
$ ./do run [host ...] -- <command ...>
$ ./do run ci0 ci2 -- virsh edit servo-ubuntu2204
To monitor system logs or process activity on any of the servers:
$ ./do logs <host>
$ ./do htop <host>
Images
Windows 10 images
Runners created from these images preinstall all dependencies (including those specified in the main repo, like GStreamer and Chocolatey deps), preload the main repo, and prebuild Servo in the release profile.
This is a libvirt/KVM-based image, compatible with Linux amd64 servers only:
servo-windows10
Ubuntu 22.04 images
Runners created from these images preinstall all dependencies (including those specified in the main repo, like mach bootstrap deps), preload the main repo, and prebuild Servo in the release profile.
These are libvirt/KVM-based images, compatible with Linux amd64 servers only:
servo-ubuntu2204(ci0, ci1, and ci2 only)servo-ubuntu2204-bench(ci3 and ci4 only)
macOS 13/14/15 x64 images
Runners created from these images preinstall all dependencies (including those specified in the main repo, like mach bootstrap deps), preload the main repo, and prebuild Servo in the release profile.
These are libvirt/KVM-based images, compatible with Linux amd64 servers only:
servo-macos13servo-macos14servo-macos15
Automating the macOS installer is difficult without paid tooling, but we can get close enough with some once-per-server setup. To prepare a server for macOS 13/14/15 guests, build a clean image, replacing “13” with the macOS version as needed:
- Clone the OSX-KVM repo:
git clone --recursive https://github.com/kholia/OSX-KVM.git /var/lib/libvirt/images/OSX-KVM - Download the BaseSystem.dmg:
( cd /var/lib/libvirt/images/OSX-KVM; ./fetch-macOS-v2.py ) - Rename it to reflect the macOS version:
mv /var/lib/libvirt/images/OSX-KVM/BaseSystem{,.macos13}.dmg - Convert that .dmg to .img:
dmg2img -i /var/lib/libvirt/images/OSX-KVM/BaseSystem.macos13.{dmg,img} - Reduce the OpenCore
Timeoutsetting:cd /var/lib/libvirt/images/OSX-KVM/OpenCorevim config.plist- Type
/<key>Timeout<, press Enter, typej0f>wcw5, press Escape, type:x, press Enter rm OpenCore.qcow2./opencore-image-ng.sh --cfg config.plist --img OpenCore.qcow2cp /var/lib/libvirt/images/OSX-KVM/OpenCore/OpenCore{,.macos13}.qcow2
- Create zvol and libvirt guest with random UUID and MAC address
zfs create -V 90G tank/base/servo-macos13.cleanvirsh define profiles/servo-macos13/guest.xmlvirt-clone --preserve-data --check path_in_use=off -o servo-macos13.init -n servo-macos13.clean --nvram /var/lib/libvirt/images/OSX-KVM/OVMF_VARS.servo-macos13.clean.fd --skip-copy sda -f /dev/zvol/tank/base/servo-macos13.clean --skip-copy sdccp /var/lib/libvirt/images/OSX-KVM/{OVMF_VARS-1920x1080.fd,OVMF_VARS.servo-macos13.clean.fd}virsh undefine --keep-nvram servo-macos13.init- TODO: improve per-vm nvram management
virsh start servo-macos13.clean
- Install macOS
- At the boot menu, choose macOS Base System
- Choose Disk Utility
- Choose the QEMU HARDDISK Media listed as Uninitialized
- Click Erase, click Erase, then click Done
- Press Cmd+Q to quit Disk Utility
- macOS 13: Choose Reinstall macOS Ventura
- macOS 14: Choose Reinstall macOS Sonoma
- macOS 15: Choose Reinstall macOS Sequoia
- When asked to select a disk, choose Untitled
- Shut down the guest when you see Select Your Country or Region:
virsh shutdown servo-macos13.clean
- Take a snapshot:
zfs snapshot tank/base/servo-macos13.clean@fresh-install - Boot base vm guest:
virsh start servo-macos13.clean- If latency is high:
- Press Command+Option+F5, then click Full Keyboard Access, then press Enter
- You can now press Shift+Tab to get to the buttons at the bottom of the wizard
- Select Your Country or Region = United States
- If latency is high, Accessibility > Vision then:
- > Reduce Transparency = Reduce Transparency
- > Reduce Motion = Reduce Motion
- TODO: macOS 15: do we need to uncheck the box for allowing password reset via Apple ID?
- macOS 13/14: Migration Assistant = Not Now
- macOS 15: Transfer Your Data to This Mac = Set up as new
- macOS 13/14: Sign In with Your Apple ID = Set Up Later
- macOS 15: Sign In to Your Apple Account = Set Up Later
- Full name =
servo - Account name =
servo - Password =
servo2024! - Enable Location Services = Continue, Don’t Use
- Select Your Time Zone > Closest City: = UTC - United Kingdom
- Uncheck Share Mac Analytics with Apple
- Screen Time = Set Up Later
- macOS 15: Update Mac Automatically = Only Download Automatically
- TODO: can we prevent the download too?
- Quit the Keyboard Setup Assistant
- If latency is high:
- Press Cmd+Space, type
full keyboard access, turn it on, then press Cmd+Q - On macOS 15, this may make some steps harder to do with keyboard navigation for some reason
- Press Cmd+Space, type
- Once installed, shut down the guest:
virsh shutdown servo-macos13.clean
- If latency is high:
- When the guest shuts down, take another snapshot:
zfs snapshot tank/base/servo-macos13.clean@oobe - Start the base guest:
virsh start servo-macos13.clean - Log in with the password above
- Press Cmd+Space, type
full disk access, press Enter- On macOS 14/15, you may have to explicitly select Allow applications to access all user files
- Click the plus, type the password above, type
/System/Applications/Utilities/Terminal.app, press Enter twice, press Cmd+Q - Press Cmd+Space, type
terminal, press Enter - Type
curl https://ci0.servo.org/static/macos13.sh | sudo sh, press Enter, type the password above, press Enter - When the guest shuts down, take another snapshot:
zfs snapshot tank/base/servo-macos13.clean@automated - Copy the clean image to a file:
dd status=progress iflag=fullblock bs=1M if=/dev/zvol/tank/base/servo-macos13.clean of=/var/lib/libvirt/images/servo-macos13.clean.img
Remote deployment tip. If you’ve deployed the clean image, but now the base image rebuilds are getting stuck at the macOS installer menu, your NVRAM may not be set to boot from the correct disk. You can work around this by nulling out the BaseSystem.dmg disk in the clean guest config:
- Edit the clean guest:
virsh edit servo-macos13.clean - Find the
<disk>block containingsdcandBaseSystem - Change
<disk type="file" ...>to<disk type="block" ...> - Change
<source file="..."/>to<source dev="/dev/null"/> - Save and quit (nano): Ctrl+X, Y, Enter
- Restart the monitor:
systemctl restart monitor
Style guide
See the Servo book’s style guide.