Git Hg Sync

While Lando now supports Git as a target SCM for landing, most automation still relies pushes landing in Mercurial. The Git Hg Sync component is in charge of syncing commits and tags from GitHub to HgMO.

Note

GitHub’s Activity page looks similar to the PushLog. However it doesn’t expose enough information for downstream use, nor does it provide for bespoke metadata and extension. Moreover, its latency guarantees are not suited to our requirements in terms of delays and delivery guarantees.

It relies on logic from git-cinnabar to create and record a two-way map between Git and Hg commits. Due to the difference in branch and tags management between Git and HgMO, changes to one Git repository may be reflected as changes to various repository in HgMO. More often, this depends on which git branch the commits were added.

Warning

Signed git commits are not supported, and will break sync badly. This is due to the use of git-cinnabar, and the expectation of stable two-way mapping between Git and Hg commits. git-cinnabar supports Git signatures in its local metadata. However, it won’t be able to map the commits back correctly to another Git repository if reading from synced Hg changesets. This may also lead to further issues if the signed commits already exist there, and multiple Git commits map to the same Hg changeset.

Architecture

git-hg-sync is an event-driven component. It connects to Pulse, and subscribes to notifications from Lando. At a high level, those notifications contain information about the source repository and branches that were modified, as well as tags to create.

        architecture-beta
  service lando(internet)[Lando]

  service github(database)[GitHub]
  service pulse(server)[Pulse]

  service sync(server)[Git Hg Sync]

  service hgmo(database)[HgMO]

  lando:B --> T:github
  lando:R --> L:pulse

  sync:L --> R:pulse
  sync:B --> R:github
  sync:R --> L:hgmo
    

After successfully having pushed changes to Git, Lando publishes messages to pulse.mozilla.org (AMQP). Those notifications contain information about the commits, branches and tags. Git-Hg-Sync processes those messages to determine what to fetch from Git and push to hg.mozilla.org (HgMO). It processes the notifications in strict order, retrying a failure until it succeeds (or is otherwise removed from the tip of the queue).

Set-up and Configuration

Git-Hg-Sync runs in GCPv2. This is configured in the webservices-infra repository. This is a Docker-based deployment, following the DockerFlow guidelines. The images are built in GitHub actions.

The mapping of which data in Git should go where in Mercurial is described in a configuration file. For ease of maintenance, configuration files for all environments are checked in with the source code, and built into the Docker images. The configuration file to use in a specific deployment is selected based on the ENVIRONMENT environment variable. For example, the production environment will use the config-production.toml configuration.

There are three main sections in the configuration file: tracked_repositories, branch_mappings, and tag_mappings.

Source Repositories

This is a list of repositories to monitor. The name is used for the working directory in the directory specified by clones.directory.

Note

Any Pulse notification not related to a tracked repository will be ignored.

The list is also used when bootstrapping the working directory, by pre-fetching the data in Git.

Branch Mapping

The branch_mappings expresses which branches in Git should be synced to which individual repository in HgMO. There can be multiple matches for a single branch. It is possible to use regular expressions when matching branch names (as suggested by the name of the branch_pattern). If a match is found, and the RE contained capturing groups, they can be reused to build the destination URL and branch.

[[branch_mappings]]
source_url = "https://github.com/mozilla-firefox/firefox.git"
# esr<M> branches to mozilla-esr<M>
branch_pattern = "^(esr\\d+)$"
destination_url = "ssh://hg.mozilla.org/releases/mozilla-\\1/"
destination_branch = "default"

Note

Backslashes need to be escaped to retain the special meaning in those regular expressions.

If the destination URL and branch names does not contain RE replacements, the bootstrap mechanism will also fetch data from the Mercurial remotes.

For large repositories such as Firefox, it can be useful to target mozilla-central in a branch mapping, even as a read-only source with impossible patterns. This is useful to benefit from Mercurial bundles (if available) to speed-up the initial import.

#
# MOZILLA-UNIFIED
#
# We don't sync to this repository, but we put it here first to fetch all
# references early, with the benefit of bundles.
#
[[branch_mappings]]
source_url = "https://github.com/mozilla-firefox/firefox.git"
branch_pattern = "THIS_SHOULD_MATCH_NOTHING"
destination_url = "https://hg.mozilla.org/mozilla-unified/"
destination_branch = "NOT_A_VALID_BRANCH"

Note

As branch mappings are processed sequentially, such an entry needs to appear first for each source URL/branch mapping.

Tag Mapping

The tag_mappings is similar to the configuration for branches, including the support for regular expressions. Unlike branches, where Git commits are converted and pushed to Mercurial by git-cinnabar, it is necessary to recreate tags.

[[tag_mappings]]
source_url = "https://github.com/mozilla-firefox/firefox.git"
# <M>_<m>(_<p>...)esr BUILD and RELEASE tags to mozilla-esr<M>
tag_pattern = "^(FIREFOX|DEVEDITION|FIREFOX-ANDROID)_(\\d+)(_\\d+)+esr_(BUILD\\d+|RELEASE)$"
destination_url = "ssh://hg.mozilla.org/releases/mozilla-esr\\2/"
tags_destination_branch = "tags-unified"
# Default
#tag_message_suffix = "a=tagging CLOSED TREE DONTBUILD"

Note

The destination branch is named tags_destination_branch.

Mercurial’s support for tags relies on inspecting information from the .hgtags file from every Mercurial head. git-cinnabar therefore updates this file in the repository when creating new tags. However, he Git and Mercurial histories MUST remain in sync with a bijective mapping between each SCM. As a result is not possible update the .hgtags file in any of the branches receiving new code from Git.

The solution to this problem is to use a separate branch in Mercurial repositories, dedicated to receiving tags. The Git-Hg-Sync worker will maintain a Git branch named after tags_destination_branch locally in the working repository, and push that branch to a matching one in Mercurial.

Note

Tags branches are created as orphan branches without shared history with the default branch. The custom hook SingleRootCheck in HgMO forbids branches with multiple roots. This hook must be disabled for any target repository. Alternatively, the root of the new branch can be added in the allowedroots section of the relevant hgrc, e.g., for Firefox, Bug 1978262.

Due to differences in the data models between Git and Mercurial, git-cinnabar refuses to create a tag which already exists in the repository, even if on a different branch. As a result, it is recommended to use the same tags_destination_branch for all tag_mappings with the same source from the tracked_repositories.

Warning

As the work copy of the tags_destination_branch is only present locally on the worker in Git, there might create bootstrapping issues if re-creating a work copy from scratch (see bug 1962599 and this comment). A manual fix would be to create the local tags_destination_branch from the Hg repo with the most recent updates to the tags.

The tags_mappings also has an optional tag_message_suffix, which allows to specify a templated addition to the message of the commit creating a tag. The default is shown commented out in the configuration snippet above.

Pulse (AMQP) Queue

Git Hg Sync creates a queue, and binds it to the Lando exchanges described in the Lando Pushlog documentation.

The configuration file can also contain details about Pulse, in the pulse section. Conventional parameters are written in the configuration file, but anything sensitive is left to be passed via environment.

Note

For more deployment flexibility, Pulse parameters are overridable via environment variables. For example, pulse.param can be overriden by the value in PULSE_PARAM.

The parameters available in the pulse section are:

  • exchange

  • host

  • password (DO NOT STORE THIS IN SOURCE CONTROL)

  • port (needs to be an integer)

  • queue

  • routing_key

  • ssl (in the environment, needs to be an empty string to be False, otherwise True)

  • userid

Administrative CLI

Git-Hg-Sync offers a small management interface via a command line tool available on the workers: git-hg-cli. It requires a configuration file to be specified, and accepts a handful of commands.

git-hg-cli -c <CONFIG> [config|dequeue|fetchrepo]

Inspecting the Run-time Configuration

The config command simply dumps a pretty-printed version of the live configuration to the console. This is a combination of the static information from the configuration file, as well as anything overridden from the environment.

Warning

Sensitive data such as passwords is not redacted from this output.

Pre-fetching Working Directory Data

The fetchrepo command is used to pre-populate or update the local working directory. It fetches all available commits from the Git source, as well as (optionally) any target Mercurial repo from the branch_mappings (as long as they do not contain dynamic replacement from regular expression capturing groups).

Warning

There may be some issues in bootstrapping the tags branches, see bug 1962599.

This command takes a mandatory --repository-url option, which should be the full URL of one of the tracked_repositries.

If the --fetch-all option is passed, all data from Mercurial will also be fetched. The --verbose option requests that the output from the git and hg operation be output to the console.

Removing an Erroneous Pulse Notification

It may happen that a Pulse notification leads to a terminally failing action. As Git Hg Sync processes messages strictly in order, this means that any further processing is blocked. This would result in the symptom that HgMO (particularly autoland for Firefox) is no longer synced from Git.

Warning

Skipping a message may have unwanted consequences and require ad hoc fixes to be made to recover.

The dequeue command can be used to remove the tip message from the queue. For safety, it requires explicit passing of the --repository-url and --push-id options. The values of those options is compared to what is present in the first notification in the queue. Only iff those details match will the message be removed.

Ad hoc recovery techniques for a skipped message may include (but are not limited to):