HPC2 Migration to Hive

What's going to happen?

HPC will upgrade and migrate the servers that comprise the HPC2 cluster to Hive on June 22, 2026. The full migration is expected to take three weeks and end on July 12. 2026. As part of the migration process, HPC staff will:

Patch the BIOS, CMC (Chassis Management Controller), and BMC (Baseboard Management Microcontroller) firmware. This will address the CPU throttling bug that several HPC2 users have encountered.

Upgrade the operating system to Ubuntu 22.04 to remain in compliance with UCOP policy.

What You Gain

The CPU throttling bug is fixed

The BIOS/CMC/BMC firmware patches resolve the throttling issue that has impacted running jobs of most users at some point. Jobs that have been quietly running slower than they should will run at full speed.

PIs keep the resources that were available to them on HPC2

.. with a slight caveat. The QoS that determines your group's allocation will be replicated to Hive on the high partition, which means your group gets priority access and it is always available to your group (barring some freak, unseen event). The key difference between HPC2 and Hive is that you will get the same compute resources, but they are not tied to a particular server. This means that if your particular server goes down, you'll still be able to compute.

This is likely a good time to mention HPC's support policy that we fully support systems that are under warranty, up to a maximum of seven years. Hardware replacement costs that fall outside the warranty are the owner's responsibility.

You keep computing through the migration window

HPC2 hardware is offline June 22 – July 12 for firmware and OS work, but HPC2 users have full access to Hive's low queue during that period. To be clear, the low queue jobs comprise unused CPU cycles, and high queue jobs take precedence. Jobs on the low-queue may be killed and requeued in favor of high-queue activity. If you plan to schedule jobs on the low queue, it helps to craft your jobs to checkpoint frequently.

Open OnDemand, free for all Hive users

Browser-based access to JupyterLab, VS Code, RStudio, a Linux desktop, a file browser, and more at ondemand.hive.hpc.ucdavis.edu. No more SSH + X11 forwarding gymnastics. Useful for students, collaborators, and anyone who's been losing afternoons to terminal-only workflows.

Nightly self-service backups of every user directory.

You restore your own data without filing a ticket. See the backup docs. Group directories can be added for a fee.

Redundant storage available for purchase via the Quobyte file system

Data is replicated across multiple storage arrays, where a single hardware failure no longer threatens your work. PIs can purchase additional Quobyte capacity at the published rates.

Globus

HPC@UCD has purchased a Globus license, enabling Per-PI Globus collections that can be shared with collaborators at other institutions. Moving multi-terabyte datasets becomes a normal part of the workflow rather than a project.

No VPN

SSH directly to hive.hpc.ucdavis.edu with your SSH key or Kerberos passphrase.

A dedicated MPI scratch partition

at /nfs/hive/scratch-mpi-io, formatted with ZFS specifically because it currently outperforms Quobyte for MPI I/O patterns.

Self-service group management via Hippo.

PIs approve new members, create groups, and users upload their own SSH keys without a ticket.

Modern OS and Software Stack

Ubuntu 22.04 brings current compilers, Python 3.10, updated CUDA support, and keeps us compliant with UCOP policy.

What Changes, and what you need to do

Hive does not use hyperthreading

HPC2 currently advertises 11,232 threads across all systems. With hyperthreading disabled, Hive presents the same hardware as 5,616 physical cores. That looks like a 50% cut on paper, but for most workloads, it isn't.

For most HPC workloads, such as tightly-coupled MPI, numerical simulation, dense linear algebra, 5,616 physical cores will perform comparably to, or better than, the 11,232 hyperthreads you have today.

For workloads that benefit from hyperthreading, such as generally lighter-weight, embarrassingly parallel tasks with low memory pressure (some bioinformatics pipelines, parameter sweeps, and certain Monte Carlo codes), you'll see a real reduction. If your work falls in this category, contact us before June 22 and we'll benchmark a representative job.

Why disabling hyperthreading is performance-neutral for most HPC workloads

Hyperthreading exposes two logical threads per physical core, but those threads share the same execution units, L1/L2 cache, and memory bandwidth. For workloads that already saturate cache and memory, which describes most scientific computing, the second thread competes with the first for the same resources rather than adding meaningful throughput. Benchmarks on HPC workloads typically show 0–20% gains from hyperthreading, and a non-trivial number of codes run faster with it disabled.

For lighter-weight workloads with low memory pressure, the second thread can use idle execution slots productively, and hyperthreading gains can reach 30% or more. This is why it's a sensible default on desktops but a poor default on HPC clusters.

The medium Slurm queue is gone, with no replacement.

It relied on a cgroup bug that has been fixed upstream and cannot be reproduced safely. If your group depends on the medium queue, contact us before June 22 and we'll work through your workload directly. There may be a viable pattern using existing queues, but it'll depend on what you're actually running.

apo storage paths change.

Group directories currently on apo will be accessible at /nfs/hpc2/apo rather than their current path. For workflows you can't easily modify (Snakemake configs, hardcoded paths in compiled tools), open a ticket, and we'll help work through it.

Maintenance windows of approximately one week each, twice yearly.

Required by UCOP for security patching and supported software versions. We publish dates well in advance, usually in mid-June and December. Granted, HPC2 does not have maintenance windows.

Home directories are separate from group directories.

New accounts get a 20GB home directory. If your current workflow assumes $HOME and group storage are the same volume, audit before migration.

Additional storage can no longer be added to apo.

New purchases go to Quobyte. Existing apo allocations are preserved.

Migration timeline and what to do

Before June 22:

Audit your group directory for references to apo paths.
Flag any pipelines pinned to specific kernel or library versions.
Make an account for Hive via the Hippo user portal. Confirm that your SSH key is in Hippo and that you can log in to Hive.
Request access to your PI's group once they are available. We will be pre-creating PI user accounts and groups.

June 22 to July 12 (migration window):

HPC2 hardware is offline for firmware patching and OS reinstallation. HPC2 users have access to Hive's resources during this period. Your work doesn't stop; it relocates. Long-running HPC2 jobs should be checkpointed and resubmitted on Hive, or completed before June 22. After July 12: Full cutover complete.

Support during migration

We're finalizing the exact support model and will publish specifics before June 22. At a minimum, expect office hours during the migration window and a dedicated channel for migration-related tickets. If something genuinely breaks in a way that blocks your research, that's our problem to solve, not yours.

Why we're consolidating

Hive already hosts LSSC0, Peloton, Demon, Impact, Cardio, and Atomate. Franklin migrates next, and the CAES Farm cluster shares the same Quobyte storage and software stack. One cluster means one place to apply security patches, one place to debug, one place where new features land. That consolidation is how Open OnDemand, Globus, and self-service backups got built in the first place, rather than being deferred indefinitely across five separate environments.

Support

Our documentation site is at https://docs.hpc.ucdavis.edu. For support information, please visit https://docs.hpc.ucdavis.edu/support.