Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The original Ceph cluster was designed for archival data storage. In circa 2015, Research Computing decommissioned the storage resource on the then HPC clusters, Corona, Maia, Trits, Capella and Cuda0 and used Ceph as the storage backend instead. This worked fine until Sol, built as a 34 node replacement cluster for Corona, Capella, Cuda0 and Trits, was expanded and upgraded to 56 nodes (80 81 nodes in Fall 20182019) with 66 (114 120 in Fall 20182019) nVIDIA GPUs. The increase in I/O from simulations on Sol caused instability in Ceph. After some research, it was decided that the Ceph replacement should include a fast tier of storage built using SSDs based on the Ceph file system (CephFS) to handle I/O from the ever expanding Sol cluster. The fast tier, CephFS, would provide a distributed global scratch on Sol for writing simulation data from running jobs while the slow tier, Ceph would provide longer term storage of simulation data.

...