Overview
Sol is a heterogeneous computing cluster, built by Dell, that can be expanded further by investments from Lehigh Faculty, Departments, Centers and Colleges. The majority of Sol compute nodes are purchased by Lehigh Faculty known as Condo Investors. In addition, LTS has purchased 8 nodes that are available to Condo Investors and other researchers (called Hotel Investors) on a rental basis. All Condo and Hotel nodes have a minimum configuration of dual socket, 10 core per socket, 128 GB RAM, 1TB hard disk and 100Gb/s EDR Infiniband interconnect. LTS also provides a head/login node for interactive access to the cluster for compiling applications, editing files, submitting job scripts and monitoring submitted jobs.
The model for sustaining high performance computing at Lehigh is premised on faculty purchasing compute nodes from their grants which are then added to the cluster. In exchange, LTS will provide system administration and support for the nodes for a period of 5 years (length of hardware warranty, see exception below). The advantage to Condo Investors, besides avoiding the cost incurred for managing the cluster, is the ability to utilize the entire collection of Hotel and Condo nodes when need arises. In exchange, Condo Investors will allow their idle cycles to be used by Condo and Hotel Investors. This provides Condo Investors with much greater flexibility than owning a standalone cluster.
If equipment is purchased with a different length of warranty either due to budget constraints or grant requirements, then LTS will commit to support and maintain the equipment only for the duration of the hardware warranty. For example, if length of warranty is 3 years, then the support provided, as described in this document, is for 3 years only.
Program Details
Compute nodes are purchased and maintained based on a 5-year warranty. The minimum purchase is one Base Compute node with various upgrade options (processor, memory, GPUs and MICs) as described in the table below. Condo Investors will be provided with an annual allocation equivalent to the number of computing core-hours or service units (SU) their investment provides, which may be expended on all available nodes on Sol. This amounts to 175,200 SUs per year for a 20-core Compute node. All investments must include a 5-year hardware warranty that would allow LTS staff to initiate repair and replacement of equipment in the event of a hardware failure. This will ensure a high quality of service with minimum disruption to users.
At the end of 5-years, the investor may donate his/her equipment to LTS or take possession of his/her equipment to setup a cluster in their own lab. LTS will not provide infrastructure, system administration or support for out-of-warranty equipment in or out of the Data Center. Equipment donated to LTS will be used at the discretion of the Managers of Research Computing, Data Center Operations and Systems & Network Administration. They may make a decision to either dispose off the equipment or repurpose them for infrastructure resources that may or may not be available to the Lehigh community.
How to become a Condo Investor?
Faculty, Departments, Centers or Colleges who are are interested in investing in the Condo Program should refer to the Pricing Chart for available equipment and costs.
Prospective investors should review available equipment and estimated costs and contact Alex Pacheco, Manager of Research Computing for next steps.
The Condo Investment is for a period of 5 years or length of hardware warranty purchased.
Investors will not be provided with dedicated access to their equipment but are provided with higher priority on their investment. Investments are shared with all Sol users. Condo Investors can in turn utilize all other available Sol nodes at a lower priority.
The base compute node is the minimum configuration available in the Condo Program.
All investments must include EDR Infiniband adaptors and cables.
Heterogeneity of the cluster is in the number of cores, memory, storage and accelerators available on the compute nodes and not in the underlying network fabric.
The minimum condo investment is 1 compute node which may be shared by multiple faculty irrespective of department or college.
Partnerships between two or more faculty for a condo investment is permitted but LTS will not initiate such partnerships nor will it mediate the distribution of allocations among the partners. We will create separate allocations for the individual partners.
Currently, there are no plans for supporting resources either in terms of administration or infrastructure cost sharing that do not meet Sol's base configuration for e.g. clusters without EDR Infiniband or those not part of the shared cluster.
Please contact Research Computing staff for upgrade options.
How are allocations calculated?
Annual allocations that a Condo Investor receives is calculated as
Annual SU = Total Number of Invested Cores * 24 hours/day * 365 days/year
Cores that cannot be scheduled i.e. CUDA Cores or x86 cores on a MIC Coprocessors, are not counted in the Annual SU allocation. A Base Compute Node and a Base Compute Node with an add-on Accelerator provide 315,360 SUs annually.