Condo Cluster Program
Overview
Sol is a heterogeneous computing cluster, built by Dell, that can be expanded further by investments from Lehigh Faculty, Departments, Centers and Colleges. The majority of Sol compute nodes are purchased by Lehigh Faculty known as Condo Investors. In addition, LTS has purchased 8 nodes that are available to Condo Investors and other researchers (called Hotel Investors) on a rental basis. All Condo and Hotel nodes have a minimum configuration of dual socket, 10 core per socket, 128 GB RAM, 1TB hard disk and 100Gb/s EDR Infiniband interconnect. LTS also provides a head/login node for interactive access to the cluster for compiling applications, editing files, submitting job scripts and monitoring submitted jobs.
The model for sustaining high performance computing at Lehigh is premised on faculty purchasing compute nodes from their grants which are then added to the cluster. In exchange, LTS will provide system administration and support for the nodes for a period of 5 years (length of hardware warranty, see exception below). The advantage to Condo Investors, besides avoiding the cost incurred for managing the cluster, is the ability to utilize the entire collection of Hotel and Condo nodes when need arises. In exchange, Condo Investors will allow their idle cycles to be used by Condo and Hotel Investors. This provides Condo Investors with much greater flexibility than owning a standalone cluster.
If equipment is purchased with a different length of warranty either due to budget constraints or grant requirements, then LTS will commit to support and maintain the equipment only for the duration of the hardware warranty. For example, if length of warranty is 3 years, then the support provided, as described in this document, is for 3 years only.
Program Details
Compute nodes are purchased and maintained based on a 5-year warranty. The minimum purchase is one Base Compute node with various upgrade options (processor, memory, GPUs and MICs) as described in the table below. Condo Investors will be provided with an annual allocation equivalent to the number of computing core-hours or service units (SU) their investment provides, which may be expended on all available nodes on Sol. This amounts to 175,200 SUs per year for a 20-core Compute node. All investments must include a 5-year hardware warranty that would allow LTS staff to initiate repair and replacement of equipment in the event of a hardware failure. This will ensure a high quality of service with minimum disruption to users.
At the end of 5-years, the investor may donate his/her equipment to LTS or take possession of his/her equipment to setup a cluster in their own lab. LTS will not provide infrastructure, system administration or support for out-of-warranty equipment in or out of the Data Center. Equipment donated to LTS will be used at the discretion of the Managers of Research Computing, Data Center Operations and Systems & Network Administration. They may make a decision to either dispose off the equipment or repurpose them for infrastructure resources that may or may not be available to the Lehigh community.
How to become a Condo Investor?
Prospective investors should contact Ryan Bradley, Director of Research Computing for next steps.
The Condo Investment is for a period of 5 years or length of hardware warranty purchased.
Investors will not be provided with dedicated access to their equipment but are provided with higher priority on their investment. Investments are shared with all Sol users. Condo Investors can in turn utilize all other available Sol nodes at a lower priority.
The minimum condo investment is 1 compute node which may be shared by multiple faculty irrespective of department or college.
Partnerships between two or more faculty for a condo investment is permitted but LTS will not initiate such partnerships nor will it mediate the distribution of allocations among the partners. We will create separate allocations for the individual partners.
Please contact Research Computing staff for upgrade options.
How are allocations calculated?
Annual allocations that a Condo Investor receives is calculated as
Annual SU = Total Number of Invested Cores * 24 hours/day * 365 days/year
Cores that cannot be scheduled i.e. CUDA Cores or x86 cores on a MIC Coprocessors, are not counted in the Annual SU allocation. A Base Compute Node and a Base Compute Node with an add-on Accelerator provide 315,360 SUs annually.
Can I include computing time or equipment purchase in grants?
Please contact the Office of Research and Sponsored Program to check for information on a specific funding agencies policy on computing equipment purchase. If computing equipment can be budgeted, feel free to modify the templates per your requirement and add to your budget justification. In addition, to the Budget Templates, information about Lehigh's Facilities is also available at the above website.
If you need a letter of support or a budgetary quote for computing resources, please contact Ryan Bradley, Manager of Research Computing either directly or through the Research Data Management Committee, if you also need help with creating a Data Management Plan.