Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: removed nc reference, that was removed from the ssh gareway

Table of Contents
maxLevel3

Sol is Lehigh's newest Linux cluster replacing Corona and other ancillary Level 2 resources. Following our tradition of naming high performance computing clusters after stars or celestial phenomena, Sol is named after the nearest star.

Info
titleAcknowledgement

...

In publications, reports, and presentations that utilize Sol, Hawk and Ceph, please acknowledge Lehigh University using the following statement:

"Portions of this research were conducted on Lehigh University's Research Computing infrastructure partially supported by NSF Award 2019035"

Sol is a heterogeneous cluster launched on Oct 1, 2016 with a total of 34 nodes, 26 are Condo investments by two CAS faculty. All nodes provide 500GB scratch storage for running jobs and are interconnected with 2:1 oversubscribed EDR (100Gbps) Infiniband fabric. In Fall 2018, a new Ceph storage cluster was installed that provides a 11TB CephFS global scratch space for storing temporary data for 7 days after completion of jobs.

...

  • In Jan. 2017, each of the 25 Condo nodes were upgraded to include two GTX 1080 GPU cards.
  • In 2017, Condo Investments from RCEAS and CBE faculty added 22 nodes and 16 nVIDIA GTX 1080 GPU cards.
  • In 2018, Condo Investments from RCEAS and CAS faculty added 24 nodes and 48 nVIDIA RTX 2080 TI GPU cards.
  • In Mar. 2019, Condo Investments from RCEAS faculty added 1 node.
  • In May-June September 2020, Condo Investments from CAS, RCEAS and COH faculty added 6 8 nodes.
  • In Spring 2022, Condo Investments from CAS and RCEAS faculty added 4 nodes.

As of MarFeb. 20192022

Processor Type

Number of Nodes

Number of CPUs

Number of GPUs

CPU Memory (GB)

GPU Memory (GB)

CPU TFLOPs

GPU TFLOPs

Annual SUs

2.3 GHz E5-2650v3

9

180

10

1152

80

5.76

2.57

1,576,800

2.3 GHz E5-2670v3

33

792

62

4224

496

25.344

15.934

6,937,920

2.2 GHz E5-2650v4

14

336


896


9.6768


2,943,360

2.6 GHz E5-2640v3

1

16


512


0.5632


140,160

2.3 GHz Gold 6140

24

864

48

4608

528

41.472

18.392

7,568,640

2.6 GHz Gold 62406216
1152
10.368
1,892,160

87

2404

120

12544

1104

93.184

36.130

21,059,040

Info
titleAcknowledgement

In publications, reports, and presentations that utilize Sol, Hawk and Ceph, please acknowledge Lehigh University using the following statement:

"Portions of this research were conducted on Lehigh University's Research Computing infrastructure partially supported by NSF Award 2019035"

System Configuration

Expand
title1 Interactive node
  • Two 2.3GHz 10-core Intel Xeon E5-2650 v3, 25M Cache
  • 128GB 2133MHz RAM
  • 1TB HDD
  • 10 GbE and 1 GbE network interface
  • CentOS 7.x

...

2.1 GHz Gold 6230R2104
768
4.3264
911,040
3.0 GHz Gold 6348R14851922003.07248.5420,480

3.0GHz EPYC 7302

(Coming Soon)

3962476811524.300828.08840,960


93

2652

149

14272

2456

104.8832

112.6364

23,231,520


System Configuration

Expand
title1 Interactive node
  • Two 2.3GHz 10-core Intel Xeon E5-2650 v3, 25M Cache
  • 128GB 2133MHz RAM
  • 1TB HDD
  • 100 Gb/10 GbE and 1 GbE network interface
  • CentOS 8.x


Expand
title9 Haswell GPU nodes
  • Two 2.3GHz 10-core Intel Xeon E5-2650 v3, 25M Cache
  • 128GB 2133MHz RAM
  • 1TB HDD
  • 100 Gb/s EDR Infiniband network interface
  • 10 GbE and 1 GbE network interface
  • 8 nodes with one EVGA Geforce GTX 1080 PCIE 8GB GDDR5
  • 1 node with two EVGA Geforce GTX 1080 PCIE 8GB GDDR5
  • CentOS 78.x 


Expand
title33 Haswell GPU nodes
  • Two 2.3GHz 12-core Intel Xeon E5-2670 v3, 30M Cache
  • 128GB 2133MHz RAM
  • 1TB HDD
  • 100 Gb/s EDR Infiniband network interface
  • 10 GbE and 1 GbE network interface
  • 4 nodes with one EVGA Geforce GTX 1080 PCIE 8GB GDDR5
  • 29 nodes with two EVGA Geforce GTX 1080 PCIE 8GB GDDR5
  • CentOS 78.x


Expand
title14 Broadwell Compute nodes
  • Two 2.2GHz 12-core Intel Xeon E5-2650 v4, 30M Cache
  • 64GB 2133MHz RAM
  • 1TB HDD
  • 100 Gb/s EDR Infiniband network interface
  • 10 GbE and 1 GbE network interface
  • CentOS 78.x


Expand
title1 Haswell BigMem node
  • Two 2.6GHz 8-core Intel Xeon E5-2640 v3, 20M Cache
  • 512GB 2400MHz RAM
  • 1TB HDD
  • 100 Gb/s EDR Infiniband network interface
  • 10 GbE and 1 GbE network interface
  • CentOS 78.x


Expand
title12 Skylake Compute nodes
  • Two 2.3GHz 18-core Intel Xeon Gold 6140, 24.75M Cache
  • 192GB 2666MHz RAM
  • 1TB HDD
  • 100 Gb/s EDR Infiniband network interface
  • 10 GbE and 1 GbE network interface
  • CentOS 78.x


Expand
title12 Skylake Quad GPU nodes
  • Two 2.3GHz 18-core Intel Xeon Gold 6140, 24.75M Cache
  • 192GB 2666MHz RAM
  • 1TB HDD
  • 100 Gb/s EDR Infiniband network interface
  • 10 GbE and 1 GbE network interface
  • Four ASUS Geforce RTX 2080 TI PCIE 11GB GDDR6
  • CentOS 78.x


Expand
title6 Cascade Lake Compute nodes
  • Two 2.6GHz 18-core Intel Xeon Gold 6240, 24.75M Cache
  • 192GB 2933MHz RAM
  • 1TB HDD
  • 100 Gb/s EDR Infiniband network interface
  • 10 GbE and 1 GbE network interface
  • CentOS 78.x


Expand
title2 Cascade Lake Refresh Compute nodes

Intel XEON processors AVX2 and AVX512 frequencies

...

  • Two 2.1GHz 26-core Intel Xeon Gold 6230R, 35.75M Cache
  • 384GB RAM
  • 1TB HDD
  • 10 GbE and 1 GbE network interface
  • CentOS 8.x


Expand
title1 Cascade Lake Refresh GPU node
  • Two 3.0GHz 24-core Intel Xeon Gold 6248R, 35.75M Cache
  • 192GB RAM
  • 1TB HDD
  • 10 GbE and 1 GbE network interface
  • Five NVIDIA A100 40GB HBM2 GPUs
  • CentOS 8.x


Expand
title3 AMD EPYC GPU node (Coming Soon)
  • Two 3.0GHz 16-core AMD EPYC 7302, 128M Cache
  • 256GB RAM
  • 1TB HDD
  • 10 GbE and 1 GbE network interface
  • Eight NVIDIA A40 48GB GDDR6 GPUs
  • CentOS 8.x


Intel XEON processors AVX2 and AVX512 frequencies

...

  1. Dimitrios Vavylonis, Department of Physics: 1 20-core compute node

    • Annual allocation: 175,200 SUs

  2. Wonpil Im, Department of Biological Sciences:

    • 25 24-core compute node with 2 GTX 1080 cards per node (5,256,000 SUs)

    • 12 36-core compute nodes with 4 RTX 2080 cards per node (3,784,320 SUs)Total Annual allocation: 9,040,320

    • 3 32-core compute nodes with 8 A40 GPUs per node (840960 SUs)

    • Total Annual allocation: 9,881,280 SUs

  3. Anand Jagota, Department of Chemical Engineering: 1 24-core compute node

    • Annual allocation: 210,240 SUs

  4. Brian Chen, Department of Computer Science and Engineering: 

    • 1 24-core compute node (210,240 SUs)
    • 2 52-core compute nodes (911,040 SUs)
    • Annual allocation: 1,1212,280 SUs

  5. Edmund Webb III & Alparslan Oztekin, Department of Mechanical Engineering and Mechanics: 6 24-core compute node

    • Annual allocation: 1,261,440 SUs

  6. Jeetain Mittal & Srinivas Rangarajan, Department of Chemical Engineering: 13 24-core Broadwell based compute node and 16 GTX 1080 cards

    • Annual allocation: 2,733,120 SUs

  7. Seth Richards-Shubik, Department of Economics

    • Annual allocation: 140,160 SUs

  8. Ganesh Balasubramanian, Department of Mechanical Engineering and Mechanics: 7 36-core Skylake based compute node

    • Annual allocation: 2,207,520 SUs

  9. Department of Industrial and Systems Engineering: 2 36-core Skylake based compute node

    • Annual allocation: 630,720 SUs

  10. Lisa Fredin, Department of Chemistry:

    • 2 36-core Skylake based compute node

    • 4 36-core Cascade Lake based compute node
    • Annual allocation: 1,892,160 SUs

  11. Paolo Bocchini, Department of Civil and Environmental Engineering: 1 24-core Broadwell based compute node

    • Annual Allocation: 210,240 SUs

  12. Hannah Dailey, Department of Mechanical Engineering and Mechanics: 1 36-core Skylake based compute node

    • Annual allocation: 315,360 SUs

  13. College of Health: 2 36-core Cascade Lake based compute node
    • Annual allocation: 630,720 SUs

Comparison with AWS

Image Removed

Methods/Notes:

Adapted from the code at https://gitlab.beocat.ksu.edu/Admin-Public/amazon-cost-comparison

Costs for Spot/On-Demand pricing last updated: August 6, 2020 from https://aws.amazon.com/ec2/spot/pricing. These costs fluctuate and Spot pricing requires that users accept job-preemption; no jobs on Sol are currently preemptable.

Amazon instance types selected were those which conform to Sol job sizes regarding memory, core counts, and GPUs. These are, g4dn.x-12xlarge, p3.8xlarge, c5n.large-18xlarge, and r5.8-16xlarge. Sizing and run time of actual Sol jobs from Slurm's internal database were used to match instance types to generate a cost for each job run on Sol.

Costs for AWS do not include data storage, networking, VPN/transit, data ingress/egress, or any other charges aside from the EC2 instance price. The code calculates cost as per-second even if the job ran for less than one hour, and assumes that since an Amazon VCPU is half a core, a job will take twice as long on Amazon as on Sol.

Accounts

A Principal Investigator can request accounts for his/her users for $50/user/year. Each account is provided with 150GB home storage quota. Every user will need to have an active allocation to use Sol. Sharing of accounts is explicitly forbidden and will result in forfeiture of accounts.

If additional storage is required, a PI can purchase a Ceph Storage project for his/her group based on a 5 year purchase. PIs who purchase a Ceph Storage project, minimum $375/TB, can opt for using their Ceph space for home directories and have their account fees waived for the life of the Ceph project. 

Allocations

Principal Investigators, who are not Condo Investors, called Hotel Investors henceforth, will be able to purchase computing time, if available, on an annual basis. Computing time equivalent to 8 20-core compute nodes or 1,400,000 SUs is available for general use on an annual basis. The cost per SU is fixed at 1¢ with a minimum annual buy-in of 50,000 SUs with increments of 10,000 SUs.

  1. Minimum Annual Allocation (50K SUs): $500

  2. 10K Increments: $100

The allocation cycle is fixed at one year and unused allocations (the minimum and any increments purchased during that cycle) will not rollover to the next allocation cycle nor be refunded. The initial allocation cycle will begin on Oct 1, 2016 and end on Sep. 30, 2017. Implementing a rolling allocation cycle for Hotel Investors i.e. allocation cycle begins the day you initiated a purchase rather than a rigid start date (Oct 1.) is a work in progress.

Condo Investors who utilize their allocated cycles before the cycle ends can purchase additional 10K increments, if available, and are not subject to the minimum allocation purchase. However, any 10K increments purchased must be expended before the next allocation cycle begins. There are no refunds or rollovers of SUs from one allocation cycle to the next.

For example, a PI who estimates requiring 75K SUs annually will need to purchase 80K SUs for $800 annually. Suppose the PI only consumes 60K SUs during the allocation cycle, the unused 20K SUs will not rollover to the next allocation cycle, nor will the PI be reimbursed for the unused SUs. Alternatively, the PI can purchase the minimum 50K SUs for $500 and purchase additional increments of 10K whenever his/her allocation balance is low. The PI should consider that the availability of 10K increments for purchase is not guaranteed.

...

  1. , Department of Mechanical Engineering and Mechanics: 7 36-core Skylake based compute node

    • Annual allocation: 2,207,520 SUs

  2. Department of Industrial and Systems Engineering: 2 36-core Skylake based compute node

    • Annual allocation: 630,720 SUs

  3. Lisa Fredin, Department of Chemistry:

    • 2 36-core Skylake based compute node

    • 4 36-core Cascade Lake based compute node
    • Annual allocation: 1,892,160 SUs

  4. Paolo Bocchini, Department of Civil and Environmental Engineering: 1 24-core Broadwell based compute node

    • Annual Allocation: 210,240 SUs

  5. Hannah Dailey, Department of Mechanical Engineering and Mechanics: 1 36-core Skylake based compute node

    • Annual allocation: 315,360 SUs

  6. College of Health: 2 36-core Cascade Lake based compute node
    • Annual allocation: 630,720 SUs
  7. Keith Moored, Department of Mechanical Engineering and Mechanics: 1 48-core Cascade Lake Refresh compute node with 5 A100 GPUs
    • Annual allocation: 420,480 SUs

Comparison with AWS


Image Added

Methods/Notes:

Adapted from the code at https://gitlab.beocat.ksu.edu/Admin-Public/amazon-cost-comparison

Costs for Spot/On-Demand pricing last updated: August 6, 2020 from https://aws.amazon.com/ec2/spot/pricing. These costs fluctuate and Spot pricing requires that users accept job-preemption; no jobs on Sol are currently preemptable.

Amazon instance types selected were those which conform to Sol job sizes regarding memory, core counts, and GPUs. These are, g4dn.x-12xlarge, p3.8xlarge, c5n.large-18xlarge, and r5.8-16xlarge. Sizing and run time of actual Sol jobs from Slurm's internal database were used to match instance types to generate a cost for each job run on Sol.

Costs for AWS do not include data storage, networking, VPN/transit, data ingress/egress, or any other charges aside from the EC2 instance price. The code calculates cost as per-second even if the job ran for less than one hour, and assumes that since an Amazon VCPU is half a core, a job will take twice as long on Amazon as on Sol.

Accounts & Allocations

For accessing Sol, please see revised policy for Accounts & Allocations

Logging into Sol

Sol can be accessed via SSH using a SSH Client. Linux and Mac users can login to Sol by entering the following command in a terminal:

Code Block
languagebash
ssh username@sol.cc.lehigh.edu

If you are off campus, then there are two options

  1. Start a vpn session and then login to Sol using the ssh command above
  2. Use ssh gateway as a jump host first and then login to Sol using the above ssh command on the ssh gateway prompt. If your ssh is from the latest version of openssh, then you can use the following command
Code Block
languagebash
ssh -J username@ssh.cc.lehigh.edu username@sol.cc.lehigh.edu

If you are using the ssh gateway, you might want to add the following to your ${HOME}/.ssh/config file on your local systemsystem 

Code Block
languagebash
  Host *ssh
  HostName ssh.cc.lehigh.edu
  Port 22
# This is an example - replace alp514 with your Lehigh ID
  User alp514

  Host *sol
  HostName sol.cc.lehigh.edu
  Port 22
  User <LehighID>
  ProxyCommand ssh ssh nc-W %h:%p %pssh

to simplify the ssh and scp (for file transfer) command. You will be prompted for your password twice - first for ssh and then for sol


Code Block
languagebash
ssh sol
scp sol:<path to source directory>/filename <path to destination directory>/filename 

If you are using public key authentication, please add a passphrase to your key. Passwordless authentication is a security risk. Use ssh-agent and ssh-add to manage your public keys. See https://kb.iu.edu/d/aeww for details.

Windows users will need to install a SSH Client to access Sol. Lehigh Research Computing recommends MobaXterm since it can be configured to use the SSH Gateway as jump host. DUO Authentication is activated for faculty and staff on the SSH Gateway. If a window pops up for password enter your Lehigh password. The second pop up is for DUO, it only says DUO Login. Enter 1 for Push to DUO or 2 for call to registered phone.