recommended

Recommended Hardware and Benchmarking

Linux Workstations

The majority of structural biology computing happens on linux; there are several vendors providing linux workstation or server configurations we've had good experience with: ThinkMate, Exxact, Microway. Please feel free to contact us at help@sbgrid.org regarding upcoming hardware, or if you have another hardware vendor to mention.

System configurations can be targeted to particular compute-intenstive workflows (for example, Cryo-EM or small molecule docking). We're in the process of assembling benchmark workflows, datasets and results to help guide hardware decisions.

Apple Workstations

Any Apple machine can be a capable computer for structural biology. The basic Apple educational discount is available through their online web store, and your institution may have negotiated an arrangement with their Apple representative. SBGrid has several labs that run exclusively on Macs and OS X, though these are not advised for CryoEM data processing.

Linux Benchmarking

Benchmarking Systems

Local Systems and Single AWS Instance

System Hardware Overview GPU model GPU count GPU release date 2022 current approximate GPU Cost
Single Workstation 8x Xeon E5-1630 v4 @3.7GHz ; 128G RAM ; 1G network GeForce RTX 2080 1 2018/09 650
Cryo-EM Cluster Node 1 2x 16 core Xeon(R) Gold 6142 @2.6GHz ; 384G RAM ; 10G network 1080 Ti 7 2017/03 450
Cryo-EM Cluster Node 2 2x Xeon(R) Gold 6226R @2.9GHz ; 384G RAM ; 10G network 2080 Ti 4 2018/09 650
Cryo-EM Cluster Node 3 AMD EPYC 7542 @2.9GHz ; 512G RAM ; 10G network RTX6000 4 2018/08 4000
Cryo-EM Cluster Node 4 Xeon(R) E5-2698 v4 @2.2GHz ; 503G RAM ; 50G network Tesla V100-SXM2 8 2018/03 9000
AWS (g5.4xlarge) AMD EPYC 7R32 ; 64G RAM ; 10G (burstable) network NVIDIA A10G Tensor 1 2021/04 N/A

Expanded AWS Instance Set

AWS Instance Type Hardware Overview GPU model GPU count GPU release date hourly cost
g4dn.2xlarge 8x Xeon(R) Platinum 8259CL CPU @ 2.50GHz ; 32GB RAM T4 Tensor Core 1 2018/09 $0.75
g4dn.8xlarge 32x Xeon(R) Platinum 8259CL CPU @ 2.50GHz ; 128GB RAM T4 Tensor Core 1 2018/09 $2.18
g4dn.12xlarge 48x Xeon(R) Platinum 8259CL CPU @ 2.50GHz ; 192GB RAM T4 Tensor Core 4 2018/09 $3.91
g4dn.metal 96x Xeon(R) Platinum 8259CL CPU @ 2.50GHz ; 384GB RAM T4 Tensor Core 8 2018/09 $7.82
g5.2xlarge 8x AMD EPYC 7R32; 32GB RAM A10G Tensor Core 1 2021/04 $1.21
g5.16xlarge 64x AMD EPYC 7R32; 256GB RAM A10G Tensor Core 1 2021/04 $4.10
g5.12xlarge 48x AMD EPYC 7R32; 192GB RAM A10G Tensor Core 4 2021/04 $5.67
g5.24xlarge 96X AMD EPYC 7R32; 384GB RAM A10G Tensor Core 4 2021/04 $8.14
Notes

Core counts determined via nproc output, and include hyperthreading where enabled.

Network capacity on AWS instances may not be directly comparable to physical systems.

Benchmarking Datasets

Dataset Number of Images Storage Space
Beta-Galactosidase 24 3.0G
Cannabinoid Receptor 1-G Protein Complex 2753 476G
Inflammasome 6594 1.6T

Dataset 1

Beta-galactosidase data set from the Namba group at Osaka University, Japan. (EMPIAR-10204, EMD-6840); courtesy of the Relion tutorial. It was collected on a JEOL CRYO ARM 200 microscope.

Dataset 2

Cannabinoid receptor 1-G Protein complex data set from Kumar et al. (EMPIAR-10288, EMD-0339, PDB-6n4b); courtesy of the Skiniotis group at Stanford University via the CryoSPARC tutorial. It was collected on a FEI Titan Krios microscope.

Dataset 3

Inflammasome data set from Sharif et al. (EMPIAR-10597, EMD-22367, PDB-7jkq); courtesy of the Wu group at Harvard University. It was collected on a FEI Titan Krios microscope.

Performance

Current benchmarking workflows run with cryosparc 3.3.2

Beta-galactosidase

Single GPU performance:

Beta-galactosidase single GPU performance

Multi-GPU performance:

Beta-galactosidase multi GPU performance

Multi-GPU Relative Runtime vs Ideal Case:

Relative GPU Runtime: Betagal

Ideal relative speed-up calculated as 1/number of GPUs.

Cannabinoid receptor 1-G protein complex

Single GPU performance:

Cannabinoid single GPU performance

Multi-GPU performance:

Cannabinoid multi-GPU performance

Multi-GPU Relative Runtime vs Ideal Case:

Relative GPU Runtime: Cannabinoid Receptor

Ideal relative speed-up calculated as 1/number of GPUs.

Single GPU performance vs 2022 GPU Cost:

Approximate 2022 GPU Cost vs Cannabinoid Receptor Runtime, single GPU

Performance of all three datasets on a single workstation

normalized workstation runtime

Dataset size and single-GPU runtime

runtime vs storage

Green line showing linear fit.

runtime vs n_images

Green line showing linear fit.

Recommendations

  • Total runtime is proportional to the number of images and storage space of the dataset. Beta-galactosidase takes 2820s (0.78h), cannabinoid receptor 67286s (18.7h), and inflammasome 586821s (163h) on a single workstation.
  • Patch Motion Correction (Multi) and Non-Uniform Refinement are the most time-consuming steps. As the number of images and storage space of the dataset increase, the time for Patch Motion Correction increases.
  • As the number of GPUs increase, the runtime for Patch Motion Correction drastically decreases.
  • Cyro-EM cluster Node 3 achieves the best performance overall.

Page edited by: Bojia Cynthia Hu, Grinnell College, Student of the 2022 Summer Scholars Program at Harvard Medical School