Building a GPU Home Server for AI

Want to build a GPU home server for running quantized models? Here’s some tips and tricks for setting up the server.

Components Overview

GPUs

RTX 3090: Two RTX 3090s with NVLink are a common choice for running large AI models. NVLink can provide improved communication between GPUs, though for many AI tasks, traditional PCIe bandwidth is sufficient.
VRAM: With API models - memory is king. LLAMA3 70b fits into 160GB of RAM - it’s quantized varients are able to squeeze into 48 GB VRAM. Hence whey 2x3090 and 2x4090 GPUS are so popular for home systems.

CPU

AMD vs. Intel: Modern Intel CPUs are generally better at power management and clocking back when idle. However, high-end AMD CPUs like the 7800X3D are also a good choice.
Recommendation: Consider AMD Ryzen 7800X3D or Intel i5/i7 depending on your power management preference and budget. The AMD Rynzen 7800X3D and 7900X3D have very large l3 caches making them highly performant on un-optimised single treaded applications (looking at you Rimworld)

Motherboard

PCIe Lanes: Ensure the motherboard supports 8x/8x PCIe bifurcation if running dual GPUs. Models like the Asus Creator, ASRock Taichi series (AMD), or any Z790 board (Intel) are good choices.
Integrated NIC: For high-speed networking, consider boards with a 10-gig NIC.

Memory

RAM: 32GB or 64GB DDR4/DDR5 depending on your workload. Dual-channel configurations are generally sufficient.
Storage: A 2TB PCIe 5.0 NVMe SSD ensures fast read/write speeds.

Power Supply Unit (PSU)

Capacity: A 1200W Platinum or Titanium PSU is recommended. These offer higher efficiency, especially at lower loads, which is critical for reducing idle power consumption.
Connections: Ensure the PSU has enough PCIe connectors (6 total, 2 for CPU and 4 for GPUs).

Cooling

Airflow: Ensure adequate spacing and airflow for cooling. Adding dedicated fans or using water cooling can help manage temperatures and improve efficiency.

Additional Components

Networking: A high-speed NIC like Intel X710-DA4 can be beneficial for data transfer.
UPS: Consider an Uninterruptible Power Supply (UPS) to protect against power outages.

Power Management

GPU Power Limiting

Persistent Mode: Enable persistent mode to reduce power usage when GPUs are idle.
1 sudo nvidia-smi -pm 1

Power Limit: Set power limits to balance performance and efficiency.

        
      
sudo nvidia-smi -pl 200 -i 0  # Set power limit to 200W for GPU 0

CPU and System Power Management

BIOS Settings: Enable power-saving features in the BIOS. Disable unnecessary components.
Operating System: Use Linux with power management tools to monitor and control power usage. For instance, power_now can provide power draw information.
1 cat /sys/class/power_supply/BAT0/power_now

Example Builds

Build 1

CPU: AMD 7900X3D
Motherboard: Asus X670E Hero
RAM: 64GB DDR5
GPUs: 2x RTX 3090 with NVLink
PSU: Corsair RM1200e
Cooling: Custom water cooling for GPUs, air cooling for CPU
Storage: 2TB PCIe 5.0 NVMe SSD
Networking: Integrated 10-gig NIC

Build 2

CPU: Intel i5
Motherboard: Z790 board with dual PCIe 4.0 x16 slots
RAM: 32GB DDR4
GPUs: 2x RTX 3090 with NVLink
PSU: Be Quiet 1000W Platinum
Cooling: Air cooling with additional fans
Storage: 1TB PCIe 4.0 NVMe SSD
Networking: 1-gig NIC (optional 10-gig upgrade)

Power Consumption

Idle Power: Aim for around 50-90W. Efficient components and power management settings are crucial.
Load Power: Expect around 700-800W under full load with power-limited GPUs. Ensure your PSU can handle peak loads.

Miscellaneous Tips

Energy Efficiency: Invest in energy-efficient components and consider renewable energy options like solar panels to offset electricity costs.
Monitoring Tools: Use power metering tools to monitor and manage power usage effectively. For example: Electricity Usage Monitor

Building a GPU Home Server for AI

Building a GPU Home Server for AI

Components Overview

GPUs

CPU

Motherboard

Memory

Power Supply Unit (PSU)

Cooling

Additional Components

Power Management

GPU Power Limiting

CPU and System Power Management

Example Builds

Build 1

Build 2

Power Consumption

Miscellaneous Tips

Further Reading

ML in PL Workshop, Generative methods in drug discovery, a practical introduction

Using GPT4 to generate git logs for OpenSource projects in the style of conventional commits via a terminal

Deploying Llama2 on A100 GPUs using vLLM