// features · rentals

On-Demand Nodes

Rent GPU compute by the second. Pay only for verifiable cycles. Filter the mesh by GPU model, memory, region, and latency. OpenAI-compatible endpoints.

section · featuresread 3 min

Two ways to rent

Spot — pay per cycle

Best for short jobs, experiments, batch inference. Bid against current market price. Node can be reclaimed by higher bidder if you don't reserve.

Reserved — pay per hour

Lock a specific node for 1h to 30d. Up to 40% cheaper than spot for long-running workloads. Cannot be reclaimed.

Filter the mesh

The deployment dashboard exposes filters by:

GPU model — RTX 3090 / 4090 / A4000 / A6000 / A100 / H100 / L40S / M3 Max
GPU memory — 16 / 24 / 48 / 80 / 128 GB
Region — US-East / US-West / EU-Frankfurt / EU-Amsterdam / Asia-Tokyo / Asia-Singapore / Global mesh
Network — minimum bandwidth / max p95 latency

Pricing snapshot

GPU	VRAM	AWS On-Demand	StackGPU spot	You save
NVIDIA H100	80 GB	$12.29 / hr	$2.84 / hr	−77%
NVIDIA A100	80 GB	$4.10 / hr	$0.96 / hr	−77%
NVIDIA L40S	48 GB	$2.45 / hr	$0.62 / hr	−75%
RTX 4090	24 GB	—	$0.31 / hr	consumer tier
Apple M3 Max	128 GB unified	—	$0.48 / hr	consumer tier

API access

Every deployment exposes OpenAI-compatible chat-completion and embeddings endpoints out of the box. Existing code that uses the OpenAI SDK works with a one-line URL swap.

from openai import OpenAI

client = OpenAI(
    base_url="https://<your-deployment>.stackgpu.xyz/v1",
    api_key="<your-stackgpu-key>"
)

resp = client.chat.completions.create(
    model="llama-3.1-70b",
    messages=[{"role":"user","content":"hello"}]
)

← Previous

Revenue Sharing

Stake and Earn ETH