// features · rentals
On-Demand Nodes
Rent GPU compute by the second. Pay only for verifiable cycles. Filter the mesh by GPU model, memory, region, and latency. OpenAI-compatible endpoints.
Two ways to rent
Spot — pay per cycle
Best for short jobs, experiments, batch inference. Bid against current market price. Node can be reclaimed by higher bidder if you don't reserve.
Reserved — pay per hour
Lock a specific node for 1h to 30d. Up to 40% cheaper than spot for long-running workloads. Cannot be reclaimed.
Filter the mesh
The deployment dashboard exposes filters by:
- GPU model — RTX 3090 / 4090 / A4000 / A6000 / A100 / H100 / L40S / M3 Max
- GPU memory — 16 / 24 / 48 / 80 / 128 GB
- Region — US-East / US-West / EU-Frankfurt / EU-Amsterdam / Asia-Tokyo / Asia-Singapore / Global mesh
- Network — minimum bandwidth / max p95 latency
Pricing snapshot
| GPU | VRAM | AWS On-Demand | VouchGPU spot | You save |
|---|---|---|---|---|
| NVIDIA H100 | 80 GB | $12.29 / hr | $2.84 / hr | −77% |
| NVIDIA A100 | 80 GB | $4.10 / hr | $0.96 / hr | −77% |
| NVIDIA L40S | 48 GB | $2.45 / hr | $0.62 / hr | −75% |
| RTX 4090 | 24 GB | — | $0.31 / hr | consumer tier |
| Apple M3 Max | 128 GB unified | — | $0.48 / hr | consumer tier |
API access
Every deployment exposes OpenAI-compatible chat-completion and embeddings endpoints out of the box. Existing code that uses the OpenAI SDK works with a one-line URL swap.
from openai import OpenAI
client = OpenAI(
base_url="https://<your-deployment>.vouchgpu.xyz/v1",
api_key="<your-vouchgpu-key>"
)
resp = client.chat.completions.create(
model="llama-3.1-70b",
messages=[{"role":"user","content":"hello"}]
)