# kgpu-gateway — Agent guide

`https://api.kgpu.net` — Lab GPU dispatch for SNUH researchers.

## Access

**One-time setup** — sign in at <https://kgpu.net> with Google. Once approved,
your dashboard at `/view` displays a personal Bearer token
(format `kgpu_<48 hex>`). Copy the `export` line shown there and paste into
your shell rc:

```bash
export KGPU_API_TOKEN=kgpu_xxxxxxxxxxxx...
```

Reopen Claude Code (or any terminal) — `$KGPU_API_TOKEN` is now visible.
Token persists until you sign out — signing out revokes it; sign back in to get a new one.

**Every `/v1/*` request** carries:

```
Authorization: Bearer $KGPU_API_TOKEN
```

Inside a rented GPU container, `$KGPU_API_TOKEN` and `$KGPU_API_BASE` are
auto-injected, so workloads can call `/v1/files` from within without embedding secrets.

**Approval status** — lab members whose email is in the admin allowlist are
auto-approved on sign-in. Others land in `status=pending` and can view cluster
state but not receive a token until an admin upgrades them.

## Credits

The lab GPU is a shared resource. Every rental burns **credits** per minute so
that idle rentals are economically painful and capacity rotates.

- **$1 == 1500 credits.** New users start at **1,000,000 credits** (≈ $666).
- **Price** for the current GPU class (GB10, ≈ RTX 3090): **500 credits/hour**
  (≈ $0.333/hr — RunPod 3090 Community ballpark). Charged per minute,
  rounded up. A new user's 1,000,000 credits buys ~2,000 hours.
- **Precheck.** `POST /v1/gpus` returns `402 insufficient_credits` if your
  balance can't cover one hour at the current price.
- **Out-of-credits return.** A rental whose user balance hits zero is
  auto-returned (`end_reason: out_of_credits`).
- **Top-up.** Admin only — `POST /v1/admin/users/{id}/credits {"credits": N}`.

Always-on endpoints:

| Verb | Path | Auth | What |
|---|---|---|---|
| GET | `/v1/balance` | Bearer | `{balance, price_per_hour_credits, starting_credits}` |

The `GET /v1/gpus` response also carries `credits.balance` and a per-rental
`credits_charged` / `price_per_hour_credits` snapshot.

## Resources (3) — 16 endpoints total

### `/v1/gpus` — rent (대여) and return (반납) GPUs

A GPU is the rental unit. POST a GPU into existence with the container image you want running on it; do your work over `/v1/runs` and `/v1/files`; DELETE to return it and free the hardware for someone else. The container image is fixed at rental time — to switch images, return this GPU and rent a fresh one.

| Verb | Path | Auth | What |
|---|---|---|---|
| GET | `/v1/gpus` | optional | public: cluster summary. authenticated: + the caller's `my_gpus` list |
| POST | `/v1/gpus` | Bearer | **Rent (대여)** a GPU. The container image is fixed for this rental's lifetime |
| GET | `/v1/gpus/{gpu_id}` | Bearer | rental status + child runs |
| DELETE | `/v1/gpus/{gpu_id}` | Bearer | **Return (반납)** the GPU. The container is destroyed and the hardware goes back to the pool |

### `/v1/runs` — execute commands inside a rented GPU

| Verb | Path | Auth | What |
|---|---|---|---|
| POST | `/v1/runs` | Bearer | start a run; long-polls up to `wait_seconds` (default 10, max 60). Returns sync result if quick, `phase: Running` + run_id otherwise |
| GET | `/v1/runs` | Bearer | list runs (`?gpu_id=` optional filter; without it lists across all your rented GPUs) |
| GET | `/v1/runs/{run_id}` | Bearer | current state (stdout snapshot + phase). `?gpu_id=` is an **optional hint** — without it the gateway scans your rented GPUs to locate the run |
| DELETE | `/v1/runs/{run_id}` | Bearer | SIGTERM the process (logs persist until the GPU is returned). `?gpu_id=` optional hint as above |

### `/v1/files` — persistent workspace

Single API surface, two kinds of drives split by URL prefix. Both backed by Cloudflare R2; bytes never proxy through the gateway — every read/write redirects to a presigned R2 URL. Persistent across GPU rentals; the same files are reachable from any GPU you rent next time.

  - `/v1/files/`                    → your **my drive**, scoped to R2 prefix `users/{user_id}/`
  - `/v1/shared/`                   → list of **orgs** you belong to
  - `/v1/shared/<org>/<key>`        → that **org's shared drive** — read by any member, write by uploader/admin role

The shared drive is org-namespaced (e.g. `shared/vitallab/datasets/ecg/...`). You can be a member of multiple orgs; the root listing shows each as a subdirectory. The first segment under `/v1/shared/` MUST be an org slug — there's no flat or default-org fallback.

**My drive — per-user R2 (free egress, 100 GB quota per user by default)**

| Verb | Path | Auth | What |
|---|---|---|---|
| GET | `/v1/files?prefix=&glob=` | Bearer | list your files (flat). Response includes `quota.used_bytes` / `quota.quota_bytes`. `?prefix=` narrows the R2 list; `?glob=` runs an fnmatch filter on the relative paths |
| GET | `/v1/files/{path}` | Bearer | **302 → R2 presigned GET URL** (`curl -L` follows). `Accept: application/json` or `?json=1` returns `{url, size, expires_in}` instead |
| PUT | `/v1/files/{path}` | Bearer | **307 → R2 presigned PUT URL** with `Connection: close`. Same Expect/no-Expect trade-off as the shared drive (see below). Soft quota: `Content-Length` + current `users/{user_id}/` usage must stay under `KGPU_USER_QUOTA_BYTES` (default 100 GiB) — otherwise 413 |
| DELETE | `/v1/files/{path}` | Bearer | delete a single object, or every object under a prefix when `path` ends with `/` (or when no exact-key match exists) |

**Shared drive — R2 (gateway issues redirects/URLs; bytes never proxy through it)**

| Verb | Path | Auth | What |
|---|---|---|---|
| GET | `/v1/shared` | Bearer | list orgs you belong to (HTML for rclone, JSON via `?json=1`) |
| GET | `/v1/shared/<org>/` | Bearer member | directory listing under that org |
| GET | `/v1/shared/<org>/<key>` | Bearer member | **302 → R2 presigned GET URL** (Accept JSON for the envelope shape) |
| POST | `/v1/shared/<org>/<key>` | Bearer + uploader/admin in org | mint presigned PUT URL (for clients without `Expect: 100-continue`) |
| PUT | `/v1/shared/<org>/<key>` | Bearer + uploader/admin in org | **307 → R2 presigned PUT URL** with `Connection: close` |
| DELETE | `/v1/shared/<org>/<key>` | Bearer + uploader/admin in org | delete the R2 object |
| POST | `/v1/shared/<org>/<key>/registered` | Bearer + uploader/admin in org | record an out-of-band PUT in the SharedFile FAT |

Role required per op: read (any member), write (uploader or admin in that org). Membership is managed by admins via `/v1/admin/orgs/<slug>/members` — see below.

Shared writes require an uploader/admin flag on the caller; reads are open to any Bearer-authenticated user. If R2 credentials are not configured the shared endpoints return 503.

Both writeable paths use the same R2 presign under the hood — `POST` mints a URL for callers that prefer the explicit two-step; `PUT` is the one-shot that works from any HTTP client (just send `Expect: 100-continue` if you don't want to double-upload your own bytes).

**Why GET stays a 302** — Cloudflare R2 egress is free; AWS Seoul egress is $0.126/GB. Proxying GETs through the gateway would convert the high-volume side (downloads) into AWS egress costs that scale linearly with researcher count, so the gateway hands out a signed URL and steps out of the data path.

**One-shot examples**

```bash
# Download — curl follows the 302 to R2:
curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
     "$KGPU_API_BASE/v1/shared/vitallab/datasets/ecg/mit-bih-arrhythmia-database-1.0.0.zip" \
     -o mitdb.zip

# Upload — fast path. `Expect: 100-continue` avoids sending the body twice
# (curl 8.x no longer adds it automatically, so spell it out):
curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
     -H "Expect: 100-continue" \
     -T mitdb.zip \
     "$KGPU_API_BASE/v1/shared/vitallab/datasets/ecg/mit-bih-arrhythmia-database-1.0.0.zip"

# Upload — also works without Expect, but the gateway eats your first copy
# of the body before redirecting (you pay 2× bandwidth, we still pay $0):
curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
     -T mitdb.zip \
     "$KGPU_API_BASE/v1/shared/vitallab/datasets/ecg/mit-bih-arrhythmia-database-1.0.0.zip"
```

**Two-step upload (Python `requests`, PowerShell, browser fetch — any client where you want a single network transfer without the Expect dance)**

```python
import requests
r = requests.post(f"{API}/v1/shared/{key}",
                  headers={"Authorization": f"Bearer {tok}",
                           "x-content-type": "application/zip"}).json()
with open(local, "rb") as f:
    requests.put(r["url"], data=f, headers={"Content-Type": "application/zip"})
```

## Folders inside a rented GPU

When you rent with the recommended `ghcr.io/vitaldb/kgpu-pytorch:latest`
image, the gateway's pod startup runs `kgpu-bootstrap`, which mounts
two filesystems before the rental is reported `Ready`:

| Path | Backed by | RW? | Use it for |
|---|---|---|---|
| **`/shared/<org>/`** | each org's shared drive (rclone HTTP, vfs-cache full) | RO | reading lab datasets — `duckdb`, `pyarrow`, `wfdb` open files in place. Mount root lists every org you belong to as a subdir |
| **`/files`** | your mydrive (rclone WebDAV, vfs-cache writes) | **RW** | small editable files, results drops, notebook saves |
| `/workspace` | pod ephemeral local disk (≤100 GiB) | RW | training I/O — checkpoints, logs, scratch. Lost on rental end |

`/files` also exposes `shared/<org>/` subdirs (same data, just via the
slower WebDAV path). Stick with `/shared/<org>/` for dataset reads —
e.g. `/shared/vitallab/datasets/ecg/foo.zip`.

**Other images** — if you POST a `/v1/gpus` with a different image
(e.g. `nvcr.io/nvidia/pytorch:24.10-py3` or your own), the pod falls back
to plain `sleep infinity` and you mount manually:

```bash
apt-get update -qq && apt-get install -y -qq --no-install-recommends fuse3 ca-certificates curl unzip zstd
curl -fsSL https://rclone.org/install.sh | bash
# then either: kgpu-mount-shared && kgpu-mount-files (if helpers shipped)
# or the raw rclone commands below
```

On a plain image, set them up manually:

```bash
apt-get update -qq && apt-get install -y -qq --no-install-recommends fuse3 ca-certificates curl unzip
curl -fsSL https://rclone.org/install.sh | bash

# Read-only shared (HTTP backend — simpler, no metadata RT-trips per stat)
mkdir -p /shared
rclone mount :http: /shared \
  --http-url "$KGPU_API_BASE/v1/shared/" \
  --http-headers "Authorization,Bearer $KGPU_API_TOKEN" \
  --read-only --vfs-cache-mode full \
  --vfs-read-chunk-size 16M --vfs-read-chunk-streams 4 --daemon

# Read-write mydrive (WebDAV backend — also exposes shared/ as a subdir, RO)
mkdir -p /files
rclone mount :webdav: /files \
  --webdav-url "$KGPU_API_BASE/v1/files/" \
  --webdav-vendor other \
  --webdav-headers "Authorization,Bearer $KGPU_API_TOKEN" \
  --vfs-cache-mode writes --daemon
```

Cache modes:
- `--vfs-cache-mode full` — random access without downloading the whole file.
  Best for big zips you'll seek into (`duckdb`, `pyarrow.parquet`, `wfdb`).
- `--vfs-cache-mode writes` — writes go through local cache then upload.
  Required for any write workload on WebDAV.
- `--vfs-cache-mode off` — pure stream. Best for one-shot extract:
  `unzip /shared/vitallab/datasets/ecg/foo.zip -d /workspace/`.

To unmount: `fusermount3 -u /shared` (or `/files`).

### ⚠️ Filesystem write hygiene — read this before writing to the mount

The WebDAV mount is **convenient for browsing and dropping small results**,
but it's not a real POSIX filesystem under the hood. R2 is an object store,
and every "write" is at minimum one full object upload. Specifically:

- **Every append re-uploads the entire object.** `tee -a log.txt`,
  `>> file`, sqlite WAL — all of these will silently turn into "download
  → modify → upload entire file" cycles. Don't do this on training logs
  or checkpoints; the mount will crawl
- **`open(..., 'r+').seek(N).write(...)` and random in-place writes** force
  whole-object rewrite per call. Same trap as above
- **`utime()`/`touch` to update mtime is a no-op**, `chmod`/`chown` ignored,
  hardlinks / symlinks unsupported
- **Two writers to the same path → last-write-wins** (no locking)

**The pattern that works:**

```bash
# 1. Write everything during training to local pod disk
mkdir -p /workspace/exp42
python train.py --out /workspace/exp42      # checkpoints, logs, plots here

# 2. At the end, package + upload in ONE transfer
tar -czf /tmp/exp42.tgz -C /workspace exp42
curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
     -H "Expect: 100-continue" \
     -T /tmp/exp42.tgz \
     "$KGPU_API_BASE/v1/files/exp42-outputs.tgz"
```

Or even simpler — stream the tar straight to the gateway without a temp file:

```bash
tar -czf - -C /workspace exp42 | \
  curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
       -H "Expect: 100-continue" -H "Content-Type: application/gzip" \
       --data-binary @- \
       "$KGPU_API_BASE/v1/files/exp42-outputs.tgz"
```

**When the WebDAV mount IS the right tool:** Jupyter notebook save,
dropping a small results.csv interactively, editing a config file across
sessions. Not for the training loop's I/O path.

### Per-experiment venv (recommended)

The base image ships with `uv`. Create a venv per experiment so its
package set is isolated and reproducible:

```bash
uv venv /workspace/exp42 --python 3.12
source /workspace/exp42/bin/activate
uv pip install transformers wandb 'torch==2.5.*'
# ... train ...
```

To persist the venv across rentals (so the next rental doesn't reinstall),
tar it onto your mydrive at the end:

```bash
tar -cf - -C /workspace exp42 | zstd -T0 | \
  curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
       -H "Expect: 100-continue" -H "Content-Type: application/zstd" \
       --data-binary @- \
       "$KGPU_API_BASE/v1/files/envs/exp42.tar.zst"

# Next rental:
curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
  "$KGPU_API_BASE/v1/files/envs/exp42.tar.zst" | \
  zstd -dT0 | tar -xf - -C /workspace
source /workspace/exp42/bin/activate
```

### API surface (what the mounts call)

- `GET /v1/files/[<prefix>/]` — HTML directory listing (rclone HTTP) or JSON
  with `?json=1`
- `HEAD /v1/files/<key>` — size, content-type, ETag
- `GET /v1/files/<key>` — 302 to a presigned R2 GET URL
- `PUT /v1/files/<key>` — 307 to a presigned R2 PUT URL
- `DELETE /v1/files/<key>` — single file or `<key>/` prefix delete
- `OPTIONS / PROPFIND / MKCOL / MOVE / COPY` — WebDAV (mydrive only;
  shared/ subtree's writes return 403)

You can drive the same API with `curl`, `requests`, `boto3` (via the
presigned URLs), or any HTTP client.

## Admin: organizations

The shared drive is namespaced by **org** (e.g. `vitallab`, plus whatever
other research groups get onboarded). Admin Bearer (env token or `role=admin`)
only.

```bash
# List orgs
curl -sS -H "Authorization: Bearer $T" https://api.kgpu.net/v1/admin/orgs

# Create an org (slug must be DNS-friendly lowercase)
curl -sS -X POST -H "Authorization: Bearer $T" -H "Content-Type: application/json" \
  https://api.kgpu.net/v1/admin/orgs \
  -d '{"slug":"snuhecg","name":"SNUH ECG Research","owner_user_id":1}'

# Add / change a member's role (admin | uploader | member)
curl -sS -X POST -H "Authorization: Bearer $T" -H "Content-Type: application/json" \
  https://api.kgpu.net/v1/admin/orgs/snuhecg/members \
  -d '{"user_id":2,"role":"uploader"}'

# Remove a member
curl -sS -X DELETE -H "Authorization: Bearer $T" \
  https://api.kgpu.net/v1/admin/orgs/snuhecg/members/2
```

Roles:
- **admin** — manage members + read + write
- **uploader** — read + write
- **member** — read only

## Auth

```
Authorization: Bearer $KGPU_API_TOKEN
```

Inside a rented GPU's container, env vars `$KGPU_API_TOKEN` and `$KGPU_API_BASE` are auto-injected.

## Workflow A — quick one-shot

```bash
# 1. Rent a GPU with the image you want running on it
GPU=$(curl -sS -X POST -H "Authorization: Bearer $T" -H "Content-Type: application/json" \
  https://api.kgpu.net/v1/gpus -d '{
    "name": "check",
    "image": "nvcr.io/nvidia/cuda:13.0.0-base-ubuntu24.04"
  }' | jq -r .gpu_id)
# Wait for the GPU to become Ready (~5–30s if the image isn't already cached on the node)
sleep 10

# 2. Run command (sync return if <10s)
curl -sS -X POST -H "Authorization: Bearer $T" -H "Content-Type: application/json" \
  https://api.kgpu.net/v1/runs -d "{
    \"gpu_id\": \"$GPU\",
    \"name\": \"nvidia-smi\",
    \"command\": [\"nvidia-smi\", \"-L\"]
  }"
# → {phase:"Succeeded", exit_code:0, stdout:"GPU 0: NVIDIA GB10...", ...}

# 3. Return (반납) the GPU
curl -sS -X DELETE -H "Authorization: Bearer $T" https://api.kgpu.net/v1/gpus/$GPU
```

## Persisting artifacts (중요)

**The rented GPU's container disk is ephemeral.** When you return the GPU
(`DELETE /v1/gpus/{id}`) or it auto-returns on idle / `duration_hours` expiry,
everything inside the container — including `/tmp`, the writable layer,
`$HOME`, and `/tmp/kgpu/runs/{run_id}/` — is destroyed. Pod-local run logs
disappear with the pod; if you need them, mirror them through
`/files/.../` (WebDAV mount) or curl-PUT to `/v1/files/...` before
returning the rental. Anything *you* produced (model checkpoints, plots,
predictions, intermediate csv,
preprocessed datasets) is lost unless you upload it.

**Rule for agents and humans alike:** upload every artifact you'd want to
see again to `/v1/files/{path}` (my drive — per-user R2, 100 GB quota,
persists across rentals). From inside a rented GPU the call is one curl
(the env vars are auto-injected):

```bash
# Save a checkpoint from inside the rented GPU's container
curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
     -H "Expect: 100-continue" \
     -T /workspace/checkpoint.pt \
     "$KGPU_API_BASE/v1/files/checkpoints/exp42/checkpoint.pt"

# Save a results csv
curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
     -H "Expect: 100-continue" \
     -T /workspace/results.csv \
     "$KGPU_API_BASE/v1/files/exp42/results.csv"
```

A useful pattern is to make the **last command of your run** a `tar` of your
output dir piped to `/v1/files/`:

```bash
tar -czf - -C /workspace outputs/ | \
  curl -L -H "Authorization: Bearer $KGPU_API_TOKEN" \
       -H "Expect: 100-continue" \
       -H "Content-Type: application/gzip" \
       --data-binary @- \
       "$KGPU_API_BASE/v1/files/runs-out/${KGPU_GPU_ID}-outputs.tgz"
```

Anything you skip uploading is gone the moment the pod dies.

## Workflow B — iterative dev with pip cache

```bash
# Rent (대여) a GPU
GPU=$(curl -sS -X POST -H "Authorization: Bearer $T" -H "Content-Type: application/json" \
  https://api.kgpu.net/v1/gpus -d '{
    "name": "exp",
    "image": "nvcr.io/nvidia/pytorch:24.10-py3"
  }' | jq -r .gpu_id)
sleep 15

# Install deps once (cached in container until DELETE /v1/gpus)
curl -sS -X POST -H "Authorization: Bearer $T" -H "Content-Type: application/json" \
  https://api.kgpu.net/v1/runs -d "{
    \"gpu_id\": \"$GPU\",
    \"name\": \"pip\",
    \"command\": [\"pip\", \"install\", \"transformers\", \"datasets\"],
    \"wait_seconds\": 60
  }"

# Upload code — `-L` to follow the 307 to R2; `Expect: 100-continue` so the
# body never touches the gateway (curl 8.x doesn't add it automatically).
curl -L -H "Authorization: Bearer $T" -H "Expect: 100-continue" -T train.py \
  https://api.kgpu.net/v1/files/train.py

# Run (long-poll 10s; if not done, poll). The inner curl follows the 302
# from /v1/files/train.py to R2 — same `-L` flag.
RUN=$(curl -sS -X POST -H "Authorization: Bearer $T" -H "Content-Type: application/json" \
  https://api.kgpu.net/v1/runs -d "{
    \"gpu_id\": \"$GPU\",
    \"name\": \"train\",
    \"command\": [\"bash\", \"-c\", \"curl -sLH 'Authorization: Bearer $KGPU_API_TOKEN' $KGPU_API_BASE/v1/files/train.py > /tmp/train.py && python /tmp/train.py\"]
  }")
# If phase="Running", poll:
RUN_ID=$(echo $RUN | jq -r .run_id)
while true; do
  ST=$(curl -sS -H "Authorization: Bearer $T" "https://api.kgpu.net/v1/runs/$RUN_ID?gpu_id=$GPU")
  PHASE=$(echo $ST | jq -r .phase)
  [ "$PHASE" != "Running" ] && break
  sleep 10
done
echo $ST | jq .stdout -r

# Return (반납) the GPU
curl -X DELETE -H "Authorization: Bearer $T" https://api.kgpu.net/v1/gpus/$GPU
```

## Run response shape

```json
{
  "run_id": "run-<name>-<ts>-<rand>",
  "gpu_id": "gpu-<name>-<ts>-<rand>",
  "phase": "Succeeded" | "Failed" | "Running",
  "exit_code": 0 | null,
  "stdout": "...",
  "stderr": "...",
  "started_at": "2026-05-17T01:00:00Z",
  "completed_at": "2026-05-17T01:00:02Z" | null,
  "elapsed_seconds": 2.3,
  "pid": 12345
}
```

If `phase: "Running"`, follow up with `GET /v1/runs/{run_id}?gpu_id=...` until `phase` changes.

## Cluster info (no auth)

```bash
curl https://api.kgpu.net/v1/gpus
# → {
#     cluster: [{node, gpu_model, gpu_busy, gpu_total, ready, price_per_hour_credits}],
#     active_total: N,
#     price_per_hour_credits: 500   # current rental price; Phase-1 single price
#   }
```

Authenticated callers also see `credits.balance` and a per-rental
`credits_charged` / `price_per_hour_credits` snapshot under `my_gpus[]`.

## Phase 1 limits

- Single GPU node (spark-llm, NVIDIA GB10). Multiple concurrent rentals on the same node share the physical GPU (CUDA context contention)
- `/v1/files/{path}` (my drive) is per-user R2 with a 100 GB default soft quota (`KGPU_USER_QUOTA_BYTES`). Persistent across rentals, free egress, no gateway bandwidth penalty. For datasets shared across an org use `/v1/shared/<org>/*` (uploader/admin role required for writes)
- Run logs in `/tmp/kgpu/runs/{run_id}/` inside the rented GPU. Lost when the GPU is returned — if you want them after, mirror them through the WebDAV mount at `/files/` or curl-PUT them out before returning
- **Idle timeout.** No `/v1/runs` POST activity on the rental for 1 hour → `idle_warning: true` appears on the GPU in `GET /v1/gpus`. 2 hours → the gateway auto-returns the GPU (`end_reason: auto_idle`). To keep a long-running training session alive, just keep your run going (any active `/v1/runs` counts as activity). `DELETE /v1/gpus/{gpu_id}` (반납) still works at any time, and `duration_hours` (default 12h, max 168h) is still the hard ceiling
- **Credits.** Every rental costs credits per minute — see the [Credits](#credits) section above. Out-of-credits → auto-return (`end_reason: out_of_credits`)
- Single admin user (`lucid80@gmail.com`). Phase 2: per-user namespaces and isolation
- No GPU resource scheduling (K8s device plugin incompatible with GB10) — concurrent rentals on one node share the physical GPU cooperatively
- The rented GPU's local disk (writable container layer + `/tmp`) is **ephemeral** and **capped at 100 GiB per rental** (`resources.limits.ephemeral-storage`). Exceeding the cap evicts your pod and you lose anything not uploaded. Anything you need to keep across rentals must be PUT to `/v1/files/...` before you return the GPU — see the "Persisting artifacts" section above

## Container env auto-injected

| Var | Value |
|---|---|
| `KGPU_GPU_ID` | this rental's id |
| `KGPU_API_TOKEN` | Bearer token |
| `KGPU_API_BASE` | `https://api.kgpu.net` |
| `NVIDIA_VISIBLE_DEVICES` | `all` |
| `NVIDIA_DRIVER_CAPABILITIES` | `compute,utility` |

## Useful base images

| Purpose | Image |
|---|---|
| **Recommended** — PyTorch + rclone + fuse + duckdb + `kgpu-mount-shared` | `ghcr.io/vitaldb/kgpu-pytorch:latest` |
| Bare CUDA / nvidia-smi | `nvcr.io/nvidia/cuda:13.0.0-base-ubuntu24.04` |
| PyTorch (vanilla NGC, no extras) | `nvcr.io/nvidia/pytorch:24.10-py3` |
| TensorFlow | `nvcr.io/nvidia/tensorflow:24.10-tf2-py3` |

The first image is built in [vitaldb/kgpu-images](https://github.com/vitaldb/kgpu-images)
and saves ~30 s of `apt-get + rclone install` on every rental.