Runlog

Runlog

Runlog v1

Main

◈ Projects

⊞ Compare

◉ Chat

★ Plans

⬡ Workspace

? Docs

Account

⚿ API Tokens

⚙ Settings

◎ About

→ Logout

◁ Collapse

0%

—

—

Set password

or

Login

Register

Forgot Password

v1 is full

All v1 spots are taken.
Join the waitlist — we'll email you when a spot opens.

Dashboard

Your Projects

Each project holds multiple training runs.

Loading projects…

Project

—

Runs

No runs yet. Start training with Runlog.

Subscription

Choose your Plan

Upgrade anytime. Downgrade anytime.

Monthly

Annual

Loading plans…

Need something tailored?

Request a custom plan with limits that fit your exact needs.

View payment history

Collaboration

Your Workspace

Invite teammates to collaborate on projects.

Workspaces

＋ Create

✉ Invitations

🕓 History

🔐 Roles

Loading workspaces…

Project

Runs Table

Documentation

RunLogger Docs

📚 Version 1.0

🕒 Last updated: June 2026

🔍

Sections

Tutorial Videos

Installation

Getting Started

API Reference

Offline Mode

Runtime Knobs

PyTorch

HuggingFace

Keras

XGBoost

System Stats

Collaboration

Plans

Terminal Capture

Manual Sync

Auto Names

Tags & Notes

Error Handling

FAQ

Tutorial Videos

📹 What You'll Learn

Install runlog-sdk and generate your API token
Send training metrics live with logger.log()
Register runtime knobs — adjust lr, Sleep Time, or any variable mid-run from the dashboard
Log eval steps and checkpoint names with log_eval()
Stream terminal output automatically — no extra code
Detect crashed and dead runs with instant email alerts
Pause training from the dashboard with zero state loss
View the full knob audit trail overlaid on your loss chart

Installation

💡 Tip

We recommend installing in a virtual environment to keep your dependencies isolated.

pip install runlog-sdk

Or install from source:

pip install git+https://github.com/runlog-in/runlog-sdk.git

Getting Started

📝 Note

This example shows the basic pattern. See framework-specific guides below for PyTorch, HuggingFace, Keras, and XGBoost.

from runlogger import RunLogger

logger = RunLogger(
    base_url      = "https://runlog.in",
    project_name  = "my-project",              # created automatically if missing
    api_token     = "rl-gb-...",               # Dashboard → API Tokens
    run_name      = "run-1",                   # optional — auto-generated if omitted
    config        = {"model": "gpt2", "params": "125M"},
    tags          = ["baseline", "v1"],
    offline_mode  = True,                      # preserve data if connection drops
)

for step in range(1000):
    loss = train_one_step()

    logger.log(step=step, total_steps=1000, loss=loss, lr=scheduler.get_lr())

    if step % 100 == 0:
        val_loss = evaluate()
        logger.log_eval(step=step, val_loss=val_loss, is_best=val_loss < best)

    if logger.should_pause():
        save_checkpoint(step)
        logger.finish("paused")
        break

logger.finish()

API Reference

RunLogger()

Creates a new run and connects to the dashboard.

base_urlstrDashboard URL e.g. https://runlog.in

api_tokenstrYour API token from Dashboard → API Tokens

project_namestrProject name — auto-created if missing

run_namestrName for this run — auto-generated if not provided (e.g. cosmic-nebula-42)

configdictHyperparameters and metadata. Visible on the run page.

start_stepintStep to start from. Use when resuming from a checkpoint. Default: 0.

tagslistRun tags e.g. ["baseline", "fp16", "v2"]

notesstrFree-text description of the run.

log_system_statsboolAuto-attach GPU/CPU/RAM stats to every log() call. Default: True.

offline_modeboolPreserve data locally if connection is unavailable. Syncs automatically on reconnect. Default: True. Requires a supported plan.

capture_terminalboolCapture stdout/stderr and stream terminal output to the dashboard alongside metrics. Default: True.

verboseboolPrint internal debug info — packet counts, sync intervals, orphan recovery detail. Default: False.

metricslistOptional list of metric names you plan to log. Metrics are tracked automatically — this is rarely needed. Default: [].

logger = RunLogger(
    base_url         = "https://runlog.in",
    api_token        = "rl-gb-...",
    project_name     = "llm-pretraining",
    run_name         = "gpt2-run-3",
    config           = {"params": "125M", "batch_size": 32, "max_steps": 50000},
    start_step       = 5000,
    tags             = ["fp16", "warmup-cosine"],
    notes            = "Resume from best checkpoint, new LR schedule",
    log_system_stats = True,
    offline_mode     = True,
)

logger.log(step, **kwargs)

Log training metrics at the current step. Pass any keyword arguments — each becomes a chart on the dashboard. total_steps enables the progress bar. Buffering and rate limiting are handled automatically.

logger.log(
    step           = step,
    total_steps    = total_steps,
    train_loss     = loss.item(),
    lr             = scheduler.get_last_lr()[0],
    tokens_per_sec = tokens_per_sec,
    total_tokens   = step * batch_size * seq_len,
    eta_seconds    = (total_steps - step) * step_time,
)

logger.log_eval(step, **kwargs)

Log evaluation metrics. Tracked separately from training metrics on the dashboard. Pass is_best=True to flag the current best checkpoint.

logger.log_eval(
    step             = step,
    val_loss         = val_loss,
    ppl              = math.exp(val_loss),
    accuracy         = accuracy,
    is_best          = is_best,
    checkpoint_saved = is_best,
    checkpoint_path  = "checkpoints/best.pt" if is_best else None,
)

logger.should_pause()

Returns True if a pause was triggered from the dashboard. Call once per step — the flag clears automatically after being read.

if logger.should_pause():
    save_checkpoint(step)
    logger.finish("paused")
    sys.exit(0)

logger.finish(status)

Mark the run as done. Always call this at the end of your script. Status options: completed | crashed | paused. Waits up to 10 seconds for any pending data before closing.

Context Manager

Automatically calls finish("completed") on normal exit and finish("crashed") if an exception is raised.

with RunLogger(...) as logger:
    for step in range(steps):
        logger.log(step=step, loss=loss)

Offline Mode

⚠️ Important

Offline mode requires a supported plan. If your plan does not include it, it is disabled automatically at startup with a warning.

RunLogger's offline mode is designed for real-world training conditions where connections are unreliable. Enable it once — everything else is automatic.

logger = RunLogger(
    ...,
    offline_mode = True,   # default
)

What it does

When offline_mode=True:

Mid-run disconnectTraining continues uninterrupted. All data is preserved locally and synced automatically when the connection is restored — in order, with no gaps.

Start offlineYou can start training with no connection at all. Everything is buffered locally and uploaded on the next successful connection.

Crashed or killed runsIf your process is killed mid-training, all data logged before the crash is preserved. The next time you start a run from the same directory, it is recovered and synced automatically — no manual steps.

Plan limit mid-runIf your daily log limit is reached, data that could not be uploaded is held locally. It is automatically uploaded the next day when your limit resets.

Terminal logsAll terminal output is captured and streamed to the dashboard in real time. If offline, chunks are stored locally and flushed on reconnect.

Plan requirement

Offline mode requires a supported plan. If your plan does not include it, it is disabled automatically at startup with a warning. If your plan is upgraded mid-run, offline mode activates immediately — no restart needed.

When to use

Long training runsoffline_mode=True

Unstable or intermittent networkoffline_mode=True

Short scripts, stable connectionEither

No local disk writes allowedoffline_mode=False

Runtime Knobs

💡 Tip

Knobs are perfect for hyperparameter tuning without restarting your training run. Adjust learning rate, batch size, or any custom value in real-time.

Register any value as a "knob," and it becomes a live dial on the dashboard. Anyone with edit access can drag it mid-run — your script picks up the new value on its very next read, no restart needed.

logger.register_knob(key, value, min, max, label=None)

Registers a knob. Can be called before training starts or at any point during the run — call it again later to add more knobs as you go.

keystrUnique name for this knob, used to read it back via logger.knobs.

valuefloatStarting value.

minfloatLower bound for the dashboard dial. Default: 0.0.

maxfloatUpper bound for the dashboard dial. Default: 1.0.

labelstrDisplay name on the dashboard. Defaults to key if omitted.

logger.knobs

A read-only dict of current knob values. Read it fresh on every step (or wherever relevant) — values update automatically in the background as the dashboard sends changes.

logger.register_knob("lr",         value=1e-3, min=1e-5, max=0.1)
logger.register_knob("sleep_time", value=0.5,  min=0.0,  max=2.0)

for step in range(total_steps):
    lr = logger.knobs.get("lr", 1e-3)
    optimizer.param_groups[0]['lr'] = lr

    loss = train_step()
    logger.log(step=step, loss=loss, lr=lr)

    time.sleep(logger.knobs.get("sleep_time", 0.5))

Permissions

Knobs are visible to anyone who can view the run, but only editable by the project owner or collaborators with member role or higher. Viewers see the dial move in real time but can't drag it.

Reconnects & crashes

Knob values are persisted locally when offline_mode=True, so if your script disconnects or crashes mid-run, the most recent dashboard-set values are restored automatically on reconnect — you won't lose a tuned value to a flaky connection.

PyTorch

📝 Note

For multi-GPU / DDP training, log only from rank 0 to avoid duplicate data.

logger = RunLogger(
    base_url     = "https://runlog.in",
    api_token    = "rl-gb-...",
    project_name = "my-project",
    run_name     = "pytorch-run",
    config       = {"arch": "gpt2", "batch_size": batch_size},
    offline_mode = True,
)

try:
    for step in range(total_steps):
        loss = criterion(model(x), y)
        loss.backward()
        optimizer.step()
        scheduler.step()

        logger.log(
            step           = step,
            total_steps    = total_steps,
            train_loss     = loss.item(),
            lr             = scheduler.get_last_lr()[0],
            tokens_per_sec = batch_size * seq_len / step_time,
        )

        if step % eval_every == 0:
            val_loss = evaluate(model, val_loader)
            is_best  = val_loss < best_loss
            if is_best:
                torch.save(model.state_dict(), "best.pt")
            logger.log_eval(step=step, val_loss=val_loss, is_best=is_best,
                            checkpoint_path="best.pt" if is_best else None)

        if logger.should_pause():
            torch.save(model.state_dict(), f"pause_{step}.pt")
            logger.finish("paused")
            break

    logger.finish("completed")
except Exception:
    logger.finish("crashed")
    raise

For multi-GPU / DDP training, log only from rank 0:

if rank == 0:
    logger.log(step=step, loss=loss)

HuggingFace Trainer

📝 Note

This callback integrates seamlessly with HuggingFace's Trainer class, automatically logging training and evaluation metrics.

from runlogger import RunLogger
from transformers import TrainerCallback

class RunLoggerCallback(TrainerCallback):
    def __init__(self, logger):
        self.logger = logger

    def on_log(self, args, state, control, logs=None, **kwargs):
        if logs:
            self.logger.log(step=state.global_step,
                            total_steps=state.max_steps, **logs)

    def on_evaluate(self, args, state, control, metrics=None, **kwargs):
        if metrics:
            self.logger.log_eval(step=state.global_step, **metrics)

    def on_train_end(self, args, state, control, **kwargs):
        self.logger.finish()

# usage:
logger  = RunLogger(..., offline_mode=True)
trainer = Trainer(..., callbacks=[RunLoggerCallback(logger)])

Keras / TensorFlow

💡 Tip

This callback works with both Keras 2.x and Keras 3.x, as well as pure TensorFlow training loops.

import tensorflow as tf
from runlogger import RunLogger

class RunLoggerCallback(tf.keras.callbacks.Callback):
    def __init__(self, logger, total_epochs):
        self.logger       = logger
        self.total_epochs = total_epochs

    def on_epoch_end(self, epoch, logs=None):
        self.logger.log(step=epoch, total_steps=self.total_epochs, **(logs or {}))

    def on_train_end(self, logs=None):
        self.logger.finish()

# usage:
logger = RunLogger(..., offline_mode=True)
model.fit(X, y, epochs=50, callbacks=[RunLoggerCallback(logger, total_epochs=50)])

XGBoost

📝 Note

This callback automatically logs all evaluation metrics from each iteration, making it easy to track boosting performance.

import xgboost as xgb
from runlogger import RunLogger

class RunLoggerXGBCallback(xgb.callback.TrainingCallback):
    def __init__(self, logger, total_rounds):
        self.logger       = logger
        self.total_rounds = total_rounds

    def after_iteration(self, model, epoch, evals_log):
        metrics = {}
        for data, metric_dict in evals_log.items():
            for name, vals in metric_dict.items():
                metrics[f"{data}_{name}"] = vals[-1]
        self.logger.log(step=epoch, total_steps=self.total_rounds, **metrics)
        return False

# usage:
logger = RunLogger(..., offline_mode=True)
bst    = xgb.train(params, dtrain, num_boost_round=100,
                   evals=[(dval, "val")],
                   callbacks=[RunLoggerXGBCallback(logger, 100)])

Automatic System Stats

💡 Tip

Install optional dependencies to enable automatic hardware monitoring. No code changes needed.

When optional packages are installed, RunLogger automatically appends hardware metrics to every log() call. These appear as charts alongside your training metrics — no extra code needed.

gpu_utilpynvmlGPU utilization (%)

gpu_mem_usedpynvmlGPU memory used (MB)

gpu_mem_totalpynvmlTotal GPU memory (MB)

cpu_utilpsutilCPU utilization (%)

ram_usedpsutilRAM used (MB)

ram_totalpsutilTotal system RAM (MB)

# install optional dependencies
pip install pynvml psutil

# disable if not needed
logger = RunLogger(..., log_system_stats=False)

Stats are collected from GPU 0. If no GPU is present only CPU/RAM metrics are logged. If neither package is installed, system stats are silently skipped.

Collaboration

💡 Tip

Team workspaces are perfect for research groups and ML teams who need to share experiments and collaborate on model development.

Pro and Elite plans support team workspaces. Create a workspace, invite teammates by email, and share projects across your organization.

For team workspaces, go to Workspace in the sidebar. Roles:

adminManage members, all projects

memberCreate/edit projects, view all

viewerRead only

Plans

📝 Note

Plan changes take effect immediately, even mid-run. You can upgrade or downgrade at any time without interrupting your training.

Plans and limits are managed from the dashboard's Plans page. Upgrade or downgrade at any time — changes take effect immediately, even mid-run.

Daily log limitRunLogger warns you when reached and resumes automatically the next day.

Max metrics trackedMetric keys beyond your plan's limit are ignored.

Log rateData is accepted at the rate your plan allows. The most recent value always gets through.

Offline modeAvailable on supported plans. Activates and deactivates automatically with plan changes.

Team workspacesAvailable on Pro and Elite plans.

Terminal Capture

💡 Tip

Terminal capture is enabled by default. All your print() statements, tqdm progress bars, and framework logs appear automatically on the dashboard.

When capture_terminal=True (the default), RunLogger intercepts all stdout and stderr output from your training script and streams it to the dashboard in real time alongside your metrics. No extra code needed — print() statements, tqdm progress bars, and framework logs all appear automatically.

logger = RunLogger(
        ...,
        capture_terminal = True,   # default — streams all print() output to dashboard
    )

Offline behaviour

If the connection drops mid-run, terminal chunks are stored locally and flushed to the dashboard on reconnect — in order, with no gaps. This requires offline_mode=True.

Disable if needed

logger = RunLogger(
        ...,
        capture_terminal = False,  # raw stdout only, nothing sent to dashboard
    )

Disable if your script produces extremely high-frequency output that you don't need on the dashboard, or if you're running in an environment where stdout redirection is not allowed.

Manual Sync CLI

📝 Note

The CLI tool is useful for recovering data from interrupted runs. Safe to run multiple times — already-synced packets are skipped automatically.

If a run was interrupted and you want to sync its locally buffered data without starting a new run, use the runlogger-sync command:

# scan the default dumps/ directory
    runlogger-sync

    # scan a specific directory
    runlogger-sync --dir /path/to/runs

    # sync one specific file
    runlogger-sync --file dumps/.runlog_abc123.db

    # show full debug output
    runlogger-sync --verbose
    runlogger-sync -v

Options

--dirDirectory to scan for offline DB files. Default: dumps/

--fileSync a single specific DB file directly.

--base-urlServer URL — fallback if not stored in the DB. Can also be set via RUNLOGGER_URL.

--tokenAPI token — fallback if not stored in the DB. Can also be set via RUNLOGGER_TOKEN.

--verbose, -vShow full debug detail: run IDs, per-batch info, log uploads.

Environment variables

export RUNLOGGER_URL=https://runlog.in
    export RUNLOGGER_TOKEN=rl-...
    runlogger-sync

Notes

The token and server URL are stored inside each DB file, so you usually don't need to pass them manually. Safe to run multiple times — already-synced packets are skipped automatically. Unrecoverable DB files (missing token or payload) are discarded silently.

Auto Run Names

💡 Tip

Auto-generated names are readable and unique. Use fixed names for reproducible experiments, or let RunLogger generate them for exploration runs.

If you don't provide a run_name, one is generated automatically in the format adjective-noun-number:

cosmic-nebula-42
    silver-ridge-317
    eager-summit-5

Names are readable, memorable, and unique at any practical project scale. You'll see them on the dashboard and in logs. To use a fixed name instead:

logger = RunLogger(
        ...,
        run_name = "gpt2-baseline-run3",
    )

Tags & Notes

💡 Tip

Use tags to group and filter runs on the dashboard. Notes are perfect for recording experiment hypotheses and results.

Tags

Tags appear on the dashboard and can be used to filter and group runs across a project. Pass any list of strings.

logger = RunLogger(
        ...,
        tags = ["baseline", "bf16", "fineweb", "v2"],
    )

Notes

Free-text notes visible on the run detail page. Useful for recording what you're testing in this run.

logger = RunLogger(
        ...,
        notes = "Testing SwiGLU vs GELU — same LR schedule, different FFN.",
    )

Error Handling

⚠️ Important

Always use the context manager or call finish() explicitly to properly mark your run status. Otherwise, runs may appear stuck as "running" on the dashboard.

Errors raised at startup

These are raised immediately as RuntimeError before training begins:

RuntimeError: Invalid API token: rl-...
    RuntimeError: [Runlog] account is banned.

Everything else degrades gracefully

Connection lost mid-runData is preserved locally if offline_mode=True, retried automatically on reconnect.

Upload failureLogged to console. Training continues unaffected.

Plan limit reachedRunLogger warns you and stops logging for the rest of the day. Resets at midnight.

Recommended pattern

try:
        with RunLogger(...) as logger:
            for step in range(max_steps):
                loss = train()
                logger.log(step=step, loss=loss)
    except RuntimeError as e:
        print(f"RunLogger error: {e}")
        # continue training without logging, or exit

Verbose mode

Pass verbose=True to see internal detail — packet counts, sync intervals, orphan run recovery. Useful for diagnosing connection or sync issues.

logger = RunLogger(..., verbose=True)

FAQ

Do I need to call finish() if I use the context manager? ▼

No — it is called automatically. Normal exit calls finish("completed"). An exception calls finish("crashed"). The exception is not suppressed.

What if I forget finish()? ▼

The run stays marked as running on the dashboard indefinitely. Always call finish() or use the context manager.

Can I use RunLogger with multi-GPU / DDP training? ▼

Yes. Log only from rank 0 to avoid duplicate data:

if rank == 0:
  logger.log(step=step, loss=loss)

Can I log string values as metrics? ▼

No — metric values must be int, float, or bool. Pass strings in config, tags, or notes instead.

Can I have multiple loggers in one script? ▼

Yes. Each RunLogger instance is independent and creates its own run.

Does RunLogger affect training performance? ▼

No. All logging is non-blocking — your training loop is never slowed down.

What if my machine is killed mid-training? ▼

If offline_mode=True, all data logged before the crash is preserved and recovered automatically the next time you start a run from the same directory. No manual steps required.

Where are offline DB files stored? ▼

In a dumps/ directory relative to where your training script runs. Files are named .runlog_<run_id>.db and cleaned up automatically after a successful sync.

How do I debug connection or sync issues? ▼

Pass verbose=True to RunLogger(...) to see full internal detail, or use runlogger-sync --verbose for manual sync debugging.

Can I use RunLogger with a self-hosted Runlog instance? ▼

RunLogger is designed exclusively for use with runlog.in. Self-hosted deployments are not supported. Set base_url to https://runlog.in.

Runlog

The training monitor — lightweight, self-hosted, beautiful.

⚡

Real-time Streaming

Live metric updates. Watch your loss curve move as training happens.

🎛

Live Runtime Knobs

Turn a dial in the dashboard and change your training script mid-run. No restart, no redeploy.

◈

Multi-run Compare

Overlay train and val loss across runs on a single chart. Spot the best experiment instantly.

⬡

Team Workspaces

Invite teammates, assign roles, and share projects across your organization.

⚿

API Token Auth

Per-project tokens let you log from any machine — Colab, cloud, local — securely.

★

Checkpoint Tracking

Automatically flags best checkpoints alongside your metrics.

◬

Metric Alerts

Set alerts for loss plateaus, threshold crossings, and more. Never miss a crashed run.

◎

Dynamic Charts

Auto-detected from whatever you log. Drag to reorder. Smooth with a slider.

⊞

Runs Table

Filter and sort all runs by status, tag, or loss. Built for large experiment histories.

◉

System Stats

Automatic CPU utilization and RAM tracking logged alongside your metrics every step.

⊘

Offline Mode

Log runs without a connection. Metrics are queued locally and synced when you're back online.

⌨

Terminal Capture

Mirrors stdout and stderr into your run log. Every print and warning saved automatically.

Get in touch

📧 contact@runlog.in

Terms of Service | Privacy Policy

Runlog

© 2026 Runlog (runlog.in) All rights reserved.

⌄

Selected Runs

Active Metrics

Compare

Run Comparison

Select runs, then pick which metrics to overlay. Hover the chart for a unified crosshair tooltip.

Select Runs

Run

—

running

step 0 / ?

loss — lr — eta — tok/s —

—

Smoothing 0%

Run Details

Tags

Notes

Share

Not public

Checkpoints

No checkpoints yet.

Account

Your Settings

Manage your profile, appearance, and preferences.

Profile

U

Email:

Plan:

Member since:

Last seen: Just now

Active Devices

Font Size

SmallMediumLarge

13px

Email Alerts

Control which notifications you receive via email.

Crash Notification

Get notified when a run crashes

Run Completed Notification

Get notified when a run completes successfully

New Sign-in Notification

Get notified when someone signs into your account

Dead Run Notification

Get notified when a run stops sending metrics

Tips & Tricks

Show helpful tips about Runlog features

Appearance

Dark

Light

Timezone

Display timezone

Affects how timestamps are displayed throughout the dashboard.

Account

Your API Tokens

Click a project to manage its tokens.

Usage

🔐

Account Tokens

cross-project

These tokens can access multiple projects. Share them carefully — they're like keys to your account.

Create New Token

Label

Project Access

Workspace

Team Chat

Real-time messaging across your workspaces.

Workspaces

—

—

Select a workspace to start chatting.