How TheRPC routes traffic

The marketing line is "one API key, 29 chains, no provider sprawl." That's true. This post is for the engineers who want to know how it actually works under the hood — what the request path looks like, where the latency budget gets spent, and what happens when a provider is down.

The shape of a request

When you POST https://ethereum.therpc.io/<YOUR_KEY> with a JSON-RPC body, the request walks through four layers:

Edge — TLS termination + global anycast routing to the nearest data center. Single-digit millisecond round trip from most regions.
Auth + rate limit — your API key is resolved, the call is metered, and either rate-limited or accepted. Rate-limit state lives in a Cloudflare KV near the edge, so the lookup is ~1 ms.
Router — the brain. Picks an upstream provider for this chain + method + payload size + the current health state of the candidates.
Upstream — the actual chain node (ours, a partner's, or a public RPC for low-traffic chains). Round trip here dominates total latency.

End-to-end, a warm-path eth_blockNumber from Frankfurt to a German upstream is about 15 ms. A warm-path eth_getLogs over a 5k-block range is more like 80–300 ms depending on result size, with most of the budget at the upstream.

What the router does

The router is the differentiating piece. Per chain, we maintain a pool of upstream providers — typically 2 to 5. For every incoming call, the router picks one based on:

Method whitelist. Not every upstream exposes every method. trace_* and debug_* need a node with the right tracer compiled in; some upstreams strip them.
Recent latency. EWMA per (provider, method). If provider A has been serving eth_getLogs 30% slower than the rest of the pool for the last 60 seconds, the router weights it down.
Recent error rate. Per (provider, method, response-class). 5xx responses, malformed JSON-RPC errors, and gateway_timeout all count differently. A provider that recently returned a few invalid-block-tag errors gets weighted up (because that was the client's fault, not the upstream's), while one that returned 502s gets weighted down.
Cost. Some upstreams are cheaper per call than others. We balance latency vs. cost based on your plan tier.

The decision is per-call and stateless on the request hot path. Picking takes microseconds.

Failover

If the chosen upstream returns a retryable error (5xx, timeout, network reset), the router retries against the next-best candidate. The whole retry budget for a single user request is 2 attempts and 8 seconds wall clock — after that, we surface the failure to you with a structured { error: { code, message, attempts } } response so your client can decide what to do.

We deliberately don't retry forever. Indefinite retries cascade into upstream overload during a real outage. Two attempts catches transient flakes; anything beyond is a real failure and you want to know about it.

Caching

We cache aggressively at the edge for read methods that are safe to cache. The big wins:

eth_chainId — TTL 24 hours. The chain ID doesn't change.
eth_blockNumber — TTL 1 second. Aggressive but bounded; you'll never see a value more than a block behind.
eth_getBlockByNumber(finalized=true) — TTL 1 hour. Once finalized, the value is immutable.
eth_getCode — TTL 24 hours per address. Contract code doesn't change unless redeployed.

About 60% of typical read traffic hits the cache. That's also why your eth_blockNumber heartbeat may return the same value across two consecutive calls a few ms apart — that's not a bug, that's edge caching saving you a CU.

We don't cache state-dependent methods (eth_call, eth_getBalance, eth_getLogs with non-finalized blocks, eth_getTransactionReceipt for unfinalized txs). The TTL would have to be zero to be correct, and zero-TTL caching is just overhead.

What happens during a provider outage

The router's health check is two-tier:

Synthetic probe. Every upstream gets a no-op eth_chainId every 5 seconds. Latency + success ratio feeds the EWMA.
Real-traffic observation. Per-call success/failure feeds back into the same EWMA with a higher weight than the synthetic probe.

If an upstream's success ratio drops below 95% over a 60-second window, it's marked degraded and gets only 10% of new traffic until it recovers. If it drops below 50%, it's marked down and gets 0%. Recovery requires three consecutive successful probes plus a 30-second cooldown.

In a real outage (single provider, e.g. Infura goes down for Ethereum), the failover is invisible to you — traffic shifts to the remaining pool members within a single rate-limit window. The thing you'll notice is a small (5–15%) latency bump until the surviving providers warm up.

Where the latency budget actually goes

For a typical EU → EU call:

Stage	p50 latency	p99 latency
Edge (TLS, route)	2 ms	8 ms
Auth + KV lookup	1 ms	4 ms
Router decision	<1 ms	<1 ms
Upstream (chain)	8 ms	60 ms
Response + egress	2 ms	20 ms
Total	13 ms	92 ms

The upstream node always dominates the p99 — which is why we maintain redundancy at that layer and not at the others.

What's coming

A few things on the near-term roadmap that we're already building:

Per-region failover. Today the router is global. We're moving to per-region pools so a transient incident in eu-central-1 doesn't ripple to US traffic.
Subscription routing. WebSocket-based eth_subscribe is currently pinned to a single upstream per connection. We're working on transparent reconnection on the server side so a provider blip doesn't drop your sub.
Custom-method allowlists. Right now we expose a curated set of methods per chain. The next release lets you opt in to additional methods (e.g. flashbots_*) by enabling them on your account.

Questions on the architecture? Reply to this post on Twitter or hop in our Telegram. We post incident reports there too.