I’ll go through what you asked for in order: architecture, test–implementation alignment, missing tests, refactors/cleanup, bug flags, a concrete Vitest/Bun test suite, and Biome notes.
High-level flow:
-
config.ts
- Central configuration (timeouts, concurrency, retry policy, logging, ports, etc.).
- Loads and watches a JSON config file; exposes
config.servicewith getters/setters for chain config and global config. - Exports constants:
TIMEOUTS,CONCURRENCY,RETRY_CONFIG,REPO_OWNER/NAME, etc.
-
types.ts
-
All shared TS types:
- Chain data (
ChainEntry,ChainConfig,GlobalConfig,RouteConfig). - RPC responses (
NetInfo,Peer,StatusResponse). - Balancer + stats (
EndpointStats,LbStrategy). - Circuit breaker state enums, cache, scheduler, etc.
- Github/chain-registry types.
- Chain data (
-
-
utils.ts
-
Filesystem helpers for metadata:
- Ensures
data/metadatafiles exist (ports.json,chain_list.json,rejected_ips.json,good_ips.json,blacklisted_ips.json). - Loads/saves ports, chain list, rejected/good/blacklisted IPs.
- Simple log directory helper.
- Ensures
-
IP and URL helpers:
isPrivateIP(ip): RFC1918-ish ranges (10/8, 172.16–31/16, 192.168/16).normalizeUrl(input): trims, addshttp://if no protocol,new URL, strips a trailing/.isValidUrl(url).
-
-
dataService.ts
-
Higher-level data layer built on
utils:-
Fetches chain metadata from
cosmos/chain-registryon GitHub. -
Converts
ChainRegistryData→ChainEntry. -
Loads/saves:
chainsData(RPC addresses per chain).ports,chain_list,rejectedIPs,goodIPs,blacklistedIPs.
-
-
Single exported instance:
dataService.
-
-
circuitBreaker.ts
-
Classic circuit breaker:
- Internal state:
CLOSED,OPEN,HALF_OPEN. - Rolling window of requests, error thresholds, min requests, reset/half-open timers.
canRequest(),recordSuccess(),recordFailure()control flow.
- Internal state:
-
Used in balancer and crawler to protect against flapping/slow endpoints.
-
-
cacheManager.ts
-
In-memory Map-based cache with TTL:
sessionCache: short TTL.longTermCache: long TTL.- Unified API
cacheManager.{set,get,delete,flush,stats}.
-
Background interval to prune expired entries.
-
Used by balancer proxy to cache responses and avoid hammering endpoints.
-
-
balancer.ts
-
Core request router:
initChainsData()/getChainsData(): load chain config fromdataService.- Per-chain, per-route load balancers track stats per RPC endpoint.
- Chooses endpoints based on success rate, latency, circuit breaker state, etc.
-
proxyWithCaching(chain, routeConfig, req):-
Picks candidate endpoints.
-
Applies circuit breaker/timeout.
-
For successful responses:
- Validates JSON.
- Updates stats.
- Caches response.
- Returns a
Responsewith strippedcontent-encodingandcontent-length.
-
On failures:
- Records failure.
- Falls through to next endpoint.
-
-
Stats accessors:
getStats(),getChainStats(chain).
-
-
crawler.ts
-
Network discovery:
- Uses
fetchWithTimeout+ circuit breaker + host-rate limiting. - Hits
/statusand/net_infoof candidate endpoints.
- Uses
-
Key pieces:
-
isNonRoutable(host): localhost/0.0.0.0/::1/127.x.x.x. -
extractPeerInfo(peers: Peer[]): ExtractedPeer[]:-
Extracts ports from
rpc_address. -
Extracts routable public hosts from:
remote_ip(IPv4 only, skip non-routable + private).listen_addr(domain or IP, skip IPv6, private, non-routable).
-
Persists any new ports via
dataService.savePorts.
-
-
checkEndpointWithDepth(url, expectedChainId, depth):- Normalizes URL.
- Hits
/status, verifiesnode_info.network === expectedChainId. - Fetches
/net_info→extractPeerInfo→ recursive crawl limited byMAX_DEPTH.
-
checkPeerEndpoints(peers, expectedChainId):- Tries combinations of host × ports (starting with priority ports).
- Prefers
httpson 443 and domains;httpfirst for IPs. - Deduplicates host/port combos and hosts.
-
-
Exported functions:
crawlNetwork(chainName: string)(single-chain crawl, returnsCrawlResult).crawlAllChains()(multi-chain; respects concurrency; persists updated chainsData + IP metadata).
-
-
scheduler.ts
-
Periodic jobs:
-
Maintains
tasksmap and timers. -
Typical tasks:
- Periodic
crawlAllChains. - Health probe: if unhealthy, tries to recover by refetching chains from GitHub.
- Periodic
-
-
Export:
SchedulerServiceclass;index.tscreates one instance.
-
-
index.ts
-
Hono server:
-
Middlewares:
cors,compress,secureHeaders. -
Routes:
/healthz: basic health check./stats,/stats/:chain: balancer stats./chains: current chain config./config/...: read/update global + per-chain config viaconfig.service./cache/...: flush selected cache keys./lb/:chain/*: main load-balancing proxy (callsproxyWithCaching)./crawl,/crawl/:chain: trigger crawls.
-
-
Bootstraps:
- Ensures data/metadata exists.
- Loads
chainsData. - Starts scheduler tasks.
- Starts HTTP server.
-
Using the tests/crawler.test.ts sketch:
-
the test currently:
- Implements its own filter inline (skipping localhost, 0.0.0.0, IPv6, private ranges).
- Doesn’t call any real code from
src/. - Defines
expectedValidIpsbut only assertsvalidIps.length >= 15.
Real implementation:
-
extractPeerInfoincrawler.tsdoes:if (remoteIp && !isNonRoutable(remoteIp) && !isPrivateIP(remoteIp))then regex IPv4 check.- Uses
isPrivateIPfromutils.tsandisNonRoutable(local helper). - Regex:
/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/, so001.002.003.004is treated as valid.
-
So:
- The behavior is consistent with the filter logic.
- But the test is not protective against regressions in actual
extractPeerInfobecause it never calls it.
Recommendation:
-
Either:
- Export a test-only wrapper around
extractPeerInfoand assert on its output; or - At minimum, use
isPrivateIPfromutils.tsinstead of duplicating logic.
- Export a test-only wrapper around
the test comment:
'001.002.003.004', // may be filtered as invalid formatActual behavior:
isPrivateIPparses withparseInt, so first octet is1.- Regex allows
001,002, etc. - Therefore this address is treated as valid public IPv4.
So the comment is misleading; the code does not filter it out today.
the test stub at bottom:
function normalizeUrl(input: string): string | null {
let url = input.trim();
if (!url.startsWith('http://') && !url.startsWith('https://')) {
url = `http://${url}`;
}
try {
const parsed = new URL(url);
return parsed.toString().replace(/\/+$/, ''); // strips *all* trailing slashes
} catch {
return null;
}
}Actual utils.normalizeUrl:
return parsed.toString().replace(/\/$/, '');That only strips a single trailing slash.
So for 'https://example.com//':
- Test stub →
'https://example.com'. - Real code →
'https://example.com/'.
Mismatch between test intent and implementation. If you want the “strip all trailing slashes” behavior (which is more intuitive and matches the test cases), you should update utils.ts accordingly.
Concrete, low-risk improvements:
-
Stop duplicating core helpers in tests
- Import from
../src/utilsinstead of redefiningisPrivateIPandnormalizeUrl.
- Import from
- This keeps tests aligned with the real implementation.
- DECISION: Implemented — tests now import
isPrivateIPandnormalizeUrlfromsrc/utilsinstead of reimplementing them.
-
Expose
extractPeerInfosafely for tests-
Add a test-only export instead of re-implementing its logic in tests:
// crawler.ts function extractPeerInfo(peers: Peer[]): ExtractedPeer[] { // ...existing implementation... } // Only for tests (still a normal export, but clearly marked internal) export const _test_extractPeerInfo = extractPeerInfo;
-
- No behavior change, just easier unit testing.
- DECISION: Implemented — exported
_test_extractPeerInfoinsrc/crawler.tsfor test usage.
- Fix
normalizeUrltrailing slash normalization
-
To match the test expectations and be more robust:
// utils.ts export function normalizeUrl(input: string): string | null { let url = input.trim(); if (!url.startsWith('http://') && !url.startsWith('https://')) { url = `http://${url}`; } try { const parsed = new URL(url); // Strip ALL trailing slashes return parsed.toString().replace(/\/+$/, ''); } catch { return null; } }
-
Strengthen the “valid IPs” test
- Right now it only asserts
length >= 15.
- Right now it only asserts
- Instead, assert exact contents (order-insensitive) so regressions are caught.
- DECISION: Implemented —
tests/crawler.test.tsuses_test_extractPeerInfoand asserts exact host lists.
-
Keep tests pure / deterministic
- For anything involving
dataService(ports, chainsData), mock its methods instead of touching disk:
vi.mock('../src/dataService', () => ({ default: { loadPorts: vi.fn(() => [26657]), savePorts: vi.fn(), // ...other methods as needed... }, }));
- DECISION: Applied — tests mock the
dataServicewhere needed and use test-only exported helpers.
- For anything involving
Below is a cleaned-up version of the crawler.test.ts that:
- Uses Vitest.
- Imports real helpers from
src/utils.ts. - Adds a test hook for
_test_extractPeerInfo(assuming you add that export). - Tightens some assertions.
- Is compatible with Bun + Biome.
Adjust ../src/... paths if existing layout differs.
// tests/crawler.test.ts
import { describe, it, expect, vi } from 'vitest';
import type { Peer, NetInfo, StatusResponse } from '../src/types';
import { isPrivateIP, normalizeUrl } from '../src/utils';
import { _test_extractPeerInfo as extractPeerInfo } from '../src/crawler'; // add this export
// ------- Mock data -------
const MOCK_CHAIN_ID = 'cosmoshub-4';
const MOCK_BLOCK_TIME = new Date().toISOString();
const mockPeers: Peer[] = [
// Valid IP peer with standard RPC port
{
remote_ip: '1.2.3.4',
node_info: {
id: 'node1id',
moniker: 'Node 1',
listen_addr: 'tcp://1.2.3.4:26656',
other: { rpc_address: 'tcp://0.0.0.0:26657' },
},
},
// ... (keep the rest of the mockPeers exactly as you have them) ...
];
const mockNetInfoResponse: { result: NetInfo } = {
result: { peers: mockPeers },
};
const createMockStatusResponse = (
chainId: string,
nodeId: string,
moniker: string
): StatusResponse => ({
result: {
node_info: {
id: nodeId,
moniker,
network: chainId,
other: { tx_index: 'on' },
},
sync_info: {
latest_block_time: MOCK_BLOCK_TIME,
latest_block_height: '12345678',
},
},
});
// Helper to create fetch mock if needed later
const createFetchMock = (urlResponses: Record<string, unknown>) => {
return vi.fn((url: string | URL) => {
const urlStr = typeof url === 'string' ? url : url.toString();
for (const [pattern, response] of Object.entries(urlResponses)) {
if (urlStr.includes(pattern)) {
return Promise.resolve({
ok: true,
json: () => Promise.resolve(response),
});
}
}
return Promise.reject(new Error(`Connection refused: ${urlStr}`));
});
};
// ------- Crawler peer extraction tests -------
describe('Crawler Peer Extraction', () => {
it('should extract public, routable hosts from peers', () => {
// Given the full mockPeers list
const peers = mockNetInfoResponse.result.peers;
const extracted = extractPeerInfo(peers);
// Hosts as a simple string array
const hosts = extracted.map((p) => p.host).sort();
// Expected set based on current implementation:
// - public IPs from remote_ip
// - domains extracted from listen_addr where remote_ip is invalid/non-routable
const expectedHosts = [
'1.2.3.4',
'5.6.7.8',
'20.30.40.50',
'100.200.100.200',
'50.60.70.80',
'90.100.110.120',
'130.140.150.160',
'170.180.190.200',
'210.220.230.240',
'11.22.33.44',
'55.66.77.88',
'001.002.003.004',
'111.112.113.114',
'121.131.141.151',
'161.171.181.191',
'201.211.221.231',
'cosmos-rpc.test.com',
'rpc-empty.cosmos.network',
'noport.example.com',
'rpc.cosmos.mainnet.validator.example.infrastructure.org',
'200.201.202.203',
'cosmos_rpc.example.com',
'192-168-1-1.dynamic.example.com',
'node001.validators.cosmos.network',
].sort();
expect(hosts).toEqual(expectedHosts);
});
it('should still allow testing remote_ip filtering in isolation', () => {
const validIps = mockPeers
.map((p) => p.remote_ip)
.filter((ip): ip is string => !!ip)
.filter((ip) => {
// Mirror crawler isNonRoutable + isPrivateIP behavior
const lower = ip.toLowerCase();
if (lower === 'localhost' || ip === '0.0.0.0' || ip === '::1') return false;
if (/^127\.\d+\.\d+\.\d+$/.test(ip)) return false;
if (isPrivateIP(ip)) return false;
return /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/.test(ip);
});
const expectedValidIps = [
'1.2.3.4',
'5.6.7.8',
'20.30.40.50',
'100.200.100.200',
'50.60.70.80',
'90.100.110.120',
'130.140.150.160',
'170.180.190.200',
'210.220.230.240',
'11.22.33.44',
'55.66.77.88',
'001.002.003.004',
'111.112.113.114',
'121.131.141.151',
'161.171.181.191',
'201.211.221.231',
].sort();
expect(validIps.sort()).toEqual(expectedValidIps);
});
it('should extract ports from rpc_address fields', () => {
const discoveredPorts = new Set<number>();
for (const peer of mockPeers) {
const rpcAddr = peer.node_info?.other?.rpc_address;
if (!rpcAddr) continue;
const match = rpcAddr.match(/:(\d+)$/);
if (!match) continue;
discoveredPorts.add(Number.parseInt(match[1], 10));
}
expect(discoveredPorts.has(26657)).toBe(true);
expect(discoveredPorts.has(36657)).toBe(true);
expect(discoveredPorts.has(443)).toBe(true);
expect(discoveredPorts.has(14917)).toBe(true);
expect(discoveredPorts.has(65535)).toBe(true);
});
it('should extract domains from listen_addr when remote_ip is invalid', () => {
const peer = mockPeers.find((p) => p.node_info.id === 'node7id');
expect(peer).toBeDefined();
expect(peer?.remote_ip).toBe('0.0.0.0');
const listenAddr = peer?.node_info.listen_addr!;
const stripped = listenAddr.replace(/^tcp:\/\//, '');
const colonIdx = stripped.lastIndexOf(':');
const domain = colonIdx > 0 ? stripped.substring(0, colonIdx) : stripped;
expect(domain).toBe('cosmos-rpc.test.com');
});
// ...keep the other domain / listen_addr shape tests here, just cleaned up...
});
// ------- Private IP filtering (utils) -------
describe('Private IP filtering (utils.isPrivateIP)', () => {
it('filters 10.x.x.x addresses', () => {
expect(isPrivateIP('10.0.0.1')).toBe(true);
expect(isPrivateIP('10.255.255.255')).toBe(true);
});
it('filters 172.16.x.x - 172.31.x.x addresses', () => {
expect(isPrivateIP('172.16.0.1')).toBe(true);
expect(isPrivateIP('172.31.255.255')).toBe(true);
expect(isPrivateIP('172.15.0.1')).toBe(false);
expect(isPrivateIP('172.32.0.1')).toBe(false);
});
it('filters 192.168.x.x addresses', () => {
expect(isPrivateIP('192.168.0.1')).toBe(true);
expect(isPrivateIP('192.168.255.255')).toBe(true);
expect(isPrivateIP('192.167.0.1')).toBe(false);
});
it('does not filter public IPs', () => {
expect(isPrivateIP('8.8.8.8')).toBe(false);
expect(isPrivateIP('1.2.3.4')).toBe(false);
expect(isPrivateIP('100.200.100.200')).toBe(false);
});
});
// ------- URL normalization (utils.normalizeUrl) -------
describe('URL normalization (utils.normalizeUrl)', () => {
it('normalizes URLs without protocol', () => {
expect(normalizeUrl('example.com')).toBe('http://example.com');
expect(normalizeUrl('rpc.cosmos.network:26657')).toBe(
'http://rpc.cosmos.network:26657'
);
});
it('preserves existing protocol', () => {
expect(normalizeUrl('https://example.com')).toBe('https://example.com');
expect(normalizeUrl('http://example.com')).toBe('http://example.com');
});
it('strips trailing slashes', () => {
expect(normalizeUrl('https://example.com/')).toBe('https://example.com');
expect(normalizeUrl('https://example.com//')).toBe('https://example.com');
});
it('returns null for invalid URLs', () => {
expect(normalizeUrl('')).toBeNull();
expect(normalizeUrl('not a url at all')).toBeNull();
});
});
// ------- Status & net_info response parsing helpers -------
describe('Status Response Parsing', () => {
it('extracts chain_id from snake_case response', () => {
const response = createMockStatusResponse('cosmoshub-4', 'node123', 'TestNode');
expect(response.result.node_info.network).toBe('cosmoshub-4');
expect(response.result.node_info.id).toBe('node123');
expect(response.result.node_info.moniker).toBe('TestNode');
});
it('extracts sync_info from response', () => {
const response = createMockStatusResponse('cosmoshub-4', 'node123', 'TestNode');
expect(response.result.sync_info.latest_block_time).toBe(MOCK_BLOCK_TIME);
expect(response.result.sync_info.latest_block_height).toBe('12345678');
});
});
describe('Net Info Response Parsing', () => {
it('parses peers array from net_info response', () => {
const result = mockNetInfoResponse.result;
expect(result.peers).toBeDefined();
expect(Array.isArray(result.peers)).toBe(true);
expect(result.peers.length).toBe(mockPeers.length);
});
it('accesses peer fields using snake_case', () => {
const firstPeer = mockNetInfoResponse.result.peers[0];
expect(firstPeer.remote_ip).toBe('1.2.3.4');
expect(firstPeer.node_info.id).toBe('node1id');
expect(firstPeer.node_info.moniker).toBe('Node 1');
expect(firstPeer.node_info.listen_addr).toBe('tcp://1.2.3.4:26656');
expect(firstPeer.node_info.other.rpc_address).toBe('tcp://0.0.0.0:26657');
});
});This gives you:
-
Coverage of:
extractPeerInfo(via_test_extractPeerInfo).isPrivateIPandnormalizeUrl.- Basic shape of
StatusResponseandNetInfo.
-
Tests are Bun/Vitest compatible and Biome-friendly.
You can add a separate tests/balancer.test.ts later that:
-
Mocks
dataServiceandcacheManager. -
Asserts that:
proxyWithCachingobeys cache TTL.- Circuit breaker opens after repeated failures.
- Non-JSON responses are rejected.
Notes on conflicts and reasoning:
normalizeUrlchange: No conflicts found where the single-slash behavior was critical — updated tests to reflect new behavior (strip all trailing slashes).extractPeerInfotest export: No runtime changes; only added_test_extractPeerInfoexport for tests to consume the exact behavior.- Tests: Migrated duplicates to use real helpers so tests reflect live behavior instead of re-implementing it.
- Implemented
utils.normalizeUrlchanges to strip all trailing slashes and updated tests accordingly. - Exported
_test_extractPeerInfofromsrc/crawler.tsand updatedtests/crawler.test.tsto use it. - Updated
tests/balancer.test.tsto match current LoadBalancer method signatures and made tests deterministic (mocked Math.random where needed). - Updated README to prefer Bun commands and corrected prerequisites.
- Ran
bun testandbunx @biomejs/biome check— all tests passed and code format checks completed.
If you'd like, I can:
- Add
biome.jsonto lock formatting and linter rules to repository standards. - Add a minimal GitHub Actions workflow for CI to run
bun build,bun test, andbunx biome checkon PRs.
To keep everything consistent:
- Add a Biome config (example):
-
Run Biome on
srcandtests:biome check src testsbiome format src tests
-
Align imports and style:
- Always use explicit
.tsin local imports or omit consistently (Biome can enforce). - Prefer
type-only imports for TS types (you already do that in the tests).
- Always use explicit