Usage Guide

Which Command Should I Use?

Goal	Command	When to use
Quick test data	`generate`	Single file with specific scenarios, encoding, or PMU config
Diverse test corpus	`corpus`	Many files covering the full parameter space for compression testing
Real event data	`convert` / `batch`	Convert LBNL microPMU CSV captures to binary wire format
Real continuous data	`import-archive`	Convert LBNL continuous archive (days of real 120 Hz data)
Debug a .c37 file	`inspect`	View metadata, hex dump, or export parsed data as CSV
Validate compression	`verify`	Compare original vs decompressed files (bit-exact or lossy)

For compression testing, start with corpus to generate a broad test dataset, then supplement with generate for specific edge cases or import-archive for real-world data.

Installation

git clone https://github.com/AZX-PBC-OSS/upmu-dataframes.git
cd upmu-dataframes
cargo build --release

The binary is at target/release/upmu-dataframes. Add it to your PATH or run directly.

Workflow: LBNL CSV Conversion

Single file conversion

Convert an LBNL microPMU CSV event capture to IEEE C37.118.2 binary:

upmu-dataframes convert \
  --input /path/to/event.csv \
  --output /path/to/event.c37

The output file contains one CFG-2 configuration frame followed by one data frame per CSV row. At 120 Hz reporting rate, a 1-minute event produces 7,200 data frames.

Customizing output

Override defaults for station metadata and encoding:

upmu-dataframes convert \
  --input event.csv \
  --output event.c37 \
  --station-name "SITE-PMU-01" \
  --idcode 42 \
  --format rect \
  --encoding int16 \
  --data-rate 60

--format rect uses rectangular (real + imaginary) instead of polar (magnitude + angle)
--encoding int16 produces smaller frames (54 bytes vs 90 bytes) with scaled integer values
--data-rate sets the declared reporting rate in the CFG-2 frame

Batch conversion

Convert an entire LBNL event library:

upmu-dataframes batch \
  --input-dir /data/lbnl-events \
  --output-dir /data/c37-output

This recursively walks the input directory, finds all CSV files, and writes corresponding .c37 files to the output directory preserving the directory structure (e.g., PV/event1.csv becomes PV/event1.c37).

Workflow: Synthetic Data Generation

Generate realistic PMU streams with composable power system scenarios:

# 60 seconds of normal operation (no scenarios)
upmu-dataframes generate \
  --output normal.c37 \
  --duration 60

# Random mix of scenarios (deterministic with seed)
upmu-dataframes generate \
  --output random.c37 \
  --duration 60 \
  --scenario random_mix \
  --seed 42

# Specific scenarios placed sequentially
upmu-dataframes generate \
  --output events.c37 \
  --duration 30 \
  --scenario sag,motor_start,freq_event

# All scenario types for comprehensive testing
upmu-dataframes generate \
  --output all_events.c37 \
  --duration 120 \
  --scenario all

# Int16 encoding with boundary test scenario
upmu-dataframes generate \
  --output int16_test.c37 \
  --encoding int16 \
  --scenario int16_boundary

Baseline data always includes:

Realistic 3-phase voltage/current waveforms with Gaussian noise
Positive sequence computed via Fortescue transformation
Frequency jitter around 60 Hz nominal
ROCOF derived from frequency variation

Advanced Options

Phasor Configuration

Option	Description	Default
`--phasor-count N`	Number of phasors to generate (1-16)	8
`--notation rect`	Use rectangular notation (real/imaginary) instead of polar	polar
`--nominal-freq 50`	50 Hz grid frequency (default is 60 Hz)	60

Data Encoding (per type)

Option	Description	Default
`--encoding TYPE`	Set all data types to same encoding (float32 or int16)	float32
`--phasor-encoding TYPE`	Phasor encoding override	(inherits from --encoding)
`--analog-encoding TYPE`	Analog channel encoding override	(inherits from --encoding)
`--freq-encoding TYPE`	Frequency/ROCOF encoding override	(inherits from --encoding)

Analog Channels

Option	Description	Default
`--analog-count N`	Number of generic analog channels	0
`--analog-preset NAME`	Use preset analogs (substation)	none

Available presets:

substation - 4 realistic channels: TEMP-XFMR, TAP-POS, MW, MVAR

Digital Channels

Option	Description	Default
`--digital-count N`	Number of generic digital words (each has 16 bits)	0
`--digital-preset NAME`	Use preset digital word (breaker)	none

Available presets:

breaker - Breaker/recloser status with fault simulation support

Multi-PMU Streams

Option	Description	Default
`--pmu-count N`	Number of PMUs in aggregated stream (1-256)	1

Generate PDC-style aggregated streams with multiple PMUs:

# 3 PMUs at different buses
upmu-dataframes generate \
  --output substation.c37 \
  --pmu-count 3 \
  --duration 60 \
  --scenario fault_lg

Each PMU gets:

Distinct station name (BASE, BASE-PMU2, BASE-PMU3)
Unique IDCODE (sequential from base)
Independent phase angle offset (5° per PMU)
Shared timestamp and scenario effects across all PMUs

Data Rate & Timing

Option	Description	Default
`--rate RATE`	Reporting rate in fps, or negative for seconds per frame	120
`--config-count N`	Config count (CFGCNT) in CFG-2 frame	1
`--time-base N`	TIME_BASE for FRACSEC encoding (1-16777215)	1000000
`--cfg2-interval SECS`	Re-emit CFG-2 frame every N seconds (real PMU behavior)	none

Session Framing

Generate realistic TCP session structure with all IEEE C37.118.2 frame types:

Option	Description	Default
`--header-text TEXT`	Prepend HDR frame with ASCII station description	none
`--include-cfg1`	Prepend CFG-1 capabilities frame before CFG-2	off
`--include-cfg3`	Include CFG-3 extended config frame after CFG-2	off
`--include-commands`	Wrap data with CMD TurnOnData/TurnOffData frames	off

Example — full session stream matching real PMU TCP traffic:

upmu-dataframes generate \
  --output session.c37 \
  --duration 60 \
  --header-text "microPMU Station A, Model XY-1234, Firmware v2.5" \
  --include-cfg1 \
  --include-cfg3 \
  --include-commands \
  --cfg2-interval 30 \
  --scenario random_mix \
  --seed 42

This produces: [HDR] [CFG-1] [CFG-2] [CFG-3] [CMD:TurnOn] [Data×7200] [CFG-2 retransmit×2] [CMD:TurnOff]

Examples:

--rate 30 - 30 frames per second
--rate -5 - One frame every 5 seconds (slow sampling)
--rate -1 - One frame per second

Available Scenarios

Scenarios are composable -- multiple can be active simultaneously. When multiple scenarios overlap, their modifiers stack (voltage/current multipliers multiply, offsets add).

Voltage scenarios:

Name	Description	Default Parameters
`sag`	Voltage reduction (IEC 61000-4-30)	80% retained (20% dip)
`swell`	Voltage elevation	115% of nominal
`cap_switching`	Decaying sinusoidal oscillation per-phase with 120° shift	500 Hz ring, 20 ms decay, 30% amplitude

Current scenarios:

Name	Description	Default Parameters
`motor_start`	High inrush current + voltage dip with exponential decay	6x inrush, 88% voltage
`pv_cloud`	V-shaped current ramp (PV inverter cloud transient)	50% depth
`near_zero_current`	Fixed low current magnitude on all phases	0.5 A

Frequency/angle scenarios:

Name	Description	Default Parameters
`freq_event`	Trapezoidal frequency deviation (ramp up, hold, ramp down)	0.3 Hz deviation, 500 ms ramp
`angle_jump`	Instant phase angle offset on selected phases	10° on phase A

Fault scenarios:

Name	Description	Default Parameters
`fault_lg`	Line-to-ground fault (single phase)	Phase A, 40% severity, 5x fault current
`fault_ll`	Line-to-line fault (two phases)	Phases A/B, 40% severity, 5x fault current
`fault_llg`	Line-to-line-to-ground fault	Phases A/B, 40% severity, 5x fault current

Encoding edge cases:

Name	Description	Default Parameters
`int16_boundary`	Engineers magnitudes near int16 max (±32767)	99% of scale
`timestamp_rollover`	Places event at UTC second boundary	(no parameters)

Status scenarios:

Name	Description	Default Parameters
`sync_loss`	PMU sync loss at start, recovery at 80% duration	Lt1ms time quality during loss
`config_change`	Configuration change pending flag	Set for entire duration
`data_quality`	Data error with modification flag	PmuError type
`trigger`	Trigger detection with reason	MagnitudeLow/High based on co-scenarios
`gps_unlock`	GPS lock lost/recovered with quality ramping	30% ramp up, 40% hold, 30% ramp down
`leap_second`	Leap second pending/occurred flags	First half pending, second half occurred
`invalid_measurement`	NaN/zero phasor values with PmuErrorNoData STAT	NaN for float32, zero for int16

Timing scenarios:

Name	Description	Default Parameters
`missed_frames`	Skip N consecutive frames, creating a SOC gap	Configurable gap count and position
`duplicate_frames`	Emit frames with identical timestamps	3 consecutive duplicated frames
`timing_jitter`	Add Gaussian jitter to FRACSEC timestamps	~100µs stddev

Presets

Name	Description
`random_mix`	3-6 random scenarios with randomized parameters within physical ranges. Use `--seed` for determinism.
`all`	One instance of each scenario type placed sequentially with gaps, including status, timing, and invalid measurement scenarios.

Scenario Placement

Single scenario: placed at 50% of duration
Multiple scenarios: duration divided into equal slots, each scenario placed in its own slot
Presets: handle their own placement (random_mix randomizes, all places sequentially)

Workflow: Test Corpus Generation

Generate a diverse set of .c37 files covering the full parameter space for compression testing:

# Medium corpus (~60 files)
upmu-dataframes corpus \
  --output-dir /data/test-corpus \
  --preset medium \
  --seed 42

The corpus generator produces files with varied:

Data rates (30, 60, 120, 240 fps, plus negative rates for slow sampling)
Durations (1s, 10s, 60s, 300s)
Phasor counts (1, 4, 8, 16)
Encodings (float32, int16, mixed per channel type)
Notation (polar, rectangular)
Nominal frequency (50 Hz, 60 Hz)
Analog channels (none, substation preset)
Digital channels (none, breaker preset)
PMU count (1, 2, 4)
Scenarios (none, sag, random_mix, all, sync_loss, gps_unlock, etc.)
TIME_BASE values (1,000,000 and 1,048,576)
CFG-2 retransmission intervals

A manifest.json is written to the output directory describing each file's parameters for downstream tooling.

Preset	Approx. Files	Description
`small`	~20	Basic axis coverage
`medium`	~60	Full coverage with analog/digital, scenarios, multi-PMU
`large`	~120	Comprehensive with multiple durations and scenario combinations

Workflow: LBNL Continuous Archive

The LBNL continuous archive at powerdata-download.lbl.gov provides ~11.6 days of real-world 120 Hz data from 3 distribution locations. Each location has 12 gzip-compressed channel files (3-phase voltage + current, magnitude + angle).

Downloading archive files

# Download all channels for a location
upmu-dataframes download-archive \
  --location a6_bus1 \
  --output-dir ./vendor/lbnl_archive

# Download only voltage channels
upmu-dataframes download-archive \
  --location bank_514 \
  --channels voltage

# Re-download even if files exist
upmu-dataframes download-archive \
  --location grizzly_bus1_2 \
  --force

Available locations: a6_bus1, bank_514, grizzly_bus1_2.

Files are downloaded sequentially with progress bars. Existing files are skipped unless --force is set. Downloads write to a .part file first for crash safety.

Importing archive data

# Import the full archive (all ~11.6 days)
upmu-dataframes import-archive \
  --input-dir ./vendor/lbnl_archive \
  --location a6_bus1 \
  --output full_capture.c37

# Import 5 minutes starting 1 hour into the dataset
upmu-dataframes import-archive \
  --input-dir ./vendor/lbnl_archive \
  --location a6_bus1 \
  --output slice.c37 \
  --offset 3600 \
  --duration 300

# Split into 1-hour chunks
upmu-dataframes import-archive \
  --input-dir ./vendor/lbnl_archive \
  --location a6_bus1 \
  --output hourly.c37 \
  --chunk-duration 3600

When chunking, output files are named {station}_{chunk_index:04}.c37 (e.g., LBNL-ARCHIVE_0000.c37, LBNL-ARCHIVE_0001.c37). Each chunk starts with its own CFG-2 frame.

Time slicing via --offset works by reading and discarding samples (gzip streams cannot be seeked). At 120 Hz, skipping 1 hour reads through ~432,000 samples, which is fast.

The pipeline streams data through gzip decompression without loading into memory. Memory usage stays constant regardless of input size.

Archive channel format

Each .gz file is a headerless two-column CSV:

timestamp_nanoseconds,float_value

12 channels per location:

Voltage: L1MAG, L1ANG, L2MAG, L2ANG, L3MAG, L3ANG
Current: C1MAG, C1ANG, C2MAG, C2ANG, C3MAG, C3ANG

Angles are in degrees (converted to radians during import). Timestamps are nanoseconds since Unix epoch, aligned across all channels at 8,333,333 ns intervals (120 Hz).

Workflow: Inspecting Binary Files

Summary view

upmu-dataframes inspect --input event.c37

Displays configuration metadata (station name, phasor count, data rate) and data frame statistics (count, time range, CRC status).

Hex dump

upmu-dataframes inspect --input event.c37 --hexdump --max-frames 5

Prints hex + ASCII dump of each frame for protocol debugging. Use --max-frames to limit output.

CSV export

upmu-dataframes inspect --input event.c37 --csv parsed_output.csv

Exports parsed data frames as CSV. Columns adapt to the stream's configuration — phasor count, analog channels, digital words, and multi-PMU layout are all reflected in the output:

soc, fracsec, stat_word, phasor_0_mag, phasor_0_ang, ..., phasor_N_mag, phasor_N_ang, freq_deviation, rocof, analog_0, ..., digital_0, ...

Multi-PMU streams prefix columns with the PMU index. This is useful for importing C37.118 data into analysis tools (Python, MATLAB, Excel).

Workflow: Compression Verification

After compressing and decompressing a .c37 file, verify the integrity:

Bit-exact verification

upmu-dataframes verify \
  --original original.c37 \
  --decompressed decompressed.c37

Checks that every frame in the decompressed file is byte-identical to the original. Reports pass/fail per frame.

Lossy verification

For lossy compression algorithms, use tolerance-based comparison:

upmu-dataframes verify \
  --original original.c37 \
  --decompressed decompressed.c37 \
  --mode lossy \
  --mag-tolerance 0.01 \
  --angle-tolerance 0.001 \
  --freq-tolerance 0.001

Each data frame is compared field-by-field:

Phasor magnitudes must differ by less than --mag-tolerance
Phasor angles must differ by less than --angle-tolerance radians
Frequency/ROCOF must differ by less than --freq-tolerance Hz

Compression ratio

Include the compressed file to calculate the compression ratio:

upmu-dataframes verify \
  --original original.c37 \
  --decompressed decompressed.c37 \
  --compressed compressed.bin \
  --json

The --json flag outputs a structured report:

{
  "total_frames": 7200,
  "passed": 7200,
  "failed": 0,
  "cfg_match": true,
  "compression_ratio": 0.45,
  "frame_comparisons": [...]
}

Data Encoding Options

Float32 (default)

Phasors: IEEE 754 single-precision floats (4 bytes magnitude + 4 bytes angle)
Frequency/ROCOF: IEEE 754 single-precision floats
Frame size: 90 bytes (with 8 phasors)
Best for: analysis, round-trip accuracy, protocol compliance testing

Int16

Phasors: signed 16-bit integers with calibrated scale factors
Frequency: millihertz resolution (value / 1000 = Hz)
ROCOF: centihertz/second resolution (value / 100 = Hz/s)
Frame size: 54 bytes (with 8 phasors)
Best for: field-deployment realism, bandwidth-constrained scenarios, compression testing with smaller frames

Phasor Layout

By default, each frame contains 8 phasors (configurable via --phasor-count):

Index	Name	Type	Description
0	VA	Voltage	Phase A voltage
1	VB	Voltage	Phase B voltage
2	VC	Voltage	Phase C voltage
3	V+	Voltage	Positive sequence voltage (Fortescue)
4	IA	Current	Phase A current
5	IB	Current	Phase B current
6	IC	Current	Phase C current
7	I+	Current	Positive sequence current (Fortescue)

Typical File Sizes

For the default configuration (8 phasors, 0 analog, 0 digital, 1 PMU):

Duration	Rate	Encoding	Approx. Size
1 min	120 fps	float32	~634 KB
1 min	120 fps	int16	~380 KB
1 hour	120 fps	float32	~37 MB
1 hour	120 fps	int16	~22 MB

Formula: CFG-2 size + (frame_size × rate × duration_seconds)

Frame size varies with configuration: adding analog channels adds 4 bytes (float32) or 2 bytes (int16) per channel. Digital words add 2 bytes each. Multi-PMU streams multiply the per-PMU data portion by the PMU count. Different phasor counts scale the phasor portion (8 bytes per float32 phasor, 4 bytes per int16).

Troubleshooting

Invalid encoding/notation: Passing an unrecognised value to --encoding, --phasor-encoding, --analog-encoding, --freq-encoding, or --notation exits with an error listing valid options.

Duration too short for --scenario all: The generator warns and places as many scenarios as fit within the duration. The remaining scenarios are silently dropped. Use --duration 120 or longer for all to ensure every scenario has room.

Negative rate with scenarios: Scenario durations are calculated in frames, not seconds. With --rate -5 (one frame every 5 seconds), a 0.2-second sag becomes less than 1 frame and is clamped to a minimum of 1 frame.

Missing LBNL CSV test data: Tests that reference vendor/pmu_event_library/ require the git submodule to be initialised: git submodule update --init.

FilesExpand file tree

usage-guide.md

Latest commit

History