| Goal | Command | When to use |
|---|---|---|
| Quick test data | generate |
Single file with specific scenarios, encoding, or PMU config |
| Diverse test corpus | corpus |
Many files covering the full parameter space for compression testing |
| Real event data | convert / batch |
Convert LBNL microPMU CSV captures to binary wire format |
| Real continuous data | import-archive |
Convert LBNL continuous archive (days of real 120 Hz data) |
| Debug a .c37 file | inspect |
View metadata, hex dump, or export parsed data as CSV |
| Validate compression | verify |
Compare original vs decompressed files (bit-exact or lossy) |
For compression testing, start with corpus to generate a broad test dataset, then supplement with generate for specific edge cases or import-archive for real-world data.
git clone https://github.com/AZX-PBC-OSS/upmu-dataframes.git
cd upmu-dataframes
cargo build --releaseThe binary is at target/release/upmu-dataframes. Add it to your PATH or run directly.
Convert an LBNL microPMU CSV event capture to IEEE C37.118.2 binary:
upmu-dataframes convert \
--input /path/to/event.csv \
--output /path/to/event.c37The output file contains one CFG-2 configuration frame followed by one data frame per CSV row. At 120 Hz reporting rate, a 1-minute event produces 7,200 data frames.
Override defaults for station metadata and encoding:
upmu-dataframes convert \
--input event.csv \
--output event.c37 \
--station-name "SITE-PMU-01" \
--idcode 42 \
--format rect \
--encoding int16 \
--data-rate 60--format rectuses rectangular (real + imaginary) instead of polar (magnitude + angle)--encoding int16produces smaller frames (54 bytes vs 90 bytes) with scaled integer values--data-ratesets the declared reporting rate in the CFG-2 frame
Convert an entire LBNL event library:
upmu-dataframes batch \
--input-dir /data/lbnl-events \
--output-dir /data/c37-outputThis recursively walks the input directory, finds all CSV files, and writes corresponding .c37 files to the output directory preserving the directory structure (e.g., PV/event1.csv becomes PV/event1.c37).
Generate realistic PMU streams with composable power system scenarios:
# 60 seconds of normal operation (no scenarios)
upmu-dataframes generate \
--output normal.c37 \
--duration 60
# Random mix of scenarios (deterministic with seed)
upmu-dataframes generate \
--output random.c37 \
--duration 60 \
--scenario random_mix \
--seed 42
# Specific scenarios placed sequentially
upmu-dataframes generate \
--output events.c37 \
--duration 30 \
--scenario sag,motor_start,freq_event
# All scenario types for comprehensive testing
upmu-dataframes generate \
--output all_events.c37 \
--duration 120 \
--scenario all
# Int16 encoding with boundary test scenario
upmu-dataframes generate \
--output int16_test.c37 \
--encoding int16 \
--scenario int16_boundaryBaseline data always includes:
- Realistic 3-phase voltage/current waveforms with Gaussian noise
- Positive sequence computed via Fortescue transformation
- Frequency jitter around 60 Hz nominal
- ROCOF derived from frequency variation
| Option | Description | Default |
|---|---|---|
--phasor-count N |
Number of phasors to generate (1-16) | 8 |
--notation rect |
Use rectangular notation (real/imaginary) instead of polar | polar |
--nominal-freq 50 |
50 Hz grid frequency (default is 60 Hz) | 60 |
| Option | Description | Default |
|---|---|---|
--encoding TYPE |
Set all data types to same encoding (float32 or int16) | float32 |
--phasor-encoding TYPE |
Phasor encoding override | (inherits from --encoding) |
--analog-encoding TYPE |
Analog channel encoding override | (inherits from --encoding) |
--freq-encoding TYPE |
Frequency/ROCOF encoding override | (inherits from --encoding) |
| Option | Description | Default |
|---|---|---|
--analog-count N |
Number of generic analog channels | 0 |
--analog-preset NAME |
Use preset analogs (substation) | none |
Available presets:
substation- 4 realistic channels: TEMP-XFMR, TAP-POS, MW, MVAR
| Option | Description | Default |
|---|---|---|
--digital-count N |
Number of generic digital words (each has 16 bits) | 0 |
--digital-preset NAME |
Use preset digital word (breaker) | none |
Available presets:
breaker- Breaker/recloser status with fault simulation support
| Option | Description | Default |
|---|---|---|
--pmu-count N |
Number of PMUs in aggregated stream (1-256) | 1 |
Generate PDC-style aggregated streams with multiple PMUs:
# 3 PMUs at different buses
upmu-dataframes generate \
--output substation.c37 \
--pmu-count 3 \
--duration 60 \
--scenario fault_lgEach PMU gets:
- Distinct station name (BASE, BASE-PMU2, BASE-PMU3)
- Unique IDCODE (sequential from base)
- Independent phase angle offset (5° per PMU)
- Shared timestamp and scenario effects across all PMUs
| Option | Description | Default |
|---|---|---|
--rate RATE |
Reporting rate in fps, or negative for seconds per frame | 120 |
--config-count N |
Config count (CFGCNT) in CFG-2 frame | 1 |
--time-base N |
TIME_BASE for FRACSEC encoding (1-16777215) | 1000000 |
--cfg2-interval SECS |
Re-emit CFG-2 frame every N seconds (real PMU behavior) | none |
Generate realistic TCP session structure with all IEEE C37.118.2 frame types:
| Option | Description | Default |
|---|---|---|
--header-text TEXT |
Prepend HDR frame with ASCII station description | none |
--include-cfg1 |
Prepend CFG-1 capabilities frame before CFG-2 | off |
--include-cfg3 |
Include CFG-3 extended config frame after CFG-2 | off |
--include-commands |
Wrap data with CMD TurnOnData/TurnOffData frames | off |
Example — full session stream matching real PMU TCP traffic:
upmu-dataframes generate \
--output session.c37 \
--duration 60 \
--header-text "microPMU Station A, Model XY-1234, Firmware v2.5" \
--include-cfg1 \
--include-cfg3 \
--include-commands \
--cfg2-interval 30 \
--scenario random_mix \
--seed 42This produces: [HDR] [CFG-1] [CFG-2] [CFG-3] [CMD:TurnOn] [Data×7200] [CFG-2 retransmit×2] [CMD:TurnOff]
Examples:
--rate 30- 30 frames per second--rate -5- One frame every 5 seconds (slow sampling)--rate -1- One frame per second
Scenarios are composable -- multiple can be active simultaneously. When multiple scenarios overlap, their modifiers stack (voltage/current multipliers multiply, offsets add).
Voltage scenarios:
| Name | Description | Default Parameters |
|---|---|---|
sag |
Voltage reduction (IEC 61000-4-30) | 80% retained (20% dip) |
swell |
Voltage elevation | 115% of nominal |
cap_switching |
Decaying sinusoidal oscillation per-phase with 120° shift | 500 Hz ring, 20 ms decay, 30% amplitude |
Current scenarios:
| Name | Description | Default Parameters |
|---|---|---|
motor_start |
High inrush current + voltage dip with exponential decay | 6x inrush, 88% voltage |
pv_cloud |
V-shaped current ramp (PV inverter cloud transient) | 50% depth |
near_zero_current |
Fixed low current magnitude on all phases | 0.5 A |
Frequency/angle scenarios:
| Name | Description | Default Parameters |
|---|---|---|
freq_event |
Trapezoidal frequency deviation (ramp up, hold, ramp down) | 0.3 Hz deviation, 500 ms ramp |
angle_jump |
Instant phase angle offset on selected phases | 10° on phase A |
Fault scenarios:
| Name | Description | Default Parameters |
|---|---|---|
fault_lg |
Line-to-ground fault (single phase) | Phase A, 40% severity, 5x fault current |
fault_ll |
Line-to-line fault (two phases) | Phases A/B, 40% severity, 5x fault current |
fault_llg |
Line-to-line-to-ground fault | Phases A/B, 40% severity, 5x fault current |
Encoding edge cases:
| Name | Description | Default Parameters |
|---|---|---|
int16_boundary |
Engineers magnitudes near int16 max (±32767) | 99% of scale |
timestamp_rollover |
Places event at UTC second boundary | (no parameters) |
Status scenarios:
| Name | Description | Default Parameters |
|---|---|---|
sync_loss |
PMU sync loss at start, recovery at 80% duration | Lt1ms time quality during loss |
config_change |
Configuration change pending flag | Set for entire duration |
data_quality |
Data error with modification flag | PmuError type |
trigger |
Trigger detection with reason | MagnitudeLow/High based on co-scenarios |
gps_unlock |
GPS lock lost/recovered with quality ramping | 30% ramp up, 40% hold, 30% ramp down |
leap_second |
Leap second pending/occurred flags | First half pending, second half occurred |
invalid_measurement |
NaN/zero phasor values with PmuErrorNoData STAT | NaN for float32, zero for int16 |
Timing scenarios:
| Name | Description | Default Parameters |
|---|---|---|
missed_frames |
Skip N consecutive frames, creating a SOC gap | Configurable gap count and position |
duplicate_frames |
Emit frames with identical timestamps | 3 consecutive duplicated frames |
timing_jitter |
Add Gaussian jitter to FRACSEC timestamps | ~100µs stddev |
| Name | Description |
|---|---|
random_mix |
3-6 random scenarios with randomized parameters within physical ranges. Use --seed for determinism. |
all |
One instance of each scenario type placed sequentially with gaps, including status, timing, and invalid measurement scenarios. |
- Single scenario: placed at 50% of duration
- Multiple scenarios: duration divided into equal slots, each scenario placed in its own slot
- Presets: handle their own placement (random_mix randomizes, all places sequentially)
Generate a diverse set of .c37 files covering the full parameter space for compression testing:
# Medium corpus (~60 files)
upmu-dataframes corpus \
--output-dir /data/test-corpus \
--preset medium \
--seed 42The corpus generator produces files with varied:
- Data rates (30, 60, 120, 240 fps, plus negative rates for slow sampling)
- Durations (1s, 10s, 60s, 300s)
- Phasor counts (1, 4, 8, 16)
- Encodings (float32, int16, mixed per channel type)
- Notation (polar, rectangular)
- Nominal frequency (50 Hz, 60 Hz)
- Analog channels (none, substation preset)
- Digital channels (none, breaker preset)
- PMU count (1, 2, 4)
- Scenarios (none, sag, random_mix, all, sync_loss, gps_unlock, etc.)
- TIME_BASE values (1,000,000 and 1,048,576)
- CFG-2 retransmission intervals
A manifest.json is written to the output directory describing each file's parameters for downstream tooling.
| Preset | Approx. Files | Description |
|---|---|---|
small |
~20 | Basic axis coverage |
medium |
~60 | Full coverage with analog/digital, scenarios, multi-PMU |
large |
~120 | Comprehensive with multiple durations and scenario combinations |
The LBNL continuous archive at powerdata-download.lbl.gov provides ~11.6 days of real-world 120 Hz data from 3 distribution locations. Each location has 12 gzip-compressed channel files (3-phase voltage + current, magnitude + angle).
# Download all channels for a location
upmu-dataframes download-archive \
--location a6_bus1 \
--output-dir ./vendor/lbnl_archive
# Download only voltage channels
upmu-dataframes download-archive \
--location bank_514 \
--channels voltage
# Re-download even if files exist
upmu-dataframes download-archive \
--location grizzly_bus1_2 \
--forceAvailable locations: a6_bus1, bank_514, grizzly_bus1_2.
Files are downloaded sequentially with progress bars. Existing files are skipped unless --force is set. Downloads write to a .part file first for crash safety.
# Import the full archive (all ~11.6 days)
upmu-dataframes import-archive \
--input-dir ./vendor/lbnl_archive \
--location a6_bus1 \
--output full_capture.c37
# Import 5 minutes starting 1 hour into the dataset
upmu-dataframes import-archive \
--input-dir ./vendor/lbnl_archive \
--location a6_bus1 \
--output slice.c37 \
--offset 3600 \
--duration 300
# Split into 1-hour chunks
upmu-dataframes import-archive \
--input-dir ./vendor/lbnl_archive \
--location a6_bus1 \
--output hourly.c37 \
--chunk-duration 3600When chunking, output files are named {station}_{chunk_index:04}.c37 (e.g., LBNL-ARCHIVE_0000.c37, LBNL-ARCHIVE_0001.c37). Each chunk starts with its own CFG-2 frame.
Time slicing via --offset works by reading and discarding samples (gzip streams cannot be seeked). At 120 Hz, skipping 1 hour reads through ~432,000 samples, which is fast.
The pipeline streams data through gzip decompression without loading into memory. Memory usage stays constant regardless of input size.
Each .gz file is a headerless two-column CSV:
timestamp_nanoseconds,float_value
12 channels per location:
- Voltage:
L1MAG,L1ANG,L2MAG,L2ANG,L3MAG,L3ANG - Current:
C1MAG,C1ANG,C2MAG,C2ANG,C3MAG,C3ANG
Angles are in degrees (converted to radians during import). Timestamps are nanoseconds since Unix epoch, aligned across all channels at 8,333,333 ns intervals (120 Hz).
upmu-dataframes inspect --input event.c37Displays configuration metadata (station name, phasor count, data rate) and data frame statistics (count, time range, CRC status).
upmu-dataframes inspect --input event.c37 --hexdump --max-frames 5Prints hex + ASCII dump of each frame for protocol debugging. Use --max-frames to limit output.
upmu-dataframes inspect --input event.c37 --csv parsed_output.csvExports parsed data frames as CSV. Columns adapt to the stream's configuration — phasor count, analog channels, digital words, and multi-PMU layout are all reflected in the output:
soc, fracsec, stat_word, phasor_0_mag, phasor_0_ang, ..., phasor_N_mag, phasor_N_ang, freq_deviation, rocof, analog_0, ..., digital_0, ...
Multi-PMU streams prefix columns with the PMU index. This is useful for importing C37.118 data into analysis tools (Python, MATLAB, Excel).
After compressing and decompressing a .c37 file, verify the integrity:
upmu-dataframes verify \
--original original.c37 \
--decompressed decompressed.c37Checks that every frame in the decompressed file is byte-identical to the original. Reports pass/fail per frame.
For lossy compression algorithms, use tolerance-based comparison:
upmu-dataframes verify \
--original original.c37 \
--decompressed decompressed.c37 \
--mode lossy \
--mag-tolerance 0.01 \
--angle-tolerance 0.001 \
--freq-tolerance 0.001Each data frame is compared field-by-field:
- Phasor magnitudes must differ by less than
--mag-tolerance - Phasor angles must differ by less than
--angle-toleranceradians - Frequency/ROCOF must differ by less than
--freq-toleranceHz
Include the compressed file to calculate the compression ratio:
upmu-dataframes verify \
--original original.c37 \
--decompressed decompressed.c37 \
--compressed compressed.bin \
--jsonThe --json flag outputs a structured report:
{
"total_frames": 7200,
"passed": 7200,
"failed": 0,
"cfg_match": true,
"compression_ratio": 0.45,
"frame_comparisons": [...]
}- Phasors: IEEE 754 single-precision floats (4 bytes magnitude + 4 bytes angle)
- Frequency/ROCOF: IEEE 754 single-precision floats
- Frame size: 90 bytes (with 8 phasors)
- Best for: analysis, round-trip accuracy, protocol compliance testing
- Phasors: signed 16-bit integers with calibrated scale factors
- Frequency: millihertz resolution (value / 1000 = Hz)
- ROCOF: centihertz/second resolution (value / 100 = Hz/s)
- Frame size: 54 bytes (with 8 phasors)
- Best for: field-deployment realism, bandwidth-constrained scenarios, compression testing with smaller frames
By default, each frame contains 8 phasors (configurable via --phasor-count):
| Index | Name | Type | Description |
|---|---|---|---|
| 0 | VA | Voltage | Phase A voltage |
| 1 | VB | Voltage | Phase B voltage |
| 2 | VC | Voltage | Phase C voltage |
| 3 | V+ | Voltage | Positive sequence voltage (Fortescue) |
| 4 | IA | Current | Phase A current |
| 5 | IB | Current | Phase B current |
| 6 | IC | Current | Phase C current |
| 7 | I+ | Current | Positive sequence current (Fortescue) |
For the default configuration (8 phasors, 0 analog, 0 digital, 1 PMU):
| Duration | Rate | Encoding | Approx. Size |
|---|---|---|---|
| 1 min | 120 fps | float32 | ~634 KB |
| 1 min | 120 fps | int16 | ~380 KB |
| 1 hour | 120 fps | float32 | ~37 MB |
| 1 hour | 120 fps | int16 | ~22 MB |
Formula: CFG-2 size + (frame_size × rate × duration_seconds)
Frame size varies with configuration: adding analog channels adds 4 bytes (float32) or 2 bytes (int16) per channel. Digital words add 2 bytes each. Multi-PMU streams multiply the per-PMU data portion by the PMU count. Different phasor counts scale the phasor portion (8 bytes per float32 phasor, 4 bytes per int16).
Invalid encoding/notation: Passing an unrecognised value to --encoding, --phasor-encoding, --analog-encoding, --freq-encoding, or --notation exits with an error listing valid options.
Duration too short for --scenario all: The generator warns and places as many scenarios as fit within the duration. The remaining scenarios are silently dropped. Use --duration 120 or longer for all to ensure every scenario has room.
Negative rate with scenarios: Scenario durations are calculated in frames, not seconds. With --rate -5 (one frame every 5 seconds), a 0.2-second sag becomes less than 1 frame and is clamped to a minimum of 1 frame.
Missing LBNL CSV test data: Tests that reference vendor/pmu_event_library/ require the git submodule to be initialised: git submodule update --init.