Skip to content

Latest commit

 

History

History
545 lines (401 loc) · 18.9 KB

File metadata and controls

545 lines (401 loc) · 18.9 KB

Usage Guide

Which Command Should I Use?

Goal Command When to use
Quick test data generate Single file with specific scenarios, encoding, or PMU config
Diverse test corpus corpus Many files covering the full parameter space for compression testing
Real event data convert / batch Convert LBNL microPMU CSV captures to binary wire format
Real continuous data import-archive Convert LBNL continuous archive (days of real 120 Hz data)
Debug a .c37 file inspect View metadata, hex dump, or export parsed data as CSV
Validate compression verify Compare original vs decompressed files (bit-exact or lossy)

For compression testing, start with corpus to generate a broad test dataset, then supplement with generate for specific edge cases or import-archive for real-world data.

Installation

git clone https://github.com/AZX-PBC-OSS/upmu-dataframes.git
cd upmu-dataframes
cargo build --release

The binary is at target/release/upmu-dataframes. Add it to your PATH or run directly.

Workflow: LBNL CSV Conversion

Single file conversion

Convert an LBNL microPMU CSV event capture to IEEE C37.118.2 binary:

upmu-dataframes convert \
  --input /path/to/event.csv \
  --output /path/to/event.c37

The output file contains one CFG-2 configuration frame followed by one data frame per CSV row. At 120 Hz reporting rate, a 1-minute event produces 7,200 data frames.

Customizing output

Override defaults for station metadata and encoding:

upmu-dataframes convert \
  --input event.csv \
  --output event.c37 \
  --station-name "SITE-PMU-01" \
  --idcode 42 \
  --format rect \
  --encoding int16 \
  --data-rate 60
  • --format rect uses rectangular (real + imaginary) instead of polar (magnitude + angle)
  • --encoding int16 produces smaller frames (54 bytes vs 90 bytes) with scaled integer values
  • --data-rate sets the declared reporting rate in the CFG-2 frame

Batch conversion

Convert an entire LBNL event library:

upmu-dataframes batch \
  --input-dir /data/lbnl-events \
  --output-dir /data/c37-output

This recursively walks the input directory, finds all CSV files, and writes corresponding .c37 files to the output directory preserving the directory structure (e.g., PV/event1.csv becomes PV/event1.c37).

Workflow: Synthetic Data Generation

Generate realistic PMU streams with composable power system scenarios:

# 60 seconds of normal operation (no scenarios)
upmu-dataframes generate \
  --output normal.c37 \
  --duration 60

# Random mix of scenarios (deterministic with seed)
upmu-dataframes generate \
  --output random.c37 \
  --duration 60 \
  --scenario random_mix \
  --seed 42

# Specific scenarios placed sequentially
upmu-dataframes generate \
  --output events.c37 \
  --duration 30 \
  --scenario sag,motor_start,freq_event

# All scenario types for comprehensive testing
upmu-dataframes generate \
  --output all_events.c37 \
  --duration 120 \
  --scenario all

# Int16 encoding with boundary test scenario
upmu-dataframes generate \
  --output int16_test.c37 \
  --encoding int16 \
  --scenario int16_boundary

Baseline data always includes:

  • Realistic 3-phase voltage/current waveforms with Gaussian noise
  • Positive sequence computed via Fortescue transformation
  • Frequency jitter around 60 Hz nominal
  • ROCOF derived from frequency variation

Advanced Options

Phasor Configuration

Option Description Default
--phasor-count N Number of phasors to generate (1-16) 8
--notation rect Use rectangular notation (real/imaginary) instead of polar polar
--nominal-freq 50 50 Hz grid frequency (default is 60 Hz) 60

Data Encoding (per type)

Option Description Default
--encoding TYPE Set all data types to same encoding (float32 or int16) float32
--phasor-encoding TYPE Phasor encoding override (inherits from --encoding)
--analog-encoding TYPE Analog channel encoding override (inherits from --encoding)
--freq-encoding TYPE Frequency/ROCOF encoding override (inherits from --encoding)

Analog Channels

Option Description Default
--analog-count N Number of generic analog channels 0
--analog-preset NAME Use preset analogs (substation) none

Available presets:

  • substation - 4 realistic channels: TEMP-XFMR, TAP-POS, MW, MVAR

Digital Channels

Option Description Default
--digital-count N Number of generic digital words (each has 16 bits) 0
--digital-preset NAME Use preset digital word (breaker) none

Available presets:

  • breaker - Breaker/recloser status with fault simulation support

Multi-PMU Streams

Option Description Default
--pmu-count N Number of PMUs in aggregated stream (1-256) 1

Generate PDC-style aggregated streams with multiple PMUs:

# 3 PMUs at different buses
upmu-dataframes generate \
  --output substation.c37 \
  --pmu-count 3 \
  --duration 60 \
  --scenario fault_lg

Each PMU gets:

  • Distinct station name (BASE, BASE-PMU2, BASE-PMU3)
  • Unique IDCODE (sequential from base)
  • Independent phase angle offset (5° per PMU)
  • Shared timestamp and scenario effects across all PMUs

Data Rate & Timing

Option Description Default
--rate RATE Reporting rate in fps, or negative for seconds per frame 120
--config-count N Config count (CFGCNT) in CFG-2 frame 1
--time-base N TIME_BASE for FRACSEC encoding (1-16777215) 1000000
--cfg2-interval SECS Re-emit CFG-2 frame every N seconds (real PMU behavior) none

Session Framing

Generate realistic TCP session structure with all IEEE C37.118.2 frame types:

Option Description Default
--header-text TEXT Prepend HDR frame with ASCII station description none
--include-cfg1 Prepend CFG-1 capabilities frame before CFG-2 off
--include-cfg3 Include CFG-3 extended config frame after CFG-2 off
--include-commands Wrap data with CMD TurnOnData/TurnOffData frames off

Example — full session stream matching real PMU TCP traffic:

upmu-dataframes generate \
  --output session.c37 \
  --duration 60 \
  --header-text "microPMU Station A, Model XY-1234, Firmware v2.5" \
  --include-cfg1 \
  --include-cfg3 \
  --include-commands \
  --cfg2-interval 30 \
  --scenario random_mix \
  --seed 42

This produces: [HDR] [CFG-1] [CFG-2] [CFG-3] [CMD:TurnOn] [Data×7200] [CFG-2 retransmit×2] [CMD:TurnOff]

Examples:

  • --rate 30 - 30 frames per second
  • --rate -5 - One frame every 5 seconds (slow sampling)
  • --rate -1 - One frame per second

Available Scenarios

Scenarios are composable -- multiple can be active simultaneously. When multiple scenarios overlap, their modifiers stack (voltage/current multipliers multiply, offsets add).

Voltage scenarios:

Name Description Default Parameters
sag Voltage reduction (IEC 61000-4-30) 80% retained (20% dip)
swell Voltage elevation 115% of nominal
cap_switching Decaying sinusoidal oscillation per-phase with 120° shift 500 Hz ring, 20 ms decay, 30% amplitude

Current scenarios:

Name Description Default Parameters
motor_start High inrush current + voltage dip with exponential decay 6x inrush, 88% voltage
pv_cloud V-shaped current ramp (PV inverter cloud transient) 50% depth
near_zero_current Fixed low current magnitude on all phases 0.5 A

Frequency/angle scenarios:

Name Description Default Parameters
freq_event Trapezoidal frequency deviation (ramp up, hold, ramp down) 0.3 Hz deviation, 500 ms ramp
angle_jump Instant phase angle offset on selected phases 10° on phase A

Fault scenarios:

Name Description Default Parameters
fault_lg Line-to-ground fault (single phase) Phase A, 40% severity, 5x fault current
fault_ll Line-to-line fault (two phases) Phases A/B, 40% severity, 5x fault current
fault_llg Line-to-line-to-ground fault Phases A/B, 40% severity, 5x fault current

Encoding edge cases:

Name Description Default Parameters
int16_boundary Engineers magnitudes near int16 max (±32767) 99% of scale
timestamp_rollover Places event at UTC second boundary (no parameters)

Status scenarios:

Name Description Default Parameters
sync_loss PMU sync loss at start, recovery at 80% duration Lt1ms time quality during loss
config_change Configuration change pending flag Set for entire duration
data_quality Data error with modification flag PmuError type
trigger Trigger detection with reason MagnitudeLow/High based on co-scenarios
gps_unlock GPS lock lost/recovered with quality ramping 30% ramp up, 40% hold, 30% ramp down
leap_second Leap second pending/occurred flags First half pending, second half occurred
invalid_measurement NaN/zero phasor values with PmuErrorNoData STAT NaN for float32, zero for int16

Timing scenarios:

Name Description Default Parameters
missed_frames Skip N consecutive frames, creating a SOC gap Configurable gap count and position
duplicate_frames Emit frames with identical timestamps 3 consecutive duplicated frames
timing_jitter Add Gaussian jitter to FRACSEC timestamps ~100µs stddev

Presets

Name Description
random_mix 3-6 random scenarios with randomized parameters within physical ranges. Use --seed for determinism.
all One instance of each scenario type placed sequentially with gaps, including status, timing, and invalid measurement scenarios.

Scenario Placement

  • Single scenario: placed at 50% of duration
  • Multiple scenarios: duration divided into equal slots, each scenario placed in its own slot
  • Presets: handle their own placement (random_mix randomizes, all places sequentially)

Workflow: Test Corpus Generation

Generate a diverse set of .c37 files covering the full parameter space for compression testing:

# Medium corpus (~60 files)
upmu-dataframes corpus \
  --output-dir /data/test-corpus \
  --preset medium \
  --seed 42

The corpus generator produces files with varied:

  • Data rates (30, 60, 120, 240 fps, plus negative rates for slow sampling)
  • Durations (1s, 10s, 60s, 300s)
  • Phasor counts (1, 4, 8, 16)
  • Encodings (float32, int16, mixed per channel type)
  • Notation (polar, rectangular)
  • Nominal frequency (50 Hz, 60 Hz)
  • Analog channels (none, substation preset)
  • Digital channels (none, breaker preset)
  • PMU count (1, 2, 4)
  • Scenarios (none, sag, random_mix, all, sync_loss, gps_unlock, etc.)
  • TIME_BASE values (1,000,000 and 1,048,576)
  • CFG-2 retransmission intervals

A manifest.json is written to the output directory describing each file's parameters for downstream tooling.

Preset Approx. Files Description
small ~20 Basic axis coverage
medium ~60 Full coverage with analog/digital, scenarios, multi-PMU
large ~120 Comprehensive with multiple durations and scenario combinations

Workflow: LBNL Continuous Archive

The LBNL continuous archive at powerdata-download.lbl.gov provides ~11.6 days of real-world 120 Hz data from 3 distribution locations. Each location has 12 gzip-compressed channel files (3-phase voltage + current, magnitude + angle).

Downloading archive files

# Download all channels for a location
upmu-dataframes download-archive \
  --location a6_bus1 \
  --output-dir ./vendor/lbnl_archive

# Download only voltage channels
upmu-dataframes download-archive \
  --location bank_514 \
  --channels voltage

# Re-download even if files exist
upmu-dataframes download-archive \
  --location grizzly_bus1_2 \
  --force

Available locations: a6_bus1, bank_514, grizzly_bus1_2.

Files are downloaded sequentially with progress bars. Existing files are skipped unless --force is set. Downloads write to a .part file first for crash safety.

Importing archive data

# Import the full archive (all ~11.6 days)
upmu-dataframes import-archive \
  --input-dir ./vendor/lbnl_archive \
  --location a6_bus1 \
  --output full_capture.c37

# Import 5 minutes starting 1 hour into the dataset
upmu-dataframes import-archive \
  --input-dir ./vendor/lbnl_archive \
  --location a6_bus1 \
  --output slice.c37 \
  --offset 3600 \
  --duration 300

# Split into 1-hour chunks
upmu-dataframes import-archive \
  --input-dir ./vendor/lbnl_archive \
  --location a6_bus1 \
  --output hourly.c37 \
  --chunk-duration 3600

When chunking, output files are named {station}_{chunk_index:04}.c37 (e.g., LBNL-ARCHIVE_0000.c37, LBNL-ARCHIVE_0001.c37). Each chunk starts with its own CFG-2 frame.

Time slicing via --offset works by reading and discarding samples (gzip streams cannot be seeked). At 120 Hz, skipping 1 hour reads through ~432,000 samples, which is fast.

The pipeline streams data through gzip decompression without loading into memory. Memory usage stays constant regardless of input size.

Archive channel format

Each .gz file is a headerless two-column CSV:

timestamp_nanoseconds,float_value

12 channels per location:

  • Voltage: L1MAG, L1ANG, L2MAG, L2ANG, L3MAG, L3ANG
  • Current: C1MAG, C1ANG, C2MAG, C2ANG, C3MAG, C3ANG

Angles are in degrees (converted to radians during import). Timestamps are nanoseconds since Unix epoch, aligned across all channels at 8,333,333 ns intervals (120 Hz).

Workflow: Inspecting Binary Files

Summary view

upmu-dataframes inspect --input event.c37

Displays configuration metadata (station name, phasor count, data rate) and data frame statistics (count, time range, CRC status).

Hex dump

upmu-dataframes inspect --input event.c37 --hexdump --max-frames 5

Prints hex + ASCII dump of each frame for protocol debugging. Use --max-frames to limit output.

CSV export

upmu-dataframes inspect --input event.c37 --csv parsed_output.csv

Exports parsed data frames as CSV. Columns adapt to the stream's configuration — phasor count, analog channels, digital words, and multi-PMU layout are all reflected in the output:

soc, fracsec, stat_word, phasor_0_mag, phasor_0_ang, ..., phasor_N_mag, phasor_N_ang, freq_deviation, rocof, analog_0, ..., digital_0, ...

Multi-PMU streams prefix columns with the PMU index. This is useful for importing C37.118 data into analysis tools (Python, MATLAB, Excel).

Workflow: Compression Verification

After compressing and decompressing a .c37 file, verify the integrity:

Bit-exact verification

upmu-dataframes verify \
  --original original.c37 \
  --decompressed decompressed.c37

Checks that every frame in the decompressed file is byte-identical to the original. Reports pass/fail per frame.

Lossy verification

For lossy compression algorithms, use tolerance-based comparison:

upmu-dataframes verify \
  --original original.c37 \
  --decompressed decompressed.c37 \
  --mode lossy \
  --mag-tolerance 0.01 \
  --angle-tolerance 0.001 \
  --freq-tolerance 0.001

Each data frame is compared field-by-field:

  • Phasor magnitudes must differ by less than --mag-tolerance
  • Phasor angles must differ by less than --angle-tolerance radians
  • Frequency/ROCOF must differ by less than --freq-tolerance Hz

Compression ratio

Include the compressed file to calculate the compression ratio:

upmu-dataframes verify \
  --original original.c37 \
  --decompressed decompressed.c37 \
  --compressed compressed.bin \
  --json

The --json flag outputs a structured report:

{
  "total_frames": 7200,
  "passed": 7200,
  "failed": 0,
  "cfg_match": true,
  "compression_ratio": 0.45,
  "frame_comparisons": [...]
}

Data Encoding Options

Float32 (default)

  • Phasors: IEEE 754 single-precision floats (4 bytes magnitude + 4 bytes angle)
  • Frequency/ROCOF: IEEE 754 single-precision floats
  • Frame size: 90 bytes (with 8 phasors)
  • Best for: analysis, round-trip accuracy, protocol compliance testing

Int16

  • Phasors: signed 16-bit integers with calibrated scale factors
  • Frequency: millihertz resolution (value / 1000 = Hz)
  • ROCOF: centihertz/second resolution (value / 100 = Hz/s)
  • Frame size: 54 bytes (with 8 phasors)
  • Best for: field-deployment realism, bandwidth-constrained scenarios, compression testing with smaller frames

Phasor Layout

By default, each frame contains 8 phasors (configurable via --phasor-count):

Index Name Type Description
0 VA Voltage Phase A voltage
1 VB Voltage Phase B voltage
2 VC Voltage Phase C voltage
3 V+ Voltage Positive sequence voltage (Fortescue)
4 IA Current Phase A current
5 IB Current Phase B current
6 IC Current Phase C current
7 I+ Current Positive sequence current (Fortescue)

Typical File Sizes

For the default configuration (8 phasors, 0 analog, 0 digital, 1 PMU):

Duration Rate Encoding Approx. Size
1 min 120 fps float32 ~634 KB
1 min 120 fps int16 ~380 KB
1 hour 120 fps float32 ~37 MB
1 hour 120 fps int16 ~22 MB

Formula: CFG-2 size + (frame_size × rate × duration_seconds)

Frame size varies with configuration: adding analog channels adds 4 bytes (float32) or 2 bytes (int16) per channel. Digital words add 2 bytes each. Multi-PMU streams multiply the per-PMU data portion by the PMU count. Different phasor counts scale the phasor portion (8 bytes per float32 phasor, 4 bytes per int16).

Troubleshooting

Invalid encoding/notation: Passing an unrecognised value to --encoding, --phasor-encoding, --analog-encoding, --freq-encoding, or --notation exits with an error listing valid options.

Duration too short for --scenario all: The generator warns and places as many scenarios as fit within the duration. The remaining scenarios are silently dropped. Use --duration 120 or longer for all to ensure every scenario has room.

Negative rate with scenarios: Scenario durations are calculated in frames, not seconds. With --rate -5 (one frame every 5 seconds), a 0.2-second sag becomes less than 1 frame and is clamped to a minimum of 1 frame.

Missing LBNL CSV test data: Tests that reference vendor/pmu_event_library/ require the git submodule to be initialised: git submodule update --init.