Skip to content

Powerpipe Export Timeout with Large JSONB Fields #950

@ryansticksecure

Description

@ryansticksecure

Powerpipe Export Timeout with Large JSONB Fields

Describe the bug

Powerpipe control exports (--export=filename.json) timeout when processing controls with large JSONB fields containing complex nested data, while JSON console output (--output=json) works correctly. The issue appears to be in the JSON exporter's handling of large JSON/JSONB fields during marshaling, causing performance degradation and eventual timeouts.

Powerpipe version

Powerpipe v1.4.0
Powerpipe v0.0.0-dev-develop.20251002184514 (built from commit ea87a9e)
Steampipe v1.1.4

To reproduce

  1. Setup: Install Powerpipe and any Steampipe plugin with tables containing large JSONB fields (e.g., cloud provider policies with complex assignment rules)

  2. Repro steps:

    # This times out (exit code 124) and creates no file
    timeout 30s powerpipe control run your_control_name \
      --mod-location "/path/to/mod" \
      --export test_export.json \
      --progress=false
    
    # This works correctly (exit code 1, normal for controls with alarms)
    powerpipe control run your_control_name \
      --mod-location "/path/to/mod" \
      --output=json > test_output.json
  3. Control SQL example (generic pattern):

    control example_control_with_large_jsonb {
      title = "Example Control with Large JSONB Fields"
      
      sql = <<EOT
        select
          resource_id,
          name,
          large_jsonb_field,  -- Contains complex nested JSON arrays/objects
          case
            when large_jsonb_field is null or large_jsonb_field = '[]'::jsonb then 'none'
            else 'complex-data'
          end as data_summary
        from some_plugin_table
        where large_jsonb_field is not null
      EOT
    }

Expected behavior

Both --export=filename.json and --output=json should work correctly with large JSONB fields, returning exit codes indicating success or control results (0, 1, or 2).

Actual behavior

  • --export=filename.json: Times out after 30+ seconds (exit code 124), creates no output file
  • --output=json: Works correctly, produces substantial JSON output, proper exit code

Failed Workarounds Attempted

1. JSONB Field Truncation

Attempted: Truncating large JSONB fields to prevent marshaling issues

case 
  when length(large_field::text) > 1000 then 
    left(large_field::text, 1000) || '...'
  else large_field::text 
end as truncated_field

Result: Still times out - truncation doesn't prevent the underlying marshaling issue

2. Length-Based Conditional Logic

Attempted: Using jsonb_array_length() with size checks

case
  when length(assignments::text) < 10000 and jsonb_array_length(assignments) = 1 then
    assignments -> 0 ->> 'field'
  when length(assignments::text) < 10000 then
    'multiple (' || jsonb_array_length(assignments) || ' items)'
  else 'large-data'
end as smart_summary

Result: Still times out - JSON parsing functions themselves cause performance issues

3. Simplified Assignment Summaries

Attempted: Replacing complex nested field parsing with simple existence checks

case
  when large_field is null or large_field = '[]'::jsonb then 'none'
  else 'assigned'
end as simple_summary

Result: Reduced but did not eliminate timeouts with very large fields

Root Cause Analysis

Based on source code investigation:

  1. Different Code Paths:

    • --export uses pipe-fittings JSON exporter → querydisplay.BuildJSON()ParseJSONOutputColumnValue()
    • --output=json uses direct JSON streaming/output handling
  2. Problematic Function: pipe-fittings/querydisplay/BuildJSON() attempts to marshal large JSONB fields entirely in memory, causing timeouts with complex assignment data

  3. JSONB Field Content: Fields contain arrays with complex nested objects:

    [
      {
        "target": {
          "@odata.type": "#microsoft.graph.someAssignmentTarget",
          "filterId": null,
          "filterType": "none"
        },
        "source": "direct",
        "sourceId": null
      }
    ]

Additional context

  • Environment: Reproduced in WSL/Linux environment
  • Custom Build: Built Powerpipe from source (latest develop branch) - issue persists
  • Workaround Currently Used: Using --output=json and redirecting to file works correctly
  • Impact: Affects controls with any large JSONB fields containing complex nested data
  • Related Files:
    • pipe-fittings/querydisplay/display.go - BuildJSON() function
    • pipe-fittings/querydisplay/column.go - ParseJSONOutputColumnValue() function

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions