Skip to content

Latest commit

 

History

History
578 lines (429 loc) · 17.1 KB

File metadata and controls

578 lines (429 loc) · 17.1 KB

cspaced Language Specification

This document defines the complete syntax and semantics of cspaced, an indentation-based dialect of the C programming language. This specification is comprehensive enough to enable implementation of parsers, syntax highlighters, and language servers.

Table of Contents

  1. Overview
  2. Lexical Structure
  3. Syntax Grammar
  4. Expressions
  5. Statements
  6. Declarations
  7. Preprocessor
  8. Tree-Sitter Grammar Notes

Overview

cspaced is a transpiler that converts clean, indentation-based C syntax to traditional brace-based C. Key design principles:

  • Significant Indentation: Uses 2-space indentation instead of braces
  • Optional Semicolons: Inferred from newlines (except in some complex expressions)
  • Full C Compatibility: All C features supported, zero runtime overhead
  • Line-Oriented: Each statement/declaration typically occupies one line

Basic Example

// cspaced syntax
#include <stdio.h>

int factorial(int n):
    if (n <= 1):
        return 1
    else:
        return n * factorial(n - 1)

int main(void):
    printf("Hello World!\n")
    for (int i = 0; i < 10; i++):
        printf("factorial(%d) = %d\n", i, factorial(i))
    return 0

Generates equivalent C code with braces and semicolons.

Lexical Structure

Character Set

  • UTF-8 encoded source files
  • Whitespace: space (U+0020), tab (U+0009), newline (U+000A)
  • Significant whitespace: indentation level determines block structure

Tokens

Keywords (reserved words)

auto        break       case        char        const
continue    default     do          double      else
enum        extern      float       for         goto
if          inline      int         long        register
restrict    return      short       signed      sizeof
static      struct      switch      typedef     union
unsigned    void        volatile    while       _Alignas
_Alignof    _Atomic     _Bool       _Complex    _Generic
_Imaginary  _Noreturn   _Static_assert _Thread_local

Operators

+    -    *    /    %    ++   --   ~
&    |    ^    <<   >>   &&   ||
==   !=   <    >    <=   >=   ?    :
=    +=   -=   *=   /=   %=   &=   |=
^=   <<=  >>=  .
->   !    ~

Delimiters

(    )    [    ]    {    }    ;    ,
.    ->   :    ::

Literals

integer-literal: decimal | octal | hexadecimal | binary
floating-literal: decimal-floating | hexadecimal-floating
character-literal: 'c-char-sequence'
string-literal: "s-char-sequence"

Tokenization Rules

  1. Indentation: Leading spaces at line start determine indent level
  2. Newline Handling: Newlines terminate statements unless in continuations
  3. Semicolon Inference: Added automatically unless explicitly present
  4. Comment Handling: // and /* */ comments preserved

Significant Indentation

indent-level ::= (space space)*  // exactly 2 spaces per level
block ::= ':' newline indent statements dedent

// Examples:
if (condition):     // level 0
  statement1        // level 1
  if (nested):      // level 1
    statement2      // level 2
  statement3        // level 1

function(args):     // level 0
statement           // level 1

Syntax Grammar

Top-Level Rules

translation-unit ::= external-declaration*

external-declaration ::= function-definition
                        | declaration
                        | ';'

Declarations

declaration ::= declaration-specifiers init-declarator-list? ';'

declaration-specifiers ::= storage-class-specifier declaration-specifiers?
                          | type-specifier declaration-specifiers?
                          | type-qualifier declaration-specifiers?
                          | function-specifier declaration-specifiers?
                          | alignment-specifier declaration-specifiers?

init-declarator-list ::= init-declarator (',' init-declarator)*

init-declarator ::= declarator ('=' initializer)?

declarator ::= pointer? direct-declarator

direct-declarator ::= identifier
                     | '(' declarator ')'
                     | direct-declarator '[' type-qualifier-list? assignment-expression? ']'
                     | direct-declarator '[' 'static' type-qualifier-list? assignment-expression ']'
                     | direct-declarator '[' type-qualifier-list 'static' assignment-expression ']'
                     | direct-declarator '[' type-qualifier-list? '*' ']'
                     | direct-declarator '(' parameter-type-list ')'
                     | direct-declarator '(' identifier-list? ')'

pointer ::= '*' type-qualifier-list?

type-qualifier-list ::= type-qualifier+

parameter-type-list ::= parameter-list (',' '...')?

parameter-list ::= parameter-declaration (',' parameter-declaration)*

parameter-declaration ::= declaration-specifiers declarator
                         | declaration-specifiers abstract-declarator?

type-name ::= specifier-qualifier-list abstract-declarator?

abstract-declarator ::= pointer
                       | pointer? direct-abstract-declarator

direct-abstract-declarator ::= '(' abstract-declarator ')'
                              | direct-abstract-declarator? '[' type-qualifier-list? assignment-expression? ']'
                              | direct-abstract-declarator? '[' 'static' type-qualifier-list? assignment-expression ']'
                              | direct-abstract-declarator? '[' type-qualifier-list 'static' assignment-expression ']'
                              | direct-abstract-declarator? '[' '*' ']'
                              | direct-abstract-declarator? '(' parameter-type-list? ')'

initializer ::= assignment-expression
                | '{' initializer-list ','? '}'

initializer-list ::= designation? initializer (',' designation? initializer)*

designation ::= designator+

designator ::= '[' constant-expression ']'
               | '.' identifier

// Storage class specifiers
storage-class-specifier ::= 'auto' | 'register' | 'static' | 'extern' | 'typedef' | '_Thread_local'

// Type specifiers
type-specifier ::= 'void' | 'char' | 'short' | 'int' | 'long' | 'float' | 'double'
                   | 'signed' | 'unsigned' | '_Bool' | '_Complex' | '_Imaginary'
                   | 'struct' struct-specifier | 'union' union-specifier | 'enum' enum-specifier
                   | typedef-name

struct-specifier ::= identifier? '{' struct-declaration+ '}'
                    | identifier

union-specifier ::= identifier? '{' struct-declaration+ '}'
                   | identifier

struct-declaration ::= specifier-qualifier-list struct-declarator-list ';'

struct-declarator-list ::= struct-declarator (',' struct-declarator)*

struct-declarator ::= declarator? ':' constant-expression
                     | declarator

enum-specifier ::= identifier? '{' enumerator-list ','? '}'
                  | identifier

enumerator-list ::= enumerator (',' enumerator)*

enumerator ::= enumeration-constant ('=' constant-expression)?

enumeration-constant ::= identifier

// Type qualifiers
type-qualifier ::= 'const' | 'restrict' | 'volatile' | '_Atomic'

// Function specifiers
function-specifier ::= 'inline' | '_Noreturn'

// Alignment specifier
alignment-specifier ::= '_Alignas' '(' type-name ')'
                       | '_Alignas' '(' constant-expression ')'

Function Definitions

function-definition ::= declaration-specifiers declarator declaration-list? compound-statement

declaration-list ::= declaration+

// cspaced specific: compound-statement uses indentation
compound-statement ::= ':' newline indent statement-list dedent

statement-list ::= statement*

// In traditional C: compound-statement ::= '{' block-item-list? '}'

// block-item-list ::= block-item+
// block-item ::= declaration | statement

Statements

statement ::= labeled-statement
             | compound-statement    // indentation-based blocks
             | expression-statement
             | selection-statement
             | iteration-statement
             | jump-statement

labeled-statement ::= identifier ':' statement
                     | 'case' constant-expression ':' statement
                     | 'default' ':' statement

// cspaced extends selection with indentation
selection-statement ::= 'if' '(' expression ')' compound-statement ('else' compound-statement)?
                       | 'switch' '(' expression ')' compound-statement

// cspaced extends iteration with indentation
iteration-statement ::= 'while' '(' expression ')' compound-statement
                       | 'do' compound-statement 'while' '(' expression ')' ';'
                       | 'for' '(' for-init ';' expression? ';' expression? ')' compound-statement

for-init ::= declaration
            | expression?

jump-statement ::= 'goto' identifier ';'
                  | 'continue' ';'
                  | 'break' ';'
                  | 'return' expression? ';'

// cspaced: semicolons optional in jump statements
jump-statement ::= 'goto' identifier
                  | 'continue'
                  | 'break'
                  | 'return' expression?

Expressions

expression ::= assignment-expression (',' assignment-expression)*

assignment-expression ::= conditional-expression
                          | unary-expression assignment-operator assignment-expression

assignment-operator ::= '=' | '*=' | '/=' | '%=' | '+=' | '-=' | '<<=' | '>>=' | '&=' | '^=' | '|='

conditional-expression ::= logical-or-expression ('?' expression ':' conditional-expression)?

logical-or-expression ::= logical-and-expression ('||' logical-and-expression)*

logical-and-expression ::= inclusive-or-expression ('&&' inclusive-or-expression)*

inclusive-or-expression ::= exclusive-or-expression ('|' exclusive-or-expression)*

exclusive-or-expression ::= and-expression ('^' and-expression)*

and-expression ::= equality-expression ('&' equality-expression)*

equality-expression ::= relational-expression (('==' | '!=') relational-expression)*

relational-expression ::= shift-expression (('<' | '>' | '<=' | '>=') shift-expression)*

shift-expression ::= additive-expression (('<<' | '>>') additive-expression)*

additive-expression ::= multiplicative-expression (('+' | '-') multiplicative-expression)*

multiplicative-expression ::= cast-expression (('*' | '/' | '%') cast-expression)*

cast-expression ::= '(' type-name ')' cast-expression
                   | unary-expression

unary-expression ::= postfix-expression
                    | ('++' | '--') unary-expression
                    | unary-operator cast-expression
                    | 'sizeof' '(' type-name ')'
                    | 'sizeof' unary-expression
                    | '_Alignof' '(' type-name ')'

unary-operator ::= '&' | '*' | '+' | '-' | '~' | '!'

postfix-expression ::= primary-expression
                      | postfix-expression '[' expression ']'
                      | postfix-expression '(' argument-expression-list? ')'
                      | postfix-expression '.' identifier
                      | postfix-expression '->' identifier
                      | postfix-expression ('++' | '--')
                      | '(' type-name ')' '{' initializer-list ','? '}'

primary-expression ::= identifier
                      | constant
                      | string-literal
                      | '(' expression ')'
                      | generic-selection

generic-selection ::= '_Generic' '(' assignment-expression ',' generic-assoc-list ')'

generic-assoc-list ::= generic-association (',' generic-association)*

generic-association ::= type-name ':' assignment-expression
                       | 'default' ':' assignment-expression

constant ::= integer-constant | floating-constant | enumeration-constant | character-constant

argument-expression-list ::= assignment-expression (',' assignment-expression)*

Indentation Rules

Block Structure

block ::= ':' newline indent statement* dedent

indent ::= exactly 2 spaces per nesting level
dedent ::= reduce indentation by multiples of 2 spaces

Examples

// Level 0
if (condition):
  // Level 1 (2 spaces)
  statement1
  if (nested):
    // Level 2 (4 spaces)
    statement2
  // Level 1 again
  statement3
// Level 0 again

Continuation Lines

statement ::= ... \
               continued-line

// Automatically handled for:
- Function calls: func(arg1,
                    arg2,
                    arg3)
// - Binary ops: result = a + b +
  //             c + d

Preprocessor Support

cspaced supports all standard C preprocessor directives:

preprocessing-file ::= group*

group ::= group-part*
       | if-section
       | control-line
       | text-line
       | '# non-directive'

group-part ::= if-section
              | control-line
              | text-line
              | '# non-directive'

if-section ::= if-group elif-groups? else-group? endif-line

if-group ::= '# if' constant-expression newline group?
            | '# ifdef' identifier newline group?
            | '# ifndef' identifier newline group?

elif-groups ::= elif-group*

elif-group ::= '# elif' constant-expression newline group?

else-group ::= '# else' newline group?

endif-line ::= '# endif' newline

control-line ::= '# include' pp-tokens newline
                | '# define' identifier replacement-list newline
                | '# define' identifier '(' identifier-list? ')' replacement-list newline
                | '# define' identifier '(' '...' ')' replacement-list newline
                | '# define' identifier '(' identifier-list ',' '...' ')' replacement-list newline
                | '# undef' identifier newline
                | '# line' pp-tokens newline
                | '# error' pp-tokens? newline
                | '# pragma' pp-tokens? newline
                | '# ' pp-tokens? newline

text-line ::= pp-tokens? newline

replacement-list ::= pp-tokens?

pp-tokens ::= preprocessing-token*

preprocessing-token ::= header-name
                       | identifier
                       | pp-number
                       | character-constant
                       | string-literal
                       | punctuator
                       | 'each non-white-space character'

Tree-Sitter Grammar Notes

Grammar Structure

// grammar.js structure for tree-sitter-cspaced
module.exports = grammar({
  name: 'cspaced',

  rules: {
    // Top level
    translation_unit: $ => repeat($.external_declaration),

    // External declarations
    external_declaration: $ => choice(
      $.function_definition,
      $.declaration,
      $.preproc_directive
    ),

    // Function definitions with indented bodies
    function_definition: $ => seq(
      $.declaration_specifiers,
      $.declarator,
      $.compound_statement
    ),

    // Indented compound statements (key innovation)
    compound_statement: $ => seq(
      ':',
      $._indent,
      repeat($.statement),
      $._dedent
    ),

    // Handle significant indentation
    _indent: $ => token.immediate(/\n  /),  // 2 spaces
    _dedent: $ => token.immediate(/\n/),    // back to previous level

    // Statement rules with optional semicolons
    statement: $ => choice(
      $.if_statement,
      $.while_statement,
      $.for_statement,
      $.return_statement,
      $.expression_statement
    ),

    // Selection statements with indented blocks
    if_statement: $ => seq(
      'if',
      $.parenthesized_expression,
      $.compound_statement,
      optional(seq('else', $.compound_statement))
    ),

    // Expression statements (optional semicolon)
    expression_statement: $ => seq(
      $.expression,
      optional(';')
    ),

    // ... rest of rules
  },

  // Reserved words
  word: $ => $.identifier,

  // Externals for indentation tracking
  externals: $ => [
    $._indent,
    $._dedent,
    $._newline
  ],

  // Conflict resolution for optional semicolons
  conflicts: $ => [
    [$.expression_statement, $.declaration]
  ],

  // Precedence and associativity
  precedences: $ => [
    // ... operator precedence rules
  ]
});

Parser Implementation Notes

Indentation Tracking

  • Lexer must track indentation levels in stack
  • INDENT/DEDENT tokens generated for level changes
  • Consistency enforced within files

Semicolon Inference

  • Statements ending at newline get implicit ;
  • Complex expressions may require explicit ;
  • Context-aware insertion rules

Error Recovery

  • Skip to next consistent indentation level
  • Report indentation mismatches as syntax errors
  • Allow recovery from malformed blocks

Platform-Specific Considerations

Compiler Integration

  • Detects tcc, gcc, clang automatically
  • Passes through compiler flags
  • Generates intermediate .c files

File Extensions

  • .csp for cspaced source files
  • Auto-generates .c files
  • Executables named without extensions

Future Extensions

Planned Features

  • async/await syntax for coroutines
  • Pattern matching for switch expressions
  • Type inference for variable declarations
  • Module system beyond #include

Experimental Features

  • Named parameters in function calls
  • String interpolation
  • Range-based for loops (for x in array)

This specification provides a complete foundation for implementing cspaced parsers, syntax highlighters, formatters, and language servers.