Skip to content

muooon/DRNA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

D-RNA:Dual‑Helix Resonance Neural Architecture (DRNA)

Pre-Norm Edition

Attention is all you need_started,

Resonance is all you need_endure,

Neocognitron ― Transformer ― Dream Resonance Never Adjourns — it goes on...

⭐ If you like this project, please give it a star ⭐
readme:English | 日本語

D-RNA is a new neural architecture centered on a dual helix structure and a rotation field produced by RoPE.

In this architecture, Attention and MLP are synchronized into a dual helix, and information is holographically compressed through Resonant Contraction.
This method rearranges sparse representations into dense ones to achieve high expressiveness using the depth‑direction structure alone, without increasing the number of dimensions.
A key feature of this approach is its ability to preserve the full connectivity of the Transformer architecture while suppressing catastrophic forgetting and retaining subtle fluctuations and phase information.


Explanation

High‑Density Transformer and Fast Convergence via Dual‑Helix Resonant Contraction Architecture (D‑RNA)


Features

High structural compatibility: It has the exact same input–output shape as a standard Transformer Block, allowing it to be smoothly substituted as the core of an architecture.
Resonant Contraction: By synchronizing Attention and the MLP in a double‑helix pattern and converging information into a phase field, it dramatically increases representational density.
Depth as an alternative to dimensionality: The spiral rotation (depth‑wise operations) compensates for limited dimensionality and enables holographic information retention without increasing parameter count.
Excellent learning efficiency: The spiral‑based information attraction (synchronization) achieves astonishing early convergence with far fewer steps than a Transformer.
Fine‑grained phase preservation: The rotational field powered by RoPE preserves subtle fluctuations and relative contextual relationships that are often lost in conventional architectures.
Re‑synchronization of knowledge: Existing weights can be transplanted as initialization and gently adapted to the spiral phase with a low learning rate, allowing existing intelligence to be evolved or overwritten into the D-RNA structure.

Notes

Optimization of learning rate (LR):
Because D-RNA synchronizes information extremely quickly through Resonant Contraction, it converges sufficiently — and rapidly — even with a lower learning rate compared to a standard Transformer.
If the LR is set too high, the resonance may be excessively amplified and cause oscillation, so starting with a modest LR is recommended.
Synergistic gradient effects:
Since Attention (recall) and the MLP (memory) are synchronized in a double‑helix sequence, Each weight update exerts a significant impact on synchronization.
This is an advantage for fast convergence, but it also means that careful updates are key to stability.
Parameter commonality:
Hyperparameters such as weight initialization seeds and batch size can be inherited directly from standard Transformer settings.

Characteristics of D-RNA

D-RNA constructs a resonant contraction method (resonant projection field) based on the “phase of the helix.”
By transforming sparse structures into dense ones, this approach suppresses destructive forgetting (without causing mutual interference) and accelerates toward the shortest path.
Even fine noise facilitates information purification, smoothing the manifold and cumulatively achieving generalization.
These mechanisms are independent of any specific framework and function across all optimizers and models.
In a sense, it is a mechanism resembling a biological brain, consisting of neuron- and glia-like structures.
The resonant contraction method (resonant projection field) ultimately yields an equivalent of an ODE reduction approximation.


Conceptual Diagram

	Synchronizing “searching” (Attention)  
	   and “knowing” (MLP) in the phase of a spiral.  

	RoPE Rotation Field (Phase-Preserving)  
	Holographic Compression: Turning Sparse into Dense  

		A     M  
		 \   /  
		  \ /    ← This is Resonance  
		  / \      Synchronization occurs naturally through the seed  
		 /   \     Naturally, meaning emerges through a chain of synchronicities  
		A     M  

	Repeats in the depth direction to form a dual helix  
	(acts as a substitute for increasing dimensionality)  

Minimal Block

class ResonantBlock(nn.Module):
    def __init__(self, dim, n_heads):
        super().__init__()
        self.qkv = nn.Linear(dim, dim * 3)
        self.out = nn.Linear(dim, dim)
        self.mlp = MLP(dim)
        self.norm1 = nn.LayerNorm(dim)
        self.norm2 = nn.LayerNorm(dim)
        self.n_heads = n_heads
        self.d_head = dim // n_heads

    def forward(self, x, cos, sin):
        # --- Attention Path (Pre-Norm) ---
        residual = x
        x_norm = self.norm1(x)  # 演算の前にNormを適用
        
        q, k, v = project_qkv(x_norm, self.qkv, self.n_heads, self.d_head)
        q, k = apply_rope(q, k, cos, sin)
        attn_out = attention(q, k, v)
        x = residual + self.out(attn_out)

        # --- MLP Path (Pre-Norm) ---
        residual = x
        x = residual + self.mlp(self.norm2(x))  # 演算の前にNormを適用
        
        return x

Replacement and Utilization of D-RNA

Example: Replacing a Transformer block with a D-RNA block

class DRNA_ResonantBlock(nn.Module):
    """
    Replace the existing TransformerBlock with this ResonantBlock.
    I/O: [Batch, Seq, Dim] -> [Batch, Seq, Dim] (Fully compatible)
    Architecture: Pre-Norm (Stability-first for Deep Networks)
    """
    def __init__(self, dim, n_heads, mlp_dim_forward=None):
        super().__init__()
        self.n_heads = n_heads
        self.d_head = dim // n_heads
        
        # 1. Spiral Projection Layer (A)
        self.qkv = nn.Linear(dim, dim * 3)
        self.out = nn.Linear(dim, dim)
        
        # 2. Spiral Memory Layer (B)
        mlp_dim = mlp_dim_forward if mlp_dim_forward else dim * 4
        self.mlp = nn.Sequential(
            nn.Linear(dim, mlp_dim),
            nn.GELU(),
            nn.Linear(mlp_dim, dim)
        )
        
        # 3. Normalization layer for pre-processing
        self.norm1 = nn.LayerNorm(dim)
        self.norm2 = nn.LayerNorm(dim)

    def forward(self, x, cos, sin):
        """
        Phase information for RoPE as an argument (cos, sin)
        """
        # --- Attention Path (Pre-Norm) ---
        # Normalize -> QKV -> RoPE -> Residual Add
        residual = x
        x_norm = self.norm1(x)
        
        q, k, v = project_qkv(x_norm, self.qkv, self.n_heads, self.d_head)
        q, k = apply_rope(q, k, cos, sin)
        
        attn_out = attention(q, k, v)
        x = residual + self.out(attn_out) 

        # --- MLP Path (Pre-Norm) ---
        # Normalize -> MLP -> Residual Add
        residual = x
        x = residual + self.mlp(self.norm2(x)) 
        
        return x

Replacement and Utilization of D-RNA

A direct drop‑in replacement is not possible, but it can be utilized through “redefinition and re‑synchronization.”
Why it cannot be used as‑is:
While a standard Transformer stores information using an “absolute address” (absolute position), D-RNA processes information using the “phase of a spiral” (relative position), meaning the coordinate systems are fundamentally different.
Even if the weights are copied directly, the phases do not align and no resonance cannot be induced immediately.
How to replace it (implementation):
The network’s input–output shapes are fully compatible.
By rewriting the existing layers as ResonantBlock and migrating positional information into RoPE’s rotational field, the core upgrade is complete.
How to utilize and adapt it (training):
After transferring the existing model’s weights as initialization, continue training with a low learning rate.
The previously static knowledge (existing weights) begins to synchronize with the spiral rotation, gradually blending into D-RNA’s “Resonant Contraction” process and evolving beyond the original performance.


BPC Comparison Chart

Latest Test Results (Suitable for Learning)
bpc_prenorm_battle_Lay16-4x07x5000

AdamW:5e-5
Vanilla:16L, VRAM:2.11GB, Step 5000 | BPC: 3.2970 | 801.9s
D-RNA:4L, VRAM:1.03GB, Step 5000 | BPC: 2.8744 | 316.4s
|Efficiency| VRAM: Reduced by approximately 50%, BPC: Improved accuracy, Speed: Approximately 2.5 times faster

Log after Kv-RoPE restriction

Learning Test Status (Details):
Model scale:dim:128、 layers:16/4(D-RNA)、 heads:4
Data set:enwik8(100MB)
Learning Settings:step:5,000、 batch:16、 seq_len:512、 AdamW(LR:5e-5)

Configured: Battle Mode (Vanilla=16L vs D-RNA=4L)  

--- 🚀 Starting Run: Transformer (Layers: 16) ---  
Step    0 | BPC: 8.3963 | VRAM: 2.09GB | 0.3s
Step   50 | BPC: 5.9260 | VRAM: 2.11GB | 8.3s
Step  100 | BPC: 5.4452 | VRAM: 2.11GB | 16.3s
Step  150 | BPC: 5.0283 | VRAM: 2.11GB | 24.1s
Step  200 | BPC: 4.5926 | VRAM: 2.11GB | 32.2s
Step  250 | BPC: 4.4936 | VRAM: 2.11GB | 40.2s
Step  300 | BPC: 4.4320 | VRAM: 2.11GB | 48.0s
Step  350 | BPC: 4.4243 | VRAM: 2.11GB | 55.9s
Step  400 | BPC: 4.0412 | VRAM: 2.11GB | 63.9s
Step  450 | BPC: 4.0397 | VRAM: 2.11GB | 71.8s
Step  500 | BPC: 4.1921 | VRAM: 2.11GB | 79.7s
Step  550 | BPC: 4.0418 | VRAM: 2.11GB | 87.6s
Step  600 | BPC: 4.1054 | VRAM: 2.11GB | 95.5s
Step  650 | BPC: 3.8973 | VRAM: 2.11GB | 103.5s
Step  700 | BPC: 4.0361 | VRAM: 2.11GB | 111.4s
Step  750 | BPC: 3.8873 | VRAM: 2.11GB | 119.4s
Step  800 | BPC: 3.8468 | VRAM: 2.11GB | 127.6s
Step  850 | BPC: 3.9349 | VRAM: 2.11GB | 135.8s
Step  900 | BPC: 3.9785 | VRAM: 2.11GB | 144.0s
Step  950 | BPC: 3.8893 | VRAM: 2.11GB | 152.2s
Step 1000 | BPC: 3.7580 | VRAM: 2.11GB | 160.4s
Step 1050 | BPC: 3.9328 | VRAM: 2.11GB | 168.5s
Step 1100 | BPC: 3.7746 | VRAM: 2.11GB | 176.6s
Step 1150 | BPC: 3.7990 | VRAM: 2.11GB | 184.7s
Step 1200 | BPC: 3.7760 | VRAM: 2.11GB | 193.0s
Step 1250 | BPC: 3.8704 | VRAM: 2.11GB | 201.2s
Step 1300 | BPC: 3.7458 | VRAM: 2.11GB | 209.4s
Step 1350 | BPC: 3.7624 | VRAM: 2.11GB | 217.6s
Step 1400 | BPC: 3.7851 | VRAM: 2.11GB | 225.6s
Step 1450 | BPC: 3.7754 | VRAM: 2.11GB | 233.6s
Step 1500 | BPC: 3.7048 | VRAM: 2.11GB | 241.6s
Step 1550 | BPC: 3.8543 | VRAM: 2.11GB | 249.6s
Step 1600 | BPC: 3.7900 | VRAM: 2.11GB | 257.6s
Step 1650 | BPC: 3.7374 | VRAM: 2.11GB | 265.6s
Step 1700 | BPC: 3.5948 | VRAM: 2.11GB | 273.6s
Step 1750 | BPC: 3.5474 | VRAM: 2.11GB | 281.6s
Step 1800 | BPC: 3.5863 | VRAM: 2.11GB | 289.7s
Step 1850 | BPC: 3.7306 | VRAM: 2.11GB | 297.7s
Step 1900 | BPC: 3.6679 | VRAM: 2.11GB | 305.7s
Step 1950 | BPC: 3.6901 | VRAM: 2.11GB | 313.7s
Step 2000 | BPC: 3.6446 | VRAM: 2.11GB | 321.7s
Step 2050 | BPC: 3.5935 | VRAM: 2.11GB | 329.7s
Step 2100 | BPC: 3.5685 | VRAM: 2.11GB | 337.7s
Step 2150 | BPC: 3.7369 | VRAM: 2.11GB | 345.7s
Step 2200 | BPC: 3.6565 | VRAM: 2.11GB | 353.7s
Step 2250 | BPC: 3.7226 | VRAM: 2.11GB | 361.7s
Step 2300 | BPC: 3.4056 | VRAM: 2.11GB | 369.7s
Step 2350 | BPC: 3.6761 | VRAM: 2.11GB | 377.6s
Step 2400 | BPC: 3.5442 | VRAM: 2.11GB | 385.6s
Step 2450 | BPC: 3.6574 | VRAM: 2.11GB | 393.6s
Step 2500 | BPC: 3.4996 | VRAM: 2.11GB | 401.6s
Step 2550 | BPC: 3.5436 | VRAM: 2.11GB | 409.6s
Step 2600 | BPC: 3.6407 | VRAM: 2.11GB | 417.6s
Step 2650 | BPC: 3.5530 | VRAM: 2.11GB | 425.6s
Step 2700 | BPC: 3.5134 | VRAM: 2.11GB | 433.6s
Step 2750 | BPC: 3.6320 | VRAM: 2.11GB | 441.6s
Step 2800 | BPC: 3.5229 | VRAM: 2.11GB | 449.6s
Step 2850 | BPC: 3.6339 | VRAM: 2.11GB | 457.6s
Step 2900 | BPC: 3.5928 | VRAM: 2.11GB | 465.6s
Step 2950 | BPC: 3.6163 | VRAM: 2.11GB | 473.6s
Step 3000 | BPC: 3.3798 | VRAM: 2.11GB | 481.6s
Step 3050 | BPC: 3.5823 | VRAM: 2.11GB | 489.6s
Step 3100 | BPC: 3.5384 | VRAM: 2.11GB | 497.6s
Step 3150 | BPC: 3.4950 | VRAM: 2.11GB | 505.6s
Step 3200 | BPC: 3.5007 | VRAM: 2.11GB | 513.6s
Step 3250 | BPC: 3.4352 | VRAM: 2.11GB | 521.6s
Step 3300 | BPC: 3.5145 | VRAM: 2.11GB | 529.6s
Step 3350 | BPC: 3.5518 | VRAM: 2.11GB | 537.7s
Step 3400 | BPC: 3.5272 | VRAM: 2.11GB | 545.7s
Step 3450 | BPC: 3.5821 | VRAM: 2.11GB | 553.7s
Step 3500 | BPC: 3.5452 | VRAM: 2.11GB | 561.7s
Step 3550 | BPC: 3.4426 | VRAM: 2.11GB | 569.7s
Step 3600 | BPC: 3.5087 | VRAM: 2.11GB | 577.7s
Step 3650 | BPC: 3.4893 | VRAM: 2.11GB | 585.7s
Step 3700 | BPC: 3.6078 | VRAM: 2.11GB | 593.7s
Step 3750 | BPC: 3.6168 | VRAM: 2.11GB | 601.7s
Step 3800 | BPC: 3.3611 | VRAM: 2.11GB | 609.7s
Step 3850 | BPC: 3.5110 | VRAM: 2.11GB | 617.7s
Step 3900 | BPC: 3.4627 | VRAM: 2.11GB | 625.7s
Step 3950 | BPC: 3.2842 | VRAM: 2.11GB | 633.7s
Step 4000 | BPC: 3.5764 | VRAM: 2.11GB | 641.7s
Step 4050 | BPC: 3.2557 | VRAM: 2.11GB | 649.7s
Step 4100 | BPC: 3.4295 | VRAM: 2.11GB | 657.7s
Step 4150 | BPC: 3.4520 | VRAM: 2.11GB | 665.7s
Step 4200 | BPC: 3.2938 | VRAM: 2.11GB | 673.8s
Step 4250 | BPC: 3.3882 | VRAM: 2.11GB | 681.8s
Step 4300 | BPC: 3.3491 | VRAM: 2.11GB | 689.8s
Step 4350 | BPC: 3.4648 | VRAM: 2.11GB | 697.8s
Step 4400 | BPC: 3.4442 | VRAM: 2.11GB | 705.8s
Step 4450 | BPC: 3.3809 | VRAM: 2.11GB | 713.8s
Step 4500 | BPC: 3.5511 | VRAM: 2.11GB | 721.8s
Step 4550 | BPC: 3.3884 | VRAM: 2.11GB | 729.8s
Step 4600 | BPC: 3.3117 | VRAM: 2.11GB | 737.8s
Step 4650 | BPC: 3.3749 | VRAM: 2.11GB | 745.8s
Step 4700 | BPC: 3.3855 | VRAM: 2.11GB | 753.8s
Step 4750 | BPC: 3.4674 | VRAM: 2.11GB | 761.9s
Step 4800 | BPC: 3.4271 | VRAM: 2.11GB | 769.9s
Step 4850 | BPC: 3.4085 | VRAM: 2.11GB | 777.9s
Step 4900 | BPC: 3.4258 | VRAM: 2.11GB | 785.9s
Step 4950 | BPC: 3.3319 | VRAM: 2.11GB | 793.9s
Step 5000 | BPC: 3.2970 | VRAM: 2.11GB | 801.9s

--- 🚀 Starting Run: D-RNA_Transformer(L4) (Layers: 4) ---  
Step    0 | BPC: 8.1803 | VRAM: 1.02GB | 0.1s
Step   50 | BPC: 6.0895 | VRAM: 1.03GB | 3.2s
Step  100 | BPC: 5.5667 | VRAM: 1.03GB | 6.3s
Step  150 | BPC: 5.0683 | VRAM: 1.03GB | 9.5s
Step  200 | BPC: 4.8225 | VRAM: 1.03GB | 12.6s
Step  250 | BPC: 4.5308 | VRAM: 1.03GB | 15.7s
Step  300 | BPC: 4.3597 | VRAM: 1.03GB | 18.8s
Step  350 | BPC: 4.3135 | VRAM: 1.03GB | 21.9s
Step  400 | BPC: 4.0541 | VRAM: 1.03GB | 25.0s
Step  450 | BPC: 4.0538 | VRAM: 1.03GB | 28.1s
Step  500 | BPC: 3.8305 | VRAM: 1.03GB | 31.3s
Step  550 | BPC: 3.9215 | VRAM: 1.03GB | 34.4s
Step  600 | BPC: 4.0164 | VRAM: 1.03GB | 37.5s
Step  650 | BPC: 3.8336 | VRAM: 1.03GB | 40.6s
Step  700 | BPC: 3.7699 | VRAM: 1.03GB | 43.7s
Step  750 | BPC: 3.8394 | VRAM: 1.03GB | 46.8s
Step  800 | BPC: 3.8393 | VRAM: 1.03GB | 49.9s
Step  850 | BPC: 3.7473 | VRAM: 1.03GB | 53.1s
Step  900 | BPC: 3.5263 | VRAM: 1.03GB | 56.2s
Step  950 | BPC: 3.6108 | VRAM: 1.03GB | 59.3s
Step 1000 | BPC: 3.6208 | VRAM: 1.03GB | 62.4s
Step 1050 | BPC: 3.4813 | VRAM: 1.03GB | 65.5s
Step 1100 | BPC: 3.6377 | VRAM: 1.03GB | 68.6s
Step 1150 | BPC: 3.5227 | VRAM: 1.03GB | 71.8s
Step 1200 | BPC: 3.5667 | VRAM: 1.03GB | 74.9s
Step 1250 | BPC: 3.4331 | VRAM: 1.03GB | 78.0s
Step 1300 | BPC: 3.4172 | VRAM: 1.03GB | 81.2s
Step 1350 | BPC: 3.6982 | VRAM: 1.03GB | 84.3s
Step 1400 | BPC: 3.3116 | VRAM: 1.03GB | 87.5s
Step 1450 | BPC: 3.4180 | VRAM: 1.03GB | 90.6s
Step 1500 | BPC: 3.5096 | VRAM: 1.03GB | 93.7s
Step 1550 | BPC: 3.3789 | VRAM: 1.03GB | 96.9s
Step 1600 | BPC: 3.3193 | VRAM: 1.03GB | 100.0s
Step 1650 | BPC: 3.2843 | VRAM: 1.03GB | 103.2s
Step 1700 | BPC: 3.3066 | VRAM: 1.03GB | 106.3s
Step 1750 | BPC: 3.2612 | VRAM: 1.03GB | 109.4s
Step 1800 | BPC: 3.2183 | VRAM: 1.03GB | 112.6s
Step 1850 | BPC: 3.2831 | VRAM: 1.03GB | 115.9s
Step 1900 | BPC: 3.3514 | VRAM: 1.03GB | 119.1s
Step 1950 | BPC: 3.3732 | VRAM: 1.03GB | 122.3s
Step 2000 | BPC: 3.3886 | VRAM: 1.03GB | 125.5s
Step 2050 | BPC: 3.3236 | VRAM: 1.03GB | 128.8s
Step 2100 | BPC: 3.4354 | VRAM: 1.03GB | 132.1s
Step 2150 | BPC: 3.0614 | VRAM: 1.03GB | 135.3s
Step 2200 | BPC: 3.2231 | VRAM: 1.03GB | 138.5s
Step 2250 | BPC: 3.1392 | VRAM: 1.03GB | 141.6s
Step 2300 | BPC: 3.2459 | VRAM: 1.03GB | 144.7s
Step 2350 | BPC: 3.0381 | VRAM: 1.03GB | 147.9s
Step 2400 | BPC: 3.2124 | VRAM: 1.03GB | 151.0s
Step 2450 | BPC: 3.0759 | VRAM: 1.03GB | 154.2s
Step 2500 | BPC: 3.1911 | VRAM: 1.03GB | 157.4s
Step 2550 | BPC: 3.2409 | VRAM: 1.03GB | 160.6s
Step 2600 | BPC: 3.1085 | VRAM: 1.03GB | 163.8s
Step 2650 | BPC: 3.2135 | VRAM: 1.03GB | 166.9s
Step 2700 | BPC: 3.1824 | VRAM: 1.03GB | 170.1s
Step 2750 | BPC: 3.0541 | VRAM: 1.03GB | 173.2s
Step 2800 | BPC: 3.2042 | VRAM: 1.03GB | 176.4s
Step 2850 | BPC: 3.2427 | VRAM: 1.03GB | 179.6s
Step 2900 | BPC: 3.1356 | VRAM: 1.03GB | 182.8s
Step 2950 | BPC: 3.1764 | VRAM: 1.03GB | 185.9s
Step 3000 | BPC: 3.2040 | VRAM: 1.03GB | 189.0s
Step 3050 | BPC: 3.1078 | VRAM: 1.03GB | 192.2s
Step 3100 | BPC: 3.0288 | VRAM: 1.03GB | 195.4s
Step 3150 | BPC: 3.0628 | VRAM: 1.03GB | 198.5s
Step 3200 | BPC: 3.2522 | VRAM: 1.03GB | 201.7s
Step 3250 | BPC: 3.0266 | VRAM: 1.03GB | 204.9s
Step 3300 | BPC: 3.0467 | VRAM: 1.03GB | 208.0s
Step 3350 | BPC: 3.0561 | VRAM: 1.03GB | 211.2s
Step 3400 | BPC: 3.0182 | VRAM: 1.03GB | 214.4s
Step 3450 | BPC: 3.0035 | VRAM: 1.03GB | 217.5s
Step 3500 | BPC: 3.0790 | VRAM: 1.03GB | 220.7s
Step 3550 | BPC: 3.0263 | VRAM: 1.03GB | 223.8s
Step 3600 | BPC: 3.0813 | VRAM: 1.03GB | 226.9s
Step 3650 | BPC: 3.1324 | VRAM: 1.03GB | 230.1s
Step 3700 | BPC: 3.1179 | VRAM: 1.03GB | 233.2s
Step 3750 | BPC: 3.1641 | VRAM: 1.03GB | 236.4s
Step 3800 | BPC: 3.0669 | VRAM: 1.03GB | 239.5s
Step 3850 | BPC: 3.1459 | VRAM: 1.03GB | 242.7s
Step 3900 | BPC: 2.8818 | VRAM: 1.03GB | 245.8s
Step 3950 | BPC: 2.9704 | VRAM: 1.03GB | 249.1s
Step 4000 | BPC: 3.0188 | VRAM: 1.03GB | 252.4s
Step 4050 | BPC: 2.9833 | VRAM: 1.03GB | 255.7s
Step 4100 | BPC: 3.2226 | VRAM: 1.03GB | 259.0s
Step 4150 | BPC: 3.1744 | VRAM: 1.03GB | 262.4s
Step 4200 | BPC: 2.9893 | VRAM: 1.03GB | 265.7s
Step 4250 | BPC: 3.1178 | VRAM: 1.03GB | 269.0s
Step 4300 | BPC: 2.9596 | VRAM: 1.03GB | 272.2s
Step 4350 | BPC: 3.1703 | VRAM: 1.03GB | 275.4s
Step 4400 | BPC: 2.8626 | VRAM: 1.03GB | 278.5s
Step 4450 | BPC: 2.9154 | VRAM: 1.03GB | 281.7s
Step 4500 | BPC: 2.9000 | VRAM: 1.03GB | 284.8s
Step 4550 | BPC: 3.0336 | VRAM: 1.03GB | 288.0s
Step 4600 | BPC: 3.0229 | VRAM: 1.03GB | 291.2s
Step 4650 | BPC: 3.1241 | VRAM: 1.03GB | 294.4s
Step 4700 | BPC: 3.0505 | VRAM: 1.03GB | 297.5s
Step 4750 | BPC: 3.1495 | VRAM: 1.03GB | 300.7s
Step 4800 | BPC: 3.0456 | VRAM: 1.03GB | 303.8s
Step 4850 | BPC: 2.9345 | VRAM: 1.03GB | 307.0s
Step 4900 | BPC: 3.1072 | VRAM: 1.03GB | 310.1s
Step 4950 | BPC: 2.9741 | VRAM: 1.03GB | 313.2s
Step 5000 | BPC: 2.8744 | VRAM: 1.03GB | 316.4s
Preview Kv-RoPE Restriction Test Results

||| Phase 1: Pure Structural Comparison (Same Number of Layers: 16L vs. 16L) |||

use-mask
bpc_prenorm_battle

use-mask 5000step bpc_prenorm_battle_5000

Learning Test Status (Details):
Model Scale: Dimension (d_model): 256, Layers (n_layers): 16, Heads (n_heads): 8
Dataset: enwik8 (100MB)
Training Settings: Steps: 5,000, Batch Size: 16, Sequence Length: 512, AdamW (LR: 1e-4)

Training Result Analysis (Overview):
Training Efficiency: 30% improvement (Step efficiency: approx. 1.5x)
Convergence Speed: 30% reduction in time cost (Convergence rate accelerated by 1.5x)

Observed Benefits from Testing (Summary):
Optimization of Parameter Density
Structural advantage in convergence characteristics
Expansion of Information Capacity within the same computational budget
※ D-RNA evolves relative to the Transformer by utilizing phase synchronization via its helical structure, incurring only minimal internal computational overhead.

Metric Normal Transformer D-RNA Transformer Difference / Efficiency
Steps to Reach Target 3,850 steps 2,350 steps ~39.0% reduction
Time Required 1365.4 sec 876.1 sec ~35.8% faster
VRAM Usage 4.51 GB 5.05 GB +0.54 GB cost

||| Phase 2: Implementation of Optimization (Reducing the Number of Layers by Half: 16L vs. 8L) |||

use-mask 5000step
bpc_prenorm_battle_5000xcos

use-mask 10000step
bpc_prenorm_battle_10000xcosGeLU

Learning Test Status (Details):
Model Scale: Dimension (d_model): 256, Layers (n_layers): 16 / 8(D-RNA), Heads (n_heads): 8
Dataset: enwik8 (100MB)
Training Settings: Steps: 10,000, Batch Size: 16, Sequence Length: 512, AdamW (LR: 1e-4), CosineAnnealing

Training Result Analysis (Overview):
Training Efficiency: 30% improvement (Step efficiency: approx. 1.5x)
※ A reduction of up to approximately 60% in the time required to reach the same level of Perplexity (around BPC 2.05) was observed.
Convergence Speed: 30% reduction in time cost (Convergence rate accelerated by 1.5x)
※ By reducing the number of physical layers (from 16L to 8L) and employing a spiral structure (phase synchronization), we have achieved both optimization of parameter density and increased processing speed.

Observed Benefits from Testing (Summary):
Optimization of Parameter Density
Structural advantage in convergence characteristics
Expansion of Information Capacity within the same computational budget
※ D-RNA evolves relative to the Transformer by utilizing phase synchronization via its helical structure, incurring only minimal internal computational overhead.

Metric Normal Transformer D-RNA Transformer Difference / Efficiency
Steps to Reach Target 4,650 steps 3,650 steps ~21.0% reduction
Time Required 1641.5 sec 685.2 sec ~58.0% faster
VRAM Usage 4.79 GB 2.56 GB ~46.0% reduction
Final BPC 1.9272 1.8958 Higher accuracy per layer
Final step time 3,542.2 sec 1,876.8 sec ~53.0% reduction

Basic Architecture Specifications:
Transformer Absolute(Learned) GELU
D-RNA RoPE(Rotary) GELU


New Perspective: Digital Vector × Phase Distance

D-RNA can achieve high-resolution approximation even in low-bit environments (e.g., 1.58-bit / Ternary weights).

Discrete Vectors: Each layer handles discrete, "digital" 3-value vectors (-1, 0, 1).

smooth

Continuous Distance: By stacking layers with different phases (Double Helix), these discrete "jagged" representations are superimposed to form a smooth, continuous curve.

This allows the model to reconstruct high-precision "meaning-distances" through wave interference, much like how a Fourier series reconstructs smooth waves from simple components. It enables handheld devices to run large-scale models with the perceptual accuracy of high-bit floating-point math.


License:

This project is licensed under the Apache License 2.0. (See the LICENSE for details).

Acknowledgments:

This work builds upon the foundation established by the Transformer architecture.
I would like to express my gratitude to the researchers and open-source communities whose contributions to attention mechanisms, positional encoding, and large-scale model design made this work possible.
Neocognitron ― Transformer ― D‑RNA Dream Resonance Never Adjourns — it goes on...

Releases

No releases published

Packages

 
 
 

Contributors

Languages