Skip to content

Latest commit

 

History

History
31 lines (30 loc) · 2.47 KB

File metadata and controls

31 lines (30 loc) · 2.47 KB

TODO

  • Primary:
    • Add limitations:
    • Add https://arxiv.org/abs/2603.03276 as reference on multimodal asymmetry
    • Add Gemini Pretraining notes on MoE data scaling asymmetry as additional need for non-symmetric methods beyond multimodal
    • Make a reference implementation
    • Mention FLOP factor correction and WLS weighting for approach 2 as possible improvements
      • Or at least mention importance of reliance on C=6ND assumption
  • Secondary:
    • Add WLS analysis
    • Add exp6 validation for proof to appendix
    • Consider https://arxiv.org/abs/2603.06603 as another citation for methods that "extend individual terms in isolation (e.g. token scaling terms alone)"
    • Cite Gemstones: A Model Suite for Multi-Faceted Scaling Laws on how C=6ND breaks down w/ model shape
    • Cite Scaling Laws for Native Multimodal Models on PlantCAD issue for empirical C ~ D^b method (see C. Scaling Laws)
    • Mention the demo prompt examples for making your own simulator; examples:
    • Add note advising against using logloss given bias in simulations and ml-scalefit reproduction
    • Copy intercept-error proof into paper appendix
    • Add citations from "Configuration-to-Performance Scaling Law with Neural Ansatz" on other adaptations of functional forms for Chinchilla scaling laws
    • Review figures.py for ways to use existing code utilities and then regen (or push back into experiments code)