Skip to content

Update README.md#435

Open
imthegoodboy wants to merge 1 commit intoxai-org:mainfrom
imthegoodboy:patch-1
Open

Update README.md#435
imthegoodboy wants to merge 1 commit intoxai-org:mainfrom
imthegoodboy:patch-1

Conversation

@imthegoodboy
Copy link
Copy Markdown

No description provided.

gauravagerwala added a commit to gauravagerwala/grok-1 that referenced this pull request Dec 7, 2025
@sahiee-dev
Copy link
Copy Markdown

@imthegoodboy
Need Some Improvements

The updates significantly improve the structure of the Grok-1 documentation. Given the scale of this model (314B parameters), here are a few technical suggestions to improve clarity and prevent common user issues:


1. Technical Specification Scannability

The current list contains some redundant formatting (double colons). Replacing the list with a table makes it much easier for researchers to reference at a glance:

Feature Specification
Parameters 314B
Architecture Mixture of Experts (MoE)
Active Experts 2/8 (2 experts used per token)
Context Length 8,192 tokens
Embedding Size 6,144
Tokenizer SentencePiece (131,072 tokens)

2. Hardware Requirements Disclosure

While "ample VRAM" is mentioned, users may not realize the exact threshold for a model of this magnitude.

  • Suggestion: Explicitly state that loading the model in FP16/BF16 requires approximately ~630GB of VRAM. This clarifies that the code requires a multi-GPU cluster (e.g., 8x A100 80GB or 8x H100) to run, which will help reduce "Out of Memory" issues being reported as bugs.

3. Expected Checkpoint Structure

To ensure the run.py script works out of the box for users, adding a directory tree visualization would be helpful:

checkpoints/
└── ckpt-0/
    ├── [weight_files].jax
    └── tokenizer.model

4. Documentation Polish

  • Typo Fix: In the "Downloading the Weights" section, the header contains a typo: wWeights. It should be corrected to Weights.

  • Environment Best Practices: In the "Quick Start," it might be worth suggesting the use of a virtual environment (python -m venv venv) before installing dependencies to avoid JAX/CUDA version conflicts.

  • Storage Note: Mentioning that the total disk footprint (download + extraction) is over 300GB would be a helpful "Quality of Life" tip for users.

@imthegoodboy
Copy link
Copy Markdown
Author

burhh fr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants