AImindPalace

AImindPalace

Popular repositories Loading

dgx-spark-nvfp4-serving dgx-spark-nvfp4-serving Public

Guide for serving fine-tuned Qwen3.5-27B (dense, NVFP4) on DGX Spark via native vLLM. Includes critical config fixes for modelopt export_hf_checkpoint() that prevent silent FP32 dequantization.

Python 2
design-challenger design-challenger Public

Adversarial design review agent — spawns writer + challenger Claude Code agents to stress-test specs and implementation plans for any codebase

TypeScript
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
mac-studio-mlx-serving mac-studio-mlx-serving Public

Benchmarks and serving notes for Qwen3.5-27B fine-tune on Apple Silicon via MLX. Companion to dgx-spark-nvfp4-serving.

Python