zxmengde
GitHub profile for zxmengde11 skills
Are you zxmengde? Claim your skills.
zxmengde / ai-research-01-model-architecture-torchtitan
Facilitates distributed LLM pretraining using PyTorch with advanced parallelism for efficient model training on multiple GPUs.
zxmengde / ai-research-02-tokenization-sentencepiece
Provides a language-independent tokenizer for multilingual models, supporting BPE and Unigram algorithms for efficient text processing.
zxmengde / ai-research-04-mechanistic-interpretability-nnsight
Guides users in interpreting neural network internals using nnsight, enabling experiments on large models without local GPU resources.
zxmengde / ai-research-04-mechanistic-interpretability-pyvene
Guides users in performing causal interventions on PyTorch models using pyvene's framework for reproducible experiments.
zxmengde / ai-research-04-mechanistic-interpretability-saelens
Guides training and analysis of Sparse Autoencoders for interpretable feature extraction in neural networks.
zxmengde / ai-research-04-mechanistic-interpretability-transformer-lens
Guides mechanistic interpretability research using TransformerLens for inspecting transformer internals and studying attention patterns.
zxmengde / ai-research-06-post-training-miles
Guides enterprise-grade RL training using miles for large MoE models, optimizing performance and stability with low-precision techniques.
zxmengde / ai-research-06-post-training-slime
Guides LLM post-training with RL using the slime framework, integrating Megatron-LM for efficient model training and data generation.
zxmengde / ai-research-08-distributed-training-megatron-core
Facilitates training of large language models using NVIDIA Megatron-Core with advanced parallelism for optimal GPU efficiency.
zxmengde / ai-research-08-distributed-training-pytorch-fsdp2
Enhances PyTorch training scripts with FSDP2 for efficient distributed training, enabling large model handling and optimized performance.
zxmengde / ai-research-08-distributed-training-pytorch-lightning
Facilitates distributed training in PyTorch with minimal boilerplate, enabling seamless scaling from laptops to supercomputers.
zxmengde / ai-research-08-distributed-training-ray-train
Facilitates distributed training of machine learning models across clusters, optimizing performance with Ray's orchestration capabilities.
zxmengde / ai-research-09-infrastructure-modal
Enables seamless deployment of ML models on a serverless GPU cloud platform, optimizing for performance and cost efficiency.
zxmengde / ai-research-09-infrastructure-skypilot
Facilitates multi-cloud orchestration for ML workloads with automatic cost optimization and efficient resource management.
zxmengde / ai-research-10-optimization-awq
Optimizes large language models with activation-aware weight quantization for faster inference and minimal accuracy loss.
zxmengde / ai-research-10-optimization-bitsandbytes
Optimizes large language models by quantizing them to 8-bit or 4-bit, significantly reducing memory usage while maintaining accuracy.
zxmengde / ai-research-10-optimization-flash-attention
Optimizes transformer attention using Flash Attention for significant speed and memory efficiency in PyTorch models.
zxmengde / ai-research-10-optimization-hqq
Enables fast, calibration-free quantization of LLMs to low-bit precision, enhancing deployment efficiency with HuggingFace Transformers.
zxmengde / ai-research-10-optimization-ml-training-recipes
Provides optimized PyTorch training recipes for various domains, enhancing model training efficiency and performance.
zxmengde / ai-research-11-evaluation-bigcode-evaluation-harness
Evaluates code generation models using multiple benchmarks to assess coding abilities and quality across various programming languages.