hamelsmu
GitHub profile for hamelsmu148 skills
Are you hamelsmu? Claim your skills.
hamelsmu / evaluate-rag
Facilitates the evaluation of retrieval-augmented generation systems by measuring retrieval quality and optimizing generation strategies.
hamelsmu / error-analysis
Assists users in identifying and categorizing failure modes in LLM pipelines by analyzing traces for improved evaluation and debugging.
hamelsmu / eval-audit
Audits LLM evaluation pipelines to identify issues and provide actionable insights for improving evaluation trustworthiness.
hamelsmu / write-judge-prompt
Designs evaluators for subjective criteria in AI, enhancing assessment of tone, relevance, and completeness beyond code-based checks.
hamelsmu / build-review-interface
Creates a custom browser-based annotation interface for reviewing LLM traces and collecting structured feedback efficiently.
hamelsmu / generate-synthetic-data
Generates diverse synthetic test inputs for LLM evaluation, aiding in dataset bootstrapping and stress-testing failure hypotheses.
hamelsmu / validate-evaluator
Calibrates LLM judges against human labels using data splits and bias correction to ensure reliable outputs.
hamelsmu / How to Write Good Skills
Provides guidelines for creating and maintaining skills for AI coding agents, focusing on clarity and domain-specific directives.