hamelsmu

Are you hamelsmu? Claim your skills.

Verified badgeInstall analyticsFork attributionSecurity scoreGitHub sync

Facilitates the evaluation of retrieval-augmented generation systems by measuring retrieval quality and optimizing generation strategies.

Assists users in identifying and categorizing failure modes in LLM pipelines by analyzing traces for improved evaluation and debugging.

Audits LLM evaluation pipelines to identify issues and provide actionable insights for improving evaluation trustworthiness.

Designs evaluators for subjective criteria in AI, enhancing assessment of tone, relevance, and completeness beyond code-based checks.

Creates a custom browser-based annotation interface for reviewing LLM traces and collecting structured feedback efficiently.

Generates diverse synthetic test inputs for LLM evaluation, aiding in dataset bootstrapping and stress-testing failure hypotheses.

Calibrates LLM judges against human labels using data splits and bias correction to ensure reliable outputs.

Provides guidelines for creating and maintaining skills for AI coding agents, focusing on clarity and domain-specific directives.