AI Engineer - RAG Platform
About the Role
We’re hiring an AI Engineer to design and build a production-grade RAG platform that powers our test autoscripting agent. This platform ingests our QA codebase and documentation, transforms them into embeddings, and serves relevant context (page objects, fixtures, helpers, examples) via a retrieval API—enabling high-quality LLM-generated tests. You’ll own everything from ingestion to evaluation, including keeping the index fresh via Jenkins and optimizing for token cost and latency.
This role is ideal for someone who thrives in the intersection of LLM tooling, backend engineering, and developer productivity.
What You’ll Do
Build and maintain a local RAG platform, including:
- Loaders for Git, Confluence, Drive.
- Code-aware chunking (AST/semantic) and embedding pipelines.
- Vector indexing in ChromaDB with metadata and reranking.
- FastAPI (or similar) retrieval service for the autoscripting agent.
- Implement metadata filters (e.g., layer=page-object|fixture|helper|test, Git SHA, feature tags) and import-based neighbor expansion to optimize context.
- Optimize for cost and performance: tune k values, context lengths, reranker thresholds, and cache frequent retrievals.
- Build retrieval evaluation and telemetry: track recall, faithfulness, token usage, compile success of generated code, and wire alerts into Jenkins CI.
- Manage access to Claude 4 Sonnet and other model APIs; help deploy self-hosted endpoints if needed (keys, quotas, audit logs).
- Write runbooks and train the SDET team on how to use and troubleshoot the RAG system.
Tech Stack (Initial Plan)
LLM orchestration: Claude 4 Sonnet
Embeddings: mxbai-embed-large-v1 (text), bge-code-base (code)
Reranker: mxbai-rerank-base-v2
Vector store: ChromaDB (local)
Pipeline orchestration: LangChain (router by MIME/type)
Scheduling: Jenkins (daily delta ingestion)
Retrieval API: FastAPI
Evaluation: Telemetry + basic metrics (compile/run, cost, retrieval quality)
What You Bring
- 4+ years in ML/AI or platform-oriented backend engineering, including 2+ years building LLM
- Development within RAG applications.
- Strong experience with LangChain, vector DBs (ChromaDB, Qdrant, pgvector), and code-aware embeddings (BGE-code or similar).
- Solid Python skills (FastAPI or Flask) and comfort reading Java to inform chunking and context design.
- Experience with Jenkins, secrets management, and basic observability tooling (Grafana, Prometheus, LangSmith, or RAGAS).
- Comfortable working with OpenAI/Anthropic APIs or deploying self-hosted endpoints, including handling keys, rate limits, and safety controls.
It is an asset if you have:
- Experience with Claude-specific practices, structured prompting, and cost control techniques.
- Familiarity with retrieval evaluation tools like RAGAS or LangChain Evaluators, plus A/B testing for prompt or routing strategies.
- Understanding of security and compliance for developer-facing AI tools (PII handling, audit logging).
Collaboration & Role Scope
- The SDET team focuses on test quality and final review of autoscripted code.
- The Automation Agent Engineer tunes prompts and retrieval logic.
- You own the RAG platform: indexing, retrieval quality, LLM orchestration, and CI integration.
- Locations
- Multiple locations
- Remote status
- Fully Remote
About Perform
Since 2005, Perform's engineers have been helping companies scale their apps and their teams. We were near-shoring before it was even a term and have worked with 100s of clients along the way.
Already working at Perform?
Let’s recruit together and find your next colleague.