RAG Architect
We are seeking a Senior AI Engineer to lead the design and implementation of an end-to-end Retrieval-Augmented Generation (RAG) architecture. This role will drive the ingestion of GitHub repositories, Confluence pages, Qtest artifacts, PRDs, and script libraries to power autoscripting, onboarding search, and long-term knowledge reuse. As a technical leader, you will set the strategic direction, select cutting-edge models, and mentor AI and Automation Agent Engineers to deliver a scalable, secure, and innovative platform.
Key Responsibilities
- Architect ingestion and retrieval layers, selecting loaders, chunking strategies (AST-aware for Java), embeddings (e.g., BGE-Code, mxbai), vector stores (e.g., Chroma), cross-encoder rerankers, and LangChain router chains.
- Design CI orchestrations, including daily Jenkins jobs for delta detection, image captioning (e.g., Qwen2-VL, LLaVA), cost/latency guardrails, and rollback strategies.
- Establish model and prompt governance, including prompt templates, few-shot libraries, safety filters, and evaluation rubrics (faithfulness, coverage, compile success).
- Lead architecture for a UI onboarding tool, deciding on hosting (FastAPI + React or Streamlit MVP), SSO/auth flows, token streaming, and feedback mechanisms for continuous learning.
- Oversee data security and compliance, embedding privacy policies, source citations, audit logs, and ensuring Confluence/Qtest credentials are managed in Secrets Manager.
- Provide technical leadership by reviewing PRs, setting code quality standards, and conducting architecture workshops for AI and Automation Agent Engineers.
Must-Have Qualifications
- 6–8 years of experience building data or ML platforms, with at least 2 years deploying LLM/RAG systems in production.
- Deep expertise in LangChain, ChromaDB, Qdrant, or pgvector, and cross-encoder rerankers.
- Strong proficiency in Python (FastAPI or Flask) and ability to analyze Java codebases for chunking boundaries.
- Proven experience designing CI/CD pipelines (Jenkins, GitHub Actions) with delta builds and artifact promotion.
- Hands-on experience managing OpenAI/Anthropic API keys or self-hosting large models.
- Demonstrated expertise in security and compliance, including PII protection, role-based access, and secret rotation.
- Locations
- Multiple locations
- Remote status
- Fully Remote
About Perform
Since 2005, Perform's engineers have been helping companies scale their apps and their teams. We were near-shoring before it was even a term and have worked with 100s of clients along the way.
Already working at Perform?
Let’s recruit together and find your next colleague.