Job description
Job Responsibilities
You will participate in one or more of the following areas (based on your interests and background):
Dialogue and Language Model Direction
- Fine-tuning and quantization of large models (GPTQ / AWQ / GGUF), inference acceleration (vLLM / SGLang / TensorRT-LLM);
- Prompt design, tool invocation (Function Calling), intent recognition, and routing around the Agent framework;
- RAG pipeline optimization: retrieval recall, re-ranking, knowledge graph integration, long context compression.
Agent Harness Direction (Core of Self-developed Framework)
- Participate in the design and iteration of the self-developed Agent Harness: context window management, tool invocation parsing, streaming and interruption, error recovery, SubAgent orchestration;
- Reproduce and compare key designs of open-source Harnesses such as OpenClaw / Hermes / Claude Agent SDK / LangGraph, and consolidate them into the self-developed framework;
- Conduct evaluations and ablation studies on public Agent benchmarks like SWE-bench / Terminal-Bench / τ-bench, promoting version iterations of each module of the Harness;
- Design the tracing, replay, and evaluation infrastructure for Agents, ensuring that every prompt/tool change has a quantifiable regression basis.
Agent Memory System Direction
- Design and implement a multi-level memory architecture: session-level working memory, cross-session episodic memory, user profile semantic memory;
- Research and implement open-source solutions like Mem0 / Zep / LangMem, or self-develop based on vector databases (Qdrant / Milvus / pgvector);
- Address the balance between memory conflict merging, decay strategies, retrieval accuracy, and recall latency.
Speech Direction
- Deployment of streaming ASR, customized vocabulary, domain accent adaptation;
- Low-latency streaming synthesis of TTS models, CUDA Graph optimization;
- VAD, endpoint detection (EOT / Turn Detection), semantic-level turn judgment.
Job Requirements
Undergraduate or above in Computer Science, Artificial Intelligence, Electronics, Mathematics, or related fields, available to work ≥ 5 days a week, continuous internship ≥ 6 months;
Proficient in PyTorch, able to understand mainstream model code (Transformers, etc.), capable of independent training and tuning;
Familiar with the principles and representative papers of at least one mainstream model family (LLM / ASR / TTS / Retrieval), or have your own understanding of Agent system design;
Solid Python engineering skills, familiar with asyncio, FastAPI, Linux + GPU development environment;
Passionate about cutting-edge technology, proactively following trends on arXiv / GitHub, willing to experiment with new models / new Agent frameworks.
Preferred Qualifications
Practical experience in SOTA model fine-tuning, distillation, quantization, or inference optimization;
Familiar with the source code or configuration of inference frameworks like vLLM / SGLang / TensorRT / Triton;
Have read or implemented an Agent Harness (any of OpenClaw / Hermes / Claude Agent SDK / smolagents / LangGraph);
Have run experiments on Agent benchmarks like SWE-bench / Terminal-Bench / GAIA / τ-bench, understanding how Harness design affects final scores;
Have used or read the core implementations of Mem0 / Zep / LangMem / LlamaIndex;
Have made open-source contributions on HuggingFace or GitHub, or have experience winning Kaggle / AI evaluation competitions.
What We Offer
Direct access to multi-GPU clusters without queuing for resources;
Exposure to the most cutting-edge open-source models and Agent frameworks, from research papers to production;
Direct involvement in the design evolution of a truly production-ready Agent Harness, rather than just wrapping APIs;
1-on-1 collaboration with senior engineers, without intermediaries or PPT culture;
Priority for excellent performers to convert to full-time or receive a Return Offer, or long-term RA / part-time collaboration;
Flexible working arrangements (remote).
