

Overview
InternTA is a multi-agent AI teaching assistant that learns from limited data, specifically designed to help students learn the “Synthetic Biology” course. The system addresses critical challenges in AI-powered education, including data privacy risks and limited effectiveness in courses with scarce teaching materials.Abstract
Large language models (LLMs) have shown great potential to enhance student learning by serving as AI-powered teaching assistants (TA). However, existing LLM-based TA systems often face critical challenges, including data privacy risks associated with third-party API-based solutions and limited effectiveness in courses with limited teaching materials. This project proposes an automated TA training system based on LLM agents, designed to train customized, lightweight, and privacy-preserving AI models. Unlike traditional cloud-based AI TAs, our system allows local deployment, reducing data security concerns, and includes three components:- Dataset Agent: Constructing high-quality datasets with explicit reasoning paths
- Training Agent: Fine-tuning models via Knowledge Distillation, effectively adapting to limited-data courses
- RAG Agent: Enhancing responses by retrieving external knowledge
Background
Synthetic biology is a cutting-edge field that integrates knowledge from biology, chemistry, engineering, and computer science. In recent years, applications ranging from lab-grown meat to CRISPR-Cas9 gene editing technology have been leading the “Third Biotechnology Revolution.” However, the dissemination of synthetic biology knowledge faces two major challenges:- Interdisciplinary complexity: Requires integration of knowledge from multiple domains, creating a steep learning curve
- Educational resource limitations: Shortage of teaching talent with cross-disciplinary knowledge and practical experience
Technical Architecture
InternTA adopts a three-layer agent architecture to achieve automated training, local deployment, and privacy protection:1. Dataset Agent
The Dataset Agent is responsible for constructing high-quality training data with explicit reasoning paths:- Data Sources: Extracts post-class questions, key terms, and fundamental concepts from the “Synthetic Biology” textbook
- Reasoning Path Construction: Generates explicit reasoning paths for each question
- Guided Teaching Design: For complex thought questions, designs guided responses rather than providing direct answers
2. Training Agent
The Training Agent fine-tunes lightweight models using knowledge distillation techniques:- Base Model: Uses DeepSeekR1-Distill-Qwen-7B as the foundation model
- Fine-Tuning Tools: Employs PeftModel for efficient fine-tuning
- Knowledge Distillation: Transfers knowledge from larger parameter-scale models to lightweight models
3. RAG Agent
The RAG (Retrieval-Augmented Generation) Agent enhances answer quality by retrieving external knowledge:- Knowledge Base Construction: Structured processing of “Synthetic Biology” textbook content
- Semantic Retrieval: Retrieves relevant knowledge points based on user questions
- Enhanced Generation: Combines retrieved knowledge to generate more accurate and in-depth answers
Key Features
Privacy Protection
Local deployment capability, avoiding data exposure to third-party services
Multi-Agent Architecture
Three specialized agents for dataset construction, model training, and knowledge retrieval
Synthetic Biology Expertise
Specialized knowledge from the “Synthetic Biology” textbook and related materials
Guided Learning
Provides hints and guidance rather than direct answers to promote independent thinking
Cross-disciplinary Knowledge
Combines biology, chemistry, engineering, and computer science perspectives
Interactive Teaching
Engaging dialogue-based learning experience
Lightweight Design
Optimized model size to run efficiently on ordinary hardware
Knowledge Distillation
Effective adaptation to limited-data courses through advanced training techniques
Getting Started
Learn how to integrate InternTA into your applications:API Reference
Explore the complete API documentation
Authentication
Learn about the authentication process
Quick Start
Get started with example code and tutorials
Demo Application
Try out the live demo
Resources
Support
If you need help or have questions about the API, please:- Check the API Reference documentation
- Visit our GitHub repository for issues
- Contact our support team at dev@kongfoo.cn