InternTA Architecture Light

Overview

InternTA is a multi-agent AI teaching assistant that learns from limited data, specifically designed to help students learn the “Synthetic Biology” course. The system addresses critical challenges in AI-powered education, including data privacy risks and limited effectiveness in courses with scarce teaching materials.

Abstract

Large language models (LLMs) have shown great potential to enhance student learning by serving as AI-powered teaching assistants (TA). However, existing LLM-based TA systems often face critical challenges, including data privacy risks associated with third-party API-based solutions and limited effectiveness in courses with limited teaching materials. This project proposes an automated TA training system based on LLM agents, designed to train customized, lightweight, and privacy-preserving AI models. Unlike traditional cloud-based AI TAs, our system allows local deployment, reducing data security concerns, and includes three components:
  1. Dataset Agent: Constructing high-quality datasets with explicit reasoning paths
  2. Training Agent: Fine-tuning models via Knowledge Distillation, effectively adapting to limited-data courses
  3. RAG Agent: Enhancing responses by retrieving external knowledge
We validate our system in Synthetic Biology, an interdisciplinary field characterized by scarce structured training data. Experimental results and user evaluations demonstrate that our AI TA achieves strong performance, high user satisfaction, and improved student engagement, highlighting its practical applicability in real-world educational settings.

Background

Synthetic biology is a cutting-edge field that integrates knowledge from biology, chemistry, engineering, and computer science. In recent years, applications ranging from lab-grown meat to CRISPR-Cas9 gene editing technology have been leading the “Third Biotechnology Revolution.” However, the dissemination of synthetic biology knowledge faces two major challenges:
  1. Interdisciplinary complexity: Requires integration of knowledge from multiple domains, creating a steep learning curve
  2. Educational resource limitations: Shortage of teaching talent with cross-disciplinary knowledge and practical experience
Traditional AI teaching assistant solutions typically rely on cloud service APIs, which introduce data privacy risks and perform poorly when specialized teaching materials are limited. The InternTA project is designed to address these challenges.

Technical Architecture

InternTA adopts a three-layer agent architecture to achieve automated training, local deployment, and privacy protection:

1. Dataset Agent

The Dataset Agent is responsible for constructing high-quality training data with explicit reasoning paths:
  • Data Sources: Extracts post-class questions, key terms, and fundamental concepts from the “Synthetic Biology” textbook
  • Reasoning Path Construction: Generates explicit reasoning paths for each question
  • Guided Teaching Design: For complex thought questions, designs guided responses rather than providing direct answers

2. Training Agent

The Training Agent fine-tunes lightweight models using knowledge distillation techniques:
  • Base Model: Uses DeepSeekR1-Distill-Qwen-7B as the foundation model
  • Fine-Tuning Tools: Employs PeftModel for efficient fine-tuning
  • Knowledge Distillation: Transfers knowledge from larger parameter-scale models to lightweight models

3. RAG Agent

The RAG (Retrieval-Augmented Generation) Agent enhances answer quality by retrieving external knowledge:
  • Knowledge Base Construction: Structured processing of “Synthetic Biology” textbook content
  • Semantic Retrieval: Retrieves relevant knowledge points based on user questions
  • Enhanced Generation: Combines retrieved knowledge to generate more accurate and in-depth answers

Key Features

Privacy Protection

Local deployment capability, avoiding data exposure to third-party services

Multi-Agent Architecture

Three specialized agents for dataset construction, model training, and knowledge retrieval

Synthetic Biology Expertise

Specialized knowledge from the “Synthetic Biology” textbook and related materials

Guided Learning

Provides hints and guidance rather than direct answers to promote independent thinking

Cross-disciplinary Knowledge

Combines biology, chemistry, engineering, and computer science perspectives

Interactive Teaching

Engaging dialogue-based learning experience

Lightweight Design

Optimized model size to run efficiently on ordinary hardware

Knowledge Distillation

Effective adaptation to limited-data courses through advanced training techniques

Getting Started

Learn how to integrate InternTA into your applications:

Resources

Support

If you need help or have questions about the API, please:
  1. Check the API Reference documentation
  2. Visit our GitHub repository for issues
  3. Contact our support team at dev@kongfoo.cn