This repository contains the implementation of the Green Agent, a judge agent designed for the Code Translator competition. Its primary role is to evaluate code translations performed by other agents (specifically the Purple Agent).
The Green Agent acts as an orchestrator and evaluator. When it receives a request to evaluate a code translation task:
- Orchestration: It requests the Purple Agent (Participant) to translate a given snippet of code from a source language to a target language.
- Evaluation: Upon receiving the translation, it uses Google GenAI (Gemini) to act as a judge. The judge evaluates the translation based on executing correctness, style, conciseness, and relevance.
- Reporting: It returns a structured evaluation containing scores, reasoning, and a winner determination.
src/: Source code for the agent.agent.py: ContainsTranslationGreenAgent. This is the core logic that handles the evaluation workflow: validating requests, communicating with the participant agent, and invoking the Gemini model for judging.server.py: The entry point for the application. It initializes theTranslationGreenAgent, wraps it in aGreenExecutor, and sets up the A2A (Agent-to-Agent) Starlette server.common.py: Defines shared data structures and Pydantic models (e.g.,EvalRequest,TranslatorEval) and the Agent Card configuration.executor.py: Handles the execution context for the agent, providing the sandbox or environment for running the agent logic.tool_provider.py: Provides utilities for the agent to interact with external services or other agents (e.g.,talk_to_agentimplementation).client.py: Client-side utilities or helpers for interacting with the agent.
tests/: Test suite.test_agent.py: Contains integration tests and A2A conformance tests to ensure the agent behaves correctly, validates schemas, and adheres to the protocol.conftest.py: Pytest configuration and fixtures.
Dockerfile: Configuration to containerize the application for deployment.pyproject.toml: Project configuration and dependencies.
- Python 3.11+
- A Google GenAI API Key (Gemini)
- (Optional) Docker
-
Clone the repository:
git clone <repository-url> cd code_translator_green_agent
-
Create a virtual environment (optional but recommended):
python -m venv .venv source .venv/bin/activate -
Install dependencies:
pip install . # Or install specific requirements pip install python-dotenv uvicorn httpx google-genai pydantic "google-adk[a2a]"
-
Environment Variables: Create a
.envfile in the root directory (or ensure relevant environment variables are set) containing your Google API key:GOOGLE_API_KEY=your_google_api_key_here
To start the agent server:
python src/server.pyBy default, the server runs on http://127.0.0.1:9009.
You can customize the host and port using arguments:
python src/server.py --host 0.0.0.0 --port 8080-
Build the image:
docker build -t green-agent . -
Run the container:
docker run -p 9009:9009 --env GOOGLE_API_KEY=your_api_key green-agent
The agent is designed to be called by an orchestration layer or directly via A2A protocol. It expects a JSON payload (Evaluator Request) with the following structure:
{
"participants": {
"translator": "http://url-to-purple-agent"
},
"config": {
"code_to_translate": "print('Hello World')",
"source_language": "python",
"target_language": "javascript"
}
}The Workflow:
- The Green Agent contacts the participant agent at the provided URL (
http://url-to-purple-agent). - It sends the
code_to_translate,source_language, andtarget_languageto the participant. - It waits for the participant to return the translated code.
- Once received, the Green Agent constructs a prompt for the Gemini model (Judge), instructing it to evaluate the translation.
- It returns a result that is saved to the leaderboard in the following format:
{
"participants": {
"translator": "019b8933-d5b6-76a3-8e0b-930c19c10e87"
},
"results": [
{
"winner": "translator",
"execution_correctness": 10.0,
"style_score": 9.0,
"conciseness": 9.33,
"relevance": 10.0,
"overall_score": 9.58,
"reasoning": "The JavaScript translation successfully replicates the functionality of the Python code..."
}
]
}| Metric | Description | Score Range |
|---|---|---|
| Execution Correctness | Does the translated code produce the same output/behavior? | 0-10 |
| Style Score | Does the code follow idiomatic conventions of the target language? | 0-10 |
| Conciseness | Is the translation efficient without unnecessary verbosity? | 0-10 |
| Relevance | Does the translation preserve the original code's intent and logic? | 0-10 |
| Overall Score | Average of all four metrics | 0-10 |
To ensure the agent is functioning correctly, you can run the provided tests.
-
Install test dependencies (if not already installed):
pip install pytest pytest-asyncio
-
Run tests:
pytest tests/test_agent.py
The
test_agent.pycontains:- Conformance Tests: Verifies the Agent Card and A2A protocol structure (e.g., proper message formats, capabilities).
- Message Validation: Ensures that request and response payloads adhere to the defined schemas.
The run_integration_test.py script tests the full pipeline between the Green Agent (Judge) and Purple Agent (Participant):
python tests/run_integration_test.pyThis script:
- Starts both agents locally
- Sends a code translation request to the Green Agent
- The Green Agent communicates with the Purple Agent
- Returns the evaluation result
The run_leaderboard_test.py script runs a full evaluation and generates a result JSON file compatible with the AgentBeats leaderboard:
python tests/run_leaderboard_test.pyThis script:
- Starts both agents locally
- Sends multiple test cases for evaluation
- Aggregates the scores
- Generates a JSON file in the
results/directory in the correct format for the leaderboard
This project is part of the Code Translator multi-agent evaluation system built for the AgentBeats Competition. The complete system consists of:
| Repository | Description |
|---|---|
| Code Translator Green Agent (this repo) | The Judge agent that evaluates code translations |
| Code Translator Purple Agent | The participant agent that performs code translation |
| Code Translator Leaderboard | The leaderboard repository that records evaluation results |
View the live leaderboard at: AgentBeats - Code Translator Judge
- Green Agent:
docker.io/samiratra95/code-translator-green-agent:latest - Purple Agent:
docker.io/samiratra95/code-translator-purple-agent:latest
- AgentBeats Tutorial - Official tutorial for building AgentBeats agents
- Green Agent Template - Template for green (judge) agents
- Agent Template - Template for purple (participant) agents
- Leaderboard Template - Template for leaderboard repositories