Code Translator Green Agent (Judge)

This repository contains the implementation of the Green Agent, a judge agent designed for the Code Translator competition. Its primary role is to evaluate code translations performed by other agents (specifically the Purple Agent).

Overview

The Green Agent acts as an orchestrator and evaluator. When it receives a request to evaluate a code translation task:

Orchestration: It requests the Purple Agent (Participant) to translate a given snippet of code from a source language to a target language.
Evaluation: Upon receiving the translation, it uses Google GenAI (Gemini) to act as a judge. The judge evaluates the translation based on executing correctness, style, conciseness, and relevance.
Reporting: It returns a structured evaluation containing scores, reasoning, and a winner determination.

Repository Structure

src/: Source code for the agent.
- agent.py: Contains TranslationGreenAgent. This is the core logic that handles the evaluation workflow: validating requests, communicating with the participant agent, and invoking the Gemini model for judging.
- server.py: The entry point for the application. It initializes the TranslationGreenAgent, wraps it in a GreenExecutor, and sets up the A2A (Agent-to-Agent) Starlette server.
- common.py: Defines shared data structures and Pydantic models (e.g., EvalRequest, TranslatorEval) and the Agent Card configuration.
- executor.py: Handles the execution context for the agent, providing the sandbox or environment for running the agent logic.
- tool_provider.py: Provides utilities for the agent to interact with external services or other agents (e.g., talk_to_agent implementation).
- client.py: Client-side utilities or helpers for interacting with the agent.
tests/: Test suite.
- test_agent.py: Contains integration tests and A2A conformance tests to ensure the agent behaves correctly, validates schemas, and adheres to the protocol.
- conftest.py: Pytest configuration and fixtures.
Dockerfile: Configuration to containerize the application for deployment.
pyproject.toml: Project configuration and dependencies.

Setup & Setup

Prerequisites

Python 3.11+
A Google GenAI API Key (Gemini)
(Optional) Docker

Installation

Clone the repository:

git clone <repository-url>
cd code_translator_green_agent

Create a virtual environment (optional but recommended):
```
python -m venv .venv
source .venv/bin/activate
```

Install dependencies:

pip install .
# Or install specific requirements
pip install python-dotenv uvicorn httpx google-genai pydantic "google-adk[a2a]"

Environment Variables: Create a .env file in the root directory (or ensure relevant environment variables are set) containing your Google API key:
```
GOOGLE_API_KEY=your_google_api_key_here
```

Running the Agent

Locally

To start the agent server:

python src/server.py

By default, the server runs on http://127.0.0.1:9009. You can customize the host and port using arguments:

python src/server.py --host 0.0.0.0 --port 8080

Using Docker

Build the image:
```
docker build -t green-agent .
```

Run the container:

docker run -p 9009:9009 --env GOOGLE_API_KEY=your_api_key green-agent

Usage as a Judge

The agent is designed to be called by an orchestration layer or directly via A2A protocol. It expects a JSON payload (Evaluator Request) with the following structure:

{
  "participants": {
    "translator": "http://url-to-purple-agent"
  },
  "config": {
    "code_to_translate": "print('Hello World')",
    "source_language": "python",
    "target_language": "javascript"
  }
}

The Workflow:

The Green Agent contacts the participant agent at the provided URL (http://url-to-purple-agent).
It sends the code_to_translate, source_language, and target_language to the participant.
It waits for the participant to return the translated code.
Once received, the Green Agent constructs a prompt for the Gemini model (Judge), instructing it to evaluate the translation.
It returns a result that is saved to the leaderboard in the following format:

{
  "participants": {
    "translator": "019b8933-d5b6-76a3-8e0b-930c19c10e87"
  },
  "results": [
    {
      "winner": "translator",
      "execution_correctness": 10.0,
      "style_score": 9.0,
      "conciseness": 9.33,
      "relevance": 10.0,
      "overall_score": 9.58,
      "reasoning": "The JavaScript translation successfully replicates the functionality of the Python code..."
    }
  ]
}

Evaluation Metrics

Metric	Description	Score Range
Execution Correctness	Does the translated code produce the same output/behavior?	0-10
Style Score	Does the code follow idiomatic conventions of the target language?	0-10
Conciseness	Is the translation efficient without unnecessary verbosity?	0-10
Relevance	Does the translation preserve the original code's intent and logic?	0-10
Overall Score	Average of all four metrics	0-10

Testing

To ensure the agent is functioning correctly, you can run the provided tests.

Unit Tests

Install test dependencies (if not already installed):
```
pip install pytest pytest-asyncio
```
Run tests:
```
pytest tests/test_agent.py
```
The test_agent.py contains:
- Conformance Tests: Verifies the Agent Card and A2A protocol structure (e.g., proper message formats, capabilities).
- Message Validation: Ensures that request and response payloads adhere to the defined schemas.

Integration Test

The run_integration_test.py script tests the full pipeline between the Green Agent (Judge) and Purple Agent (Participant):

python tests/run_integration_test.py

This script:

Starts both agents locally
Sends a code translation request to the Green Agent
The Green Agent communicates with the Purple Agent
Returns the evaluation result

Leaderboard Test

The run_leaderboard_test.py script runs a full evaluation and generates a result JSON file compatible with the AgentBeats leaderboard:

python tests/run_leaderboard_test.py

This script:

Starts both agents locally
Sends multiple test cases for evaluation
Aggregates the scores
Generates a JSON file in the results/ directory in the correct format for the leaderboard

Related Repositories

This project is part of the Code Translator multi-agent evaluation system built for the AgentBeats Competition. The complete system consists of:

Repository	Description
Code Translator Green Agent (this repo)	The Judge agent that evaluates code translations
Code Translator Purple Agent	The participant agent that performs code translation
Code Translator Leaderboard	The leaderboard repository that records evaluation results

Live Leaderboard

View the live leaderboard at: AgentBeats - Code Translator Judge

Docker Images

Green Agent: docker.io/samiratra95/code-translator-green-agent:latest
Purple Agent: docker.io/samiratra95/code-translator-purple-agent:latest

References

AgentBeats Tutorial - Official tutorial for building AgentBeats agents
Green Agent Template - Template for green (judge) agents
Agent Template - Template for purple (participant) agents
Leaderboard Template - Template for leaderboard repositories

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code Translator Green Agent (Judge)

Overview

Repository Structure

Setup & Setup

Prerequisites

Installation

Running the Agent

Locally

Using Docker

Usage as a Judge

Evaluation Metrics

Testing

Unit Tests

Integration Test

Leaderboard Test

Related Repositories

Live Leaderboard

Docker Images

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Samir-atra/code_translator_green_agent

Folders and files

Latest commit

History

Repository files navigation

Code Translator Green Agent (Judge)

Overview

Repository Structure

Setup & Setup

Prerequisites

Installation

Running the Agent

Locally

Using Docker

Usage as a Judge

Evaluation Metrics

Testing

Unit Tests

Integration Test

Leaderboard Test

Related Repositories

Live Leaderboard

Docker Images

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages