Skip to content

Extend API doc generation to include 12 additional SDK modules#396

Open
rbren wants to merge 8 commits intomainfrom
extend-api-reference-docs
Open

Extend API doc generation to include 12 additional SDK modules#396
rbren wants to merge 8 commits intomainfrom
extend-api-reference-docs

Conversation

@rbren
Copy link
Contributor

@rbren rbren commented Mar 15, 2026

  • I have read and reviewed the documentation changes to the best of my ability.
  • If the change is significant, I have run the documentation site locally and confirmed it renders as expected.

Summary of changes

This PR extends the API reference documentation automation in scripts/generate-api-docs.py to generate documentation for 12 additional SDK modules that were previously undocumented.

Background

The existing automation used Sphinx with sphinx-markdown-builder to generate API docs for 8 core modules. However, the SDK has grown significantly and many important modules weren't being documented.

Changes

Extended module list from 8 to 20 modules:

Original modules (8):

  • agent, conversation, event, llm, tool, workspace, security, utils

New modules added (12):

  • context - AgentContext, Skills, SkillKnowledge, Triggers
  • hooks - Event-driven hooks for automation and control
  • critic - Critics for iterative refinement
  • mcp - MCP (Model Context Protocol) integration
  • plugin - Plugin system and marketplace types
  • subagent - Sub-agent delegation and registration
  • io - File storage abstractions (FileStore, LocalFileStore, InMemoryFileStore)
  • testing - Test utilities (TestLLM, TestLLMExhaustedError)
  • secret - Secret management (SecretSource, StaticSecret, LookupSecret)
  • skills - Skill management utilities
  • observability - Observability utilities
  • logger - Logging utilities

Technical changes:

  • Updated scripts/generate-api-docs.py to use a single all_modules list for both RST toctree generation and module processing
  • Expanded the class_to_module mapping to include all classes from new modules for proper cross-reference resolution
  • The script auto-updates docs.json navigation when run

Files changed:

  • scripts/generate-api-docs.py - Extended module list and class mappings
  • docs.json - Navigation auto-updated with 12 new pages
  • scripts/mint-config-snippet.json - Updated navigation snippet
  • 12 new .mdx files in sdk/api-reference/
  • 6 existing .mdx files regenerated with current SDK version

Added documentation for the following new SDK modules:
- context: AgentContext, Skills, SkillKnowledge, Triggers
- hooks: Event-driven hooks for automation and control
- critic: Critics for iterative refinement
- mcp: MCP (Model Context Protocol) integration
- plugin: Plugin system and marketplace
- subagent: Sub-agent delegation
- io: File storage abstractions (FileStore, LocalFileStore, etc.)
- testing: Test utilities (TestLLM)
- secret: Secret management (SecretSource, StaticSecret, LookupSecret)
- skills: Skill management utilities
- observability: Observability utilities
- logger: Logging utilities

Changes:
- Updated scripts/generate-api-docs.py to include 12 new modules
- Expanded class_to_module mapping for cross-reference resolution
- Regenerated all API reference documentation
- Auto-updated docs.json navigation with new pages

Co-authored-by: openhands <openhands@all-hands.dev>
- Remove testing module from documentation (as requested)
- Fix blockquote markers (> characters) that break MDX parsing
- Fix malformed Example code blocks with proper wrapping
- Remove <br/> tags and standalone backtick patterns
- Handle orphaned code block openers
- Add module filtering to only process allowed modules
- Clean up multiple levels of nested blockquotes

Co-authored-by: openhands <openhands@all-hands.dev>
Added remove_malformed_examples() function to detect and remove Example
sections containing raw JSON/code without proper code block formatting.
These caused MDX parsing errors due to curly braces being interpreted
as JSX expressions.

Co-authored-by: openhands <openhands@all-hands.dev>
1. Add horizontal rules (---) before class headers to fix spacing
   issue where CSS margin-top: 0 was applied when h4 followed by h3.

2. Fix malformed *args and **kwargs patterns from Sphinx output.
   Sphinx was breaking these into weird code blocks like:
   ```
   *
   ```
   args

   Now properly renders as `*args` and `**kwargs`.

Co-authored-by: openhands <openhands@all-hands.dev>
Add 'collapsed: true' to the API Reference group to reduce visual clutter
in the navigation sidebar.

Co-authored-by: openhands <openhands@all-hands.dev>
Add fix_shell_config_examples() to wrap shell-style configuration blocks
(KEY=VALUE with # comments) in bash code blocks, preventing # comments
from being rendered as markdown headers.

Co-authored-by: openhands <openhands@all-hands.dev>
- API Reference (collapsed) > Python SDK (collapsed) > modules
- Provides better organization for future expansion (e.g., REST API docs)

Co-authored-by: openhands <openhands@all-hands.dev>
@rbren rbren marked this pull request as ready for review March 15, 2026 19:51
@rbren rbren requested review from enyst and xingyaoww as code owners March 15, 2026 19:51
Copy link
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - The script extension works and solves a real problem (12 undocumented modules). However, there are maintainability concerns around duplicated data structures and growing complexity in markdown post-processing. Missing evidence that generated docs render correctly in Mintlify.

# Additional important modules
'context', 'hooks', 'critic', 'mcp', 'plugin',
'subagent', 'io', 'secret', 'skills',
'observability', 'logger',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: This module list is duplicated again at line 236 in clean_generated_docs(). You have to manually keep them in sync, which is fragile and error-prone.

Better approach: Define ALL_MODULES as a module-level constant above the class, then reference it in both methods. Eliminates the duplication and the risk of them drifting out of sync.

Suggested change
'observability', 'logger',
# Define after imports, before the class
ALL_MODULES = [
# Core modules (original)
'agent', 'conversation', 'event', 'llm',
'tool', 'workspace', 'security', 'utils',
# Additional important modules
'context', 'hooks', 'critic', 'mcp', 'plugin',
'subagent', 'io', 'secret', 'skills',
'observability', 'logger',
]

Then use ALL_MODULES in both places instead of redefining it.

Comment on lines +308 to +320
line = lines[i]

# Check if this is an Example header followed by unformatted code
# Handle both header-style and plain "Example:" format
is_example_header = (
line.strip() in ['#### Example', '### Example', '## Example'] or
line.strip() == 'Example:' or
line.strip() == 'Example'
)

if is_example_header:
# Normalize to h4 header
result_lines.append('#### Example')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This function has 4+ levels of nesting with complex state tracking (i, j, k pointers, in_code_block flag). Violates the "3 levels max" complexity guideline.

Consider breaking this into smaller, focused functions:

  • is_example_header(line) -> bool
  • collect_code_until_header(lines, start_idx) -> (code_lines, next_idx)
  • detect_language(code_lines) -> str
  • wrap_in_code_block(code_lines, lang) -> list[str]

Each helper handles one responsibility, making the logic easier to follow and test.

Comment on lines 717 to 850
@@ -453,6 +754,98 @@ def clean_markdown_content(self, content: str, filename: str) -> str:
# Fix header hierarchy (Example sections should be h4 under class headers)
content = self.fix_header_hierarchy(content)

# Fix example code blocks that are not properly formatted
content = self.fix_example_blocks(content)

# Fix shell-style configuration examples that have # comments being interpreted as headers
# Pattern: lines like "Example configuration:" followed by KEY=VALUE and "# comment" lines
content = self.fix_shell_config_examples(content)

# Remove all <br/> tags (wrapped in backticks or not)
content = content.replace('`<br/>`', '')
content = content.replace('<br/>', '')

# Clean up malformed code blocks with weird backtick patterns
# These come from Sphinx's markdown output
content = re.sub(r'```\s*\n``\s*\n```', '', content) # Empty weird block
content = re.sub(r'```\s*\n`\s*\n```', '', content) # Another weird pattern
content = re.sub(r'^## \}', '}', content, flags=re.MULTILINE) # Fix closing brace with header prefix

# Handle any remaining standalone code blocks with just * or ** (cleanup)
content = re.sub(r'\s*```\s*\n\s*\*\*\s*\n\s*```\s*', ' ', content)
content = re.sub(r'\s*```\s*\n\s*\*\s*\n\s*```\s*', ' ', content)

# Clean up blockquote markers that break MDX parsing
# Convert ' > text' to ' text' (indented blockquotes to plain indented text)
# Handle multiple levels of nesting like '> > text'
# BUT: Don't remove >>> which are Python REPL prompts!
# Run multiple times to handle nested blockquotes
prev_content = None
while prev_content != content:
prev_content = content
# Only match single > at start of line (not >>> or >>)
# Pattern: start of line, optional whitespace, single > not followed by >
content = re.sub(r'^(\s*)>(?!>)\s*', r'\1', content, flags=re.MULTILINE)

# Remove duplicate Example: lines after #### Example header
content = re.sub(r'(#### Example\n\n)Example:\n', r'\1', content)

# Remove malformed standalone backtick patterns
content = re.sub(r'^``\s*$', '', content, flags=re.MULTILINE)
content = re.sub(r'^`\s*$', '', content, flags=re.MULTILINE)

# Clean up multiple consecutive blank lines (more than 2)
content = re.sub(r'\n{4,}', '\n\n\n', content)

# Remove orphaned code block openers (but not closers!)
# Pattern: ``` opener followed by content that doesn't have a matching closing ```
# This handles Sphinx's broken JSON/code examples
# We track whether we're inside a code block to distinguish openers from closers
lines = content.split('\n')
cleaned = []
in_code_block = False
i = 0
while i < len(lines):
line = lines[i]

# Check for code block markers
if line.strip().startswith('```'):
if not in_code_block:
# This is an opener - check if it has a matching close
if line.strip() == '```':
# Standalone opener - look ahead for close
j = i + 1
has_close = False
while j < len(lines):
if lines[j].strip() == '```':
has_close = True
break
if lines[j].startswith('#'): # Hit a header - no proper close
break
j += 1

if not has_close:
# Skip this orphaned opener
i += 1
continue
# It's a valid opener (either has language or has close)
in_code_block = True
else:
# This is a closer
in_code_block = False

cleaned.append(line)
i += 1
content = '\n'.join(cleaned)

# Remove malformed Example sections that contain raw JSON/code without proper formatting
# These cause MDX parsing errors due to curly braces being interpreted as JSX
content = self.remove_malformed_examples(content)

# Add horizontal rules before class headers to ensure proper spacing
# This fixes the issue where h4 (method) followed by h3 (class) loses margin-top
content = self.add_class_separators(content)

lines = content.split('\n')
cleaned_lines = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This method applies 15+ sequential transformations with fragile ordering dependencies. Any change risks breaking earlier or later transformations.

Root cause: Fighting poor Sphinx markdown output with regex surgery. Consider:

  1. Can Sphinx configuration be improved to produce cleaner output?
  2. Would a different doc generator (sphinx-autodoc + myst-parser tweaks) reduce the need for post-processing?
  3. Could transformations be consolidated into fewer, more robust passes?

Current approach works but will be painful to maintain as edge cases accumulate.

@all-hands-bot
Copy link
Contributor

🟠 Important - Missing Evidence: The PR description claims "I have run the documentation site locally and confirmed it renders as expected" but provides no concrete proof.

For a documentation PR that generates 12 new API reference pages, please add an Evidence section showing:

  • Screenshot of one or two newly generated pages rendering correctly in Mintlify (e.g., openhands.sdk.context, openhands.sdk.hooks)
  • Confirmation that navigation structure works (collapsed sections, proper grouping)
  • Any rendering issues encountered and how they were fixed

This helps reviewers verify the post-processing logic actually produces valid MDX that Mintlify can parse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants