Open
Conversation
- Implemented comprehensive unit tests for the StringScanner class, covering various string types including regular, raw, and TeX strings, as well as escape sequences and error handling. - Added unit tests for Token-related functionalities, including SourceLocation, Trivia, TokenSpan, and token management. - Developed unit tests for UTF-8 utility functions, validating character decoding, encoding, and string validity checks. - Updated test cases to ensure robust coverage of edge cases and error scenarios.
There was a problem hiding this comment.
Pull request overview
This PR establishes the foundational infrastructure for the CZC compiler, introducing a complete lexical analysis system with modern C++23 features, a CLI framework, and comprehensive test coverage. The implementation provides essential compiler components including UTF-8 support, source code management, token generation, and multiple output formats.
Key Changes
- Implemented a complete lexer with support for identifiers, keywords, operators, numbers, strings (normal/raw/TeX), and comments
- Established CLI infrastructure with
lexandversioncommands supporting text and JSON output formats - Created comprehensive source code management system with buffer tracking and UTF-8 handling
Reviewed changes
Copilot reviewed 62 out of 63 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| CMakeLists.txt | Build system configuration with C++23, dependencies (CLI11, glaze, ICU), and test targets |
| include/czc/lexer/*.hpp | Lexer header files defining token types, scanners, UTF-8 utilities, and source management |
| src/lexer/*.cpp | Lexer implementation files for token scanning, UTF-8 handling, and source reading |
| src/cli/*.cpp | CLI implementation including command framework, formatters, and option handling |
| test/lexer/*_test.cpp | Comprehensive unit tests for all lexer components |
| apps/czc/main.cpp | Main entry point delegating to CLI facade |
Comments suppressed due to low confidence (2)
include/czc/lexer/token.hpp:1
- Corrected spelling of '预留未来扩展' to '预留未来扩展' in comment. The Chinese text appears correct but the spacing between words should be consistent with the English comments.
src/lexer/string_scanner.cpp:1 - The parameter 'count' is described as 'maximum number of digits to skip' but the function name and usage suggest it skips exactly 'count' hex digits, not a maximum. The documentation should clarify that it attempts to skip up to 'count' digits but may skip fewer if non-hex characters are encountered.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
…anagement - Added CompilerContext to encapsulate global options, output options, and diagnostics. - Introduced Driver class to manage the compilation process, including the execution of the lexer phase. - Enhanced diagnostics system to report errors and warnings during compilation. - Implemented LexerPhase to handle lexical analysis with options for preserving trivia and error reporting. - Updated tests to cover all token types and ensure correct naming in diagnostics. - Refactored existing code for better organization and maintainability.
- Implement comprehensive unit tests for the token-related functionalities in `token_test.cpp`, covering source locations, trivia, token spans, and various token types. - Introduce unit tests for UTF-8 utility functions in `utf8_test.cpp`, validating character decoding, encoding, validity checks, and character counting. - Ensure tests cover edge cases, including invalid UTF-8 sequences and mixed content strings.
- Added diagnostic types and level-to-string conversion in `diagnostic.cpp`. - Implemented ANSI color rendering in `ansi_renderer.cpp` for various diagnostic levels. - Created JSON emitter in `json_emitter.cpp` to output diagnostics in JSON format. - Developed text emitter in `text_emitter.cpp` for plain text output of diagnostics. - Introduced error code registration and lookup in `error_code.cpp`. - Implemented internationalization support in `i18n.cpp` for localized error messages. - Added message handling with Markdown parsing in `message.cpp`. - Created source span abstraction in `span.cpp` for tracking source code locations. - Registered lexer error codes in `lexer_error_codes.cpp` for better error reporting. - Implemented lexer source locator in `lexer_source_locator.cpp` to map errors to source locations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request establishes the initial foundation for the CZC compiler project, introducing its core build system, command-line interface (CLI) architecture, and lexer (lexical analysis) functionality. It sets up modern C++23 infrastructure, organizes the codebase for extensibility, and provides the first working CLI commands (
lexandversion). The project is structured for future pipeline expansion and includes comprehensive documentation and configuration.The most important changes are:
Project Initialization and Configuration
CMakeLists.txtbuild system supporting C++23, code coverage, multiple platforms, and integration of third-party dependencies (CLI11, glaze, tomlplusplus, GoogleTest, ICU). Also includes test targets for the lexer..gitmodulesfile to add thetest/testcasessubmodule for test cases.Core CLI and Command Architecture
Cliclass incli.hpp) using the facade pattern to manage command registration, global options, and command execution, with support for extensible commands and pipeline phases.Commandinterface for all CLI subcommands, enforcing single-responsibility and extensibility.CompilerPhaseinterface to support future pipeline composition of compiler stages.Implemented CLI Commands
lexcommand (LexCommand), which performs lexical analysis on source files, supports trivia mode, multiple output formats (text/JSON), and integrates with the pipeline interface.versioncommand (VersionCommand), which displays compiler version and build information.apps/czc/main.cpp, delegating to theClifacade.Change Tracking and Documentation
.changesmarkdown files to document major features, initial commit, and Makefile fixes for project tracking. [1] [2] [3](References: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]