A hands-on reference architecture for fully automated SQL code quality pipelines using SonarQube, GitHub Actions, and Snowflake.
In large, federated organizations, scaling analytics isn’t (just) a tech challenge — it’s an operational one.
From a technological and operational perspective, automation, governance and consistency are vital for scaling analytics in large, federated organisations. With agile methodology and modularisation, deployment volume can rise to hundreds or thousands per day, so manual QA simply cannot keep pace. For example, if 15 analytics teams were to deploy changes daily, the number of full-time reviewers required for manual reviews would be prohibitively high, resulting in bottlenecks, missed checks and an increased risk of inconsistent standards and data incidents. DataOpsBackbone addresses these issues by automating every critical step:
Quality Gates leverage SonarQube code scans to enforce SQL and customisable coding rules based on regular expressions, which are applied automatically with every git push request. For example, forgetting to prefix a schema name or hardcoding a database name triggers an automated block and feedback, thereby enforcing standards before code can be shipped.
- Releases are versioned and tested against zero-copy clones before being packaged through structured, traceable GitHub Actions workflows. This keeps production safe from changes that have not been properly tested.
- Governance is built in: rules such as 'no grants to PUBLIC' or 'only UTC TIMESTAMP allowed' are continuously enforced, and all compliance-relevant data (such as SQL code scans and test results) is logged for auditing purposes.
- Teams have full transparency and can be agile and reduce technical debt themselves. Automation enforces rules and monitors testing over time, so centralised approvals no longer become a bottleneck.
This setup offers repeatability, auditability and peace of mind, enabling new teams to get up and running quickly and allowing developers to focus on creating value rather than policing standards. The showcased projects are practical blueprints for achieving reliable, scalable analytics operations with Snowflake and GitHub Actions, not just demos.
This showcase project, together with Mother-of-all-Projects — demonstrates a fully automated DataOps setup designed to enforce SQL code quality, structure release flows, and scale confidently with Snowflake and GitHub Actions.
A DataOps pipeline that automates:
- Syncing changes from GitHub
- SQL linting & validation (SonarQube + regex rules)
- Zero-copy Snowflake DB cloning
- Building and testing releases
- Packaging deployable artifacts
It combines:
- GitHub Actions (with custom self-hosted runners)
- SonarQube extended with SQL & Text plugins
- Docker Compose for local stack orchestration
- Snowflake CLI for deployment and zero-copy cloning
- SQLUnit for automated SQL testing
-
mother-of-all-Projects
GitHub workflows, SQL refactoring logic, Snowflake deployment scripts, and validation via SQLUnit. -
DataOps Backbone
Dockerized infrastructure stack for:- SonarQube + PostgreSQL
- GitHub self-hosted runners
- Local development/testing
The showcase is built around two distinct data domains, each represented as an individual database within the same Snowflake account. This approach allows for logical isolation and independent management of domain-specific data assets.
Within each domain (database), schemas are strategically utilized to achieve two key objectives:
- Maturity Levels: Schemas separate data objects based on their maturity level (e.g., raw, curated, conformed). This provides a clear path for data as it progresses through various transformation stages.
- Versioning: Schemas also incorporate versioning for underlying database objects like tables, views, stages and procedures. This ensures traceability, facilitates rollbacks, and supports agile development by allowing iterative changes without disrupting existing consumers.
- Improved Organization: Data assets are logically grouped by business domain, making them easier to discover and manage.
- Enhanced Data Governance: Clear maturity levels and versioning promote better control over data quality and evolution.
- Scalability & Maintainability: The modular design reduces interdependencies, simplifying development, testing, and maintenance.
- Demonstrates Best Practices: Provides a practical example of implementing a domain-driven data architecture in Snowflake.
We not only use static source code analysis to review new code coming into the environment, but also check the existing setup and enforce isolation more effectively. By isolating domains and versions, the impact of changes or failures in one area on others is minimised, thereby enhancing overall system stability and aiding regression testing.
This list provides a few examples of SQL validation rules, each of which is paired with a regular expression (regex) that can be used to identify non-compliant code using the Community Text plugin of SonarQube.
Backups of these rules, which can be restored as a Quality Profile, are available in the repository (link).
(?i)^\s*CREATE\s+(?!OR\s+REPLACE\b)(?!.*\bIF\s+NOT\s+EXISTS\b).*?\bSCHEMA\b(?is)^(?!\s*--).*CREATE\s+(?!OR\s+REPLACE\b|.*IF\s+NOT\s+EXISTS\b).*TABLE\b(?i)^(?!\s*--)\s*create\s+(or\s+replace\s+)?(table|view|schema)\s+(if\s+not\s+exists\s+)?[a-z0-9_]+\.[a-z0-9_]+(\.[a-z0-9_]+)?(?i)^(?!\s*--).*grant\s+.*\s+to\s+public\b(?i)^\s*DROP\s+(SCHEMA|TABLE|VIEW)\s+(?!IF\s+EXISTS\b)(?i)^(?!\s*--)\s*USE\s+(DATABASE|SCHEMA|ROLE)\b(?i)(?<!--.*)\bTIMESTAMP(_NTZ|_LTZ)?(\s*\(\s*\d+\s*\))?\b(?i)^(?!\s*--)\s*CREATE\s+(OR\s+REPLACE\s+)?SCHEMA\s+(IF\s+NOT\s+EXISTS\s+)?(?:[a-z0-9_]+\.)?(?!RAW_|REF_|CON_|AGG_|DP_|DM_)[a-z0-9_]+;9. (Dynamic) Table | Views names must begin with a 3-character alphanumeric component code followed by an underscore
(?i)^(?!\s*--)(?:\s*create(?:\s+or\s+replace)?|\s*alter)\s+(dynamic\s+)?table\s+(if\s+not\s+exists\s+)?(?:[A-Z0-9_]+\.){0,2}(?![A-Z0-9]{3}_)[A-Z_][A-Z0-9_]*^.*cross_db_true.*$^.*cross_schema_true.*$The DataOps_init.sql script creates all required database objects, including users and roles. Simply log in to your Snowflake account and create all your objects at once.
ALTER USER IF EXISTS SVC_CICD ADD PAT CICD_PAT ROLE_RESTRICTION = 'ACCOUNTADMIN' DAYS_TO_EXPIRY = 30 COMMENT = 'New PAT for Snow CLI';
-- copy <your token>
SHOW USER PATS FOR USER SVC_CICD;Create a snow cli config ~/.snowflake/config.toml
[connections.sfseeurope-svc_cicd_user]
account = "sfseeurope-demo_mdaeppen"
user = "SVC_CICD"
database = "DATAOPS"
schema = "IOT_RAW_v001"
warehouse = "MD_TEST_WH"
authenticator = "snowflake"
password = "<your token>"Create a local .env for testing
# GitHub
GH_RUNNER_TOKEN=<...>
GITHUB_OWNER=<your GitRepo>
GITHUB_REPO_1=<your Projects>
# SonarQube
POSTGRES_USER=<...>
POSTGRES_PASSWORD=<...>
POSTGRES_DB=<...>
SONAR_JDBC_USERNAME=<...>
SONAR_JDBC_PASSWORD=<...>
# Snowflake
CONNECTION_NAME=sfseeurope-svc_cicdbase64 -b 0 -i ~/.snowflake/config.toml | tr -d '\n' > SNOW_CONFIG_B64Upload the following secrets to GitHub repo:
SNOW_CONFIG_B64SONAR_TOKEN(see below step 3.)
- Start your local stack via
./start.sh - Log in to SonarQube http://localhost:9000 as
admin - Go to My Account > Security
- Under Generate Tokens:
- Name the token (e.g.,
ci_user) - Type of the token
global analysis token - Click Generate
- Copy the token (you won’t see it again).
- Use as
SONAR_TOKENin your CI/CD pipeline.
- Start your local stack via
./start.sh - Access SonarQube at: http://localhost:9000
Login:admin/ThisIsNotSecure1234!(default 'admin') - Trigger your GitHub workflow
- Check results in sonarqube
- monitor SQLUnit test results (incl. history) at: http://localhost:8080
This is not just a demo. It's a reusable framework to scale DataOps — combining validation, governance, and automation into one consistent, testable workflow.






