Skip to content

Scripts to support the AaRC metadata curation project

Notifications You must be signed in to change notification settings

bergstand/aarc-metadata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

The Python3 script validate-AaRC-metadata.py can be used to validate entries in the AaRC metadata spreadsheet. It can read input either as an Excel file, or directly from a Google Sheets URL. If no input file is given, the default is to read the AaRC metadata curation Google sheet.

By default simply prints identified errors, but with optional flags these can be written to text files or to a multi-sheet Excel file.

Example command: python3 validate-AaRC-metadata.py --xlsx-reports output

usage: validate-AaRC-metadata.py [-h] [--sheets SHEETS] [--skip-urls] [--fields FIELDS] [--txt-reports TXT_REPORTS] [--xlsx-reports XLSX_REPORTS] [--ignore-incomplete]
                                 [excel_file]

Validate metadata in an Excel file or Google Sheet against 'field_definitions' sheet.

positional arguments:
  excel_file            Path to the Excel file (e.g., metadata.xlsx) or a Google Sheets URL.
                        If no file is given, the default is the AaRC metadata curation Google Sheet: https://docs.google.com/spreadsheets/d/1me-fjDmVRktAGRvThZuA9O1VX9s_ZYwox2jDbtOhEZI/

optional arguments:
  -h, --help            show this help message and exit
  --sheets SHEETS       Optional: Comma-separated list of sheet names to validate (e.g., --sheets canids,capra).
  --skip-urls           Skip external URL and NCBI TaxID validation checks.
  --fields FIELDS       Optional: Comma-separated list of column names to validate, e.g., --fields samp_taxon_ID,sample_age.
  --txt-reports TXT_REPORTS
                        Optional: Prefix for writing tab-delimited reports to files (e.g., 'errors'). Output files will be named <PREFIX>.<SHEET_NAME>.txt
  --xlsx-reports XLSX_REPORTS
                        Optional: Prefix for writing a single consolidated Excel report (e.g., 'xlsx_errors'). The output file will be named <PREFIX>.xlsx
  --ignore-incomplete   Optional: Do not include incomplete entries in the output (missing required fields).

About

Scripts to support the AaRC metadata curation project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages