diff --git a/QUICKSTART.md b/QUICKSTART.md new file mode 100644 index 0000000..512086b --- /dev/null +++ b/QUICKSTART.md @@ -0,0 +1,220 @@ +# SciDK Quickstart: Fresh Install to First RO-Crate + +**Goal**: Get from zero to your first RO-Crate in under 30 minutes. + +**Prerequisites**: Python 3.10+, git, and 5 minutes. + +--- + +## 1. Install (5 minutes) + +```bash +# Clone the repository +git clone https://github.com/yourusername/scidk.git +cd scidk + +# Create and activate virtual environment +python3 -m venv .venv +source .venv/bin/activate # bash/zsh +# or: source .venv/bin/activate.fish # fish shell + +# Install SciDK in editable mode +pip install -e . + +# Initialize environment (optional but recommended) +source scripts/init_env.sh +``` + +**Verify installation**: +```bash +scidk-serve --help +# Should show: usage: scidk-serve ... +``` + +--- + +## 2. Start the Server (1 minute) + +```bash +# Start SciDK +scidk-serve +# or: python3 -m scidk.app +``` + +Server starts at: **http://127.0.0.1:5000** + +Open in your browser and you should see the SciDK home page. + +--- + +## 3. Scan Your First Directory (3 minutes) + +### Via UI: +1. Navigate to **Files** page (http://127.0.0.1:5000/datasets) +2. Select provider: **Local Filesystem** +3. Enter a path (e.g., `/home/user/Documents` or use the repository root) +4. Check "Recursive" if you want subdirectories +5. Click **Scan Files** +6. Wait for scan to complete (progress shown in Background Tasks) + +### Via API (alternative): +```bash +curl -X POST http://127.0.0.1:5000/api/scan \ + -H "Content-Type: application/json" \ + -d '{"path": "/path/to/your/data", "recursive": true}' +``` + +--- + +## 4. Browse Scanned Files (2 minutes) + +After scanning completes: + +1. **Files page** shows all discovered datasets +2. Click any dataset to see details: + - File metadata (size, type, timestamps) + - Interpreted content (for Python, CSV, JSON, YAML, IPYNB, XLSX) + - Import dependencies (for code files) + +**API alternative**: +```bash +# List all scanned datasets +curl http://127.0.0.1:5000/api/datasets + +# Get specific dataset details +curl http://127.0.0.1:5000/api/datasets/ +``` + +--- + +## 5. Select Files for RO-Crate (5 minutes) + +Currently manual selection via browsing. For programmatic selection: + +```bash +# Use search to find specific file types +curl "http://127.0.0.1:5000/api/search?q=csv" + +# Filter by interpreter +curl "http://127.0.0.1:5000/api/search?q=python_code" +``` + +Mark interesting datasets mentally or via notes—RO-Crate packaging is next. + +--- + +## 6. Create RO-Crate (5 minutes) + +### Quick RO-Crate Generation: + +For a scanned directory, generate a minimal RO-Crate: + +```bash +# Generate RO-Crate JSON-LD for a directory +curl "http://127.0.0.1:5000/api/rocrate?path=/path/to/scanned/dir" > ro-crate-metadata.json +``` + +The RO-Crate will include: +- Root Dataset entity +- File/Folder entities (depth=1 by default) +- Contextual metadata per RO-Crate spec + +### Via UI (if viewer embedding is enabled): +1. Set environment variable: `export SCIDK_FILES_VIEWER=rocrate` +2. Restart server +3. Files page will show **"Open in RO-Crate Viewer"** button +4. Click to view embedded crate metadata + +--- + +## 7. Export RO-Crate as ZIP (5 minutes) + +Create a complete RO-Crate package with data files: + +```bash +# Using demo script (recommended) +./scripts/demo_rocrate_export.sh /path/to/scanned/dir ./my-crate.zip + +# Manual steps: +# 1. Generate ro-crate-metadata.json (step 6) +# 2. Copy data files into crate directory +# 3. Zip the complete package +mkdir -p my-crate +curl "http://127.0.0.1:5000/api/rocrate?path=/path/to/dir" > my-crate/ro-crate-metadata.json +cp -r /path/to/dir/* my-crate/ +zip -r my-crate.zip my-crate/ +``` + +**Result**: `my-crate.zip` is a valid RO-Crate package containing: +- `ro-crate-metadata.json` (JSON-LD metadata) +- Data files from your scanned directory + +--- + +## Verify Your RO-Crate (2 minutes) + +```bash +# Unzip and inspect +unzip -l my-crate.zip +cat my-crate/ro-crate-metadata.json | jq '.@graph[] | select(.["@type"] == "Dataset")' + +# Validate with ro-crate-py (optional) +pip install rocrate +python3 -c "from rocrate.rocrate import ROCrate; c = ROCrate('my-crate'); print(c.root_dataset)" +``` + +--- + +## Troubleshooting + +### Port already in use +```bash +# Check what's using port 5000 +lsof -i :5000 + +# Change port +export SCIDK_PORT=5001 +scidk-serve +``` + +### Scan not finding files +- Verify the path exists and is readable +- Check recursive flag if scanning subdirectories +- Install `ncdu` for faster scanning: `brew install ncdu` (macOS) or `sudo apt install ncdu` (Linux) + +### RO-Crate endpoint returns 404 +- Ensure you're running the latest code from main branch +- Check that `/api/rocrate` endpoint is implemented (planned for v0.1.0) +- See `dev/features/ui/feature-rocrate-viewer-embedding.md` for implementation status + +--- + +## Next Steps + +**Explore more features**: +- **Map page** (http://127.0.0.1:5000/map): Visualize knowledge graph schema +- **Labels & Links**: Annotate files with custom labels and relationships +- **Providers**: Connect remote sources via rclone (S3, Google Drive, etc.) +- **Neo4j**: Enable persistent graph storage (see README § Neo4j integration) + +**Documentation**: +- Full README: `/README.md` +- Development workflow: `dev/README-planning.md` +- RO-Crate feature spec: `dev/features/ui/feature-rocrate-viewer-embedding.md` + +**Community**: +- Report issues: https://github.com/yourusername/scidk/issues +- Contributing: `CONTRIBUTING.md` + +--- + +**Total time**: ~25 minutes from clone to packaged RO-Crate + +**You're ready!** You've now: +- ✅ Installed SciDK +- ✅ Scanned a directory +- ✅ Browsed files and metadata +- ✅ Generated RO-Crate JSON-LD +- ✅ Exported a complete RO-Crate ZIP package + +Happy crate-ing! 🎉 diff --git a/dev b/dev index 2dac9d4..37723a9 160000 --- a/dev +++ b/dev @@ -1 +1 @@ -Subproject commit 2dac9d46136179f2f0d14bb9795f162f76cb2884 +Subproject commit 37723a9ce29dae2392ec120779730e721eb349a1 diff --git a/e2e/core-flows.spec.ts b/e2e/core-flows.spec.ts index 25cc391..535900d 100644 --- a/e2e/core-flows.spec.ts +++ b/e2e/core-flows.spec.ts @@ -122,3 +122,41 @@ test('browse page shows correct file listing structure', async ({ page, baseURL, // Cleanup fs.rmSync(tempDir, { recursive: true, force: true }); }); + +test('navigation covers all 7 pages', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + + // Start at home + await page.goto(base); + await page.waitForLoadState('networkidle'); + + // Define all pages with their nav test IDs, URLs, and expected titles + const pages = [ + { testId: 'nav-files', url: '/datasets', titlePattern: /Files|Datasets/i }, + { testId: 'nav-maps', url: '/map', titlePattern: /Map/i }, + { testId: 'nav-chats', url: '/chat', titlePattern: /Chat/i }, + { testId: 'nav-labels', url: '/labels', titlePattern: /Labels/i }, + { testId: 'nav-links', url: '/links', titlePattern: /Links/i }, + { testId: 'nav-settings', url: '/settings', titlePattern: /Settings/i }, + ]; + + for (const { testId, url, titlePattern } of pages) { + // Verify nav link is visible + const navLink = page.getByTestId(testId); + await expect(navLink).toBeVisible(); + + // Navigate + await navLink.click(); + await page.waitForLoadState('networkidle'); + + // Verify page loads correctly + await expect(page).toHaveURL(new RegExp(url)); + await expect(page).toHaveTitle(titlePattern); + } + + // Test home navigation via logo + await page.getByTestId('nav-home').click(); + await page.waitForLoadState('networkidle'); + await expect(page).toHaveURL(base); + await expect(page).toHaveTitle(/SciDK/i); +}); diff --git a/e2e/global-teardown.ts b/e2e/global-teardown.ts new file mode 100644 index 0000000..70093b9 --- /dev/null +++ b/e2e/global-teardown.ts @@ -0,0 +1,23 @@ +import { FullConfig } from '@playwright/test'; + +// Import the teardown function from global-setup +import { teardown } from './global-setup'; + +export default async function globalTeardown(config: FullConfig) { + // Clean up test scans before shutting down server + const baseUrl = (process as any).env.BASE_URL; + if (baseUrl) { + try { + const response = await fetch(`${baseUrl}/api/admin/cleanup-test-scans`, { + method: 'POST', + }); + const result = await response.json(); + console.log('[cleanup] Test scans cleaned up:', result); + } catch (error) { + console.error('[cleanup] Failed to cleanup test scans:', error); + } + } + + // Kill the server process + await teardown(); +} diff --git a/e2e/labels.spec.ts b/e2e/labels.spec.ts index 25cd194..dedc884 100644 --- a/e2e/labels.spec.ts +++ b/e2e/labels.spec.ts @@ -231,3 +231,83 @@ test('validation: cannot save label without name', async ({ page, baseURL }) => const value = await labelNameInput.inputValue(); expect(value).toBe(''); }); + +test('neo4j: push label to neo4j', async ({ page, baseURL, request: pageRequest }) => { + // Skip test if Neo4j is not configured + test.skip(!process.env.NEO4J_URI, 'NEO4J_URI not configured'); + + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/labels`); + await page.waitForLoadState('networkidle'); + + // Create a label first + await page.getByTestId('new-label-btn').click(); + await page.getByTestId('label-name').fill('Neo4jTestLabel'); + + // Add a property + await page.getByTestId('add-property-btn').click(); + const propertyRow = page.getByTestId('property-row').first(); + await propertyRow.getByTestId('property-name').fill('id'); + await propertyRow.getByTestId('property-type').selectOption('string'); + await propertyRow.getByTestId('property-required').check(); + + // Save the label + await page.getByTestId('save-label-btn').click(); + await page.waitForTimeout(1000); + + // Verify Push to Neo4j button is visible + const pushBtn = page.getByTestId('push-neo4j-btn'); + await expect(pushBtn).toBeVisible(); + + // Push to Neo4j + await pushBtn.click(); + await page.waitForTimeout(2000); + + // Wait for success toast (the push should succeed if Neo4j is connected) + // We can't easily check the toast content, but we can verify no errors occurred + // by checking that the page is still functional + + // Verify label is still loadable + const labelItems = page.getByTestId('label-item'); + await expect(labelItems.first()).toBeVisible(); + + // Cleanup: delete the test label + page.on('dialog', async (dialog) => await dialog.accept()); + await labelItems.first().click(); + await page.waitForTimeout(300); + await page.getByTestId('delete-label-btn').click(); + await page.waitForTimeout(500); +}); + +test('neo4j: pull labels from neo4j', async ({ page, baseURL }) => { + // Skip test if Neo4j is not configured + test.skip(!process.env.NEO4J_URI, 'NEO4J_URI not configured'); + + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/labels`); + await page.waitForLoadState('networkidle'); + + // Click the "New Label" button to show the editor + await page.getByTestId('new-label-btn').click(); + + // Verify Pull from Neo4j button is visible + const pullBtn = page.getByTestId('pull-neo4j-btn'); + await expect(pullBtn).toBeVisible(); + + // Set up dialog handler before clicking + page.on('dialog', async (dialog) => { + expect(dialog.type()).toBe('confirm'); + expect(dialog.message()).toContain('Pull schema from Neo4j'); + await dialog.accept(); + }); + + // Click Pull from Neo4j + await pullBtn.click(); + await page.waitForTimeout(2000); + + // After pulling, labels should be loaded (if any exist in Neo4j) + // We can't guarantee any labels exist, but the operation should complete without error + // Verify the label list is still visible and functional + const labelList = page.getByTestId('label-list'); + await expect(labelList).toBeVisible(); +}); diff --git a/e2e/links.spec.ts b/e2e/links.spec.ts new file mode 100644 index 0000000..80139cb --- /dev/null +++ b/e2e/links.spec.ts @@ -0,0 +1,441 @@ +import { test, expect } from '@playwright/test'; + +/** + * E2E tests for Links page functionality. + * Tests the complete workflow: create link definition → configure source → configure target → define relationship → preview → execute + */ + +test('links page loads and displays empty state', async ({ page, baseURL }) => { + const consoleMessages: { type: string; text: string }[] = []; + page.on('console', (msg) => { + consoleMessages.push({ type: msg.type(), text: msg.text() }); + }); + + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + + // Navigate to Links page + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + // Verify page loads + await expect(page).toHaveTitle(/SciDK - Links/i, { timeout: 10_000 }); + + // Check for new link button + await expect(page.getByTestId('new-link-btn')).toBeVisible(); + + // Check for link list + await expect(page.getByTestId('link-list')).toBeVisible(); + + // No console errors + const errors = consoleMessages.filter((m) => m.type === 'error'); + expect(errors.length).toBe(0); +}); + +test('links navigation link is visible in header', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + + await page.goto(base); + await page.waitForLoadState('networkidle'); + + // Check that Links link exists in navigation + const linksLink = page.getByTestId('nav-links'); + await expect(linksLink).toBeVisible(); + + // Click it and verify we navigate to links page + await linksLink.click(); + await page.waitForLoadState('networkidle'); + await expect(page).toHaveTitle(/SciDK - Links/i); +}); + +test('wizard navigation: can navigate through all 4 steps', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + // Click "New Link" button + await page.getByTestId('new-link-btn').click(); + + // Verify wizard is visible + await expect(page.locator('#link-wizard')).toBeVisible(); + + // Step 1 should be active + await expect(page.locator('.wizard-step[data-step="1"]')).toHaveClass(/active/); + + // Enter link name + await page.getByTestId('link-name').fill('Test Link'); + + // Click Next to go to step 2 + await page.locator('#btn-next').click(); + await expect(page.locator('.wizard-step[data-step="2"]')).toHaveClass(/active/); + + // Click Next to go to step 3 + await page.locator('#btn-next').click(); + await expect(page.locator('.wizard-step[data-step="3"]')).toHaveClass(/active/); + + // Enter relationship type + await page.locator('#rel-type').fill('TEST_REL'); + + // Click Next to go to step 4 + await page.locator('#btn-next').click(); + await expect(page.locator('.wizard-step[data-step="4"]')).toHaveClass(/active/); + + // Verify Back button is visible + await expect(page.locator('#btn-prev')).toBeVisible(); + + // Click Back to go to step 3 + await page.locator('#btn-prev').click(); + await expect(page.locator('.wizard-step[data-step="3"]')).toHaveClass(/active/); +}); + +test('can create CSV to Graph link definition', async ({ page, baseURL }) => { + const consoleMessages: { type: string; text: string }[] = []; + page.on('console', (msg) => { + consoleMessages.push({ type: msg.type(), text: msg.text() }); + }); + + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + // Click "New Link" button + await page.getByTestId('new-link-btn').click(); + + // Step 1: Configure Source + await page.getByTestId('link-name').fill('CSV Authors to Files'); + + // Select CSV source type + await page.locator('.source-type-btn[data-source="csv"]').click(); + + // Enter CSV data + const csvData = 'name,email,file_path\nAlice,alice@ex.com,file1.txt\nBob,bob@ex.com,file2.txt'; + await page.locator('#csv-data').fill(csvData); + + // Go to Step 2 + await page.locator('#btn-next').click(); + + // Step 2: Configure Target + // Label target should be selected by default + await page.locator('#target-label-name').fill('File'); + + // Configure match strategy (property should be default) + await page.locator('#match-source-field').fill('file_path'); + await page.locator('#match-target-field').fill('path'); + + // Go to Step 3 + await page.locator('#btn-next').click(); + + // Step 3: Define Relationship + await page.locator('#rel-type').fill('AUTHORED'); + + // Add a relationship property + await page.locator('#btn-add-rel-prop').click(); + const propRows = page.locator('#rel-props-container .property-row'); + await expect(propRows).toHaveCount(1); + await propRows.locator('[data-prop-key]').fill('date'); + await propRows.locator('[data-prop-value]').fill('2024-01-15'); + + // Save the definition + await page.locator('#btn-save-def').click(); + await page.waitForTimeout(1500); // Wait for save + + // Verify link appears in list + const linkItems = page.locator('.link-item'); + await expect(linkItems.first()).toBeVisible(); + const linkText = await linkItems.first().textContent(); + expect(linkText).toContain('CSV Authors to Files'); + expect(linkText).toContain('csv'); + expect(linkText).toContain('AUTHORED'); + + // No console errors + const errors = consoleMessages.filter((m) => m.type === 'error'); + expect(errors.length).toBe(0); +}); + +test('can create Graph to Graph link definition', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + // Click "New Link" button + await page.getByTestId('new-link-btn').click(); + + // Step 1: Configure Source (Graph is default) + await page.getByTestId('link-name').fill('Person to File Link'); + await page.locator('#source-label').fill('Person'); + await page.locator('#source-where').fill('p.role = "author"'); + + // Go to Step 2 + await page.locator('#btn-next').click(); + + // Step 2: Configure Target + await page.locator('#target-label-name').fill('File'); + await page.locator('#match-source-field').fill('email'); + await page.locator('#match-target-field').fill('author_email'); + + // Go to Step 3 + await page.locator('#btn-next').click(); + + // Step 3: Define Relationship + await page.locator('#rel-type').fill('AUTHORED_BY'); + + // Save the definition + await page.locator('#btn-save-def').click(); + await page.waitForTimeout(1500); + + // Verify link appears in list + const linkItems = page.locator('.link-item'); + const linkText = await linkItems.first().textContent(); + expect(linkText).toContain('Person to File Link'); + expect(linkText).toContain('graph'); +}); + +test('can save and load link definition', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + const uniqueName = `Test Save Load ${Date.now()}`; + + // Create a link definition + await page.getByTestId('new-link-btn').click(); + await page.getByTestId('link-name').fill(uniqueName); + await page.locator('.source-type-btn[data-source="csv"]').click(); + await page.locator('#csv-data').fill('col1,col2\nval1,val2'); + await page.locator('#btn-next').click(); + await page.locator('#target-label-name').fill('TestLabel'); + await page.locator('#match-source-field').fill('col1'); + await page.locator('#match-target-field').fill('field1'); + await page.locator('#btn-next').click(); + await page.locator('#rel-type').fill('TEST_REL'); + await page.locator('#btn-save-def').click(); + await page.waitForTimeout(1500); + + // Click on the saved link by finding it by name + const linkItem = page.locator('.link-item').filter({ hasText: uniqueName }); + await linkItem.click(); + await page.waitForTimeout(500); + + // Verify wizard is populated with saved data + await expect(page.getByTestId('link-name')).toHaveValue(uniqueName); + + // Check that CSV button is active + await expect(page.locator('.source-type-btn[data-source="csv"]')).toHaveClass(/active/); + + // Navigate to step 2 and verify + await page.locator('#btn-next').click(); + await expect(page.locator('#target-label-name')).toHaveValue('TestLabel'); + await expect(page.locator('#match-source-field')).toHaveValue('col1'); + await expect(page.locator('#match-target-field')).toHaveValue('field1'); + + // Navigate to step 3 and verify + await page.locator('#btn-next').click(); + await expect(page.locator('#rel-type')).toHaveValue('TEST_REL'); + + // Cleanup: Delete the test link + page.once('dialog', async (dialog) => await dialog.accept()); + await page.locator('#btn-delete-def').click(); + await page.waitForTimeout(1000); +}); + +test('can delete link definition', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + + // Capture console logs and errors + const consoleLogs: string[] = []; + page.on('console', msg => consoleLogs.push(`[${msg.type()}] ${msg.text()}`)); + page.on('pageerror', err => consoleLogs.push(`[ERROR] ${err.message}`)); + + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + const uniqueName = `To Delete ${Date.now()}`; + + // Create a link definition + await page.getByTestId('new-link-btn').click(); + await page.getByTestId('link-name').fill(uniqueName); + await page.locator('#btn-next').click(); + await page.locator('#target-label-name').fill('TestLabel'); + await page.locator('#btn-next').click(); + await page.locator('#rel-type').fill('DELETE_ME'); + await page.locator('#btn-save-def').click(); + await page.waitForTimeout(1500); + + // Load the link by finding it by name + const linkItem = page.locator('.link-item').filter({ hasText: uniqueName }); + await linkItem.click(); + await page.waitForTimeout(500); + + // Delete button should be visible + const deleteBtn = page.locator('#btn-delete-def'); + await expect(deleteBtn).toBeVisible(); + + // Handle confirmation dialog + page.once('dialog', async (dialog) => { + expect(dialog.type()).toBe('confirm'); + await dialog.accept(); + }); + + await deleteBtn.click(); + + // Wait for wizard to hide (indicates delete completed) + try { + await expect(page.locator('#link-wizard')).toBeHidden({ timeout: 5000 }); + } catch (e) { + console.log('Console logs:', consoleLogs.join('\n')); + throw e; + } + + // Wait a bit more for list to update + await page.waitForTimeout(1000); + + // Verify link is removed from list - it should not appear anywhere + const listItems = await page.locator('.link-item').all(); + const listTexts = await Promise.all(listItems.map(item => item.textContent())); + const found = listTexts.some(text => text?.includes(uniqueName)); + + if (found) { + console.log('Console logs:', consoleLogs.join('\n')); + } + + expect(found).toBe(false); +}); + +test('validation: cannot save without name', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + // Create new link but don't enter name + await page.getByTestId('new-link-btn').click(); + + // Try to save without name + await page.locator('#btn-save-def').click(); + await page.waitForTimeout(500); + + // Should still be on wizard (not saved) + await expect(page.getByTestId('link-name')).toBeVisible(); + const value = await page.getByTestId('link-name').inputValue(); + expect(value).toBe(''); +}); + +test('validation: cannot save without relationship type', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + // Create new link with name but no relationship type + await page.getByTestId('new-link-btn').click(); + await page.getByTestId('link-name').fill('No Rel Type'); + + // Navigate to step 3 + await page.locator('#btn-next').click(); + await page.locator('#btn-next').click(); + + // Don't enter relationship type + + // Try to save + await page.locator('#btn-save-def').click(); + await page.waitForTimeout(500); + + // Should still be on wizard + await expect(page.locator('#rel-type')).toBeVisible(); + const value = await page.locator('#rel-type').inputValue(); + expect(value).toBe(''); +}); + +test('can switch between source types', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + await page.getByTestId('new-link-btn').click(); + + // Graph source should be visible by default + await expect(page.locator('#source-graph')).toBeVisible(); + await expect(page.locator('#source-csv')).not.toBeVisible(); + await expect(page.locator('#source-api')).not.toBeVisible(); + + // Switch to CSV + await page.locator('.source-type-btn[data-source="csv"]').click(); + await expect(page.locator('#source-graph')).not.toBeVisible(); + await expect(page.locator('#source-csv')).toBeVisible(); + await expect(page.locator('#source-api')).not.toBeVisible(); + + // Switch to API + await page.locator('.source-type-btn[data-source="api"]').click(); + await expect(page.locator('#source-graph')).not.toBeVisible(); + await expect(page.locator('#source-csv')).not.toBeVisible(); + await expect(page.locator('#source-api')).toBeVisible(); + + // Switch back to Graph + await page.locator('.source-type-btn[data-source="graph"]').click(); + await expect(page.locator('#source-graph')).toBeVisible(); + await expect(page.locator('#source-csv')).not.toBeVisible(); + await expect(page.locator('#source-api')).not.toBeVisible(); +}); + +test('can switch between match strategies', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + await page.getByTestId('new-link-btn').click(); + + // Navigate to step 2 + await page.locator('#btn-next').click(); + + // Property match should be visible by default + await expect(page.locator('#match-property')).toBeVisible(); + await expect(page.locator('#match-id')).not.toBeVisible(); + await expect(page.locator('#match-cypher')).not.toBeVisible(); + + // Switch to ID match + await page.locator('.match-strategy-btn[data-strategy="id"]').click(); + await expect(page.locator('#match-property')).not.toBeVisible(); + await expect(page.locator('#match-id')).toBeVisible(); + await expect(page.locator('#match-cypher')).not.toBeVisible(); + + // Switch to Cypher match + await page.locator('.match-strategy-btn[data-strategy="cypher"]').click(); + await expect(page.locator('#match-property')).not.toBeVisible(); + await expect(page.locator('#match-id')).not.toBeVisible(); + await expect(page.locator('#match-cypher')).toBeVisible(); + + // Switch back to Property match + await page.locator('.match-strategy-btn[data-strategy="property"]').click(); + await expect(page.locator('#match-property')).toBeVisible(); + await expect(page.locator('#match-id')).not.toBeVisible(); + await expect(page.locator('#match-cypher')).not.toBeVisible(); +}); + +test('can add and remove relationship properties', async ({ page, baseURL }) => { + const base = baseURL || process.env.BASE_URL || 'http://127.0.0.1:5000'; + await page.goto(`${base}/links`); + await page.waitForLoadState('networkidle'); + + await page.getByTestId('new-link-btn').click(); + + // Navigate to step 3 + await page.locator('#btn-next').click(); + await page.locator('#btn-next').click(); + + // Add 3 relationship properties + for (let i = 0; i < 3; i++) { + await page.locator('#btn-add-rel-prop').click(); + } + + // Verify 3 property rows exist + const propRows = page.locator('#rel-props-container .property-row'); + await expect(propRows).toHaveCount(3); + + // Fill in values + await propRows.nth(0).locator('[data-prop-key]').fill('key1'); + await propRows.nth(1).locator('[data-prop-key]').fill('key2'); + await propRows.nth(2).locator('[data-prop-key]').fill('key3'); + + // Remove the second property + await propRows.nth(1).locator('button').click(); + + // Verify only 2 properties remain + await expect(page.locator('#rel-props-container .property-row')).toHaveCount(2); +}); diff --git a/e2e/playwright.config.ts b/e2e/playwright.config.ts index 473f02b..c7f0383 100644 --- a/e2e/playwright.config.ts +++ b/e2e/playwright.config.ts @@ -9,4 +9,5 @@ export default defineConfig({ headless: true, }, globalSetup: require.resolve('./global-setup'), + globalTeardown: require.resolve('./global-teardown'), }); diff --git a/scidk/core/migrations.py b/scidk/core/migrations.py index 983cbed..d3ceeca 100644 --- a/scidk/core/migrations.py +++ b/scidk/core/migrations.py @@ -265,6 +265,47 @@ def migrate(conn: Optional[sqlite3.Connection] = None) -> int: _set_version(conn, 5) version = 5 + # v6: link_definitions and link_jobs for relationship creation workflows + if version < 6: + cur.execute( + """ + CREATE TABLE IF NOT EXISTS link_definitions ( + id TEXT PRIMARY KEY, + name TEXT, + source_type TEXT, + source_config TEXT, + target_type TEXT, + target_config TEXT, + match_strategy TEXT, + match_config TEXT, + relationship_type TEXT, + relationship_props TEXT, + created_at REAL, + updated_at REAL + ); + """ + ) + cur.execute( + """ + CREATE TABLE IF NOT EXISTS link_jobs ( + id TEXT PRIMARY KEY, + link_def_id TEXT, + status TEXT, + preview_count INTEGER, + executed_count INTEGER, + error TEXT, + started_at REAL, + completed_at REAL, + FOREIGN KEY (link_def_id) REFERENCES link_definitions(id) + ); + """ + ) + cur.execute("CREATE INDEX IF NOT EXISTS idx_link_jobs_def ON link_jobs(link_def_id);") + cur.execute("CREATE INDEX IF NOT EXISTS idx_link_jobs_status ON link_jobs(status);") + conn.commit() + _set_version(conn, 6) + version = 6 + return version finally: if own: diff --git a/scidk/services/label_service.py b/scidk/services/label_service.py index ca01eb7..13cd081 100644 --- a/scidk/services/label_service.py +++ b/scidk/services/label_service.py @@ -18,8 +18,11 @@ class LabelService: def __init__(self, app): self.app = app + + def _get_conn(self): + """Get a database connection.""" from ..core import path_index_sqlite as pix - self.conn = pix.connect() + return pix.connect() def list_labels(self) -> List[Dict[str, Any]]: """ @@ -28,27 +31,31 @@ def list_labels(self) -> List[Dict[str, Any]]: Returns: List of label definition dicts with keys: name, properties, relationships, created_at, updated_at """ - cursor = self.conn.cursor() - cursor.execute( - """ - SELECT name, properties, relationships, created_at, updated_at - FROM label_definitions - ORDER BY name - """ - ) - rows = cursor.fetchall() - - labels = [] - for row in rows: - name, props_json, rels_json, created_at, updated_at = row - labels.append({ - 'name': name, - 'properties': json.loads(props_json) if props_json else [], - 'relationships': json.loads(rels_json) if rels_json else [], - 'created_at': created_at, - 'updated_at': updated_at - }) - return labels + conn = self._get_conn() + try: + cursor = conn.cursor() + cursor.execute( + """ + SELECT name, properties, relationships, created_at, updated_at + FROM label_definitions + ORDER BY name + """ + ) + rows = cursor.fetchall() + + labels = [] + for row in rows: + name, props_json, rels_json, created_at, updated_at = row + labels.append({ + 'name': name, + 'properties': json.loads(props_json) if props_json else [], + 'relationships': json.loads(rels_json) if rels_json else [], + 'created_at': created_at, + 'updated_at': updated_at + }) + return labels + finally: + conn.close() def get_label(self, name: str) -> Optional[Dict[str, Any]]: """ @@ -60,28 +67,32 @@ def get_label(self, name: str) -> Optional[Dict[str, Any]]: Returns: Label definition dict or None if not found """ - cursor = self.conn.cursor() - cursor.execute( - """ - SELECT name, properties, relationships, created_at, updated_at - FROM label_definitions - WHERE name = ? - """, - (name,) - ) - row = cursor.fetchone() - - if not row: - return None - - name, props_json, rels_json, created_at, updated_at = row - return { - 'name': name, - 'properties': json.loads(props_json) if props_json else [], - 'relationships': json.loads(rels_json) if rels_json else [], - 'created_at': created_at, - 'updated_at': updated_at - } + conn = self._get_conn() + try: + cursor = conn.cursor() + cursor.execute( + """ + SELECT name, properties, relationships, created_at, updated_at + FROM label_definitions + WHERE name = ? + """, + (name,) + ) + row = cursor.fetchone() + + if not row: + return None + + name, props_json, rels_json, created_at, updated_at = row + return { + 'name': name, + 'properties': json.loads(props_json) if props_json else [], + 'relationships': json.loads(rels_json) if rels_json else [], + 'created_at': created_at, + 'updated_at': updated_at + } + finally: + conn.close() def save_label(self, definition: Dict[str, Any]) -> Dict[str, Any]: """ @@ -117,38 +128,42 @@ def save_label(self, definition: Dict[str, Any]) -> Dict[str, Any]: # Check if label exists existing = self.get_label(name) - cursor = self.conn.cursor() - if existing: - # Update - cursor.execute( - """ - UPDATE label_definitions - SET properties = ?, relationships = ?, updated_at = ? - WHERE name = ? - """, - (props_json, rels_json, now, name) - ) - created_at = existing['created_at'] - else: - # Insert - cursor.execute( - """ - INSERT INTO label_definitions (name, properties, relationships, created_at, updated_at) - VALUES (?, ?, ?, ?, ?) - """, - (name, props_json, rels_json, now, now) - ) - created_at = now - - self.conn.commit() + conn = self._get_conn() + try: + cursor = conn.cursor() + if existing: + # Update + cursor.execute( + """ + UPDATE label_definitions + SET properties = ?, relationships = ?, updated_at = ? + WHERE name = ? + """, + (props_json, rels_json, now, name) + ) + created_at = existing['created_at'] + else: + # Insert + cursor.execute( + """ + INSERT INTO label_definitions (name, properties, relationships, created_at, updated_at) + VALUES (?, ?, ?, ?, ?) + """, + (name, props_json, rels_json, now, now) + ) + created_at = now + + conn.commit() - return { - 'name': name, - 'properties': properties, - 'relationships': relationships, - 'created_at': created_at, - 'updated_at': now - } + return { + 'name': name, + 'properties': properties, + 'relationships': relationships, + 'created_at': created_at, + 'updated_at': now + } + finally: + conn.close() def delete_label(self, name: str) -> bool: """ @@ -160,10 +175,14 @@ def delete_label(self, name: str) -> bool: Returns: True if deleted, False if not found """ - cursor = self.conn.cursor() - cursor.execute("DELETE FROM label_definitions WHERE name = ?", (name,)) - self.conn.commit() - return cursor.rowcount > 0 + conn = self._get_conn() + try: + cursor = conn.cursor() + cursor.execute("DELETE FROM label_definitions WHERE name = ?", (name,)) + conn.commit() + return cursor.rowcount > 0 + finally: + conn.close() def push_to_neo4j(self, name: str) -> Dict[str, Any]: """ diff --git a/scidk/services/link_service.py b/scidk/services/link_service.py new file mode 100644 index 0000000..e8697e6 --- /dev/null +++ b/scidk/services/link_service.py @@ -0,0 +1,664 @@ +""" +Link service for managing relationship creation workflows. + +This service provides operations for: +- CRUD operations on link definitions (stored in SQLite) +- Preview and execution of link jobs +- Source adapters (Graph, CSV, API) +- Target adapters (Graph, Label) +- Matching strategies (Property, ID, Custom Cypher) +""" +from __future__ import annotations +from typing import Dict, List, Any, Optional +import json +import time +import sqlite3 +import uuid +import csv +import io +import requests + + +class LinkService: + """Service for managing link definitions and executing relationship creation workflows.""" + + def __init__(self, app): + self.app = app + + def _get_conn(self): + """Get a database connection.""" + from ..core import path_index_sqlite as pix + return pix.connect() + + def list_link_definitions(self) -> List[Dict[str, Any]]: + """ + Get all link definitions from SQLite. + + Returns: + List of link definition dicts + """ + conn = self._get_conn() + try: + cursor = conn.cursor() + cursor.execute( + """ + SELECT id, name, source_type, source_config, target_type, target_config, + match_strategy, match_config, relationship_type, relationship_props, + created_at, updated_at + FROM link_definitions + ORDER BY updated_at DESC + """ + ) + rows = cursor.fetchall() + + definitions = [] + for row in rows: + (id, name, source_type, source_config, target_type, target_config, + match_strategy, match_config, rel_type, rel_props, created_at, updated_at) = row + definitions.append({ + 'id': id, + 'name': name, + 'source_type': source_type, + 'source_config': json.loads(source_config) if source_config else {}, + 'target_type': target_type, + 'target_config': json.loads(target_config) if target_config else {}, + 'match_strategy': match_strategy, + 'match_config': json.loads(match_config) if match_config else {}, + 'relationship_type': rel_type, + 'relationship_props': json.loads(rel_props) if rel_props else {}, + 'created_at': created_at, + 'updated_at': updated_at + }) + return definitions + finally: + conn.close() + + def get_link_definition(self, link_id: str) -> Optional[Dict[str, Any]]: + """ + Get a specific link definition by ID. + + Args: + link_id: Link definition ID + + Returns: + Link definition dict or None if not found + """ + conn = self._get_conn() + try: + cursor = conn.cursor() + cursor.execute( + """ + SELECT id, name, source_type, source_config, target_type, target_config, + match_strategy, match_config, relationship_type, relationship_props, + created_at, updated_at + FROM link_definitions + WHERE id = ? + """, + (link_id,) + ) + row = cursor.fetchone() + + if not row: + return None + + (id, name, source_type, source_config, target_type, target_config, + match_strategy, match_config, rel_type, rel_props, created_at, updated_at) = row + return { + 'id': id, + 'name': name, + 'source_type': source_type, + 'source_config': json.loads(source_config) if source_config else {}, + 'target_type': target_type, + 'target_config': json.loads(target_config) if target_config else {}, + 'match_strategy': match_strategy, + 'match_config': json.loads(match_config) if match_config else {}, + 'relationship_type': rel_type, + 'relationship_props': json.loads(rel_props) if rel_props else {}, + 'created_at': created_at, + 'updated_at': updated_at + } + finally: + conn.close() + + def save_link_definition(self, definition: Dict[str, Any]) -> Dict[str, Any]: + """ + Create or update a link definition. + + Args: + definition: Dict with required keys: name, source_type, target_type, match_strategy, relationship_type + + Returns: + Updated link definition + """ + link_id = definition.get('id') or str(uuid.uuid4()) + if not link_id or not link_id.strip(): + link_id = str(uuid.uuid4()) + + name = definition.get('name', '').strip() + if not name: + raise ValueError("Link name is required") + + source_type = definition.get('source_type', '').strip() + if source_type not in ['graph', 'csv', 'api']: + raise ValueError("source_type must be 'graph', 'csv', or 'api'") + + target_type = definition.get('target_type', '').strip() + if target_type not in ['graph', 'label']: + raise ValueError("target_type must be 'graph' or 'label'") + + match_strategy = definition.get('match_strategy', '').strip() + if match_strategy not in ['property', 'id', 'cypher']: + raise ValueError("match_strategy must be 'property', 'id', or 'cypher'") + + relationship_type = definition.get('relationship_type', '').strip() + if not relationship_type: + raise ValueError("relationship_type is required") + + source_config = json.dumps(definition.get('source_config', {})) + target_config = json.dumps(definition.get('target_config', {})) + match_config = json.dumps(definition.get('match_config', {})) + relationship_props = json.dumps(definition.get('relationship_props', {})) + + now = time.time() + + # Check if link exists + existing = self.get_link_definition(link_id) + + conn = self._get_conn() + try: + cursor = conn.cursor() + if existing: + # Update + cursor.execute( + """ + UPDATE link_definitions + SET name = ?, source_type = ?, source_config = ?, target_type = ?, + target_config = ?, match_strategy = ?, match_config = ?, + relationship_type = ?, relationship_props = ?, updated_at = ? + WHERE id = ? + """, + (name, source_type, source_config, target_type, target_config, + match_strategy, match_config, relationship_type, relationship_props, now, link_id) + ) + created_at = existing['created_at'] + else: + # Insert + cursor.execute( + """ + INSERT INTO link_definitions + (id, name, source_type, source_config, target_type, target_config, + match_strategy, match_config, relationship_type, relationship_props, + created_at, updated_at) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) + """, + (link_id, name, source_type, source_config, target_type, target_config, + match_strategy, match_config, relationship_type, relationship_props, now, now) + ) + created_at = now + + conn.commit() + + return { + 'id': link_id, + 'name': name, + 'source_type': source_type, + 'source_config': json.loads(source_config), + 'target_type': target_type, + 'target_config': json.loads(target_config), + 'match_strategy': match_strategy, + 'match_config': json.loads(match_config), + 'relationship_type': relationship_type, + 'relationship_props': json.loads(relationship_props), + 'created_at': created_at, + 'updated_at': now + } + finally: + conn.close() + + def delete_link_definition(self, link_id: str) -> bool: + """ + Delete a link definition. + + Args: + link_id: Link definition ID + + Returns: + True if deleted, False if not found + """ + conn = self._get_conn() + try: + cursor = conn.cursor() + cursor.execute("DELETE FROM link_definitions WHERE id = ?", (link_id,)) + conn.commit() + return cursor.rowcount > 0 + finally: + conn.close() + + def preview_matches(self, definition: Dict[str, Any], limit: int = 10) -> List[Dict[str, Any]]: + """ + Dry-run preview of link matches. + + Args: + definition: Link definition dict + limit: Maximum number of matches to return + + Returns: + List of match dicts with source and target info + """ + # Fetch source data + source_data = self._fetch_source_data(definition) + if not source_data: + return [] + + # Limit source data for preview + source_data = source_data[:limit] + + # Match with targets + matches = self._match_with_targets(definition, source_data, limit) + + return matches + + def execute_link_job(self, link_def_id: str) -> str: + """ + Start background job to create relationships. + + Args: + link_def_id: Link definition ID + + Returns: + Job ID + """ + definition = self.get_link_definition(link_def_id) + if not definition: + raise ValueError(f"Link definition '{link_def_id}' not found") + + job_id = str(uuid.uuid4()) + now = time.time() + + # Create job record + conn = self._get_conn() + try: + cursor = conn.cursor() + cursor.execute( + """ + INSERT INTO link_jobs + (id, link_def_id, status, preview_count, executed_count, started_at) + VALUES (?, ?, ?, ?, ?, ?) + """, + (job_id, link_def_id, 'pending', 0, 0, now) + ) + conn.commit() + + # Execute job (synchronously for MVP, could be async later) + try: + self._execute_job_impl(job_id, definition) + except Exception as e: + # Update job with error + cursor.execute( + """ + UPDATE link_jobs + SET status = ?, error = ?, completed_at = ? + WHERE id = ? + """, + ('failed', str(e), time.time(), job_id) + ) + conn.commit() + raise + + return job_id + finally: + conn.close() + + def get_job_status(self, job_id: str) -> Optional[Dict[str, Any]]: + """ + Get job status and progress. + + Args: + job_id: Job ID + + Returns: + Job status dict or None if not found + """ + conn = self._get_conn() + try: + cursor = conn.cursor() + cursor.execute( + """ + SELECT id, link_def_id, status, preview_count, executed_count, error, + started_at, completed_at + FROM link_jobs + WHERE id = ? + """, + (job_id,) + ) + row = cursor.fetchone() + + if not row: + return None + + (id, link_def_id, status, preview_count, executed_count, error, + started_at, completed_at) = row + return { + 'id': id, + 'link_def_id': link_def_id, + 'status': status, + 'preview_count': preview_count, + 'executed_count': executed_count, + 'error': error, + 'started_at': started_at, + 'completed_at': completed_at + } + finally: + conn.close() + + def list_jobs(self, limit: int = 20) -> List[Dict[str, Any]]: + """ + List recent jobs. + + Args: + limit: Maximum number of jobs to return + + Returns: + List of job status dicts + """ + conn = self._get_conn() + try: + cursor = conn.cursor() + cursor.execute( + """ + SELECT id, link_def_id, status, preview_count, executed_count, error, + started_at, completed_at + FROM link_jobs + ORDER BY started_at DESC + LIMIT ? + """, + (limit,) + ) + rows = cursor.fetchall() + + jobs = [] + for row in rows: + (id, link_def_id, status, preview_count, executed_count, error, + started_at, completed_at) = row + jobs.append({ + 'id': id, + 'link_def_id': link_def_id, + 'status': status, + 'preview_count': preview_count, + 'executed_count': executed_count, + 'error': error, + 'started_at': started_at, + 'completed_at': completed_at + }) + return jobs + finally: + conn.close() + + # --- Internal helpers --- + + def _fetch_source_data(self, definition: Dict[str, Any]) -> List[Dict[str, Any]]: + """Fetch data from source based on source_type.""" + source_type = definition.get('source_type') + source_config = definition.get('source_config', {}) + + if source_type == 'graph': + return self._fetch_graph_source(source_config) + elif source_type == 'csv': + return self._fetch_csv_source(source_config) + elif source_type == 'api': + return self._fetch_api_source(source_config) + else: + raise ValueError(f"Unknown source_type: {source_type}") + + def _fetch_graph_source(self, config: Dict[str, Any]) -> List[Dict[str, Any]]: + """Fetch nodes from Neo4j.""" + try: + from .neo4j_client import get_neo4j_client + neo4j_client = get_neo4j_client() + + if not neo4j_client: + raise Exception("Neo4j client not configured") + + label = config.get('label', '') + where_clause = config.get('where_clause', '') + + # Build query + query = f"MATCH (n:{label})" + if where_clause: + query += f" WHERE {where_clause}" + query += " RETURN n LIMIT 1000" + + results = neo4j_client.execute_read(query) + + # Convert to dicts + nodes = [] + for record in results: + node = record.get('n') + if node: + node_dict = dict(node) + node_dict['_id'] = node.id if hasattr(node, 'id') else None + nodes.append(node_dict) + + return nodes + except Exception as e: + raise Exception(f"Failed to fetch graph source: {str(e)}") + + def _fetch_csv_source(self, config: Dict[str, Any]) -> List[Dict[str, Any]]: + """Parse CSV data.""" + csv_data = config.get('csv_data', '') + if not csv_data: + return [] + + try: + reader = csv.DictReader(io.StringIO(csv_data)) + return list(reader) + except Exception as e: + raise Exception(f"Failed to parse CSV: {str(e)}") + + def _fetch_api_source(self, config: Dict[str, Any]) -> List[Dict[str, Any]]: + """Fetch data from API endpoint.""" + url = config.get('url', '') + if not url: + raise ValueError("API URL is required") + + headers = config.get('headers', {}) + json_path = config.get('json_path', '') + + try: + response = requests.get(url, headers=headers, timeout=30) + response.raise_for_status() + data = response.json() + + # Apply JSONPath if specified + if json_path: + # Simple JSONPath implementation for basic cases + # For production, use jsonpath-ng library + parts = json_path.strip('$').strip('.').split('.') + for part in parts: + if part.endswith('[*]'): + key = part[:-3] + data = data.get(key, []) + else: + data = data.get(part, data) + + return data if isinstance(data, list) else [data] + except Exception as e: + raise Exception(f"Failed to fetch API source: {str(e)}") + + def _match_with_targets(self, definition: Dict[str, Any], source_data: List[Dict[str, Any]], + limit: int = 10) -> List[Dict[str, Any]]: + """Match source data with target nodes.""" + target_type = definition.get('target_type') + target_config = definition.get('target_config', {}) + match_strategy = definition.get('match_strategy') + match_config = definition.get('match_config', {}) + + if target_type == 'graph': + return self._match_graph_target(source_data, target_config, match_strategy, match_config, limit) + elif target_type == 'label': + return self._match_label_target(source_data, target_config, match_strategy, match_config, limit) + else: + raise ValueError(f"Unknown target_type: {target_type}") + + def _match_graph_target(self, source_data: List[Dict[str, Any]], target_config: Dict[str, Any], + match_strategy: str, match_config: Dict[str, Any], + limit: int) -> List[Dict[str, Any]]: + """Match with existing graph nodes.""" + try: + from .neo4j_client import get_neo4j_client + neo4j_client = get_neo4j_client() + + if not neo4j_client: + raise Exception("Neo4j client not configured") + + matches = [] + for source_item in source_data[:limit]: + if match_strategy == 'property': + source_field = match_config.get('source_field', '') + target_field = match_config.get('target_field', '') + target_label = target_config.get('label', '') + + source_value = source_item.get(source_field) + if not source_value: + continue + + query = f"MATCH (t:{target_label}) WHERE t.{target_field} = $value RETURN t LIMIT 1" + results = neo4j_client.execute_read(query, {'value': source_value}) + + if results: + target_node = results[0].get('t') + matches.append({ + 'source': source_item, + 'target': dict(target_node) if target_node else None + }) + elif match_strategy == 'id': + # Direct ID match + target_id = source_item.get('target_id') + if not target_id: + continue + + query = "MATCH (t) WHERE id(t) = $id RETURN t" + results = neo4j_client.execute_read(query, {'id': int(target_id)}) + + if results: + target_node = results[0].get('t') + matches.append({ + 'source': source_item, + 'target': dict(target_node) if target_node else None + }) + elif match_strategy == 'cypher': + # Custom Cypher matching + cypher_template = match_config.get('cypher', '') + # Execute custom Cypher (with source_item parameters) + # This is a simplified version - production would need proper parameter binding + pass + + return matches + except Exception as e: + raise Exception(f"Failed to match graph target: {str(e)}") + + def _match_label_target(self, source_data: List[Dict[str, Any]], target_config: Dict[str, Any], + match_strategy: str, match_config: Dict[str, Any], + limit: int) -> List[Dict[str, Any]]: + """Match with nodes by label.""" + # Similar to _match_graph_target but filters by label + return self._match_graph_target(source_data, target_config, match_strategy, match_config, limit) + + def _execute_job_impl(self, job_id: str, definition: Dict[str, Any]): + """Execute the link job (create relationships in Neo4j).""" + conn = self._get_conn() + try: + from .neo4j_client import get_neo4j_client + neo4j_client = get_neo4j_client() + + if not neo4j_client: + raise Exception("Neo4j client not configured") + + # Update status to running + cursor = conn.cursor() + cursor.execute( + "UPDATE link_jobs SET status = ? WHERE id = ?", + ('running', job_id) + ) + conn.commit() + + # Fetch all source data + source_data = self._fetch_source_data(definition) + + # Match with targets + matches = self._match_with_targets(definition, source_data, limit=len(source_data)) + + # Create relationships in batches + relationship_type = definition.get('relationship_type', '') + relationship_props = definition.get('relationship_props', {}) + + batch_size = 1000 + total_created = 0 + + for i in range(0, len(matches), batch_size): + batch = matches[i:i + batch_size] + + # Build batch create query + batch_data = [] + for match in batch: + source = match.get('source', {}) + target = match.get('target', {}) + + if not target: + continue + + batch_data.append({ + 'source_id': source.get('_id') or source.get('id'), + 'target_id': target.get('_id') or target.get('id'), + 'properties': relationship_props + }) + + if batch_data: + query = f""" + UNWIND $batch AS row + MATCH (source) WHERE id(source) = row.source_id + MATCH (target) WHERE id(target) = row.target_id + CREATE (source)-[r:{relationship_type}]->(target) + SET r = row.properties + """ + neo4j_client.execute_write(query, {'batch': batch_data}) + total_created += len(batch_data) + + # Update job status to completed + cursor.execute( + """ + UPDATE link_jobs + SET status = ?, executed_count = ?, completed_at = ? + WHERE id = ? + """, + ('completed', total_created, time.time(), job_id) + ) + conn.commit() + except Exception as e: + # Update job with error + cursor = conn.cursor() + cursor.execute( + """ + UPDATE link_jobs + SET status = ?, error = ?, completed_at = ? + WHERE id = ? + """, + ('failed', str(e), time.time(), job_id) + ) + conn.commit() + raise + finally: + conn.close() + + +def get_neo4j_client(): + """Get or create Neo4j client instance.""" + from .neo4j_client import get_neo4j_params, Neo4jClient + uri, user, pwd, database, auth_mode = get_neo4j_params() + + if not uri: + return None + + client = Neo4jClient(uri, user, pwd, database, auth_mode) + client.connect() + return client diff --git a/scidk/ui/templates/base.html b/scidk/ui/templates/base.html index f012c6f..65839fc 100644 --- a/scidk/ui/templates/base.html +++ b/scidk/ui/templates/base.html @@ -36,6 +36,7 @@

Maps Chats Labels + Links Settings diff --git a/scidk/ui/templates/links.html b/scidk/ui/templates/links.html new file mode 100644 index 0000000..b43d18c --- /dev/null +++ b/scidk/ui/templates/links.html @@ -0,0 +1,913 @@ +{% extends 'base.html' %} +{% block title %}SciDK - Links{% endblock %} +{% block content %} + + +

Links

+

Create relationships between data instances using graph, CSV, or API sources.

+ + + + +{% endblock %} diff --git a/scidk/web/routes/__init__.py b/scidk/web/routes/__init__.py index 094be0b..24eccba 100644 --- a/scidk/web/routes/__init__.py +++ b/scidk/web/routes/__init__.py @@ -35,6 +35,7 @@ def register_blueprints(app): from . import api_providers from . import api_annotations from . import api_labels + from . import api_links # Register UI blueprint app.register_blueprint(ui.bp) @@ -50,3 +51,4 @@ def register_blueprints(app): app.register_blueprint(api_providers.bp) app.register_blueprint(api_annotations.bp) app.register_blueprint(api_labels.bp) + app.register_blueprint(api_links.bp) diff --git a/scidk/web/routes/api_links.py b/scidk/web/routes/api_links.py new file mode 100644 index 0000000..ef67a17 --- /dev/null +++ b/scidk/web/routes/api_links.py @@ -0,0 +1,333 @@ +""" +Blueprint for Links API routes. + +Provides REST endpoints for: +- Link definitions CRUD +- Preview and execution of link jobs +- Job status tracking +""" +from flask import Blueprint, jsonify, request, current_app + +bp = Blueprint('links', __name__, url_prefix='/api') + + +def _get_link_service(): + """Get or create LinkService instance.""" + from ...services.link_service import LinkService + if 'link_service' not in current_app.extensions.get('scidk', {}): + if 'scidk' not in current_app.extensions: + current_app.extensions['scidk'] = {} + current_app.extensions['scidk']['link_service'] = LinkService(current_app) + return current_app.extensions['scidk']['link_service'] + + +@bp.route('/links', methods=['GET']) +def list_links(): + """ + Get all link definitions. + + Returns: + { + "status": "success", + "links": [ + { + "id": "uuid", + "name": "Author to File", + "source_type": "csv", + "target_type": "label", + "match_strategy": "property", + "relationship_type": "AUTHORED", + ... + } + ] + } + """ + try: + service = _get_link_service() + links = service.list_link_definitions() + return jsonify({ + 'status': 'success', + 'links': links + }), 200 + except Exception as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 500 + + +@bp.route('/links/', methods=['GET']) +def get_link(link_id): + """ + Get a specific link definition by ID. + + Returns: + { + "status": "success", + "link": {...} + } + """ + try: + service = _get_link_service() + link = service.get_link_definition(link_id) + + if not link: + return jsonify({ + 'status': 'error', + 'error': f'Link "{link_id}" not found' + }), 404 + + return jsonify({ + 'status': 'success', + 'link': link + }), 200 + except Exception as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 500 + + +@bp.route('/links', methods=['POST']) +def create_or_update_link(): + """ + Create or update a link definition. + + Request body: + { + "id": "optional-uuid", + "name": "Author to File", + "source_type": "csv", + "source_config": { + "csv_data": "name,email,file_path\\nAlice,alice@ex.com,file1.txt" + }, + "target_type": "label", + "target_config": { + "label": "File" + }, + "match_strategy": "property", + "match_config": { + "source_field": "file_path", + "target_field": "path" + }, + "relationship_type": "AUTHORED", + "relationship_props": { + "date": "2024-01-15" + } + } + + Returns: + { + "status": "success", + "link": {...} + } + """ + try: + data = request.get_json(force=True, silent=True) or {} + + if not data.get('name'): + return jsonify({ + 'status': 'error', + 'error': 'Link name is required' + }), 400 + + service = _get_link_service() + link = service.save_link_definition(data) + + return jsonify({ + 'status': 'success', + 'link': link + }), 200 + except ValueError as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 400 + except Exception as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 500 + + +@bp.route('/links/', methods=['DELETE']) +def delete_link(link_id): + """ + Delete a link definition. + + Returns: + { + "status": "success", + "message": "Link deleted" + } + """ + try: + service = _get_link_service() + deleted = service.delete_link_definition(link_id) + + if not deleted: + return jsonify({ + 'status': 'error', + 'error': f'Link "{link_id}" not found' + }), 404 + + return jsonify({ + 'status': 'success', + 'message': f'Link "{link_id}" deleted' + }), 200 + except Exception as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 500 + + +@bp.route('/links//preview', methods=['POST']) +def preview_link(link_id): + """ + Preview link matches (dry-run). + + Request body (optional): + { + "limit": 10 + } + + Returns: + { + "status": "success", + "matches": [ + { + "source": {"name": "Alice", "email": "alice@ex.com", ...}, + "target": {"path": "file1.txt", ...} + } + ], + "count": 5 + } + """ + try: + service = _get_link_service() + link = service.get_link_definition(link_id) + + if not link: + return jsonify({ + 'status': 'error', + 'error': f'Link "{link_id}" not found' + }), 404 + + data = request.get_json(force=True, silent=True) or {} + limit = data.get('limit', 10) + + matches = service.preview_matches(link, limit=limit) + + return jsonify({ + 'status': 'success', + 'matches': matches, + 'count': len(matches) + }), 200 + except Exception as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 500 + + +@bp.route('/links//execute', methods=['POST']) +def execute_link(link_id): + """ + Execute link job (create relationships in Neo4j). + + Returns: + { + "status": "success", + "job_id": "uuid" + } + """ + try: + service = _get_link_service() + job_id = service.execute_link_job(link_id) + + return jsonify({ + 'status': 'success', + 'job_id': job_id + }), 200 + except ValueError as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 404 + except Exception as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 500 + + +@bp.route('/links/jobs/', methods=['GET']) +def get_job_status(job_id): + """ + Get job status and progress. + + Returns: + { + "status": "success", + "job": { + "id": "uuid", + "link_def_id": "uuid", + "status": "completed", + "preview_count": 0, + "executed_count": 23, + "error": null, + "started_at": 1234567890.123, + "completed_at": 1234567895.456 + } + } + """ + try: + service = _get_link_service() + job = service.get_job_status(job_id) + + if not job: + return jsonify({ + 'status': 'error', + 'error': f'Job "{job_id}" not found' + }), 404 + + return jsonify({ + 'status': 'success', + 'job': job + }), 200 + except Exception as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 500 + + +@bp.route('/links/jobs', methods=['GET']) +def list_jobs(): + """ + List recent link jobs. + + Query params: + - limit: Maximum number of jobs to return (default: 20) + + Returns: + { + "status": "success", + "jobs": [...] + } + """ + try: + limit = int(request.args.get('limit', 20)) + service = _get_link_service() + jobs = service.list_jobs(limit=limit) + + return jsonify({ + 'status': 'success', + 'jobs': jobs + }), 200 + except Exception as e: + return jsonify({ + 'status': 'error', + 'error': str(e) + }), 500 diff --git a/scidk/web/routes/ui.py b/scidk/web/routes/ui.py index 90ba156..cf5f1e1 100644 --- a/scidk/web/routes/ui.py +++ b/scidk/web/routes/ui.py @@ -196,6 +196,12 @@ def labels(): return render_template('labels.html') +@bp.get('/links') +def links(): + """Link definitions page for relationship creation workflows.""" + return render_template('links.html') + + @bp.get('/settings') def settings(): """Basic settings from environment and current in-memory sizes.""" diff --git a/scripts/demo_rocrate_export.sh b/scripts/demo_rocrate_export.sh new file mode 100755 index 0000000..7fc8424 --- /dev/null +++ b/scripts/demo_rocrate_export.sh @@ -0,0 +1,369 @@ +#!/usr/bin/env bash +# SciDK Demo: Scan → Browse → Select → Create RO-Crate → ZIP Export +# +# Usage: +# ./scripts/demo_rocrate_export.sh [SOURCE_PATH] [OUTPUT_ZIP] +# +# Example: +# ./scripts/demo_rocrate_export.sh ~/Documents/my-project ./my-crate.zip +# +# Prerequisites: +# - SciDK server running at http://127.0.0.1:5000 (or set SCIDK_URL) +# - jq installed (for JSON parsing) +# - curl installed + +set -euo pipefail + +# Configuration +SCIDK_URL="${SCIDK_URL:-http://127.0.0.1:5000}" +SOURCE_PATH="${1:-}" +OUTPUT_ZIP="${2:-./demo-crate.zip}" +TEMP_DIR=$(mktemp -d) + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Helper functions +log_info() { + echo -e "${BLUE}[INFO]${NC} $*" +} + +log_success() { + echo -e "${GREEN}[SUCCESS]${NC} $*" +} + +log_error() { + echo -e "${RED}[ERROR]${NC} $*" >&2 +} + +log_warning() { + echo -e "${YELLOW}[WARNING]${NC} $*" +} + +cleanup() { + log_info "Cleaning up temporary directory: ${TEMP_DIR}" + rm -rf "${TEMP_DIR}" +} + +trap cleanup EXIT + +check_dependencies() { + local missing=() + + if ! command -v curl &> /dev/null; then + missing+=("curl") + fi + + if ! command -v jq &> /dev/null; then + missing+=("jq") + fi + + if [ ${#missing[@]} -gt 0 ]; then + log_error "Missing required dependencies: ${missing[*]}" + log_info "Install with: brew install ${missing[*]} # macOS" + log_info " or: sudo apt install ${missing[*]} # Debian/Ubuntu" + exit 1 + fi +} + +check_server() { + log_info "Checking SciDK server at ${SCIDK_URL}..." + + if ! curl -sf "${SCIDK_URL}/api/health" &> /dev/null; then + log_error "SciDK server not responding at ${SCIDK_URL}" + log_info "Start the server with: scidk-serve" + log_info "Or set SCIDK_URL to your server address" + exit 1 + fi + + log_success "Server is running" +} + +# Main workflow steps + +step1_scan() { + local path="$1" + + log_info "Step 1: Scanning directory ${path}..." + + if [ ! -d "${path}" ]; then + log_error "Directory does not exist: ${path}" + exit 1 + fi + + # Trigger scan via API + local response + response=$(curl -sf -X POST "${SCIDK_URL}/api/scan" \ + -H "Content-Type: application/json" \ + -d "{\"path\": \"${path}\", \"recursive\": true}" || echo "{}") + + if [ -z "${response}" ] || ! echo "${response}" | jq -e '.scan_id' &> /dev/null; then + log_error "Scan failed. Response: ${response}" + exit 1 + fi + + local scan_id + scan_id=$(echo "${response}" | jq -r '.scan_id') + local file_count + file_count=$(echo "${response}" | jq -r '.scanned // 0') + + log_success "Scan completed: scan_id=${scan_id}, files=${file_count}" + echo "${scan_id}" +} + +step2_browse() { + local scan_id="$1" + + log_info "Step 2: Browsing scanned datasets..." + + # Get datasets from scan + local datasets + datasets=$(curl -sf "${SCIDK_URL}/api/datasets" || echo "[]") + + local count + count=$(echo "${datasets}" | jq 'length') + + log_success "Found ${count} datasets" + + # Show sample (first 5) + if [ "${count}" -gt 0 ]; then + log_info "Sample datasets:" + echo "${datasets}" | jq -r '.[:5] | .[] | " - \(.filename) (\(.extension))"' + fi + + echo "${datasets}" +} + +step3_select() { + local datasets="$1" + + log_info "Step 3: Selecting files for RO-Crate..." + + # For demo, select all Python, CSV, and JSON files + local selected + selected=$(echo "${datasets}" | jq '[.[] | select(.extension == "py" or .extension == "csv" or .extension == "json" or .extension == "md")]') + + local count + count=$(echo "${selected}" | jq 'length') + + log_success "Selected ${count} files for RO-Crate" + + if [ "${count}" -gt 0 ]; then + log_info "Selected file types:" + echo "${selected}" | jq -r 'group_by(.extension) | .[] | "\(.length) x .\(.[0].extension)"' + fi + + echo "${selected}" +} + +step4_create_crate() { + local source_path="$1" + local crate_dir="${TEMP_DIR}/crate" + + log_info "Step 4: Creating RO-Crate metadata..." + + mkdir -p "${crate_dir}" + + # Generate ro-crate-metadata.json via API + local metadata_response + metadata_response=$(curl -sf "${SCIDK_URL}/api/rocrate?path=${source_path}" 2>/dev/null || echo '{}') + + if [ "$(echo "${metadata_response}" | jq -e 'has("@context")')" != "true" ]; then + log_warning "RO-Crate API not available or returned error" + log_info "Generating minimal RO-Crate metadata manually..." + + # Fallback: create minimal valid RO-Crate metadata + cat > "${crate_dir}/ro-crate-metadata.json" < "${crate_dir}/ro-crate-metadata.json" + fi + + log_success "RO-Crate metadata created at ${crate_dir}/ro-crate-metadata.json" + + # Copy data files + log_info "Copying data files to crate..." + + if [ -d "${source_path}" ]; then + cp -r "${source_path}"/* "${crate_dir}/" 2>/dev/null || log_warning "Some files may not have been copied" + log_success "Data files copied to crate" + else + log_warning "Source path is not a directory; skipping file copy" + fi + + echo "${crate_dir}" +} + +step5_export_zip() { + local crate_dir="$1" + local output_zip="$2" + + log_info "Step 5: Exporting RO-Crate as ZIP..." + + # Ensure output directory exists + local output_dir + output_dir=$(dirname "${output_zip}") + mkdir -p "${output_dir}" + + # Create ZIP + (cd "${crate_dir}" && zip -r - .) > "${output_zip}" + + local zip_size + zip_size=$(du -h "${output_zip}" | cut -f1) + + log_success "RO-Crate exported to ${output_zip} (${zip_size})" +} + +verify_crate() { + local output_zip="$1" + + log_info "Verifying RO-Crate package..." + + # Check ZIP contents + if command -v unzip &> /dev/null; then + log_info "ZIP contents:" + unzip -l "${output_zip}" | head -20 + fi + + # Check for required metadata file + if unzip -l "${output_zip}" | grep -q "ro-crate-metadata.json"; then + log_success "✓ ro-crate-metadata.json present" + else + log_error "✗ ro-crate-metadata.json missing" + return 1 + fi + + # Extract and validate JSON structure + local temp_json="${TEMP_DIR}/metadata.json" + unzip -p "${output_zip}" ro-crate-metadata.json > "${temp_json}" 2>/dev/null + + if jq -e '.["@context"]' "${temp_json}" &> /dev/null; then + log_success "✓ Valid JSON-LD with @context" + else + log_warning "✗ Missing or invalid @context" + fi + + if jq -e '.["@graph"]' "${temp_json}" &> /dev/null; then + log_success "✓ @graph present" + else + log_warning "✗ Missing @graph" + fi + + log_success "RO-Crate verification complete" +} + +print_summary() { + local output_zip="$1" + + cat < [output-zip]" + log_info "Example: $0 ~/Documents/my-project ./my-crate.zip" + exit 1 + fi + + # Convert to absolute path + SOURCE_PATH=$(cd "${SOURCE_PATH}" && pwd) + + log_info "Source: ${SOURCE_PATH}" + log_info "Output: ${OUTPUT_ZIP}" + echo "" + + # Pre-flight checks + check_dependencies + check_server + echo "" + + # Execute workflow + local scan_id + scan_id=$(step1_scan "${SOURCE_PATH}") + echo "" + + local datasets + datasets=$(step2_browse "${scan_id}") + echo "" + + local selected + selected=$(step3_select "${datasets}") + echo "" + + local crate_dir + crate_dir=$(step4_create_crate "${SOURCE_PATH}") + echo "" + + step5_export_zip "${crate_dir}" "${OUTPUT_ZIP}" + echo "" + + verify_crate "${OUTPUT_ZIP}" + echo "" + + print_summary "${OUTPUT_ZIP}" +} + +main "$@" diff --git a/tests/test_links_api.py b/tests/test_links_api.py new file mode 100644 index 0000000..b108d0b --- /dev/null +++ b/tests/test_links_api.py @@ -0,0 +1,314 @@ +""" +Tests for Links API endpoints. + +Tests cover: +- GET /api/links - list all link definitions +- GET /api/links/ - get link definition +- POST /api/links - create/update link definition +- DELETE /api/links/ - delete link definition +- POST /api/links//preview - preview link matches +- POST /api/links//execute - execute link job +- GET /api/links/jobs/ - get job status +- GET /api/links/jobs - list jobs +""" +import json +import pytest + + +def test_list_links_empty(client): + """Test listing links when none exist.""" + response = client.get('/api/links') + assert response.status_code == 200 + data = response.get_json() + assert data['status'] == 'success' + assert 'links' in data + assert isinstance(data['links'], list) + + +def test_create_link_success(client): + """Test creating a link definition with all required fields.""" + payload = { + 'name': 'Authors to Files', + 'source_type': 'csv', + 'source_config': { + 'csv_data': 'name,email,file_path\nAlice,alice@ex.com,file1.txt' + }, + 'target_type': 'label', + 'target_config': { + 'label': 'File' + }, + 'match_strategy': 'property', + 'match_config': { + 'source_field': 'file_path', + 'target_field': 'path' + }, + 'relationship_type': 'AUTHORED', + 'relationship_props': { + 'date': '2024-01-15' + } + } + + response = client.post('/api/links', json=payload) + assert response.status_code == 200 + data = response.get_json() + assert data['status'] == 'success' + assert 'link' in data + assert data['link']['name'] == 'Authors to Files' + assert data['link']['source_type'] == 'csv' + assert data['link']['target_type'] == 'label' + assert data['link']['match_strategy'] == 'property' + assert data['link']['relationship_type'] == 'AUTHORED' + assert 'id' in data['link'] + + +def test_create_link_missing_name(client): + """Test creating link without name fails.""" + payload = { + 'source_type': 'graph', + 'target_type': 'label', + 'match_strategy': 'property', + 'relationship_type': 'RELATED' + } + + response = client.post('/api/links', json=payload) + assert response.status_code == 400 + data = response.get_json() + assert data['status'] == 'error' + assert 'name' in data['error'].lower() + + +def test_create_link_invalid_source_type(client): + """Test creating link with invalid source_type fails.""" + payload = { + 'name': 'Bad Link', + 'source_type': 'invalid', + 'target_type': 'label', + 'match_strategy': 'property', + 'relationship_type': 'RELATED' + } + + response = client.post('/api/links', json=payload) + assert response.status_code == 400 + data = response.get_json() + assert data['status'] == 'error' + assert 'source_type' in data['error'].lower() + + +def test_create_link_invalid_target_type(client): + """Test creating link with invalid target_type fails.""" + payload = { + 'name': 'Bad Link', + 'source_type': 'graph', + 'target_type': 'invalid', + 'match_strategy': 'property', + 'relationship_type': 'RELATED' + } + + response = client.post('/api/links', json=payload) + assert response.status_code == 400 + data = response.get_json() + assert data['status'] == 'error' + assert 'target_type' in data['error'].lower() + + +def test_create_link_invalid_match_strategy(client): + """Test creating link with invalid match_strategy fails.""" + payload = { + 'name': 'Bad Link', + 'source_type': 'graph', + 'target_type': 'label', + 'match_strategy': 'invalid', + 'relationship_type': 'RELATED' + } + + response = client.post('/api/links', json=payload) + assert response.status_code == 400 + data = response.get_json() + assert data['status'] == 'error' + assert 'match_strategy' in data['error'].lower() + + +def test_create_link_missing_relationship_type(client): + """Test creating link without relationship_type fails.""" + payload = { + 'name': 'Bad Link', + 'source_type': 'graph', + 'target_type': 'label', + 'match_strategy': 'property' + } + + response = client.post('/api/links', json=payload) + assert response.status_code == 400 + data = response.get_json() + assert data['status'] == 'error' + assert 'relationship_type' in data['error'].lower() + + +def test_get_link_success(client): + """Test retrieving an existing link definition.""" + # First create a link + payload = { + 'name': 'Test Link', + 'source_type': 'graph', + 'source_config': {'label': 'Person'}, + 'target_type': 'label', + 'target_config': {'label': 'File'}, + 'match_strategy': 'property', + 'match_config': {'source_field': 'email', 'target_field': 'author'}, + 'relationship_type': 'AUTHORED', + 'relationship_props': {} + } + create_response = client.post('/api/links', json=payload) + link_id = create_response.get_json()['link']['id'] + + # Now get it + response = client.get(f'/api/links/{link_id}') + assert response.status_code == 200 + data = response.get_json() + assert data['status'] == 'success' + assert data['link']['name'] == 'Test Link' + assert data['link']['id'] == link_id + + +def test_get_link_not_found(client): + """Test retrieving non-existent link.""" + response = client.get('/api/links/nonexistent-id') + assert response.status_code == 404 + data = response.get_json() + assert data['status'] == 'error' + + +def test_update_link_success(client): + """Test updating an existing link definition.""" + # Create a link + payload = { + 'name': 'Original Name', + 'source_type': 'graph', + 'source_config': {'label': 'Person'}, + 'target_type': 'label', + 'target_config': {'label': 'File'}, + 'match_strategy': 'property', + 'match_config': {'source_field': 'email', 'target_field': 'author'}, + 'relationship_type': 'AUTHORED', + 'relationship_props': {} + } + create_response = client.post('/api/links', json=payload) + link_id = create_response.get_json()['link']['id'] + + # Update it + update_payload = payload.copy() + update_payload['id'] = link_id + update_payload['name'] = 'Updated Name' + + response = client.post('/api/links', json=update_payload) + assert response.status_code == 200 + data = response.get_json() + assert data['status'] == 'success' + assert data['link']['name'] == 'Updated Name' + assert data['link']['id'] == link_id + + +def test_delete_link_success(client): + """Test deleting a link definition.""" + # Create a link + payload = { + 'name': 'To Delete', + 'source_type': 'graph', + 'source_config': {'label': 'Person'}, + 'target_type': 'label', + 'target_config': {'label': 'File'}, + 'match_strategy': 'property', + 'match_config': {'source_field': 'email', 'target_field': 'author'}, + 'relationship_type': 'AUTHORED', + 'relationship_props': {} + } + create_response = client.post('/api/links', json=payload) + link_id = create_response.get_json()['link']['id'] + + # Delete it + response = client.delete(f'/api/links/{link_id}') + assert response.status_code == 200 + data = response.get_json() + assert data['status'] == 'success' + + # Verify it's gone + get_response = client.get(f'/api/links/{link_id}') + assert get_response.status_code == 404 + + +def test_delete_link_not_found(client): + """Test deleting non-existent link.""" + response = client.delete('/api/links/nonexistent-id') + assert response.status_code == 404 + data = response.get_json() + assert data['status'] == 'error' + + +def test_list_links_after_create(client): + """Test that created links appear in list.""" + # Get initial count + initial_response = client.get('/api/links') + initial_count = len(initial_response.get_json()['links']) + + # Create multiple links + created_ids = [] + for i in range(3): + payload = { + 'name': f'Link {i}', + 'source_type': 'graph', + 'source_config': {'label': 'Person'}, + 'target_type': 'label', + 'target_config': {'label': 'File'}, + 'match_strategy': 'property', + 'match_config': {'source_field': 'email', 'target_field': 'author'}, + 'relationship_type': f'REL_{i}', + 'relationship_props': {} + } + resp = client.post('/api/links', json=payload) + created_ids.append(resp.get_json()['link']['id']) + + # List all links + response = client.get('/api/links') + assert response.status_code == 200 + data = response.get_json() + assert data['status'] == 'success' + assert len(data['links']) == initial_count + 3 + + # Verify our links are in the list + link_names = [link['name'] for link in data['links']] + for i in range(3): + assert f'Link {i}' in link_names + + +def test_preview_link_requires_definition(client): + """Test that preview requires a valid link definition.""" + response = client.post('/api/links/nonexistent-id/preview', json={'limit': 10}) + assert response.status_code == 404 + data = response.get_json() + assert data['status'] == 'error' + + +def test_execute_link_requires_definition(client): + """Test that execute requires a valid link definition.""" + response = client.post('/api/links/nonexistent-id/execute') + assert response.status_code == 404 + data = response.get_json() + assert data['status'] == 'error' + + +def test_list_jobs_empty(client): + """Test listing jobs when none exist.""" + response = client.get('/api/links/jobs') + assert response.status_code == 200 + data = response.get_json() + assert data['status'] == 'success' + assert 'jobs' in data + assert isinstance(data['jobs'], list) + + +def test_get_job_status_not_found(client): + """Test getting status of non-existent job.""" + response = client.get('/api/links/jobs/nonexistent-job-id') + assert response.status_code == 404 + data = response.get_json() + assert data['status'] == 'error'