Data Marketplaces | VestraData

Regulatory context

What your organisation is required to have in place.

These are the specific frameworks and obligations relevant to your sector, not a generic GDPR checklist. Each one has a direct implication for how you govern AI use and data handling.

GDPR Controller Obligations

As the publisher, you carry controller liability for personal data in published datasets, regardless of what your suppliers told you.

ICO Data Sharing Guidance

PII in shared datasets creates liability for the sharing party. Technical controls at the point of publication are the appropriate measure.

Sector-Specific Obligations

Healthcare, financial, and legal datasets carry additional vertical-specific obligations on top of GDPR.

Primary use cases

What your team gets from day one.

These are the specific workflows most organisations in your sector deploy first, in plain terms.

Automated PII scanning on dataset ingest before publication

Every dataset arriving at your platform triggers a scan before it enters the publish queue. Field-level PII findings surfaced automatically. Clean datasets pass through. Findings queued for review. No manual inspection for the majority of ingest.

Multi-tenant isolation with SDK-first integration

The VestraData SDK integrates directly into your marketplace ingestion pipeline. Each publisher tenant gets isolated scanning results and audit records. Multi-tenant architecture from the ground up.

Event-driven scanning triggered on file arrival

Watch an S3 bucket, SFTP drop, or API endpoint. When a file arrives, scanning starts automatically. Governed clean copies go to the publish queue. Raw files with findings never reach customers.

API-level integration with existing marketplace infrastructure

REST API and OpenAPI spec. Python, Node.js, Java, and .NET clients. Asynchronous scanning with webhook callbacks. Fits inside your existing ingestion pipeline without architectural change.

Where to start

Which product to deploy first, and why.

Both products share the same detection engine. Most organisations in your sector start with one before adding the other.

Lead product

VestraData

SDK-first integration for automated PII scanning at dataset ingest. Multi-tenant isolation. Event-driven pipeline. Direct API integration with existing marketplace infrastructure.

Complementary

VestraShield

Your team evaluates, enriches, and works with datasets using AI tools. VestraShield ensures sensitive content in those datasets doesn't reach external LLMs during evaluation or analysis workflows.

Key capabilities

What's covered in a standard deployment.

Event-driven scanning

Scanning triggered on S3 event, SFTP arrival, or API upload. No polling. Webhook callbacks on completion. Designed for high-throughput ingestion pipelines.

Multi-tenant isolation

Each publisher tenant gets isolated scanning context, results, and audit records. No cross-contamination of findings or audit data.

SDK-first integration

Python, Node.js, Java, and .NET SDK clients. OpenAPI spec for custom integration. Embed the detection engine directly inside your pipeline if needed.

Zero-shot PII detection

GLiNER v2 handles any sector-specific entity type without retraining. Health identifiers, financial references, legal codes: all from the same engine.

Governed clean copy pipeline

Raw datasets with PII are never published. Governed clean copies generated automatically and placed in the publish queue. Full audit trail per dataset.

OpenAPI spec

Complete REST API with OpenAPI 3.1 spec. Generates typed clients in any language. Integrates with your existing API gateway and authentication layer.

AI-assisted evaluation intercept

Every prompt your analysts send to ChatGPT, Claude, Gemini, or Copilot while evaluating or enriching datasets is intercepted before it leaves your environment.

Custom dataset entity types

Dataset-specific identifiers, proprietary field names, and sector-specific codes caught by zero-shot GLiNER. No model retraining required.

Policy engine for analyst teams

Different intercept rules for data engineers, analysts, and reviewers. Transform sensitive content, hard-block restricted identifiers, or audit-only for low-risk entities.

Session audit trail

Every AI-assisted analysis session logged with entity inventory. Attributable to user and tool. Hash-chained and tamper-evident.

Browser and IDE coverage

Covers AI use in the browser (ChatGPT, Claude.ai) and in IDE tools (Cursor, GitHub Copilot). No endpoint agent required.

Zero data egress during evaluation

Intercept runs inside your environment. Dataset content being evaluated through AI tools never reaches external infrastructure unprotected.

Every dataset you publish carries your reputation.

Your team uses AI to evaluate the same data you publish.

What your organisation is required to have in place.

What your team gets from day one.

Which product to deploy first, and why.

What's covered in a standard deployment.

Talk to engineering.