These are the specific frameworks and obligations relevant to your sector, not a generic GDPR checklist. Each one has a direct implication for how you govern AI use and data handling.
As the publisher, you carry controller liability for personal data in published datasets, regardless of what your suppliers told you.
PII in shared datasets creates liability for the sharing party. Technical controls at the point of publication are the appropriate measure.
Healthcare, financial, and legal datasets carry additional vertical-specific obligations on top of GDPR.
These are the specific workflows most organisations in your sector deploy first, in plain terms.
Every dataset arriving at your platform triggers a scan before it enters the publish queue. Field-level PII findings surfaced automatically. Clean datasets pass through. Findings queued for review. No manual inspection for the majority of ingest.
The VestraData SDK integrates directly into your marketplace ingestion pipeline. Each publisher tenant gets isolated scanning results and audit records. Multi-tenant architecture from the ground up.
Watch an S3 bucket, SFTP drop, or API endpoint. When a file arrives, scanning starts automatically. Governed clean copies go to the publish queue. Raw files with findings never reach customers.
REST API and OpenAPI spec. Python, Node.js, Java, and .NET clients. Asynchronous scanning with webhook callbacks. Fits inside your existing ingestion pipeline without architectural change.
Both products share the same detection engine. Most organisations in your sector start with one before adding the other.
SDK-first integration for automated PII scanning at dataset ingest. Multi-tenant isolation. Event-driven pipeline. Direct API integration with existing marketplace infrastructure.
Your team evaluates, enriches, and works with datasets using AI tools. VestraShield ensures sensitive content in those datasets doesn't reach external LLMs during evaluation or analysis workflows.
Scanning triggered on S3 event, SFTP arrival, or API upload. No polling. Webhook callbacks on completion. Designed for high-throughput ingestion pipelines.
Each publisher tenant gets isolated scanning context, results, and audit records. No cross-contamination of findings or audit data.
Python, Node.js, Java, and .NET SDK clients. OpenAPI spec for custom integration. Embed the detection engine directly inside your pipeline if needed.
GLiNER v2 handles any sector-specific entity type without retraining. Health identifiers, financial references, legal codes: all from the same engine.
Raw datasets with PII are never published. Governed clean copies generated automatically and placed in the publish queue. Full audit trail per dataset.
Complete REST API with OpenAPI 3.1 spec. Generates typed clients in any language. Integrates with your existing API gateway and authentication layer.
Every prompt your analysts send to ChatGPT, Claude, Gemini, or Copilot while evaluating or enriching datasets is intercepted before it leaves your environment.
Dataset-specific identifiers, proprietary field names, and sector-specific codes caught by zero-shot GLiNER. No model retraining required.
Different intercept rules for data engineers, analysts, and reviewers. Transform sensitive content, hard-block restricted identifiers, or audit-only for low-risk entities.
Every AI-assisted analysis session logged with entity inventory. Attributable to user and tool. Hash-chained and tamper-evident.
Covers AI use in the browser (ChatGPT, Claude.ai) and in IDE tools (Cursor, GitHub Copilot). No endpoint agent required.
Intercept runs inside your environment. Dataset content being evaluated through AI tools never reaches external infrastructure unprotected.
We connect to something real in your environment and you see actual findings. No slide decks. No fabricated data. Median time to first scan: under 4 hours from credentials.
For engineering teams building data marketplaces. SDK documentation and pipeline integration questions welcome.