DocuFlag

Data Protection Impact Assessment

GDPR Article 35 — Last updated: May 2026

This assessment evaluates the data protection risks of DocuFlag's AI-powered document analysis service. It is conducted in accordance with Article 35 of the General Data Protection Regulation (EU) 2016/679.

1. Description of processing operations

1.1 Nature of processing

DocuFlag assists users in verifying that documents meet published consulate requirements for Schengen visa applications. The user is either an individual self-help applicant working on their own application, or an Enterprise customer (e.g. a visa agency) acting as data controller for the visa applicants whose documents they process. The service uses AI (large language models) to extract structured data from uploaded documents and compare it against a requirements database.

1.2 Roles

  • Data controller: The user (an individual self-help applicant for their own data, OR an Enterprise customer processing applications on behalf of others)
  • Data processor: DocuFlag (operated by DocuFlag)
  • Sub-processor: Google Vertex AI (EU endpoint) for AI-powered document analysis
  • Sub-processor: Stripe for payment processing

1.3 Data flow

During analysis, documents are processed using an EU-hosted relay architecture that ensures original document content is never written to disk by the analysis pipeline. After analysis, the case bundle is persisted to encrypted-at-rest cloud storage (the durable source of truth) and mirrored to a local browser cache for fast page loads:

  1. The user adds a document in the browser
  2. The browser sends the document over TLS to DocuFlag's EU-hosted analysis server, which forwards it to Google's Vertex AI EU endpoint for analysis
  3. The document is processed in-memory only by the analysis server — never written to disk, logged, or cached by the analysis pipeline
  4. Vertex AI processes the request within EU infrastructure with zero data retention
  5. The structured analysis result (field extractions, compliance observations) is returned to the browser
  6. When cloud sync is on (the default), the combined case bundle (document bytes + structured results) is encrypted at rest in our European cloud storage under a Cloud KMS-wrapped key. When cloud sync is off, no server-side copy is created and the bundle lives only in the browser’s local cache
  7. A local browser cache (IndexedDB) mirrors the cloud copy on each device for fast page loads; the durable record is the cloud copy

1.3a Storage modes (cloud sync is a user-controllable toggle)

The registration form has a single “Save my cases to the cloud” checkbox (default ON). When ticked the account stores case data server-side under encryption with keys held in Google Cloud KMS; when unticked, no server-side copy of any case is kept (the user’s browser cache is the only persistent copy). The choice is reversible from Settings at any time; disabling cloud sync wipes the cloud copy and schedules cryptographic erasure of the organisation’s KEK. The two pathways are described separately below because their threat models, recoverability properties, and DocuFlag-as-controller posture differ.

Cloud sync ON (default):

  1. Cloud-encrypted storage is provisioned automatically at organisation creation. The KMS key is minted on first sign-in and the org's cloudStorageEnabled flag is flipped to true in the same transaction. No user toggle
  2. The user's browser uploads application data over TLS to DocuFlag's server
  3. The server generates a fresh per-application AES-256-GCM data encryption key (DEK), encrypts the application data under it with a fresh IV, and discards the DEK from memory after the request
  4. Before discarding, the DEK is wrapped via Google Cloud KMS Encrypt under a per-organisation key encryption key (KEK) hosted in europe-west1; the wrapped DEK is persisted alongside the blob row
  5. The ciphertext is uploaded to self-managed object storage on EU infrastructure
  6. To download: server fetches the ciphertext, calls Cloud KMS Decrypt on the wrapped DEK to recover the DEK, AES-GCM-decrypts the body, streams plaintext over TLS back to the user
  7. The KEK material lives isolated in Cloud KMS; DocuFlag engineers cannot extract it. The application service account holds cloudkms.cryptoKeyEncrypterDecrypter only — this scope permits Encrypt/Decrypt under the application identity but does not permit exfiltration of the KEK material itself
  8. Every Encrypt/Decrypt call is logged in Google Cloud Audit Logs (a separate write-only stream) and in DocuFlag's own audit trail
  9. DocuFlag CAN technically decrypt this data when an authorised user authenticates, and we will if compelled by a court order. This is the trade-off for working password reset
  10. Each blob carries a 180-day TTL reset on every access (upload OR download). Active cases stay indefinitely; cases untouched for 180 consecutive days are auto-deleted under GDPR Art. 5(1)(e) storage-limitation

Cloud sync OFF (user opts out at registration or via Settings):

  1. No Cloud KMS key is provisioned for the organisation. cloudStorageEnabled is false and kmsKeyName is null
  2. Documents are uploaded from the browser over TLS to DocuFlag's EU analysis server for AI processing, in-memory only, and discarded as soon as the response is returned
  3. The structured analysis result is written to the browser's IndexedDB cache. No server-side blob is created
  4. Clearing browser data or switching devices loses the case. Re-enabling cloud sync from Settings provisions a fresh KMS KEK and uploads the cases currently in the local cache

1.4 Categories of personal data

  • Identity data: Passport data pages (name, date of birth, nationality, passport number, photograph)
  • Financial data: Bank statements (account numbers, balances, transaction history)
  • Employment data: Employment letters (employer name, salary, position)
  • Travel data: Flight itineraries, hotel bookings, travel insurance certificates
  • Correspondence: Cover letters, invitation letters

1.5 Special category data (Article 9)

Documents users upload may contain personal data of special categories under Article 9. Specifically:

  • Passport biographical-data pages include a facial photograph and other biographical detail (place of birth, nationality). Nationality is not, on its own, special-category data under ICO guidance, but a photograph IS personal data and IS handled with care.
  • Other documents may incidentally contain Article 9 data — for example a marriage certificate showing religious ceremony, an employment letter from a faith-based employer, or a medical certificate for a travel-insurance claim.

The biometric-data question. Under Article 4(14), “biometric data” means personal data resulting from specific technical processing relating to physical characteristics which allow or confirm the unique identification of that natural person. The italicised clause is purpose- restrictive. DocuFlag’s analysis pipeline performs vision-model field extraction (passport number, names, dates, machine-readable-zone characters) for the purpose of comparing those fields against published consulate requirements. It does not generate facial templates, perform facial recognition, match faces against any gallery, or attempt to authenticate identity by face.

Per the ICO’s March 2024 biometric-recognition guidance, a photograph is not biometric data within Article 9(1) unless it has undergone specific technical processing aimed at unique identification of a natural person. DocuFlag’s purpose is content extraction, not unique identification, so on the controlling reading the passport-photograph processing does not enter Article 9(1). We treat this as a fact-sensitive call: if a supervisory authority took a different view, the safeguards described in section 5 (no template generation, no matching, transient processing only, optional cloud-sync-off mode that keeps case content out of server-side storage entirely) remain in place and would mitigate the residual risk to data subjects.

For the documents that may incidentally contain Article 9 data (marriage certificates, faith-based employment letters, medical certificates), the lawful basis is Article 9(2)(a) explicit consent given at the moment the user uploads the specific document for the specific purpose of visa-application review.

2. Necessity and proportionality

2.1 Purpose

The processing enables a user to efficiently verify that the relevant documents (their own, on the self-help path; or a visa applicant’s, on the Enterprise path) meet published consulate requirements. The AI extracts factual data and compares it against an official requirements database, producing structured observations — not recommendations or predictions about visa outcomes.

2.2 Necessity

Manual document comparison is time-consuming and error-prone. AI-assisted extraction reduces the risk of overlooking discrepancies between documents and published requirements, improving the quality of application preparation.

2.3 Proportionality

  • Only documents voluntarily uploaded by the user are processed
  • During analysis, original documents transit through our infrastructure in-memory only and are never written to disk by the analysis pipeline (neither our analysis relay nor Vertex AI)
  • Vertex AI processes documents transiently with zero data retention
  • After analysis, the case bundle is handled per the user’s cloud-sync preference. When ON (the default), the data is encrypted at rest in our EU cloud storage server-side under Cloud KMS and DocuFlag can decrypt under your authenticated session. When OFF, no server-side copy is created and the bundle lives only in the browser’s local cache
  • No decision producing legal or similarly significant effects under UK GDPR Article 22 is taken by DocuFlag; observations are decision support that the user reviews, and the only entity that decides whether the visa is issued is the consular authority

2.4 Legal basis

  • Article 6(1)(b): Processing is necessary for the performance of the contract betweenDocuFlag and the user (the individual self-help applicant for their own data, OR the Enterprise customer acting as data controller).
  • Article 6(1)(f): Audit-trail logging and abuse / bulk-decrypt anomaly detection are processed on the basis of legitimate interests in security monitoring and Article 32 compliance, balanced against the rights of data subjects.
  • Article 6(1)(c): Disclosures to UK or EU authorities under valid legal process; retention of audit records to meet the Limitation Act 1980 s.5 limitation period for contract-related claims.
  • Article 9 special-category data. See section 1.5 for the analysis. Where Article 9 applies (incidentally, for documents that contain Article 9 data such as marriage certificates or faith-based employment letters), the lawful basis is Article 9(2)(a) explicit consent given at the time the user chooses to upload that specific document for the specific purpose of visa-application review.

3. Risk assessment

The following risks have been identified and assessed for likelihood and impact on data subjects' rights and freedoms.

RiskLikelihoodImpactMitigation
Document data stored by DocuFlagLowHighDocuments transit through DocuFlag's EU-hosted analysis server in-memory only — never written to disk, logged, or cached. The application database only stores structured analysis results, not original document content.
Google stores or trains on document dataLowHighGoogle's Vertex AI EU endpoint operates with zero data retention. A Google Cloud DPA (covering Vertex AI) is in place with Vertex AI data retention controls configured at project level. API data is not used for model training.
Unauthorized access to analysis resultsMediumMediumHTTPS (TLS 1.2+) for all data in transit. Short-lived JWT tokens (5-minute expiry) for analysis session authentication. Role-based access control per organization.
Prompt injection extracts PII from documentsLowMediumSystem prompt enforces factual extraction only. Output is validated against a strict JSON schema. The model is instructed to never include data not present in the uploaded document.
API key abuse by malicious userMediumLowAPI keys are stored on the EU analysis server and never exposed to the browser. Rate limits and spend caps on the Vertex AI project. Per-organization credit system.
Storage breach (cloud-sync backup)LowLowPer-application AES-256-GCM data key wrapped via Google Cloud KMS under a per-organisation KEK. KEK material lives isolated in Cloud KMS; an attacker with object-storage access alone holds opaque ciphertext only. An attacker with database access additionally needs valid Cloud KMS Decrypt permission to recover plaintext — IAM-scoped to the application identity only. Every Decrypt call is logged in Google Cloud Audit Logs. 180-day TTL limits exposure window. Users can delete cloud data at any time.
Cross-border data transfer (US-headquartered sub-processor parents)LowHighStorage and processing of DocuFlag data takes place exclusively in EU regions (europe-west1 for Google services; OVHcloud EU for compute and object storage). Google LLC, Stripe, and Cloudflare are US-headquartered with EU operating subsidiaries; the residual concern is that a US parent may be compelled (CLOUD Act / FISA 702) to disclose data its EU subsidiary processes. Mitigations: (i) the Google Cloud DPA incorporating EU SCCs Module 2 and the UK Addendum, (ii) the EU-US Data Privacy Framework (UK Extension), (iii) a documented Transfer Risk Assessment covering supplementary measures (Cloud KMS with IAM-isolated KEK, IAM least-privilege splitting runtime and admin service accounts, EU region pinning, audit logging, optional cloud-sync-off mode for users who want zero server-side persistence of case content). The remaining residual risk is documented in section 7.
Server-side decryption capability (compromised application identity or unlawful operator access)LowHighThe cloud-sync mode allows DocuFlagto decrypt under an authenticated user session; that is the trade-off for working password reset. The risk is that a compromised runtime service account (or insider) decrypts data outside legitimate user-driven flows. Mitigations: (i) IAM split — the runtime service account holds only Encrypt/Decrypt permission, never cryptoKeys.create or cryptoKeys.destroy; (ii) every Cloud KMS Decrypt call is logged in Google’s Data Access audit stream and our own audit table; (iii) app-side rate-limit on the download path (30/min/user) caps a leaked-SA grind; (iv) standing operational commitment to a Cloud Monitoring alert on >60 Decrypts/min/principal sustained for 5 minutes, provisioned via the bootstrap script the operator runs against the production project; (v) users who want to remove the server-side decryption capability entirely can disable cloud sync from Settings, which wipes the cloud copy and schedules cryptographic erasure of the organisation’s KEK.
Sub-processor (Cloud KMS) failure or unavailabilityLowMediumCloud KMS unavailability prevents both wrap and decrypt operations — users cannot upload or download cloud-stored cases during the outage. The local browser cache (IndexedDB) preserves read access to recently-viewed cases until KMS recovers. Cloud KMS’s own SLA covers the rest. No data loss.

4. Measures and safeguards

  • EU relay processing: Documents go from the user's browser through DocuFlag's EU-hosted analysis server to Google's Vertex AI EU endpoint. The analysis server processes documents in-memory only — no content is written to disk, logged, or cached. DocuFlag's servers only receive structured analysis results.
  • Zero data retention at sub-processor: Google's Vertex AI EU endpoint does not store API requests or responses at rest. Covered by Google Cloud DPA (covering Vertex AI) with Vertex AI data retention controls configured at project level.
  • Encryption: All data in transit is encrypted via TLS 1.2+. Analysis sessions use short-lived JWT tokens (5-minute expiry) for authentication.
  • Access control: Multi-tenant organization model with role-based access (Owner, Admin, Member). Session-based authentication via NextAuth.
  • Audit logging: All document analysis events are logged with timestamps, action types, actor IDs, and HMAC-hashed IP prefixes. The hashed IP prefixes are pseudonymous personal data under UK GDPR Recital 26 (still personal data, but not directly identifying); no plaintext PII is included.
  • Data subject rights: Data subjects can exercise their rights (access, rectification, erasure, portability) through the data controller (typically Enterprise customer). DocuFlag supports case deletion and data export functionality.
  • Data minimization: Only the minimum data necessary for compliance checking is processed. Analysis results use structured field names rather than reproducing full document content.
  • Regular review: This DPIA is reviewed annually or whenever there is a material change to the processing operations described above.

5. Special category data (Article 9) — safeguards

Section 1.5 explains why we conclude that vision-model field extraction of passport images does not enter Article 9(1) on the controlling reading of ICO 2024 biometric guidance. This section documents the safeguards that apply regardless of whether a regulator takes a different view, and the safeguards that apply to documents incidentally containing other Article 9 data.

  • No biometric identification. DocuFlag does not perform facial recognition, biometric matching, or identity verification. The vision model extracts text fields (passport number, names, dates, machine-readable-zone characters) for comparison against published consulate requirements. No facial template is generated; no gallery is consulted; no face is matched to a person.
  • Transient analysis processing. When you trigger analysis, the document image is sent from your browser through DocuFlag’s EU analysis relay (in-memory only, never written to disk) to Google’s Vertex AI EU endpoint, which operates with zero data retention — API requests and responses are not stored at rest on Google’s servers and are not used for model training.
  • Storage is encrypted at rest by default. When cloud sync is on (the default), a copy of your case bundle (which includes the uploaded files) is stored on EU object storage. The storage is encrypted at rest under AES-256-GCM with the data key wrapped via Cloud KMS (the runtime service account holds Encrypt/Decrypt only, never key-export or destroy permissions); the server can decrypt only under your authenticated session. Users may disable cloud sync from Settings, in which case no server-side copy is kept.
  • For documents incidentally containing other Article 9 data (e.g. marriage certificates, faith-based employment letters, medical certificates): the lawful basis is Article 9(2)(a) explicit consent given at the time the user chooses to upload the specific document for the specific purpose of visa-application review.
  • Enterprise controller responsibility. Enterprise customers acting as data controllers on behalf of others are responsible for ensuring that the data subjects whose documents they upload are informed and (where applicable) have consented under local law.

6. Sub-processors

Sub-processorPurposeData processedLocationDPA
Google Vertex AI (EU endpoint)Document analysisDocuments (transient, zero retention)EUYes
OVHcloud (compute / VPS hosting)EU VPS for analysis proxy + databaseDocuments (transient, in-memory) + case metadataEUYes
OVHcloud (object storage, self-managed)Encrypted cloud-backup blob storage (storage layer only sees ciphertext)Encrypted blobs only (storage layer cannot decrypt)EUYes
Google Cloud KMS (europe-west1)Per-org KEK custody for cloud-sync backupWrapped DEKs only; KEK material is isolated and never leaves Cloud KMSEUYes
StripePayment processing for credit-pack purchasesPayment details (no document data)US/EU (UK + EU + US under SCCs / DPF)Yes
CloudflareCDN, TLS termination, DDoS protectionTLS-decrypted request metadata + bodies in transit; nothing persisted at the edge for authenticated routesGlobal (EU edge presence)Yes
Backblaze B2 (off-site backup)Encrypted off-site copies of the PostgreSQL databaseage-encrypted ciphertext only; B2 cannot decryptEUYes
Email transactional provider (SMTP)Magic-link sign-in + security notificationsEmail address + message bodyEUYes

7. Conclusion and residual risks

The processing operations described above present manageable risks to data subjects’ rights and freedoms when considered alongside the safeguards in sections 4 and 5. When cloud sync is on (the default), case data is encrypted at rest in our EU cloud storage under software-protected Cloud KMS keys. Users may disable cloud sync from Settings at any time; doing so wipes the cloud copy and schedules cryptographic erasure of the organisation’s KEK. The analysis flow processes documents in-memory and does not persist them. The measures described above are considered adequate to mitigate the identified risks. No prior consultation with the ICO under Article 36 is triggered because the residual risk is not assessed as high after the safeguards are applied.

Residual risks documented for transparency:

  • Transient processing of passport photographs by Google Vertex AI. On the controlling reading of ICO 2024 biometric guidance this is not biometric data processing under Article 9 (section 1.5). The safeguards that apply regardless: zero data retention at Vertex AI, no facial template generation, no matching, no identity verification, transient in-memory processing only. Should a supervisory authority take a different view, the safeguards remain in place.
  • Server-side decryption capability under cloud-sync mode. DocuFlagcan decrypt cloud backups under an authenticated user session. That is the deliberate trade-off for working password reset and is disclosed clearly to users at registration. Mitigations: Cloud KMS KEK custody (IAM-isolated), IAM-split runtime/admin service accounts, audit logging on every Decrypt, app-side rate limit, alerting policy on bulk-decrypt anomalies. Users who prefer no server-side persistence at all can disable cloud sync from Settings, which wipes the cloud copy and schedules cryptographic erasure of the organisation’s KEK.
  • US-parent sub-processors (Google LLC, Stripe, Cloudflare). Storage and processing happen in EU regions, but the parent corporations are US-headquartered. Mitigations: EU SCCs (Module 2) + UK Addendum + EU-US Data Privacy Framework (UK Extension); Transfer Risk Assessment on file documenting supplementary measures (Cloud KMS with IAM-isolated KEK, IAM least-privilege, EU region pinning, audit logging, optional cloud-sync-off mode that keeps case content out of server-side storage entirely).

Review cadence. This assessment is reviewed annually and whenever there is a material change to the processing operations or sub-processor list.

Contact

For questions about this assessment, contact [email protected].