How We Handle Your Data: A Technical Overview of Privacy-First AI

A plain-English technical walkthrough of how Greybox handles attorney client data with AI: zero retention, no training, audit logs, and Rule 1.6 compliance.

An estate planning attorney pastes a client's trust agreement into ChatGPT and asks it to summarize the tax implications. The answer is good. The problem is that she just transmitted a client's full name, SSN, asset schedule, and beneficiary list into a consumer AI product that, under its default terms, can retain that input for up to 30 days, use it for abuse monitoring, and in some configurations use it to improve the model.

That is a confidentiality incident. Depending on the jurisdiction, it may also be a reportable one.

This is the single most common question I get from attorneys considering AI operations: "How is what you do different from me just using ChatGPT?" The short answer is: nearly every layer of the stack. The long answer is this document, which is the same walkthrough I give to any firm before we touch a single piece of client data.

The ethical problem with naive AI use

Every attorney reading this is bound by ABA Model Rule 1.6, the duty of confidentiality. That rule is not limited to not gossiping at cocktail parties. Comment [18] to Rule 1.6 requires attorneys to make "reasonable efforts to prevent the inadvertent or unauthorized disclosure of, or unauthorized access to, information relating to the representation of a client."

Pasting client information into a consumer AI product is, by almost any reasonable reading, inadvertent disclosure to a third party.

ABA Formal Opinion 512 (July 2024) made this explicit for generative AI: lawyers using GAI tools have duties of competence, confidentiality, communication, and supervision, and must understand what happens to client data once it is submitted to the tool. Not "trust the vendor." Understand the architecture.

Rule 1.1 (competence) was updated in 2012 to explicitly include technological competence. As of 2025, 40 states have adopted some version of the duty of technology competence. You cannot meet that duty by not asking the question.

Other regulated professions run on similar rails. CPAs operate under AICPA confidentiality rules (ET §1.700.001) and IRC §7216 for tax return information. Healthcare-adjacent professionals are governed by HIPAA's Privacy and Security Rules, which require business associate agreements (BAAs) with any vendor touching PHI. Financial advisors fall under Reg S-P and, increasingly, state privacy laws like the CCPA and the patchwork of state-level regimes that now cover 19 states.

The common thread: if you are a professional with a confidentiality obligation, the question is not whether you can use AI. It is whether you can prove, in writing, what happens to the data you send into it.

What Greybox actually does differently

Here is the architecture, layer by layer, in the order data moves through it.

What data ever touches an LLM, and what does not

The first and most important control is scoping. Not every automation needs an LLM. A lot of the work I build for a solo practice is deterministic: date calculations, file naming, status updates, calendar entries, report generation. That code runs locally or in the firm's existing tools (Clio, Microsoft 365, QuickBooks) and never sees a foundation model.

When an LLM is needed (drafting an engagement letter, summarizing a discovery production, classifying an intake), the data is minimized before it goes anywhere. Client names get replaced with placeholders. SSNs, account numbers, and addresses are redacted or tokenized. The model sees what it needs to do the job and nothing else.

In practice, for a typical estate planning automation, less than 20% of the data in a matter file ever reaches a model. The rest never leaves the firm's own systems.

Where the AI actually runs

When an LLM call is required, it runs against the Anthropic API with zero-retention enabled. Under Anthropic's Commercial Terms and its Zero Data Retention (ZDR) addendum, API inputs and outputs are not stored on Anthropic's systems beyond the lifetime of the request. There is no 30-day retention window. There is no human review queue. Anthropic's published policy is that API inputs and outputs are not used to train its models by default, and under the ZDR addendum the data does not persist at all after the response returns.

For firms with heightened requirements (typically firms handling sealed records, government matters, or clients with their own vendor policies), we can deploy against Claude on AWS Bedrock inside the firm's own AWS account. In that configuration, the model runs in a VPC the firm controls, subject to AWS's standard BAA where applicable, and inference traffic never leaves a US region.

Data residency, specifically: all inference happens in US regions. No data is routed through non-US endpoints. This matters for state bar opinions that have addressed cloud storage (at least 25 state bars have issued opinions on attorney use of cloud services, and data location is a recurring factor) and for any matter with state or federal government implications.

Encryption, in transit and at rest

Every network hop is TLS 1.3. Every piece of data stored by the automation (intake summaries, matter metadata, logs) is encrypted at rest with AES-256, using keys managed through AWS KMS in the firm's account. If the firm has its own encryption standard (some corporate practice clients do), we match it.

No client data is ever written to a laptop. Nothing syncs to a personal Dropbox. The automation runs in infrastructure the firm owns, with credentials the firm controls.

Role-based access and audit logs

Every automation action is logged. Who (or what agent) triggered it, what data was accessed, which model was called, what the response was, and what downstream action was taken. Logs are immutable and queryable.

For firms with multiple users, access is role-scoped: a paralegal sees the matters they are assigned, not the partner's conflict-sensitive files. This is not a Greybox invention. It is the same principle that underlies the access controls already built into Clio, NetDocuments, and iManage. We just apply it consistently to the automation layer on top.

If the bar calls, you can produce an audit trail showing exactly what AI touched what client file, when, and why.

No training on client data. Ever.

This is a contractual commitment, not a best effort. Anthropic's enterprise terms prohibit training on API inputs without explicit opt-in. Greybox's engagement contracts prohibit it downstream. No prompts, outputs, matter data, or derived artifacts are ever used to train a model, benchmark a product, or build a general-purpose dataset.

If you ever see a Greybox case study that references a client matter, it is because that client explicitly approved it in writing, with names and identifying facts changed.

What happens when a client asks to delete their data

Under the GDPR's right to erasure (Article 17), the CCPA's right to delete (Cal. Civ. Code §1798.105), and the growing list of comparable state statutes, a client request to delete their data triggers a defined workflow.

For a Greybox-built system:

The request is logged in the firm's matter management system as a client instruction.
Any automation-generated artifacts tied to that client (intake summaries, drafted documents still in review, classification tags) are purged from the automation layer within 30 days. In most configurations it happens within 24 hours.
Audit logs are retained (they must be, for the firm's own records), but the substantive content is tombstoned. The log shows that the action happened, not what the underlying data was.
Because inference runs with zero retention at the model layer, there is nothing to delete there. The data was never stored.
The firm receives a written confirmation of deletion they can send to the client.

The retention of the underlying client file itself is governed by the firm's ethics rules and state bar requirements, not by Greybox. We delete what the automation layer touched. The firm's file retention policy governs the rest.

What to tell your clients when they ask "do you use AI?"

You will get this question. More of your clients will ask in 2026 than asked in 2025, and more in 2027 than in 2026. A short, honest answer that most sophisticated clients find reassuring:

Yes, I use AI tools for specific tasks in my practice: drafting support, document summarization, administrative workflow. Every tool I use is deployed in a way that preserves your confidentiality: no consumer AI products, no training on your data, encrypted in transit and at rest, US-hosted, with audit logs. I am happy to walk you through the details if you would like.

That answer is only truthful if the infrastructure underneath it is actually built that way. That is the part we handle.

Our commitment, in plain terms

Six commitments, no legalese:

Your client data is not training data. Not ours, not Anthropic's, not anyone's.
Zero retention at the model layer. Inputs and outputs are not stored by the foundation model provider.
US-only data residency. No inference, no storage, no processing outside the US.
Encrypted end to end. TLS 1.3 in transit, AES-256 at rest, keys in infrastructure you control.
Full audit trail. Every AI action is logged, attributable, and producible on request.
Deletion on request. If a client asks you to delete their data, the automation layer does it within 30 days, usually within 24 hours.

If you are an attorney evaluating AI for your practice and you want to see exactly how this architecture would apply to your firm, specifically to the tools you already use and the matters you already handle, I run a privacy walkthrough call. It is 45 minutes, it is free, and it ends with a written summary you can show a client, a malpractice carrier, or the bar.

Book a privacy walkthrough: greyboxsystems.ai/contact

The duty of confidentiality did not get easier in the AI era. The people who take it seriously will keep the clients who pay attention.