What AI Companies Do with Your Data — the Plain Truth

Key Takeaways

  • All six major AI companies — OpenAI, Google, Anthropic, Meta, Microsoft, and Amazon — collect and use your conversations to train their models by default. Opt-outs exist for most, but they're buried in settings.
  • It's not just your chats. Multi-product companies like Google and Meta merge your AI conversations with search history, social media activity, and purchase data to build comprehensive profiles.
  • Anthropic (Claude) has shifted its stance. Previously the privacy leader, Anthropic now uses conversations for training by default on consumer plans, with a 30-day retention for opt-outs and up to 5 years for those who opt in.
  • Children's data is poorly protected. Most AI companies don't take meaningful steps to filter out conversations from minors.
  • The practical takeaway: Treat every AI chatbot like a public conversation. Opt out of training where possible. Never share personal identifiers, financial details, or confidential information.

The Uncomfortable Truth

A Stanford University study analyzed the privacy policies of the six largest AI companies in America: Amazon, Anthropic, Google, Meta, Microsoft, and OpenAI. The finding was stark: all six use your conversations to train their models. By default. Without meaningfully asking.

This isn't a conspiracy theory or a hot take. It's in the terms of service that 200+ million users agreed to without reading. What we're seeing across the industry is a pattern: collect first, offer opt-outs later, and make the opt-outs just inconvenient enough that most people never find them.

The question isn't whether AI companies are using your data. They are. The question is how much they're collecting, what else they're combining it with, and what realistic steps you can take to protect yourself.

What Gets Collected (Beyond Your Chats)

When you think about AI data collection, you probably imagine your conversation text being stored somewhere. That's true — but it's only the beginning.

The Obvious Stuff

  • Conversation content: Every prompt and every response, stored on company servers
  • Account information: Name, email, phone number, payment details
  • Usage metadata: When you use the service, how often, for how long, which features

The Less Obvious Stuff

  • Device information: Your browser type, operating system, screen resolution, IP address
  • Location data: Some platforms (Meta AI, Gemini) collect precise location data and physical addresses
  • Cross-platform data: If you use Google's Gemini, Google can link your AI conversations to your search history, YouTube views, Gmail content, Maps activity, and purchase history from Google Pay
  • Uploaded files: Documents, images, and spreadsheets you share with the AI for analysis
  • Voice data: If you use voice features, audio recordings may be stored and reviewed

The Merger Problem

Single-product AI companies (like Anthropic) know what you tell their chatbot. Multi-product companies (like Google and Meta) know much more. When your AI conversation — "I've been feeling anxious about my health recently" — gets merged with your search history ("symptoms of heart disease"), your YouTube views (health anxiety videos), and your location data (frequent hospital visits), the resulting profile is remarkably intimate.

This isn't hypothetical. Privacy analysis firm Incogni found that multi-product AI platforms are creating user profiles that tag individuals with health vulnerabilities, financial stress indicators, and relationship status — all inferred from the combination of AI chats and other data sources.

Digital data streams flowing into a central server representing AI company data collection from multiple sources
AI companies don't just store your chats — multi-product giants merge them with search, social, and purchase data to build intimate profiles.

Company by Company: Who Does What

CompanyTrains on Your Data?Opt-Out?RetentionShares with 3rd Parties?
OpenAI (ChatGPT)Yes, by defaultYes (Settings toggle)30 days after deletionLimited (contractors)
Google (Gemini)Yes, by defaultYes (activity controls)Up to 36 monthsYes (Google product network)
Anthropic (Claude)Yes, by defaultYes (opt-out 30 day retention, opt-in 5 years)30 days – 5 yearsYes (limited)
Meta (Meta AI)Yes, by defaultNo clear opt-outUnclearYes (research, corporate)
Microsoft (Copilot)Varies by planEnterprise: yesPlan-dependentLimited
Amazon (Alexa/Q)Yes, by defaultPartialUnclearYes (Amazon product network)

OpenAI (ChatGPT)

OpenAI is the most transparent about its practices, partly because it's faced the most scrutiny. On free and Plus plans, conversations are used for training unless you toggle off "Improve the model for everyone" in Settings. OpenAI also operates a monitoring system that scans conversations for potentially harmful content and can escalate them to human reviewers. Enterprise and Business plans don't train on your data by default.

Google (Gemini)

Google quietly enabled a "Personal Content" feature that uses conversations for model training. If you're considering Gemini, our 30-day Gemini review covers the user experience beyond privacy. This is concerning because Gemini isn't an isolated product — it's connected to your entire Google identity. Your Gemini conversations exist in the same profile as your searches, emails, and location history. Google retains Gemini conversation data for up to 36 months, which is the longest retention window among major providers.

Anthropic (Claude)

Anthropic was once considered the privacy leader among AI companies. That position has softened. Claude now trains on user conversations by default on consumer plans. The nuance: if you opt out, data is retained for 30 days. If you explicitly opt in to contribute training data, Anthropic retains it for up to 5 years. Business and Enterprise plans offer stronger protections.

Meta (Meta AI)

Meta AI, integrated into WhatsApp, Instagram, and Facebook, has the weakest privacy position. There appears to be no meaningful way to opt out of training data collection. Meta shares user names, email addresses, and phone numbers with external entities. Given Meta's track record with data handling, this should surprise no one — but it's worth stating clearly.

How Your Data Trains the Next Model

Understanding the training process helps you evaluate how much risk your data actually carries.

The Pipeline

  1. Collection: Your conversations are stored on the company's servers.
  2. Filtering: Some companies claim to strip personal identifiers before training. The effectiveness of this filtering varies and is rarely independently verified.
  3. Annotation: Human reviewers may read and rate conversations to create training signals. These reviewers are typically contractors bound by NDAs.
  4. Training: Filtered conversations become part of a massive dataset used to train the next model version. Your individual conversation becomes one data point among billions.
  5. Deployment: The new model goes live. It doesn't contain your words verbatim — it has learned statistical patterns from them.

The Real Risk: Not Reproduction, but Inference

The common fear — that AI will repeat your exact words to someone else — is technically possible but extremely unlikely. The real risk is inference. If thousands of people share salary information with ChatGPT, the model learns salary patterns. It doesn't reveal your specific salary, but it can make increasingly accurate inferences about salary ranges by job title, location, and company — information derived from the aggregate of individual contributions.

For a deeper understanding of the specific privacy settings available in ChatGPT, our ChatGPT safety guide walks through every toggle and setting.

The Opt-Out Reality Check

Every company that offers an opt-out describes it as simple. The reality is more complicated.

What Opting Out Actually Does

It stops future training: New conversations won't be used to improve models.

It doesn't erase past data: Conversations you had before opting out may already be in training datasets. You can't un-train a model.

It doesn't stop storage: Your conversations are still stored for abuse monitoring, legal compliance, and service operation. The opt-out covers training use, not data retention.

It doesn't stop human review: Safety monitoring may still involve human reviewers reading your conversations, even with training opted out.

How to Opt Out (Quick Guide)

PlatformHow to Opt Out
ChatGPTSettings → Data Controls → "Improve the model for everyone" → Off
Geminimyactivity.google.com → Gemini Apps Activity → Turn off
ClaudeSettings → Privacy → Opt out of training data
Meta AINo clear opt-out available; submit deletion request in some regions
CopilotEnterprise admin controls; consumer plan varies
Privacy settings interface on a screen showing data control toggles representing AI platform opt-out options
Opting out of AI training stops future data use — but doesn't erase what's already been collected or stop your conversations from being stored.

What You Can Actually Do About It

For Individual Users

  1. Opt out of training on every platform you use. Do it now — it takes 30 seconds per platform. It won't retroactively protect past conversations, but it protects future ones.
  2. Use Temporary/Ephemeral Chat modes. ChatGPT's Temporary Chat and Claude's similar feature create conversations that aren't stored in your history and aren't used for training.
  3. Never share personal identifiers. No Social Security numbers, no passwords, no bank details, no medical records, no photos of identification documents. This seems obvious, but Stanford's research shows millions of users do it daily.
  4. Assume cross-referencing. If you use Google products, assume your Gemini conversations are linked to your Google profile. If you use Meta products, assume your Meta AI conversations are linked to your Facebook/Instagram profile. Act accordingly.
  5. Review and delete conversation history regularly. Delete conversations that contain sensitive information. Remember that deletion takes 30+ days to take effect on most platforms.

For Businesses

  1. Use enterprise plans. Consumer plans are not appropriate for any business data. Enterprise plans from OpenAI, Anthropic, and Google offer contractual data protection guarantees, SOC 2 compliance, and admin controls.
  2. Create clear AI usage policies. Define what data employees can and cannot share with AI tools. Train employees on the policy. Most corporate AI data leaks aren't malicious — they're accidental.
  3. Consider self-hosted alternatives. Open-source models like Llama and Mistral can run on your own infrastructure, keeping all data in-house. The capability gap with cloud models is narrowing. For a comparison, our review of ChatGPT alternatives includes several privacy-focused options.

Frequently Asked Questions

If I opt out of training, does OpenAI still store my conversations?

Yes. Opting out of training prevents your conversations from being used to improve future models, but OpenAI still stores them for up to 30 days after deletion for abuse monitoring and legal compliance. The storage and the training are separate — opting out of one doesn't affect the other.

Is Claude really more private than ChatGPT?

It was. Anthropic's initial position was not to train on user data at all. That has shifted — Claude now trains on consumer conversations by default. The key difference: Claude's opt-out retention is 30 days vs. potentially 36 months for Google Gemini. Claude also doesn't merge your data with a broader product suite the way Google does. But the gap between Claude and ChatGPT on privacy has narrowed considerably.

Can AI companies identify me specifically from my conversations?

Yes. Your conversations are tied to your account (email, phone number). Even without an account, your IP address, device fingerprint, and browsing patterns can identify you. And if you mention your name, job, location, or other details in conversation — which most people do — identification becomes trivial.

What about the EU? Don't GDPR rules protect European users?

GDPR provides stronger protections than U.S. law. European users have the right to data access, deletion, and objection to processing. This has forced AI companies to offer more comprehensive opt-outs in Europe than in the U.S. However, enforcement is slow, and the practical effectiveness of GDPR against AI training practices is still being tested in courts. Italy temporarily banned ChatGPT in 2023 over privacy concerns; other EU actions are ongoing.

Should I stop using AI chatbots entirely?

That's an overreaction for most people. AI chatbots are useful tools. The smart approach is informed use: opt out of training, use ephemeral modes for sensitive topics, never share personal identifiers, and choose your platform based on its privacy practices. The risk isn't existential — it's about building habits that protect you as the technology evolves.

Are there AI tools that don't collect my data at all?

Self-hosted open-source models (Llama, Mistral, etc.) running on your own hardware don't send data anywhere. For cloud tools, some smaller providers offer stronger privacy guarantees, but always verify claims against their actual privacy policy. No major cloud-based AI chatbot currently offers true zero-data-collection for free users.

Sources & References

Subscribe to AI Log

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe