Data Collection Policy

Effective May 27, 2026 · Version 1.0

1. What we collect

Data type	Source	Purpose
Public HTML pages	SEOAI-CorpusBot crawl	Compute AI-readiness signals + embeddings
Page metadata (title, h1, meta description, language)	Extracted from HTML	Structured search + benchmarking
Domain-level snapshots (page counts, language distribution)	Aggregated	Trend analysis for paying customers
Embeddings (1024-dim `bge-m3` vectors)	Computed from page text	Semantic search via Data API
Crawl timestamps (`discovered_at`, `last_crawled_at`, `fetched_at`)	Recorded by crawler	Data freshness audit trail

Personally identifiable information (names, emails, phone numbers) extracted for marketing or contact use.
Anything behind login walls, paywalls, or session tokens.
Cookies, fingerprints, or tracking from visited sites.
Form submissions, query strings with PII patterns, or POST bodies.
Content from domains that have opted out (via robots.txt, web form, or email).

Pages: 12 months rolling. Pages older than 12 months are pruned monthly.
Embeddings: tied to their source page — deleted when the page is pruned.
Snapshots: kept indefinitely (aggregate-only, anonymized at the domain level).
API usage logs: 90 days for security review, then aggregated and purged.
Opt-out request records: retained indefinitely as a compliance audit trail.

Anyone can request deletion via:

Web form — immediate, synchronous deletion.
Email opt-out@seoai.space — processed within 7 days.
Blocking SEOAI-CorpusBot in robots.txt — applied on next crawl cycle (within 30 days).

Primary database & storage: enterprise cloud infrastructure (Asia-Pacific region), encrypted at rest.
Raw HTML archive: encrypted object storage, compressed.
Embedding compute: on-demand GPU compute (ephemeral, per batch).
Email delivery: Resend.
Payments: Stripe.

GDPR: We do not process personal data of EU residents through corpus crawls. The corpus targets business-domain public pages only.
APPI (Japan): No personal data (個人情報) is retained from crawls. Business directory information falls outside APPI's personal-info scope.
Copyright / fair use: Page text is processed for non-expressive purposes (metadata extraction, embedding) consistent with Japanese Copyright Act Article 30-4 and analogous fair-use principles.

If you believe SEOAI holds personal data about you, contact privacy@seoai.space. We respond within 30 days.

発効：2026年5月27日・ Version 1.0

データ種別	取得元	目的
公開 HTML ページ	SEOAI-CorpusBot クロール	AI対応度信号＋エンベディング算出
ページメタ（title, h1, meta description, 言語）	HTML から抽出	構造化検索・ベンチマーク
ドメインスナップショット（ページ数・言語分布）	集計	有料顧客向けトレンド分析
エンベディング（1024次元 `bge-m3`）	ページテキストから算出	データAPI 経由の意味検索
クロール日時（`discovered_at`, `last_crawled_at`, `fetched_at`）	クローラー記録	データ鮮度監査

SEOAI が貴方の個人データを保持していると思われる場合は privacy@seoai.space へご連絡ください。30日以内に回答します。