Item Master 專案有 60+ 已完成的通路爬蟲,但全部是手動一次性觸發,無法偵測通路何時上架新品。Codex review 確認方向:用 changedetection.io 當「眼睛」監控分類頁/sitemap 變化,webhook 通知 n8n,Phase 0 只通知不自動抓,人工驗證 7 天後再談自動化。
changedetection.io (Docker, plain HTTP)
監控 10 通路 sitemap/分類頁
↓ webhook (POST JSON, shared secret)
n8n (existing, same Docker network)
validate → normalize → 30min dedup → audit log → 通知
n8n_n8n-internal 網路(已驗證
172.20.0.0/16)http://n8n:5678/webhook/cdio-notify?secret=<TOKEN>127.0.0.1:5000,僅 Tailscale SSH tunnel
可存取| # | Channel | 監控 URL | 類型 | 間隔 | 已驗證 |
|---|---|---|---|---|---|
| 1 | pxbox | https://pxbox.es.pxmart.com.tw/SiteMap/product_sitemap_index.xml |
XML sitemap | 6h | ✅ |
| 2 | carrefour | https://online.carrefour.com.tw/.../Search-UpdateGrid?cgid=root&sz=24&start=0 |
HTML fragment | 6h | (待驗) |
| 3 | costco | https://www.costco.com.tw/rest/v2/taiwan/products/search?query=*&pageSize=1¤tPage=0 |
JSON API | 6h | (待驗) |
| 4 | rakuten | https://www.rakuten.com.tw/sitemap/sitemap_item0.xml.gz |
gzip XML | 12h | (待驗) |
| 5 | tomods | https://www.tomods.com.tw/sitemap.xml |
XML sitemap | 12h | (待驗) |
| 6 | daiso | https://shop.daiso.com.tw/sitemap.xml |
XML sitemap | 12h | ✅ 200 |
| 7 | yourchance | https://yourchance.app/sitemap.xml |
XML sitemap | 12h | ✅ 200 |
| 8 | ckcare | https://www.ck-care.com.tw/zh-TW/sitemap.xml |
XML sitemap | 12h | ✅ 200 |
| 9 | matsukiyo | https://www.matsumotokiyoshi-tw.com/store-products-sitemap.xml |
XML sitemap | 12h | ✅ 200 |
| 10 | homeda | https://www.homeda888.com.tw/all1.htm |
HTML listing | 12h | ✅ 200 |
排除說明:
建立
/home/ubuntu/docker/changedetection/docker-compose.yml
name: changedetection
services:
changedetection:
image: dgtlmoon/changedetection.io:latest
container_name: changedetection
restart: unless-stopped
mem_limit: 512m
cpus: 1.0
pids_limit: 256
ports:
- "127.0.0.1:5000:5000"
volumes:
- ./data:/datastore
environment:
- TZ=Asia/Taipei
- PORT=5000
- PUID=1000
- PGID=1000
healthcheck:
test: ["CMD-SHELL", "wget -q --spider http://127.0.0.1:5000 || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
logging:
driver: json-file
options:
max-size: "10m"
max-file: "3"
networks:
- n8n_n8n-internal
labels:
- "com.centurylinklabs.watchtower.enable=true"
networks:
n8n_n8n-internal:
external: true設計決策:
127.0.0.1:5000 不對外暴露,透過
ssh -L 5000:127.0.0.1:5000 100.73.24.43 存取n8n_n8n-internal 讓 webhook 走容器內部 DNS# 建目錄 + 寫 compose
mkdir -p ~/docker/changedetection
# 啟動
cd ~/docker/changedetection && docker compose pull && docker compose up -d
# 驗證健康
docker ps --filter name=changedetection
docker logs changedetection --tail 20
# 驗證內部網路可達 n8n
docker exec changedetection wget -q -O- http://n8n:5678 2>&1 | head -3名稱:CDIO Phase0 — Change Notify
Webhook Trigger → Validate Secret → Parse & Normalize → Dedup (fs-based, 30min) → Audit Log → 通知
openssl rand -hex 16 # → 寫入 changedetection 全域通知 URLjson://n8n:5678/webhook/cdio-notify?secret=<SECRET>
fs 模組(已確認
NODE_FUNCTION_ALLOW_BUILTIN=fs,net)/home/node/.n8n/cdio-dedup.json,30 分鐘 TTL/home/node/.n8n/cdio-audit.jsonl(host:
~/docker/n8n/data/cdio-audit.jsonl){ts, channel, url, diff_preview}透過 changedetection UI 逐一新增:
$.pagination.totalResults# 容器健康
docker inspect --format='{{.State.Health.Status}}' changedetection
# 觸發次數
wc -l ~/docker/n8n/data/cdio-audit.jsonl
# 通路分佈
cat ~/docker/n8n/data/cdio-audit.jsonl | python3 -c "
import json, sys
from collections import Counter
c = Counter()
for line in sys.stdin:
c[json.loads(line)['channel']] += 1
for ch, n in c.most_common():
flag = ' ⚠️' if n > 20 else ''
print(f' {ch}: {n}{flag}')
"| 問題 | 處理 |
|---|---|
| Watcher 觸發太頻繁(噪音) | 加 CSS/XPath filter 排除動態內容 |
| Watcher 從未觸發 | 確認頁面是否真的是靜態、interval 是否合理 |
| Akamai 封鎖 (Costco/Carrefour) | 降頻至 24h,或暫時移除 |
| Rakuten gzip 解壓失敗 | 改用非壓縮 URL 或手動驗證 |
~/INBOX.md:changedetection 項目標記「Phase 0 deployed,
7-day trial started YYYY-MM-DD」~/VPS-ALL.md:JP 容器清單加入 changedetectioncrawlers/CHANNELS.md:不改(Phase 0 不影響爬蟲)| 項目 | 增量 |
|---|---|
| RAM | +200-400MB(無 Playwright) |
| Disk | +~50MB(10 watchers text snapshots) |
| 網路 | 26 HTTP requests/day,~130KB/day |
| 容器數 | 25 → 26 |