Barcode Coverage Analysis Report

條碼覆蓋率分析報告

Analysis date: 2026-03-31 Source: channel_master.csv (2,490,077 records, 67 channels) Purpose: Difficulty coefficient for barcode collection contract pricing


Executive Summary | 重點摘要

Metric Value
Total product records 2,490,077
Total channels 67
Records with any barcode field 440,414 (17.7%)
Records with valid GTIN (matchable) 203,464 (8.2%)
Unique valid GTINs 160,612
Channels with GTIN capability 25 / 67 (37.3%)

Critical finding: The majority of Taiwan's largest e-commerce platforms (PChome 980K, ETmall 425K, Poya 28K, 7-ELEVEN 13K) have ZERO barcode data in their crawled product listings. Barcode collection for these channels requires fundamentally different approaches (image OCR, manual lookup, cross-channel matching by product name) rather than direct scraping.


Section 1: Per-Channel Barcode Coverage | 各通路條碼覆蓋率

Tier 1: Channels with Excellent GTIN Coverage (>80%)

Channel Total Products Valid GTIN GTIN Rate Unique GTINs Notes
iherb 51,489 51,484 100.0% 51,484 International supplement retailer, native GTIN
daiso 6,663 6,662 100.0% 6,662 Japanese retailer, native barcode
tomods 5,707 5,654 99.1% 5,675 Japanese drugstore, native barcode
yourchance 1,846 1,828 99.0% 1,686
pxmart_mega 10,200 9,720 95.3% 10,198 PXMart physical store data
dingding 4,451 4,243 95.3% 4,109
jpmed 5,161 4,818 93.3% 4,815 Japanese pharma
pxgo 9,906 9,151 92.4% 9,901 PXGo (online grocery)
homeda 2,924 2,682 91.7% 2,652
ckcare 1,322 1,155 87.4% 999
rakuten-amart 8,486 7,122 83.9% 8,473
libaga 4,092 3,404 83.2% 3,377 Liquor specialist

Tier 2: Channels with Partial Coverage (5-80%)

Channel Total Products Valid GTIN GTIN Rate Unique GTINs Notes
matsukiyo 705 529 75.0% 527 Japanese drugstore
rakuten 13,463 8,879 66.0% 9,012 Marketplace
weixinrx 1,932 791 40.9% 792 Pharmacy
pxbox 191,674 50,943 26.6% 46,853 Largest with GTINs but 73% empty
carrefour 119,863 28,420 23.7% 28,339 52% empty, 20% invalid format
angelbaby 5,472 1,108 20.2% 1,142 Baby products
lesenphants 5,611 537 9.6% 539
babyez 1,782 173 9.7% 168
treebuy 61,819 3,091 5.0% 2,937
costco 9,269 481 5.2% 481

Tier 3: Channels with Zero Valid GTIN

Channel Total Products Has Barcode Field Barcode Type Notes
pchome 980,048 0 None Largest platform, zero barcode
etmall 424,958 0 None Second largest, zero barcode
taaze 247,151 189,509 ISBN only Book platform
cosmed 124,723 44,005 ISBN only Drugstore, ISBNs are from books section
trplus 71,525 0 None TRPlus marketplace
poya 27,656 0 None Beauty/household retailer
seven11 12,645 0 None Convenience store
familymart 7,773 0 None Convenience store
savesafe 7,315 0 None
hola 5,683 0 None Furniture/home
ubereats 1,657 0 None Food delivery

Section 2: Key Difficult Channels Deep Dive | 重點困難通路詳析

A. PXGo (全聯線上購 - 純量販通路)

Metric Value
Total products 9,906
Valid GTIN rate 92.4%
Barcode format EAN-13 97.6%, UPC-A 1.8%, EAN-8 0.6%
Store-internal barcodes 750 (7.6%)
Brand data 0% (not scraped)
Category data 100%
Median price NT$99

Assessment: PXGo is the EASIEST among the key channels. Over 92% of products have valid GTINs. The remaining 7.6% are store-internal codes (likely private label or fresh food). This channel is a strong candidate for cross-referencing other channels' products.

Top categories with coverage: Condiments (99.7%), Hair care (100%), Biscuits (100%), Oral care (100%), Frozen meals (100%).

B. PXBox (全聯線上購 - 綜合電商)

Metric Value
Total products 191,674
Valid GTIN rate 26.6%
Products missing barcode entirely 139,878 (73.0%)
Invalid barcodes 598 (0.3%)
Brand data 99.2%
Category data 99.2%
Median price NT$780

Assessment: PXBox is a mixed marketplace. Its FMCG grocery products (biscuits 88.9%, hand tools 51.9%) have decent barcode rates, but its non-grocery products (phone cases 0.8%, furniture 0.3%, fashion 2.6%) have nearly zero. The 50,943 valid GTINs overlap heavily with PXGo (6,086 shared = 66.5% of PXGo's catalog) and PXMart_Mega (8,167 shared = 84.0% of PXMart_Mega).

C. PXMart Mega (全聯量販 Mega)

Metric Value
Total products 10,200
Valid GTIN rate 95.3%
Store-internal barcodes 478 (4.7%)
Brand data 0%
Category data 100%
Median price NT$119

Assessment: Similar to PXGo -- physical-store-oriented FMCG data with excellent barcode coverage. Almost all categories at 100%. Near-complete overlap with PXBox GTINs (84% of Mega's barcodes found in PXBox).

D. Carrefour (家樂福)

Metric Value
Total products 119,863
Valid GTIN rate 23.7%
Products missing barcode entirely 62,590 (52.2%)
Invalid format barcodes 24,441 (20.4%)
Invalid length 2,376 (2.0%)
Invalid checksum 1,196 (1.0%)
Store-internal barcodes 743 (0.6%)
Brand data 73.0%
Category data 100%
Median price NT$480

Assessment: Carrefour is moderately difficult. Over 20% of products have barcodes that failed validation (likely internal SKUs formatted differently or source data quality issues). The valid 28,420 GTINs are valuable -- they overlap 23.7% with PXBox. Category-wise, stationery (13.9%), kitchenware (21.3%), pet supplies (23.3%) and food categories tend to have better coverage than electronics/furniture (0-5%).

Special note on invalid_format barcodes (24,441): These may contain recoverable barcode data that just needs reformatting. Investigating this could unlock an additional ~20% coverage.

E. Cosmed (康是美)

Metric Value
Total products 124,723
Valid GTIN rate 0.0%
ISBN barcodes (books) 44,005 (35.3%)
Brand data 99.1%
Category data 100%
Median price NT$441

Assessment: Cosmed's crawled data has ZERO product GTINs. The 44,005 "barcodes" are all ISBNs from a books section. The top categories are luxury brands (Michael Kors, Hermes, Burberry, Chanel) and books -- suggesting the scrape captured a marketplace/department store view rather than the core drugstore assortment. Barcode collection would require entirely new crawling strategy or cross-channel matching.

F. Poya (寶雅)

Metric Value
Total products 27,656
Valid GTIN rate 0.0%
Brand data 99.2%
Category data 100%
Median price NT$199

Assessment: Zero barcode data. However, brand names and detailed category hierarchy (3 levels) are available, which could support fuzzy matching to other channels. Categories are heavily beauty/personal care focused -- the right FMCG categories for a barcode project, but data must be sourced externally.

G. 7-ELEVEN

Metric Value
Total products 12,645
Valid GTIN rate 0.0%
Brand data 0%
Category data 100%
Median price NT$999

Assessment: Zero barcode AND zero brand data. Maximum difficulty for matching. The product catalog appears to be ibon/online shop (jewelry, home appliances, gourmet food) rather than in-store convenience items.

H. Momo / Watsons (屈臣氏)

No data in channel_master.csv. These channels were not crawled or are under different channel codes. Need to verify if they exist under alternative names (e.g., shopee_watsons has 2,164 products but also zero GTINs).


Section 3: Cross-Channel Barcode Overlap | 跨通路條碼重疊

Overlap Statistics

Metric Value
Total unique valid GTINs 160,612
GTINs in 1 channel only 120,602 (84.8%)
GTINs in 2 channels 16,065 (11.3%)
GTINs in 3 channels 4,037 (2.8%)
GTINs in 4 channels 1,327 (0.9%)
GTINs in 5+ channels 176 (0.1%)

Key Pairwise Overlaps (shared unique GTINs)

Channel A Channel B Shared GTINs % of Smaller Set
PXBox PXMart_Mega 8,167 84.0%
PXBox PXGo 6,086 66.5%
PXBox Carrefour 6,522 23.7%
PXMart_Mega Carrefour 2,204 22.7%
PXGo Carrefour 2,000 21.9%
iherb PXBox 96 0.2%

Key insight: The PX family (PXBox + PXGo + PXMart_Mega) forms a highly overlapping barcode cluster. Their combined unique GTINs cover the broadest domestic FMCG barcode set. Carrefour adds ~20K additional unique GTINs. iherb operates in a nearly independent barcode universe (supplements/health foods).


Section 4: Category-Level Barcode Coverage | 品類層級條碼覆蓋率

Best-Covered FMCG Categories (from category_stats.md)

Rank Category Products Coverage Main Source
1 Analgesics (止痛藥) 21 90.5% taaze, iherb
2 Cold Medicine (感冒藥) 18 88.9% taaze, jpmed
3 Health Food (健康食品) 54,766 52.4% iherb(25K) dominates
4 Candy (糖果) 5,107 41.4% pchome, pxmart_mega
5 Sanitary Protection (衛生用品) 4,942 40.0% pchome, jpmed, carrefour

Worst-Covered FMCG Categories

Rank Category Products Coverage Challenge
59 Facial Tissue (面紙) 2,094 8.3% Dominated by pchome (no barcode)
58 Battery (電池) 16,446 8.7% 59% from pchome
57 Cigarette (香菸) 221 10.0% Regulatory restrictions
56 General Skin Care (一般肌膚保養) 17,306 10.9% Pchome + etmall domination
55 Essence Drink (精華飲料) 3,761 12.5%

Section 5: Difficulty Coefficient | 困難度係數

Methodology

Difficulty Score (0-100) = 
    Barcode_Gap * 0.40        (% without valid barcode)
  + GTIN_Gap * 0.20           (% without valid GTIN specifically)  
  + Brand_Gap * 0.15          (% without brand data - affects matching ability)
  + Category_Diversity * 0.15 (normalized 0-100)
  + Dedup_Issues * 0.10       (100 - dedup_ratio)

Difficulty Rankings for Key Channels

Rank Channel Difficulty BC Gap GTIN Gap Products Assessment
5 7-ELEVEN 100.0 100% 100% 12,645 Maximum difficulty
17 Poya (寶雅) 85.1 100% 100% 27,656 Very hard, has brand names
27 foodpanda 84.0 99% 99% 5,606 Very hard
34 Cosmed (康是美) 75.2 100% 100% 124,723 Hard, has brands + ISBNs
46 UberEats 70.0 100% 100% 1,657 Hard but small catalog
49 PXBox (全聯) 60.0 73% 73% 191,674 Medium, has 51K GTINs
51 Carrefour (家樂福) 58.2 76% 76% 119,863 Medium, has 28K GTINs
55 PXGo (全聯) 34.6 8% 8% 9,906 Low difficulty
57 PXMart Mega 27.1 5% 5% 10,200 Very low difficulty

Global Difficulty Distribution (all 67 channels)

Difficulty Tier Score Range Channels Total Products Description
Extreme 85-100 20 1,223,424 Zero barcode, may lack brand
Hard 70-85 14 153,556 Zero/near-zero barcode, has some metadata
Medium 40-70 7 367,044 Partial barcode, needs gap-filling
Low 20-40 6 26,988 Mostly covered, minor gaps
Very Low 0-20 20 719,065 Excellent barcode data

Section 6: Strategic Recommendations | 策略建議

For Contract Pricing

  1. Tier-based pricing is essential. A flat per-channel rate would severely underprice hard channels and overprice easy ones. The difficulty score spans 27 to 100.

  2. The "Big 4" problem: PChome (980K), ETmall (425K), Cosmed (125K), and TRPlus (72K) collectively represent 1.6M products with ZERO barcode data. Any contract promising barcode coverage for these channels requires non-scraping methods (image recognition, manufacturer database, manual lookup, or cross-channel name matching).

  3. PX family is the best starting point: PXGo + PXMart_Mega combined give ~15K unique GTINs at >92% valid rate -- an excellent seed dataset for FMCG barcode matching against other channels.

  4. Carrefour's 24K invalid-format barcodes are a potential "quick win" -- these may be recoverable with format normalization, adding ~20% more coverage at minimal cost.

  5. Cross-channel matching potential: 85% of GTINs exist in only 1 channel. The overlap between PX-family and Carrefour is ~22%. This means each channel adds substantial unique barcode inventory.

Suggested Difficulty Multipliers for Contract

Tier Channels Suggested Multiplier Rationale
1 (Easy) PXGo, PXMart_Mega, iherb, daiso, tomods 1.0x Direct scraping yields >90%
2 (Moderate) PXBox, Carrefour, rakuten, treebuy 1.5-2.0x Mix of scraped + gap-fill needed
3 (Hard) Cosmed, Poya, foodpanda 3.0-4.0x Requires cross-channel matching or manual
4 (Extreme) PChome, ETmall, 7-ELEVEN, trplus 5.0-8.0x No barcode source; requires image/OCR/manual
N/A Momo, Watsons TBD Not yet crawled; estimate based on platform type

Report generated from barcode_coverage_analysis.py Output files: barcode_coverage_analysis.json, channel_difficulty_scores.csv