Analysis date: 2026-03-31 Source:
channel_master.csv(2,490,077 records, 67 channels) Purpose: Difficulty coefficient for barcode collection contract pricing
| Metric | Value |
|---|---|
| Total product records | 2,490,077 |
| Total channels | 67 |
| Records with any barcode field | 440,414 (17.7%) |
| Records with valid GTIN (matchable) | 203,464 (8.2%) |
| Unique valid GTINs | 160,612 |
| Channels with GTIN capability | 25 / 67 (37.3%) |
Critical finding: The majority of Taiwan's largest e-commerce platforms (PChome 980K, ETmall 425K, Poya 28K, 7-ELEVEN 13K) have ZERO barcode data in their crawled product listings. Barcode collection for these channels requires fundamentally different approaches (image OCR, manual lookup, cross-channel matching by product name) rather than direct scraping.
| Channel | Total Products | Valid GTIN | GTIN Rate | Unique GTINs | Notes |
|---|---|---|---|---|---|
| iherb | 51,489 | 51,484 | 100.0% | 51,484 | International supplement retailer, native GTIN |
| daiso | 6,663 | 6,662 | 100.0% | 6,662 | Japanese retailer, native barcode |
| tomods | 5,707 | 5,654 | 99.1% | 5,675 | Japanese drugstore, native barcode |
| yourchance | 1,846 | 1,828 | 99.0% | 1,686 | |
| pxmart_mega | 10,200 | 9,720 | 95.3% | 10,198 | PXMart physical store data |
| dingding | 4,451 | 4,243 | 95.3% | 4,109 | |
| jpmed | 5,161 | 4,818 | 93.3% | 4,815 | Japanese pharma |
| pxgo | 9,906 | 9,151 | 92.4% | 9,901 | PXGo (online grocery) |
| homeda | 2,924 | 2,682 | 91.7% | 2,652 | |
| ckcare | 1,322 | 1,155 | 87.4% | 999 | |
| rakuten-amart | 8,486 | 7,122 | 83.9% | 8,473 | |
| libaga | 4,092 | 3,404 | 83.2% | 3,377 | Liquor specialist |
| Channel | Total Products | Valid GTIN | GTIN Rate | Unique GTINs | Notes |
|---|---|---|---|---|---|
| matsukiyo | 705 | 529 | 75.0% | 527 | Japanese drugstore |
| rakuten | 13,463 | 8,879 | 66.0% | 9,012 | Marketplace |
| weixinrx | 1,932 | 791 | 40.9% | 792 | Pharmacy |
| pxbox | 191,674 | 50,943 | 26.6% | 46,853 | Largest with GTINs but 73% empty |
| carrefour | 119,863 | 28,420 | 23.7% | 28,339 | 52% empty, 20% invalid format |
| angelbaby | 5,472 | 1,108 | 20.2% | 1,142 | Baby products |
| lesenphants | 5,611 | 537 | 9.6% | 539 | |
| babyez | 1,782 | 173 | 9.7% | 168 | |
| treebuy | 61,819 | 3,091 | 5.0% | 2,937 | |
| costco | 9,269 | 481 | 5.2% | 481 |
| Channel | Total Products | Has Barcode Field | Barcode Type | Notes |
|---|---|---|---|---|
| pchome | 980,048 | 0 | None | Largest platform, zero barcode |
| etmall | 424,958 | 0 | None | Second largest, zero barcode |
| taaze | 247,151 | 189,509 | ISBN only | Book platform |
| cosmed | 124,723 | 44,005 | ISBN only | Drugstore, ISBNs are from books section |
| trplus | 71,525 | 0 | None | TRPlus marketplace |
| poya | 27,656 | 0 | None | Beauty/household retailer |
| seven11 | 12,645 | 0 | None | Convenience store |
| familymart | 7,773 | 0 | None | Convenience store |
| savesafe | 7,315 | 0 | None | |
| hola | 5,683 | 0 | None | Furniture/home |
| ubereats | 1,657 | 0 | None | Food delivery |
| Metric | Value |
|---|---|
| Total products | 9,906 |
| Valid GTIN rate | 92.4% |
| Barcode format | EAN-13 97.6%, UPC-A 1.8%, EAN-8 0.6% |
| Store-internal barcodes | 750 (7.6%) |
| Brand data | 0% (not scraped) |
| Category data | 100% |
| Median price | NT$99 |
Assessment: PXGo is the EASIEST among the key channels. Over 92% of products have valid GTINs. The remaining 7.6% are store-internal codes (likely private label or fresh food). This channel is a strong candidate for cross-referencing other channels' products.
Top categories with coverage: Condiments (99.7%), Hair care (100%), Biscuits (100%), Oral care (100%), Frozen meals (100%).
| Metric | Value |
|---|---|
| Total products | 191,674 |
| Valid GTIN rate | 26.6% |
| Products missing barcode entirely | 139,878 (73.0%) |
| Invalid barcodes | 598 (0.3%) |
| Brand data | 99.2% |
| Category data | 99.2% |
| Median price | NT$780 |
Assessment: PXBox is a mixed marketplace. Its FMCG grocery products (biscuits 88.9%, hand tools 51.9%) have decent barcode rates, but its non-grocery products (phone cases 0.8%, furniture 0.3%, fashion 2.6%) have nearly zero. The 50,943 valid GTINs overlap heavily with PXGo (6,086 shared = 66.5% of PXGo's catalog) and PXMart_Mega (8,167 shared = 84.0% of PXMart_Mega).
| Metric | Value |
|---|---|
| Total products | 10,200 |
| Valid GTIN rate | 95.3% |
| Store-internal barcodes | 478 (4.7%) |
| Brand data | 0% |
| Category data | 100% |
| Median price | NT$119 |
Assessment: Similar to PXGo -- physical-store-oriented FMCG data with excellent barcode coverage. Almost all categories at 100%. Near-complete overlap with PXBox GTINs (84% of Mega's barcodes found in PXBox).
| Metric | Value |
|---|---|
| Total products | 119,863 |
| Valid GTIN rate | 23.7% |
| Products missing barcode entirely | 62,590 (52.2%) |
| Invalid format barcodes | 24,441 (20.4%) |
| Invalid length | 2,376 (2.0%) |
| Invalid checksum | 1,196 (1.0%) |
| Store-internal barcodes | 743 (0.6%) |
| Brand data | 73.0% |
| Category data | 100% |
| Median price | NT$480 |
Assessment: Carrefour is moderately difficult. Over 20% of products have barcodes that failed validation (likely internal SKUs formatted differently or source data quality issues). The valid 28,420 GTINs are valuable -- they overlap 23.7% with PXBox. Category-wise, stationery (13.9%), kitchenware (21.3%), pet supplies (23.3%) and food categories tend to have better coverage than electronics/furniture (0-5%).
Special note on invalid_format barcodes (24,441): These may contain recoverable barcode data that just needs reformatting. Investigating this could unlock an additional ~20% coverage.
| Metric | Value |
|---|---|
| Total products | 124,723 |
| Valid GTIN rate | 0.0% |
| ISBN barcodes (books) | 44,005 (35.3%) |
| Brand data | 99.1% |
| Category data | 100% |
| Median price | NT$441 |
Assessment: Cosmed's crawled data has ZERO product GTINs. The 44,005 "barcodes" are all ISBNs from a books section. The top categories are luxury brands (Michael Kors, Hermes, Burberry, Chanel) and books -- suggesting the scrape captured a marketplace/department store view rather than the core drugstore assortment. Barcode collection would require entirely new crawling strategy or cross-channel matching.
| Metric | Value |
|---|---|
| Total products | 27,656 |
| Valid GTIN rate | 0.0% |
| Brand data | 99.2% |
| Category data | 100% |
| Median price | NT$199 |
Assessment: Zero barcode data. However, brand names and detailed category hierarchy (3 levels) are available, which could support fuzzy matching to other channels. Categories are heavily beauty/personal care focused -- the right FMCG categories for a barcode project, but data must be sourced externally.
| Metric | Value |
|---|---|
| Total products | 12,645 |
| Valid GTIN rate | 0.0% |
| Brand data | 0% |
| Category data | 100% |
| Median price | NT$999 |
Assessment: Zero barcode AND zero brand data. Maximum difficulty for matching. The product catalog appears to be ibon/online shop (jewelry, home appliances, gourmet food) rather than in-store convenience items.
No data in channel_master.csv. These channels were
not crawled or are under different channel codes. Need to verify if they
exist under alternative names (e.g., shopee_watsons has
2,164 products but also zero GTINs).
| Metric | Value |
|---|---|
| Total unique valid GTINs | 160,612 |
| GTINs in 1 channel only | 120,602 (84.8%) |
| GTINs in 2 channels | 16,065 (11.3%) |
| GTINs in 3 channels | 4,037 (2.8%) |
| GTINs in 4 channels | 1,327 (0.9%) |
| GTINs in 5+ channels | 176 (0.1%) |
| Channel A | Channel B | Shared GTINs | % of Smaller Set |
|---|---|---|---|
| PXBox | PXMart_Mega | 8,167 | 84.0% |
| PXBox | PXGo | 6,086 | 66.5% |
| PXBox | Carrefour | 6,522 | 23.7% |
| PXMart_Mega | Carrefour | 2,204 | 22.7% |
| PXGo | Carrefour | 2,000 | 21.9% |
| iherb | PXBox | 96 | 0.2% |
Key insight: The PX family (PXBox + PXGo + PXMart_Mega) forms a highly overlapping barcode cluster. Their combined unique GTINs cover the broadest domestic FMCG barcode set. Carrefour adds ~20K additional unique GTINs. iherb operates in a nearly independent barcode universe (supplements/health foods).
| Rank | Category | Products | Coverage | Main Source |
|---|---|---|---|---|
| 1 | Analgesics (止痛藥) | 21 | 90.5% | taaze, iherb |
| 2 | Cold Medicine (感冒藥) | 18 | 88.9% | taaze, jpmed |
| 3 | Health Food (健康食品) | 54,766 | 52.4% | iherb(25K) dominates |
| 4 | Candy (糖果) | 5,107 | 41.4% | pchome, pxmart_mega |
| 5 | Sanitary Protection (衛生用品) | 4,942 | 40.0% | pchome, jpmed, carrefour |
| Rank | Category | Products | Coverage | Challenge |
|---|---|---|---|---|
| 59 | Facial Tissue (面紙) | 2,094 | 8.3% | Dominated by pchome (no barcode) |
| 58 | Battery (電池) | 16,446 | 8.7% | 59% from pchome |
| 57 | Cigarette (香菸) | 221 | 10.0% | Regulatory restrictions |
| 56 | General Skin Care (一般肌膚保養) | 17,306 | 10.9% | Pchome + etmall domination |
| 55 | Essence Drink (精華飲料) | 3,761 | 12.5% |
Difficulty Score (0-100) =
Barcode_Gap * 0.40 (% without valid barcode)
+ GTIN_Gap * 0.20 (% without valid GTIN specifically)
+ Brand_Gap * 0.15 (% without brand data - affects matching ability)
+ Category_Diversity * 0.15 (normalized 0-100)
+ Dedup_Issues * 0.10 (100 - dedup_ratio)
| Rank | Channel | Difficulty | BC Gap | GTIN Gap | Products | Assessment |
|---|---|---|---|---|---|---|
| 5 | 7-ELEVEN | 100.0 | 100% | 100% | 12,645 | Maximum difficulty |
| 17 | Poya (寶雅) | 85.1 | 100% | 100% | 27,656 | Very hard, has brand names |
| 27 | foodpanda | 84.0 | 99% | 99% | 5,606 | Very hard |
| 34 | Cosmed (康是美) | 75.2 | 100% | 100% | 124,723 | Hard, has brands + ISBNs |
| 46 | UberEats | 70.0 | 100% | 100% | 1,657 | Hard but small catalog |
| 49 | PXBox (全聯) | 60.0 | 73% | 73% | 191,674 | Medium, has 51K GTINs |
| 51 | Carrefour (家樂福) | 58.2 | 76% | 76% | 119,863 | Medium, has 28K GTINs |
| 55 | PXGo (全聯) | 34.6 | 8% | 8% | 9,906 | Low difficulty |
| 57 | PXMart Mega | 27.1 | 5% | 5% | 10,200 | Very low difficulty |
| Difficulty Tier | Score Range | Channels | Total Products | Description |
|---|---|---|---|---|
| Extreme | 85-100 | 20 | 1,223,424 | Zero barcode, may lack brand |
| Hard | 70-85 | 14 | 153,556 | Zero/near-zero barcode, has some metadata |
| Medium | 40-70 | 7 | 367,044 | Partial barcode, needs gap-filling |
| Low | 20-40 | 6 | 26,988 | Mostly covered, minor gaps |
| Very Low | 0-20 | 20 | 719,065 | Excellent barcode data |
Tier-based pricing is essential. A flat per-channel rate would severely underprice hard channels and overprice easy ones. The difficulty score spans 27 to 100.
The "Big 4" problem: PChome (980K), ETmall (425K), Cosmed (125K), and TRPlus (72K) collectively represent 1.6M products with ZERO barcode data. Any contract promising barcode coverage for these channels requires non-scraping methods (image recognition, manufacturer database, manual lookup, or cross-channel name matching).
PX family is the best starting point: PXGo + PXMart_Mega combined give ~15K unique GTINs at >92% valid rate -- an excellent seed dataset for FMCG barcode matching against other channels.
Carrefour's 24K invalid-format barcodes are a potential "quick win" -- these may be recoverable with format normalization, adding ~20% more coverage at minimal cost.
Cross-channel matching potential: 85% of GTINs exist in only 1 channel. The overlap between PX-family and Carrefour is ~22%. This means each channel adds substantial unique barcode inventory.
| Tier | Channels | Suggested Multiplier | Rationale |
|---|---|---|---|
| 1 (Easy) | PXGo, PXMart_Mega, iherb, daiso, tomods | 1.0x | Direct scraping yields >90% |
| 2 (Moderate) | PXBox, Carrefour, rakuten, treebuy | 1.5-2.0x | Mix of scraped + gap-fill needed |
| 3 (Hard) | Cosmed, Poya, foodpanda | 3.0-4.0x | Requires cross-channel matching or manual |
| 4 (Extreme) | PChome, ETmall, 7-ELEVEN, trplus | 5.0-8.0x | No barcode source; requires image/OCR/manual |
| N/A | Momo, Watsons | TBD | Not yet crawled; estimate based on platform type |
Report generated from
barcode_coverage_analysis.py Output files:
barcode_coverage_analysis.json,
channel_difficulty_scores.csv