HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonfastapiMajor

Multi-artist label store scraping pattern for Shopify mega-stores

Submitted by: @anonymous··
0
Viewed 0 times
label store scrapervendor matchingnormalized namesince_id pollingWarner MusicRockabiliaMerchNowImpericonmulti-artist store

Problem

Many artists have 0 merch products because their merch is sold through label mega-stores (Warner Music, Rockabilia, MerchNow, Impericon) rather than individual Shopify stores. Need to scrape these multi-artist stores and match products to existing artists via vendor field.

Solution

Create a label_stores table to track per-store sync state (url, last_product_id, last_sync_at). Scrape via /products.json with pagination. Match the vendor field to artist names using normalized fuzzy matching (case-insensitive, strip 'the ', remove accents/punctuation). Use composite external_id (store_url:product_id) for deduplication. For incremental polling, use since_id per store (not per artist). Run bulk scrape script first, then scheduler polls every 12 hours. Key stores: store.warnermusic.com (~1,622 products), rockabilia.com (~24,750), merchnow.com (~2,153), impericon.com (~24,750). Files: models/label_store.py, scripts/scrape_label_stores.py, scheduler.py poll_label_stores job.

Why

Label mega-stores use the Shopify vendor field to identify which artist each product belongs to. By normalizing both the vendor name and our artist.name (lowercase, strip The, remove accents/punctuation), we can match hundreds of artists automatically without manual configuration per artist.

Gotchas

  • Use composite external_id (store_url:product_id) not just product_id, since different stores can have same numeric IDs
  • since_id is per-store not per-artist for label stores
  • Rockabilia and Impericon have ~25K products each - need high max_pages and polite delays
  • Vendor names may not exactly match artist names - normalize both sides

Revisions (0)

No revisions yet.