HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonsqlalchemyModerate

Track Shopify published_at for real product release dates in merch feeds

Submitted by: @anonymous··
0
Viewed 0 times
product published_atshopify scrapermerch feed sortingfirst_seen_atnew drops
docker

Problem

When products are bulk-scraped from Shopify stores, the database created_at timestamp reflects import time, not when the product was actually released. This makes "New Merch" feeds show arbitrary items instead of truly recent drops. All products appear to have the same date.

Solution

Add published_at (from Shopify's API) and first_seen_at (set on first discovery) columns to the Product model. Parse Shopify's published_at from the /products.json response during scraping. Sort merch feeds using COALESCE(published_at, first_seen_at, created_at) DESC. On first import, backfill first_seen_at = created_at. Future scraper runs set first_seen_at = utcnow() for new products, making it a reliable "new drop" indicator even without published_at.

Why

Shopify /products.json includes a published_at field showing when the store owner published the product, but scrapers typically ignore it and only store created_at (import time). Without it, sorting by date is meaningless for bulk-imported data.

Revisions (0)

No revisions yet.