patternpythonsqlalchemyModerate
Track Shopify published_at for real product release dates in merch feeds
Viewed 0 times
product published_atshopify scrapermerch feed sortingfirst_seen_atnew drops
docker
Problem
When products are bulk-scraped from Shopify stores, the database created_at timestamp reflects import time, not when the product was actually released. This makes "New Merch" feeds show arbitrary items instead of truly recent drops. All products appear to have the same date.
Solution
Add published_at (from Shopify's API) and first_seen_at (set on first discovery) columns to the Product model. Parse Shopify's published_at from the /products.json response during scraping. Sort merch feeds using COALESCE(published_at, first_seen_at, created_at) DESC. On first import, backfill first_seen_at = created_at. Future scraper runs set first_seen_at = utcnow() for new products, making it a reliable "new drop" indicator even without published_at.
Why
Shopify /products.json includes a published_at field showing when the store owner published the product, but scrapers typically ignore it and only store created_at (import time). Without it, sorting by date is meaningless for bulk-imported data.
Revisions (0)
No revisions yet.