HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Calculate questions per day on CodeGolf.SE

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
perquestionscodegolfcalculateday

Problem

I wrote a short script in Python 3 that connects to the Stack Exchange API, gets all questions on Programming Puzzles & Code Golf over the past two weeks, and determines the average number of questions per day as well as the average number of answers per question.

The number of questions per day is intended to match that on Area 51, which it does. Obviously it's much easier to just scrape Area 51 directly, but I wanted to figure it out myself for practice.

I'm not an expert with Python or with web APIs, so I was hoping you fine Code Review folks can help me improve my practices.

import requests, datetime, time

def seconds_since_epoch(dt):
    epoch = datetime.datetime(1970, 1, 1, tzinfo=datetime.timezone.utc)
    return int((dt - epoch).total_seconds())

today = datetime.datetime.now(datetime.timezone.utc)

params = {
    "site": "codegolf",
    "fromdate": seconds_since_epoch(today - datetime.timedelta(days=14)),
    "todate": seconds_since_epoch(today),
    "pagesize": 100,
    "page": 1
}

base_url = "https://api.stackexchange.com/2.2"

results = []

while True:
    req = requests.get(base_url + "/questions", params=params)
    contents = req.json()
    results.extend(contents["items"])
    if not contents["has_more"]:
        break
    if "backoff" in contents:
        time.sleep(contents["backoff"])
    params["page"] += 1

questions_per_day = len(results) / 14
answers_per_question = sum([q["answer_count"] for q in results]) / len(results)

print("Over the past 2 weeks, PPCG has had...")
print(round(questions_per_day, 1), "questions per day")
print(round(answers_per_question, 1), "answers per question")


My approach is to build the query using a dict and make the request to the API using the requests module. I set the page size to the maximum to reduce the number of requests made so that the daily quota isn't exhausted quite so fast.

The code is hosted on GitHub, should you want to fork and adapt it for your own purposes, assuming it isn't too

Solution

Your seconds_since_epoch function has a built-in Python equivalent, datetime.timestamp.

Your namespaces would be cleaner if you did from datetime import datetime, timezone.

You use a base_url variable, but do not use urllib.parse.urljoin. Either use a hardcoded URL, or properly join the base URL with the fragment.

results is better named as questions.

In sum([q["answer_count"] for q in results]) the [] is superfluous and inefficient.

Instead of printing 3 times in a row create a multiline format string and print once.

You never create a function that returns the questions, and do not define a main function. I suggest printing in the main function, that calls a function that gets and returns the question information.

This is how I would program it:

import requests
import time
from datetime import datetime, timezone, timedelta

def get_question_info(site, start, stop):
    API_URL = "https://api.stackexchange.com/2.2/questions"
    req_params = {
        "site": site,
        "fromdate": int(start.timestamp()),
        "todate": int(stop.timestamp()),
        "pagesize": 100,
        "page": 1
    }

    questions = []
    while True:
        req = requests.get(API_URL, params=req_params)
        contents = req.json()
        questions.extend(contents["items"])

        if not contents["has_more"]:
            break
        req_params["page"] += 1

        if "backoff" in contents:
            time.sleep(contents["backoff"])

    return questions

def get_area51_estimate(site):
    now = datetime.now(timezone.utc)
    fortnight_ago = now - timedelta(days=14)
    questions = get_question_info(site, fortnight_ago, now)
    avg_questions = len(questions) / 14
    avg_answers = sum(q["answer_count"] for q in questions) / len(questions)
    return avg_questions, avg_answers

if __name__ == "__main__":
    msg = """Over the past 2 weeks, PPCG has had...
{:.1f} questions per day
{:.1f} answers per question"""
    print(msg.format(*get_area51_estimate("codegolf")))

Code Snippets

import requests
import time
from datetime import datetime, timezone, timedelta


def get_question_info(site, start, stop):
    API_URL = "https://api.stackexchange.com/2.2/questions"
    req_params = {
        "site": site,
        "fromdate": int(start.timestamp()),
        "todate": int(stop.timestamp()),
        "pagesize": 100,
        "page": 1
    }

    questions = []
    while True:
        req = requests.get(API_URL, params=req_params)
        contents = req.json()
        questions.extend(contents["items"])

        if not contents["has_more"]:
            break
        req_params["page"] += 1

        if "backoff" in contents:
            time.sleep(contents["backoff"])

    return questions


def get_area51_estimate(site):
    now = datetime.now(timezone.utc)
    fortnight_ago = now - timedelta(days=14)
    questions = get_question_info(site, fortnight_ago, now)
    avg_questions = len(questions) / 14
    avg_answers = sum(q["answer_count"] for q in questions) / len(questions)
    return avg_questions, avg_answers


if __name__ == "__main__":
    msg = """Over the past 2 weeks, PPCG has had...
{:.1f} questions per day
{:.1f} answers per question"""
    print(msg.format(*get_area51_estimate("codegolf")))

Context

StackExchange Code Review Q#120468, answer score: 14

Revisions (0)

No revisions yet.