patternpythonModerate
Calculate questions per day on CodeGolf.SE
Viewed 0 times
perquestionscodegolfcalculateday
Problem
I wrote a short script in Python 3 that connects to the Stack Exchange API, gets all questions on Programming Puzzles & Code Golf over the past two weeks, and determines the average number of questions per day as well as the average number of answers per question.
The number of questions per day is intended to match that on Area 51, which it does. Obviously it's much easier to just scrape Area 51 directly, but I wanted to figure it out myself for practice.
I'm not an expert with Python or with web APIs, so I was hoping you fine Code Review folks can help me improve my practices.
My approach is to build the query using a
The code is hosted on GitHub, should you want to fork and adapt it for your own purposes, assuming it isn't too
The number of questions per day is intended to match that on Area 51, which it does. Obviously it's much easier to just scrape Area 51 directly, but I wanted to figure it out myself for practice.
I'm not an expert with Python or with web APIs, so I was hoping you fine Code Review folks can help me improve my practices.
import requests, datetime, time
def seconds_since_epoch(dt):
epoch = datetime.datetime(1970, 1, 1, tzinfo=datetime.timezone.utc)
return int((dt - epoch).total_seconds())
today = datetime.datetime.now(datetime.timezone.utc)
params = {
"site": "codegolf",
"fromdate": seconds_since_epoch(today - datetime.timedelta(days=14)),
"todate": seconds_since_epoch(today),
"pagesize": 100,
"page": 1
}
base_url = "https://api.stackexchange.com/2.2"
results = []
while True:
req = requests.get(base_url + "/questions", params=params)
contents = req.json()
results.extend(contents["items"])
if not contents["has_more"]:
break
if "backoff" in contents:
time.sleep(contents["backoff"])
params["page"] += 1
questions_per_day = len(results) / 14
answers_per_question = sum([q["answer_count"] for q in results]) / len(results)
print("Over the past 2 weeks, PPCG has had...")
print(round(questions_per_day, 1), "questions per day")
print(round(answers_per_question, 1), "answers per question")My approach is to build the query using a
dict and make the request to the API using the requests module. I set the page size to the maximum to reduce the number of requests made so that the daily quota isn't exhausted quite so fast.The code is hosted on GitHub, should you want to fork and adapt it for your own purposes, assuming it isn't too
Solution
Your
Your namespaces would be cleaner if you did
You use a
In
Instead of
You never create a function that returns the questions, and do not define a main function. I suggest printing in the main function, that calls a function that gets and returns the question information.
This is how I would program it:
seconds_since_epoch function has a built-in Python equivalent, datetime.timestamp.Your namespaces would be cleaner if you did
from datetime import datetime, timezone.You use a
base_url variable, but do not use urllib.parse.urljoin. Either use a hardcoded URL, or properly join the base URL with the fragment.results is better named as questions.In
sum([q["answer_count"] for q in results]) the [] is superfluous and inefficient.Instead of
printing 3 times in a row create a multiline format string and print once.You never create a function that returns the questions, and do not define a main function. I suggest printing in the main function, that calls a function that gets and returns the question information.
This is how I would program it:
import requests
import time
from datetime import datetime, timezone, timedelta
def get_question_info(site, start, stop):
API_URL = "https://api.stackexchange.com/2.2/questions"
req_params = {
"site": site,
"fromdate": int(start.timestamp()),
"todate": int(stop.timestamp()),
"pagesize": 100,
"page": 1
}
questions = []
while True:
req = requests.get(API_URL, params=req_params)
contents = req.json()
questions.extend(contents["items"])
if not contents["has_more"]:
break
req_params["page"] += 1
if "backoff" in contents:
time.sleep(contents["backoff"])
return questions
def get_area51_estimate(site):
now = datetime.now(timezone.utc)
fortnight_ago = now - timedelta(days=14)
questions = get_question_info(site, fortnight_ago, now)
avg_questions = len(questions) / 14
avg_answers = sum(q["answer_count"] for q in questions) / len(questions)
return avg_questions, avg_answers
if __name__ == "__main__":
msg = """Over the past 2 weeks, PPCG has had...
{:.1f} questions per day
{:.1f} answers per question"""
print(msg.format(*get_area51_estimate("codegolf")))Code Snippets
import requests
import time
from datetime import datetime, timezone, timedelta
def get_question_info(site, start, stop):
API_URL = "https://api.stackexchange.com/2.2/questions"
req_params = {
"site": site,
"fromdate": int(start.timestamp()),
"todate": int(stop.timestamp()),
"pagesize": 100,
"page": 1
}
questions = []
while True:
req = requests.get(API_URL, params=req_params)
contents = req.json()
questions.extend(contents["items"])
if not contents["has_more"]:
break
req_params["page"] += 1
if "backoff" in contents:
time.sleep(contents["backoff"])
return questions
def get_area51_estimate(site):
now = datetime.now(timezone.utc)
fortnight_ago = now - timedelta(days=14)
questions = get_question_info(site, fortnight_ago, now)
avg_questions = len(questions) / 14
avg_answers = sum(q["answer_count"] for q in questions) / len(questions)
return avg_questions, avg_answers
if __name__ == "__main__":
msg = """Over the past 2 weeks, PPCG has had...
{:.1f} questions per day
{:.1f} answers per question"""
print(msg.format(*get_area51_estimate("codegolf")))Context
StackExchange Code Review Q#120468, answer score: 14
Revisions (0)
No revisions yet.