HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Plot daily time spent on codereview.stackexchange.com

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
codereviewtimespentstackexchangeplotcomdaily

Problem

This draws an xyplot from csv data obtained from RescueTime API for activity time for "http://codereview.stackexchange.com".

```
# -------------------------------------------------------
# PLOT TIME SPENT ON CODEREVIEW SE
# USING RESCUE TIME API AND R
# -------------------------------------------------------

# -------------------------------------------------------
# LIBRARIES
if(!require("RCurl")){
install.packages("RCurl")
}
if(!require("lattice")){
install.packages("lattice")
}
require("RCurl")
require("lattice")
# -------------------------------------------------------

print("Initializing...")

# -------------------------------------------------------
# RESCUE TIME API ACCESS

# api key
# get from https://www.rescuetime.com/anapi/manage
api_key <- "..."

# query from date
from_date <- "2014-01-01"

# query to date
to_date <- format(Sys.Date(), "%Y-%m-%d")

# activity to get details of
activity <- "codereview.stackexchange.com"

# --api description--
# for more details see https://www.rescuetime.com/anapi/setup/documentation
# key - API key
# format - Output format
# rb - Restrict Start Date ("%Y-%m-%d")
# re - Restrict End Date ("%Y-%m-%d")
# rk - Restrict Kind (in this case it's activity)
# pv - Perspective (in this case it's interval)
# rs - Resolution Time (if it is day it's a daily report)
# rt - Restrict Thingy (Restrict Kind of Type)
url_format <- "https://www.rescuetime.com/anapi/data?key=%s&format=csv&rb=%s&re=%s&rk=activity&pv=interval&rs=day&rt=%s"

# create url
api_url <- sprintf(url_format,api_key,from_date,to_date,activity)
# -------------------------------------------------------

# -------------------------------------------------------
# PLOT

# get csv as a text
csv_text_data <- textConnection(getURL(api_url))

# parse csv
activity_data <- read.csv(csv_text_data,header = TRUE)

print(sprintf("Mean Time Spent on '%s':%f (seconds)",activity,mean(activity_data$Time)))

activity_time_spent <- activity_data$Time
activity_dates <- activ

Solution

Here is how I would rewrite your code. I will add my comments after:

plotActivity <- function(
   activity  = 'codereview.stackexchange.com',
   api_key   = getOption('rescuetime_api_key'),
   from_date = NULL,
   to_date   = NULL) {

## This function plots time spent on a given URL
## using the rescue time API
##
## Arguments:
##    - activity:  the url you want to check for usage
##    - api_key:   user API key, you can get yours at
##                 https://www.rescuetime.com/anapi/manage
##    - from_date: a Date or string in "YYYY-MM-DD" format,
##                 defaults to January 1st of the same year as to_date
##    - to_date:   a Date or string in "YYYY-MM-DD" format,
##                 defaults to today
##    - verbose:   adds verbose, defaults to FALSE

   library("RCurl")
   library("ggplot2")

   # arguments parsing and default settings
   if (is.null(api_key)) stop("please set your rescue time API key...")

   if (is.null(to_date)) to_date <- Sys.Date()
   to_date <- as.Date(to_date)
   if (is.na(to_date)) stop("error parsing to_date...")

   if (is.null(from_date)) from_date <- format(to_date, "%Y-01-01")
   from_date <- as.Date(from_date)
   if (is.na(from_date)) stop("error parsing from_date...")

   # build the API request url
   # for more details see https://www.rescuetime.com/anapi/setup/documentation

   base_url  <- "https://www.rescuetime.com/anapi/data"
   arguments <- c(key = api_key,   # API key
                  format = 'csv',  # Output format
                  rb = from_date,  # Restrict Start Date
                  re = end_date,   # Restrict End Date
                  rk = 'activity', # Restrict Kind
                  pv = 'interval', # Perspective
                  rs = 'day',      # Resolution Time
                  rt = activity)   # Restrict Thingy (Restrict Kind of Type)

   arg_str <- paste(names(arguments), arguments, sep = '=', collapse = '&')
   api_url <- sprintf("%s?%s", base_url, arg_str)

   # read data
   csv_text_data <- textConnection(getURL(api_url))
   activity_data <- read.csv(csv_text_data, header = TRUE)

   # print diagnostic and plot
   cat(sprintf("Total Time Spent between %s and %s on '%s':%f (seconds)",
               activity, from_date, end_date, sum(activity_data$Time)))

   activity_data$Date <- as.Date(activity_data$Date)
   ggplot(activity_data, aes(x=Date, y=Time.Spent..seconds.)) +
      geom_bar(stat="identity") +
      xlab("Date") +
      ylab("Time (seconds)")
}


So here are a few ideas:

-
I went from a script to a function. This way it is a lot easier to use and share. I have abstracted what I believe were all the right inputs and chosen sensible defaults. Of interest is the use of getOption for the api_key so the user can set it once for all by running options(rescuetime_api_key = "xyz123abc"). So now, all you have to do is source the file where this function will be stored, then call the function. A function, when executed, runs in its own environment so all the variables that are created at runtime are deleted as you exit the function. Your global environment is never polluted, which removes the need for your "clean up" section (which was otherwise pretty harmful as janos pointed out.)

-
Like rm(list=ls()), forcing the install of packages via install.packages() is a bit harmful. No, you just want to use library and it will die right there, leaving the user with the decision of installing the package. There is a nice blog about why some people -me included- prefer library over require: http://yihui.name/en/2014/07/library-vs-require

-
I made use of R's Date object wherever it made sense.

-
I rewrote the way the api url is built, making use of a named vector for the arguments. With inline comments. I hope you'll agree the code is a bit more readable and easier to maintain, something you should always aim for.

-
I replaced print with cat. This way you don't get the [1] prefix that comes with printing a vector.

-
This has less to do with code reviewing but I thought that it would be more useful to see the total time spent on the website. Average time spent per day can be a little confusing: the user might be wondering if it is including all days or only those when the website was visited (you chose the latter.) While total time removes that ambiguity. Also, I thought it would be more useful to plot the data as a barplot with a real timescale, i.e. including holes for periods when the website was not visited. I used ggplot2 for that, see the picture below. The x-axis might look weird but you did spend six seconds on the website on July 1st so that's why it starts on that day, although you won't really notice there is a data point there.

  • My rewrite got rid of `activity_date_index



Disclaimer: I don't have an account to that rescuetime website so I was not able to fully test my code. If you find a mistake, please feel free to edit my post, I won't mind.

Thanks for sharing your code, I like

Code Snippets

plotActivity <- function(
   activity  = 'codereview.stackexchange.com',
   api_key   = getOption('rescuetime_api_key'),
   from_date = NULL,
   to_date   = NULL) {

## This function plots time spent on a given URL
## using the rescue time API
##
## Arguments:
##    - activity:  the url you want to check for usage
##    - api_key:   user API key, you can get yours at
##                 https://www.rescuetime.com/anapi/manage
##    - from_date: a Date or string in "YYYY-MM-DD" format,
##                 defaults to January 1st of the same year as to_date
##    - to_date:   a Date or string in "YYYY-MM-DD" format,
##                 defaults to today
##    - verbose:   adds verbose, defaults to FALSE

   library("RCurl")
   library("ggplot2")

   # arguments parsing and default settings
   if (is.null(api_key)) stop("please set your rescue time API key...")

   if (is.null(to_date)) to_date <- Sys.Date()
   to_date <- as.Date(to_date)
   if (is.na(to_date)) stop("error parsing to_date...")

   if (is.null(from_date)) from_date <- format(to_date, "%Y-01-01")
   from_date <- as.Date(from_date)
   if (is.na(from_date)) stop("error parsing from_date...")

   # build the API request url
   # for more details see https://www.rescuetime.com/anapi/setup/documentation

   base_url  <- "https://www.rescuetime.com/anapi/data"
   arguments <- c(key = api_key,   # API key
                  format = 'csv',  # Output format
                  rb = from_date,  # Restrict Start Date
                  re = end_date,   # Restrict End Date
                  rk = 'activity', # Restrict Kind
                  pv = 'interval', # Perspective
                  rs = 'day',      # Resolution Time
                  rt = activity)   # Restrict Thingy (Restrict Kind of Type)

   arg_str <- paste(names(arguments), arguments, sep = '=', collapse = '&')
   api_url <- sprintf("%s?%s", base_url, arg_str)

   # read data
   csv_text_data <- textConnection(getURL(api_url))
   activity_data <- read.csv(csv_text_data, header = TRUE)

   # print diagnostic and plot
   cat(sprintf("Total Time Spent between %s and %s on '%s':%f (seconds)",
               activity, from_date, end_date, sum(activity_data$Time)))

   activity_data$Date <- as.Date(activity_data$Date)
   ggplot(activity_data, aes(x=Date, y=Time.Spent..seconds.)) +
      geom_bar(stat="identity") +
      xlab("Date") +
      ylab("Time (seconds)")
}

Context

StackExchange Code Review Q#64257, answer score: 16

Revisions (0)

No revisions yet.