HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Converting dataframe columns to binary columns

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
convertingbinarydataframecolumns

Problem

I basically want to convert all categorical column in an R dataframe to several binary columns:

For example, a categorical column like

Company
-------
IBM
Microsoft
Google


will be converted to 3 columns:

Company_is_IBM Company_is_Microsoft Company_is_Google

1 0 0
0 1 0
0 0 1


However, my code runs very slowly on my testing data, which should generate a 900 column*367 row data frame.

I'm attaching the code in to test with my dataset, simply by extracting the source code package, set working directory (setwd) to the source code files directory like shown in the picture (in archive), and run:

gpudataf = read.table("gpudataf.txt")
gpudataf_bin = ConvertCategoricalDataFrameToBinaryKeepingOriginals(gpudataf)


and you will see how slow it is.

Source code archive

`source("IsCategorical.R")

# Function CategoricalToBinary: Take a data.frame, determine which columns are categorical,
# if categorical, convert the categorical column to several binary columns with values 0 and 1

#input: a Categorical Column, name of that column. Output: a data frame of multiple binary columns.
ConvertCategoricalColumnToBinaryColumns

Solution

This is a general do-it-yourself answer to "my code is slow, what can I do?": Use the profiler.

Rprof(tmp <- tempfile())

#############################
#### YOUR CODE GOES HERE ####
#############################

Rprof()
summaryRprof(tmp)
unlink(tmp)


Yes, there is a bit of a learning curve about interpreting the output, but you could try to find the bottleneck in your code and test alternatives of your own. If those don't work, ask again with a more specific "My code is slow doing this, what would be a better way?".

Code Snippets

Rprof(tmp <- tempfile())

#############################
#### YOUR CODE GOES HERE ####
#############################

Rprof()
summaryRprof(tmp)
unlink(tmp)

Context

StackExchange Code Review Q#9164, answer score: 6

Revisions (0)

No revisions yet.