patternpythonMinor
Converting dataframe columns to binary columns
Viewed 0 times
convertingbinarydataframecolumns
Problem
I basically want to convert all categorical column in an R dataframe to several binary columns:
For example, a categorical column like
will be converted to 3 columns:
However, my code runs very slowly on my testing data, which should generate a 900 column*367 row data frame.
I'm attaching the code in to test with my dataset, simply by extracting the source code package, set working directory (setwd) to the source code files directory like shown in the picture (in archive), and run:
and you will see how slow it is.
Source code archive
`source("IsCategorical.R")
# Function CategoricalToBinary: Take a data.frame, determine which columns are categorical,
# if categorical, convert the categorical column to several binary columns with values 0 and 1
#input: a Categorical Column, name of that column. Output: a data frame of multiple binary columns.
ConvertCategoricalColumnToBinaryColumns
For example, a categorical column like
Company
-------
IBM
Microsoft
Google
will be converted to 3 columns:
Company_is_IBM Company_is_Microsoft Company_is_Google
1 0 0
0 1 0
0 0 1
However, my code runs very slowly on my testing data, which should generate a 900 column*367 row data frame.
I'm attaching the code in to test with my dataset, simply by extracting the source code package, set working directory (setwd) to the source code files directory like shown in the picture (in archive), and run:
gpudataf = read.table("gpudataf.txt")
gpudataf_bin = ConvertCategoricalDataFrameToBinaryKeepingOriginals(gpudataf)
and you will see how slow it is.
Source code archive
`source("IsCategorical.R")
# Function CategoricalToBinary: Take a data.frame, determine which columns are categorical,
# if categorical, convert the categorical column to several binary columns with values 0 and 1
#input: a Categorical Column, name of that column. Output: a data frame of multiple binary columns.
ConvertCategoricalColumnToBinaryColumns
Solution
This is a general do-it-yourself answer to "my code is slow, what can I do?": Use the profiler.
Yes, there is a bit of a learning curve about interpreting the output, but you could try to find the bottleneck in your code and test alternatives of your own. If those don't work, ask again with a more specific "My code is slow doing this, what would be a better way?".
Rprof(tmp <- tempfile())
#############################
#### YOUR CODE GOES HERE ####
#############################
Rprof()
summaryRprof(tmp)
unlink(tmp)Yes, there is a bit of a learning curve about interpreting the output, but you could try to find the bottleneck in your code and test alternatives of your own. If those don't work, ask again with a more specific "My code is slow doing this, what would be a better way?".
Code Snippets
Rprof(tmp <- tempfile())
#############################
#### YOUR CODE GOES HERE ####
#############################
Rprof()
summaryRprof(tmp)
unlink(tmp)Context
StackExchange Code Review Q#9164, answer score: 6
Revisions (0)
No revisions yet.