HiveBrain v1.2.0
Get Started
← Back to all entries
snippetpythonMinor

Splitting a CAN bus log in .asc format

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
canformatlogbussplittingasc

Problem

I've written a quick script for a coworker to split a large CAN log into smaller chunks. (If you're not familiar with CAN, it's a communication protocol used by the ECUs in many cars.) I know where to split because I've inserted dummy CAN messages (with ID 0x00) at the start of each section, and one at the end of testing (which may be somewhere in the middle of the log) to tell me when to stop reading.

The log is in .asc or .csv format, and can be several gigabytes in size. Currently I can process a 1.5GB file in about 40 seconds, but I'm sure that can be improved. I'm looking more for advice on how to speed this up than to make it more Pythonic, but of course criticism is welcome in both areas.

Note: titles is a dictionary mapping section numbers to a particular string that needs to be added to the filename before saving. I can add the code for generating these, but I don't believe it's as relevant.

```
def split_asc_file(target_file, target_dir, titles):

import os
import time

start_time = time.time()

if not os.path.isdir(target_dir):
os.mkdir(target_dir)
os.chdir(target_dir)

section = None

def create_title(message_string):
req_num = int(message_string[0:8])
obj_num = int(message_string[8:16])

if req_num == 0 and obj_num == 0:
print "Splitting completed in {} seconds".format(time.time() - start_time)
quit() # final test has been executed
else:
at = "AT{}_{}".format(req_num, obj_num)
title_prefix = titles[at]
title_string = "{}_{}.asc".format(title_prefix, at)
return title_string

def can_traffic_only(f):
# iterate only over lines that contain messages
for line in f:
if len(line.split()) == 14:
yield line

with open(target_file) as log:
print "Opening {}...".format(target_file)
for message in can_traffic_only(log):
values = message.s

Solution

That looks like it's close to what you're going to get with Python I think.

I'd suggest taking a profiler and optimising according to that; e.g. I can imagine that doing less work using split and instead just counting the number of spaces (instead of allocating all the results) should be a bit faster (in can_traffic_only).

can_data can be delayed till the condition for the if block is true, but again, depends on how often that's the case.

If there's nothing else you could inline can_traffic_only and see if that makes a difference.

Context

StackExchange Code Review Q#132898, answer score: 3

Revisions (0)

No revisions yet.