Francis Barnhart

Home /

Writing Mixed Data With MATLAB

Wednesday, January 19th, 2005

Today I found myself learning more about MATLAB. It wasn’t a very pleasant experience. Especially considering my experience in Python. I needed a way to first store data about ballistic missile trajectories and then write it to a CSV file for later use. The data consisted of a mix of string and numeric types. I thought this would be easy enough. Instead, I now know why most of the CSV files I’ve seen output from MATLAB only contain numerical data.

The first part of the battle was trying to find the appropriate data structure for my mixed type data. I’m used to Python’s lists, which allow you to nearly any object in them. In MATLAB the most common data structure is an array (or matrix, hence the name MATLAB). It took me a little while to stumble across cell arrays, which can contain mixed data. Unfortunately, the file output functions included in MATLAB do not support cell arrays—they only support arrays of numerical data.

Why these functions do not support mixed data is beyond me. It wasn’t that hard to implement a function that supported cell arrays. Granted, I had a few moments of minor frustration with the MATLAB language.

My first attempt saw me merrily chugging along, converting every cell of my cell array to strings. Until I realized that none of the MATLAB functions are designed to work on sequences of data. In Python, many functions take a sequence like this: function(sequence). In MATLAB, you have to specify every item in the sequence like so: function(item1, item2, ...). Thus, my first implementation had a switch-case tree to deal with different numbers of columns in cell arrays. I’m embarrassed to even mention it. Even more embarrassing, I now know how to de-sequence things in MATLAB, i.e. C{:} gives C{1}, C{2}, ...

I figured, “What the hell, I might as well leave it for now.” But I ran the function and it was damn slow. So I figured I was going to have to change the way I concatenated the columns together and I might as well push that back into the for loops to deal with varying numbers of columns. What you see below is the result.

function cellwrite(filename, cellarray)
[rows, cols] = size(cellarray);
fid = fopen(filename, 'w');
for i_row = 1:rows
    file_line = '';
    for i_col = 1:cols
        contents = cellarray{i_row, i_col};
        if isnumeric(contents)
            contents = num2str(contents);
        elseif isempty(contents)
            contents = '';
        end
        if i_col < cols
            file_line = [file_line, contents, ','];
        else
            file_line = [file_line, contents];
        end
    end
    count = fprintf(fid, '%s\n', file_line);
end
st = fclose(fid);

The function changes every cell to a string and then adds it to a line accumulator with commas between the cell values. Yes, continually adding strings together is a bad idea. However, in MATLAB, it turns out that using bracket concatenation is about twice as fast as calling strcat repeatedly (i.e. file_line = strcat(file_line, contents, ',')). This line accumulator is then written to a file.

Unfortunately, this new function wasn’t any faster than my original implementation. But it does have the advantage of supporting more than 15 columns and looking much nicer.

On a whim, I decided to see how my function compared to the built-in MATLAB function CSVWRITE. I made an array with random numbers that was 1000 rows by 100 columns made a cell array copy and fed it through my CELLWRITE functions. It took an awful 67.6 seconds to return. Apparently, that’s not bad by MATLAB standards. CSVWRITE took 67.7 seconds to write the numerical array to disk. Sixty-seven seconds is an awful long time. I get bored with waiting after about five seconds.

So I spent some time at home tonight and rewrote the function in Python. Below is the first revision, which is fairly similar to the MATLAB version, with the exception that it uses the join function to add the strings together. That’s faster and nicer looking.

def cellwrite(filename, cellarray):
    rows, cols = len(cellarray), len(cellarray[0])
    fid = file(filename, 'w')
    for i_row in range(rows):
        cells = []
        for i_col in range(cols):
            contents = cellarray[i_row][i_col]
            if isinstance(contents, (int, float, long)):
                contents = str(contents)
            cells.append(contents)
        file_line = ','.join(cells)
        file_line = ''.join([file_line, '\n'])
        fid.write(file_line)
    fid.close()

That ran through a similar data set in <3.5 seconds. Yes, my work and home computers are different. But they don’t vary by an order of magnitude.

To top them all, the more Pythonic version below ran in <0.4 seconds!

def cellwrite2(filename, cellarray):
    fid = file(filename, 'w')
    lines = [','.join(map(str, row))+'\n' for row in cellarray]
    fid.writelines(lines)
    fid.close()

So why does MATLAB suck so much? A few profiler runs indicate that the number to string conversion function, num2str is damn slow. Why? I don’t know. Maybe I’ll look into it more. Maybe I’ve done enough already.

francis@francisbarnhart.com

Copyright © 2000-2005 by Francis Barnhart.