I increased the speed of my Python script that converts Garmin tcx files (XML format) into parquet. Below shows (on my machine) the difference in performance for v1 (using BeatifulSoup and a Python for-loop) vs v2 (my new tcx-extract package that uses Zig), tested in a non-scientific manner.
❯ python main.py
INFO:root:Authenticating API
INFO:root:Got 1 activities
TCX to Parquet v1 conversion took 14.66s
TCX to Parquet v2 conversion took 0.27s
See the commit where I made this update.
As part of my Master's of Computer Science coursework, I've been experimenting with using a lower-level language to play with the various sorting algorithms that I'm learning about. While C came to mind, I decided to try one of the newer languages, because why not.
I chose Zig because I got a good vibe from the community and hearing about the TigerBeetle database made me curious to learn more.
Why not Rust? This could have also been done in Rust. Or practically any language for that matter.
My Zig executable accepts two arguments: filepath to a tcx and the name of the target tag we want to get data for.
Here's what it does:
I'd like to say it was the most elegant thing ever created, but alas, it is not. It makes up for its looks with its results though:
_ = points.next();
while (points.next()) |point| {
var tagBeforeAfters = std.mem.split(u8, point, targetTagEnd);
while (tagBeforeAfters.next()) |tagBeforeAfter| {
var tagAfter = std.mem.split(u8, tagBeforeAfter, targetTagStart);
_ = tagAfter.next();
while (tagAfter.next()) |tag| {
_ = try stdout.write(try std.fmt.allocPrint(allocator, "{s}", .{tag}));
break;
}
_ = try stdout.write("\n");
break;
}
}
Python acts as a controller. There's a function to build the Zig executable on whatever machine the package is being run on. The trickiest part about this was to get the executable in a place that was accessible. I'm probably doing something wrong here, but I got it working by finding my way to the path of the package and getting the executable in a familiar directory.
Here is the entirety of the python extract.py file:
import subprocess
import os
def get_tag(filepath: str, tag_name: str) -> str:
cwd = os.path.abspath(os.path.dirname(__file__))
abs_path = os.path.join(cwd, 'zig', 'extract')
return subprocess.check_output([abs_path, filepath, tag_name]).decode('utf-8')
def extract(filepath: str, tag_name: str) -> list[str]:
result = get_tag(filepath=filepath, tag_name=tag_name)
if len(result) > 0:
return result.split('\n')[:-1]
else:
return []
This was my first time publishing a package, so everything was a challenge. Here are some things I'd like to work on next to improve its usefulness and just for fun: