data engineering

Rewriting my day trading bot to use dbt and DuckDB

Mar 27, 2024

This week, I started re-writing my day trading bot from scratch. The original bot was created in 2019 and has not aged well, due to the many additions I've made in a hurry. On the other hand, it's been a steady money maker, even through rough conditions, thanks to a conservative strategy. When TD Ameritrade's API stopped working (bought by Schwab), I decided to use this opportunity to give it some more attention while I switch to the TradeStation API.

Repo: alhankeser/borsa


  • Decouple the API provider: I had too closely coupled with the TD Ameritrade API, so going to try and avoid re-making that mistake this go-around with TradeStation.
  • Faster backtesting: Backtesting took a while with my original bot, so hoping I can cut that down. I believe that running transformation logic in SQL (dbt + DuckDB) instead of pandas will have a significant impact here. I might even find some uses for Zig if I can, but only if there are real performance issues (other than python itself).
  • Portfolio strategies: I only had one strategy that I used on a few select stocks. I'd like to setup a structure that can test and run different strategies for different portfolios of stocks.

Why dbt and DuckDB

Many of the transformations I need to make work quite well as SQL queries and it's very possible that this would go faster in SQL than in pandas, mainly for backtesting. As for day trading, I don't need blazing speed since I'm only making a trading decision once a minute.

Also, I feel like it: I want to better understand DuckDB's use cases and its drawbacks.

Update 1: April 1 2024

Recursion in dbt macros

In my first week of rewriting the bot, I've realized that I can do nearly everything needed in dbt, with the help of some Python (or language of choice) as a controller. I almost thought I could get away with no loops of any kind, but found myself needing a recursive macro that generated CTEs. The reason for this was the need to be able to backtest buy decisions where the sell event depended on conditions related to the buy price (think trailing stops). It's easy to do when there's only one trade in a day, but there could be multiple trades and figuring out when the next buy event takes place and re-doing the same analysis made my brain melt. So recurse, I did.

While it wasn't as elegant of a solution I was hoping for, it is lightyears faster than the many loops I was doing in my original pandas approach. Here is the macro in question at this moment in time. It takes a list of potential buys (as suggested by trading strategies), buys on the first one and holds until the first sell event. It calls the same macro again to create a subsequent cte that is filtered by the timestamps that come after the most recent sell date. We do that a maximum of 3 times as that is my current limit on the number of trades per day (arbitrary at the moment for testing purposes).

Where to put strategies

My strategies are quite simple: they are a set of conditions under which I buy. My sell conditions are all based around open profit and using some dynamic trailing stops (not yet implemented in latest version). At first, I thought I was going to build these through Python like I did the first time around, but it turns out that passing variables to dbt via dbt_project vars works as expected for dictionary-like objects. The below works, which means I could call a dbt run from anywhere and override the variable:

%% dbt_project.yml %%

  strategies: [
      id: "test1",
      buy: "(sma0 > sma5)",
      id: "test2",
      buy: "(sma0 > sma5 and sma5 > sma10)",

Then in a macro that unions all strategies, I can do this:

{% macro union_strategies(strategies=var("strategies"), model=ref("int__indicators")) %}
        {% for strategy in strategies %}
                '{{ strategy['id'] }}' as strategy_id,
                {{ strategy['buy'] }} as buy,
            from {{ model }}
            {% if not loop.last %}
                union all
            {% endif %}
        {% endfor %}
    ) unioned_strategies
{% endmacro %}

The same logic to do all of this in an easy-to-test way through code would be much more involved. As I progressed through the creation of the various pieces of this project in SQL, I realized that it's actually one of the better options for such a project and I don't feel like I'm shoe-horning the trading bot into using dbt and DuckDB.

Update 2: April 8 2024

Most of my time has gone towards finding a quick and easy way to visualize the backtest results. It's challenging because I need to be able to see not only the overall results of each strategy, but also individual days to tune the rules. Of course, using an ML model in comparison will be a future step as opposed to all of this manual work. However, to get started, I'd like to identify a baseline strategy that works decently well.

Visualization options

I played around with both Plotly Dash and Vega. While I quickly got both working to do simple things, I was not able to get to a more advanced stage in a reasonable amount of time, so I cut my losses and went back to what I can dependably create complex visuals with: good ol' matplotlib. It's slow to generate plots, but fast for me to develop on and what's holding me back right now is not the performance of my app, but instead the time I have to dedicate to it.

Other improvements

  • Created a utility to upload the backtest results as parquet to Google Cloud Storage, which then gets transferred to BigQuery and visualized in Hex. I did not setup a whole pipeline just for this, yet. I need to look into how to get a parquet file straight into Hex if that's even an option.
  • Limited trading time to the first two hours after market open, which greatly improved profits.

Update 3: May 11 2024

I added an "optimal" strategy by default to compare backtest results against this. That way, I can see how much each strategy is leaving on the table. To create this optimal strategy, I'm using the SQL version of finding a max subarray difference where the first value needs to be the lower value:


optimal_strategy as (
		i1.price as best_buy_price,
		i1.ts as best_buy_ts,
		i1.minutes_since_open as best_buy_minutes_since_open,
		i2.price as best_sell_price,
		i2.ts as best_sell_ts,
		i2.minutes_since_open as best_sell_minutes_since_open,
		i2.price - i1.price as best_price_diff,
	from indicators as i1
	left join
		indicators as i2
		on i2.ts > i1.ts
		and i2.ts_day = i1.ts_day
		and i2.symbol = i1.symbol
	where i2.price - i1.price > 0
	and i1.minutes_since_open >= {{ var("buy_after_minutes") }}
		row_number() over (
			partition by i1.symbol, i1.ts_day
			order by i2.price - i1.price desc
		= 1

More about the switch from TD Ameritrade to TradeStation

At some point last year, TD Ameritrade accounts got migrated to Charles Schwab. Despite promises to the contrary, one of the victims of this acquisition was the TD Ameritrade API, which was one of the better "free" options available. Last I checked, Schwab has not re-enabled the API for the common people. (Update: I got an email late March saying that it was again available for non-business users.)

That's where TradeStation comes in. I've been eyeing the platform for a while as it seems simpler and they have a documented API, which is a nice change from TD Ameritrade.

Last update: May 12, 2024