The martingalebot package provides functions to download
cryptocurrency price data from Binance and to perform backtesting and
parameter optimization for a single pair martingale trading strategy as
implemented by single pair dca bots on 3commas, Pionex, TradeSanta, Mizar, OKX,
Bitget
and others.
There are three different functions to download data from Binance:
get_binance_klines(),
get_binance_klines_from_csv(),
get_binance_prices_from_csv. The function
get_binance_klines() can download candlestick data
directly. The user can specify the trading pair, the start and end time
and the time frame for the candles. For example, to download hourly
candles from ETHUSDT from the first of January to the first
of March 2023, one could specify:
get_binance_klines(symbol = 'ETHUSDT',
start_time = '2025-01-01',
end_time = '2025-03-01',
interval = '1h')
#> open_time open high low close close_time
#> <POSc> <num> <num> <num> <num> <POSc>
#> 1: 2025-01-01 00:00:00 3339.88 3345.98 3328.47 3337.78 2025-01-01 00:59:59
#> 2: 2025-01-01 01:00:00 3337.78 3365.71 3335.84 3363.70 2025-01-01 01:59:59
#> 3: 2025-01-01 02:00:00 3363.69 3366.40 3342.67 3346.54 2025-01-01 02:59:59
#> 4: 2025-01-01 03:00:00 3346.54 3368.42 3346.35 3362.61 2025-01-01 03:59:59
#> 5: 2025-01-01 04:00:00 3362.61 3363.72 3351.00 3355.20 2025-01-01 04:59:59
#> ---
#> 1413: 2025-02-28 20:00:00 2228.12 2238.69 2221.30 2230.15 2025-02-28 20:59:59
#> 1414: 2025-02-28 21:00:00 2230.14 2238.28 2198.51 2216.13 2025-02-28 21:59:59
#> 1415: 2025-02-28 22:00:00 2216.12 2234.60 2210.35 2225.30 2025-02-28 22:59:59
#> 1416: 2025-02-28 23:00:00 2225.31 2231.96 2209.76 2216.58 2025-02-28 23:59:59
#> 1417: 2025-03-01 00:00:00 2216.59 2239.69 2213.57 2237.59 2025-03-01 00:59:59An advantage of get_binance_klines() is that it can
download price data up to the current time. A disadvantage is that the
lowest time frame for the candles is 1 minute.
The function get_binance_klines_from_csv() downloads
candlestick data via csv files from https://data.binance.vision/. The advantage of this
method is that it is faster for large amounts of data and that that the
lowest time frame for the candles is 1 second. A disadvantage is that it
can only download price data up to 1-2 days ago as the csv files on https://data.binance.vision are
only updated once per day.
The function get_binance_prices_from_csv() also
downloads price data via csv files from https://data.binance.vision/ and thus shares the same
advantages and disadvantages, but it downloads aggregated trades instead
of candlestick data. This allows for an even lower time resolution as it
returns all traded prices of a coin over time. Knowing the exact price
at each point in time is particularly helpful for backtesting martingale
bots with trailing buy and sell orders. The function
get_binance_prices_from_csv() returns a data frame with
only two columns. See, for example:
get_binance_prices_from_csv('LTCBTC',
start_time = '2025-01-01',
end_time = '2025-02-01', progressbar = F)
#> time price
#> <POSc> <num>
#> 1: 2025-01-01 00:00:21 0.001105
#> 2: 2025-01-01 00:00:22 0.001104
#> 3: 2025-01-01 00:00:35 0.001104
#> 4: 2025-01-01 00:01:11 0.001105
#> 5: 2025-01-01 00:01:13 0.001105
#> ---
#> 133794: 2025-02-01 23:59:31 0.001174
#> 133795: 2025-02-01 23:59:35 0.001173
#> 133796: 2025-02-01 23:59:38 0.001173
#> 133797: 2025-02-01 23:59:41 0.001172
#> 133798: 2025-02-01 23:59:43 0.001173Since this function returns very large amounts of data for frequently
traded pairs such as BTCUSDT, it is, by default,
parallelized and shows a progress bar. Currently, the functions
backtest and grid_search are implemented in
such a way that they expect the price data to be in the format as
returned by this function.
Before running scans, you can quickly inspect how a parameter set allocates capital across safety orders, where orders are placed, and where the take‑profit lands.
# Capital allocation by price level (horizontal stacked bars)
plot_martingale_config(
starting_price = 100,
n_safety_orders = 8,
pricescale = 2.4,
volumescale = 1.5,
take_profit = 2.4,
stepscale = 1.1,
plot_type = "allocation"
)# Timeline view (order sequence with buy amounts and final TP point)
plot_martingale_config(
starting_price = 100,
n_safety_orders = 8,
pricescale = 2.4,
volumescale = 1.5,
take_profit = 2.4,
stepscale = 1.1,
plot_type = "timeline"
)To perform a backtest of a martingale bot, we first download price
data for a specific time period and trading pair with
get_binance_prices_from_csv() and then apply
backtest to it. The tested martingale bot can be set up
with the following parameters:
base_order_volume: The size of the base order (in
the quote currency)
first_safety_order_volume: The size of the first
safety order (in the quote currency)
n_safety_orders: The maximum number of safety
orders
pricescale: Price deviation to open safety orders (%
from initial order)
volumescale: With what number should the funds used
by the last safety order be multiplied?
take_profit: At what percentage in profit should the
bot close the deal?
stepscale: With what number should the price
deviation percentage used by the last safety order be
multiplied.
stoploss: At what percentage of draw down should a
stop-loss be triggered? If set to zero (default), a stop-loss will never
be triggered.
start_asap: Should new deals be started immediately
after the previous deal was closed. If set to FALSE new
deals are only started where the logical vector deal_start
in data is TRUE.
use_emergency_stop: Whether to honor an external
emergency stop signal during backtesting. If set to TRUE,
the logical column emergency_stop in data (if
present) will immediately close any open deal when it becomes
TRUE and prevent new deals while it is
TRUE.
If we don’t specify any of these arguments, the default parameter
settings will be used. To show the default settings, type
args(backtest) or go to the help file with
?backtest.
dat <- get_binance_prices_from_csv('BONKUSDT',
start_time = '2025-03-01',
end_time = '2025-07-01',
progressbar = F)
dat |> backtest()
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 50.4 174 33.3 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>The backtest function returns the following measures:
profit: The percentage of profit the bot made during
the tested time period.
n_trades: The number of deals (cycles) that have
been closed.
max_draw_down: The biggest draw down in percent hat
occurred.
required_capital: How much capital is needed to run
a bot with the used parameter settings.
covered_deviation: The percentage price deviation
from the initial order to the last safety order.
down_tolerance: The percentage price deviation from
the initial order price to the take profit price when all safety orders
are used up.
max_time: The maximum number of days the bot was in
a stuck position (maximum number of days of being fully
invested).
percent_inactive: The percentage of time the bot was
in a stuck position. That is, all safety orders were filled and the bot
was fully invested.
n_stoploss: The number of stop-losses that had been
triggered.
n_emergency_stops: The number of emergency stops
that had been triggered due to the emergency_stop signal
when use_emergency_stop = TRUE.
If the argument plot is TRUE, an
interactive plot showing the changes in capital and price of the traded
cryptocurrency over time is produced. Buys, sells and stop-losses are
displayed as red, green and blue dots, respectively.
By default, new trades are started as soon as possible. If the price
data set contains a logical vector deal_start and the
argument start_asap is set to FALSE, new deals
are only started where the logical vector deal_start in
data is TRUE. We can add a deal start
condition, for example based on the Relative Strength Index (RSI), by
using one of the add_*_filter functions. We can specify the
time frame for the candles, the number of candles that are considered
and the cutoff for creating the logical vector deal_start.
In the following example, new deals are only started if the hourly RSI
is below 30. You can see in the plot that there are no buys (red dots)
at peaks of the price curve anymore. However, the performance is
slightly worse because there are now less trades in total.
dat |>
add_rsi_filter(time_period = "1 hour", n = 7, cutoff = 30) |>
backtest(start_asap = FALSE, plot = TRUE)Other useful deal-start filters provided by the package:
dat |>
add_sma_filter(n = 100, column_name = "deal_start", price_is_above = TRUE) |>
backtest(start_asap = FALSE)
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 46.0 160 33.3 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>dat |>
add_bollinger_filter(time_period = "1 hour", n = 20, cutoff = 0.10,
column_name = "deal_start", signal_on_below = TRUE) |>
backtest(start_asap = FALSE)
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 25.8 67 8.79 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>dat |>
add_macd_filter(time_period = "4 hours", column_name = "deal_start",
macd_is_above_signal = TRUE) |>
backtest(start_asap = FALSE)
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 30.6 132 33.3 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>dat_regime <- dat |>
add_rsi_filter(time_period = "1 week", n = 14, cutoff = 40,
column_name = "is_bull_regime", rsi_is_above = FALSE) |>
add_rsi_filter(time_period = "4 hours", n = 14, cutoff = 30,
column_name = "is_dip", rsi_is_above = FALSE)
dat_regime[, deal_start := is_bull_regime & is_dip]
dat_regime |>
backtest(start_asap = FALSE)
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 4.73 23 33.3 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>Emergency stops are rare, high-conviction exit signals to protect the
bot from regime changes (e.g., start of a bear market) or extreme
momentum down moves. The following helpers produce a logical column
named emergency_stop that
backtest(..., use_emergency_stop = TRUE) will honor.
dat |>
add_rsi_filter(time_period = "1 week", n = 14, cutoff = 40,
column_name = "emergency_stop", rsi_is_above = FALSE) |>
backtest(use_emergency_stop = TRUE)
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 5.77 19 7.45 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>dat |>
add_death_cross_filter(column_name = "emergency_stop") |>
backtest(use_emergency_stop = TRUE)
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 50.4 174 33.3 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>dat |>
add_roc_filter(time_period = "1 day", n = 90, cutoff = -30,
smoothing_period = 7, column_name = "emergency_stop",
roc_is_below = TRUE) |>
backtest(use_emergency_stop = TRUE)
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 50.4 174 33.3 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>dat |>
add_sma_filter(n = 200, column_name = "emergency_stop", price_is_above = FALSE) |>
backtest(use_emergency_stop = TRUE)
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 0 0 0 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>dat |>
add_bollinger_filter(time_period = "1 day", n = 20, cutoff = 0.95,
column_name = "emergency_stop", signal_on_below = FALSE) |>
backtest(use_emergency_stop = TRUE)
#> # A tibble: 1 × 10
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 42.5 149 20.3 503. 19.2
#> # ℹ 5 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>Notes: - Emergency stops should be infrequent; prefer higher
timeframes and conservative thresholds. - You can combine multiple stops
by OR‑condition,
e.g.dat[, emergency_stop := stop1 | stop2]. - Emergency
stops are displayed as purple dots in the plot.
To find the best parameter set for a given time period, we can
perform a grid search using the function grid_search. This
function takes possible values of martingale bot parameters, runs the
function backtest with each possible combination of these
values and returns the results as a date frame. Each row of this data
frame contains the result of one possible combination of parameters.
Since doing a grid search can be computationally expensive, the
grid_search function is parallelized by default.
By default, grid_search uses a broad range of
parameters. For example, for n_safety_orders, values
between 6 and 16 in steps of 2 are tested (see
args(grid_search)for default ranges of parameters).
However, we could also use, for, examples, values between 4 and 6, by
explicitly specifying it:
res <- dat |>
grid_search(n_safety_orders = 4:6, progressbar = F)
res
#> # A tibble: 628 × 20
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 51.2 124 32.3 640 18
#> 2 51.2 124 32.3 640 18
#> 3 44.0 164 32.5 640 18
#> 4 44.0 164 32.5 640 18
#> 5 42.4 204 33.3 423. 18
#> 6 42.4 204 33.3 423. 18
#> 7 39.0 251 34.1 273. 18
#> 8 39.0 251 34.1 273. 18
#> 9 37.7 161 33.7 423. 15.6
#> 10 37.7 161 33.7 423. 15.6
#> # ℹ 618 more rows
#> # ℹ 15 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>,
#> # base_order_volume <dbl>, first_safety_order_volume <dbl>,
#> # n_safety_orders <int>, pricescale <dbl>, volumescale <dbl>,
#> # take_profit <dbl>, stepscale <dbl>, start_asap <lgl>, stoploss <dbl>,
#> # compound <lgl>The rows of the returned data frame are ordered by the column
profit. In the first row, we see the set of parameters that
led to the highest profit. To plot the best-performing parameter set, we
can pass the values from the first row of res as arguments
to backtest using purrr::exec(). This function
takes a function as its first argument and a list of parameters as its
second, which we can create on the fly.
# First, run the grid search
res <- dat |>
grid_search(n_safety_orders = 4:6, progressbar = FALSE)
# Then, plot the best result
# We extract the first row as a list of parameters
best_params <- res |> dplyr::slice(1)
# And pass them to backtest using the !!! (big bang) operator
exec(backtest, !!!best_params, data = dat, plot = TRUE)Instead of picking the most profitable parameter constellation, we
could also pick the one with the best compromise between
profit and max_draw_down by replacing the
command slice(1) with
slice_max(profit - max_draw_down).
It should be noted that the grid_search function also
has the following arguments that allow to restrict the search space:
min_covered_deviation: the minimum percentage price
deviation from the initial order to the last safety order a given
parameter combination must have. Parameter combinations that have a
covered price deviation less than this value are discarded and not
tested.
min_down_tolerance: the minimum price down tolerance
(i.e. percentage price deviation from the initial order price to the
take profit price when all safety orders are filled) a given parameter
combination must have. Parameter combinations that have a price down
tolerance less than this value are discarded and not tested.
max_required_capital: the maximum capital a given
parameter combination can require. Parameters that require more capital
than this value are discarded and not tested.
This can be handy because we might only want to search for optimal parameter combinations within a set of parameters that have minimum “down tolerance” and thus have certain robustness against sudden price drops. In this case, it would be a waste of computation time if we tested all possible combinations of parameters.
Instead of performing a grid search, we can also search for the best parameter combination with built-in optimization helpers.
# Optimize for profit using Differential Evolution
best_de <- de_search(
data = dat,
objective_metric = "profit",
DEoptim_control = list(itermax = 50, NP = 64, trace = FALSE)
)
best_de
#> # A tibble: 1 × 18
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 80.6 126 15.0 315. 26.9
#> # ℹ 13 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>,
#> # n_safety_orders <dbl>, pricescale <dbl>, volumescale <dbl>,
#> # take_profit <dbl>, stepscale <dbl>, stoploss <dbl>,
#> # base_order_volume <dbl>, first_safety_order_volume <dbl>
# Plot the best configuration found by DE
best_de %>% exec(backtest, !!!., data = dat, plot = TRUE)You can also optimize a custom metric, e.g. a simple risk-adjusted target:
best_de_custom <- de_search(
data = dat,
objective_metric = "profit / (1 + max_draw_down)",
DEoptim_control = list(itermax = 40, NP = 48, trace = FALSE)
)
best_de_custom
#> # A tibble: 1 × 18
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 75.6 151 14.7 259. 27.3
#> # ℹ 13 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>,
#> # n_safety_orders <dbl>, pricescale <dbl>, volumescale <dbl>,
#> # take_profit <dbl>, stepscale <dbl>, stoploss <dbl>,
#> # base_order_volume <dbl>, first_safety_order_volume <dbl># Random search to explore the space broadly
rand <- random_search(
data = dat,
n_samples = 200,
progressbar = FALSE
)
# Inspect top candidates
rand %>%
dplyr::slice_max(profit, n = 5)
#> # A tibble: 5 × 20
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 69.5 133 13.2 440. 24.1
#> 2 59.4 97 32.0 525. 21.3
#> 3 53.4 106 31.8 741. 18.8
#> 4 49.2 171 12.8 2627. 21.2
#> 5 45.0 190 12.8 3115. 21.3
#> # ℹ 15 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>,
#> # base_order_volume <dbl>, first_safety_order_volume <dbl>,
#> # n_safety_orders <dbl>, pricescale <dbl>, volumescale <dbl>,
#> # take_profit <dbl>, stepscale <dbl>, stoploss <dbl>, start_asap <lgl>,
#> # compound <lgl>
# Plot the best one
rand %>%
dplyr::slice_max(profit, n = 1) %>%
exec(backtest, !!!., data = dat, plot = TRUE)You can restrict the search space by providing lower/upper bounds (DE) or ranges (random search).
# Differential Evolution with custom bounds
best_de_bounds <- de_search(
data = dat,
objective_metric = "profit",
n_safety_orders_bounds = c(6, 14),
pricescale_bounds = c(1.2, 3.2),
volumescale_bounds = c(1.0, 2.0),
take_profit_bounds = c(1.0, 3.0),
stepscale_bounds = c(0.8, 1.2),
stoploss_bounds = c(0, 40),
base_order_volume_bounds = c(10, 50),
first_safety_order_volume_bounds = c(10, 50),
DEoptim_control = list(itermax = 40, NP = 48, trace = FALSE)
)
best_de_bounds
#> # A tibble: 1 × 18
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 80.8 147 14.6 1096. 27.0
#> # ℹ 13 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>,
#> # n_safety_orders <dbl>, pricescale <dbl>, volumescale <dbl>,
#> # take_profit <dbl>, stepscale <dbl>, stoploss <dbl>,
#> # base_order_volume <dbl>, first_safety_order_volume <dbl>
# Random search with matching ranges and pre-filters
rand_bounds <- random_search(
data = dat,
n_samples = 200,
n_safety_orders_bounds = c(6, 14),
pricescale_bounds = c(1.2, 3.2),
volumescale_bounds = c(1.0, 2.0),
take_profit_bounds = c(1.0, 3.0),
stepscale_bounds = c(0.8, 1.2),
stoploss_values = c(0, 25, 30, 40),
min_covered_deviation = 8,
min_down_tolerance = 8,
max_required_capital = 10000,
progressbar = FALSE
)
rand_bounds %>%
dplyr::slice_max(profit, n = 1)
#> # A tibble: 1 × 20
#> profit n_trades max_draw_down required_capital covered_deviation
#> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 62.1 117 31.8 204. 22.8
#> # ℹ 15 more variables: down_tolerance <dbl>, max_time <dbl>,
#> # percent_inactive <dbl>, n_stoploss <int>, n_emergency_stops <int>,
#> # base_order_volume <dbl>, first_safety_order_volume <dbl>,
#> # n_safety_orders <dbl>, pricescale <dbl>, volumescale <dbl>,
#> # take_profit <dbl>, stepscale <dbl>, stoploss <dbl>, start_asap <lgl>,
#> # compound <lgl>In the previous examples, we used the same data for training and testing the algorithm. However, this most likely resulted in over-fitting and over-optimistic performance estimation. A better strategy would be to strictly separate testing and learning by using cross-validation.
We first download a longer time period of price data so that we have more data for training and testing:
dat <- get_binance_prices_from_csv("ATOMUSDT",
start_time = '2022-01-01',
end_time = '2023-03-03', progressbar = F)Next, we split our data into many different test and training time
periods. We can use the function create_timeslices to
create start and end times of the different splits. It has the following
4 arguments.
train_months The duration of the training periods in
months
test_month The duration of the testing periods in
months
shift_months The number of months pairs of test and
training periods are shifted to each other. The smaller this number the
more pairs of test and training data sets can be created
data The price data set
For example, if we want to use 4 months for training, 4 months for testing and create training and testing periods every month, we could specify:
slices <- dat |>
create_timeslices(train_months = 4, test_months = 4, shift_months = 1)
slices
#> # A tibble: 7 × 5
#> period start_train end_train start_test
#> <dbl> <dttm> <dttm> <dttm>
#> 1 1 2022-01-01 00:00:00 2022-05-02 18:00:00 2022-05-02 18:00:00
#> 2 2 2022-01-31 10:30:00 2022-06-02 04:30:00 2022-06-02 04:30:00
#> 3 3 2022-03-02 21:00:00 2022-07-02 15:00:00 2022-07-02 15:00:00
#> 4 4 2022-04-02 07:30:00 2022-08-02 01:30:00 2022-08-02 01:30:00
#> 5 5 2022-05-02 18:00:00 2022-09-01 12:00:00 2022-09-01 12:00:00
#> 6 6 2022-06-02 04:30:00 2022-10-01 22:30:00 2022-10-01 22:30:00
#> 7 7 2022-07-02 15:00:00 2022-11-01 09:00:00 2022-11-01 09:00:00
#> # ℹ 1 more variable: end_test <dttm>Note that these time periods are partially overlapping. If we want to
have non-overlapping time periods, we could specify
shift_months = 4.
We can now perform cross-validation by iterating over the rows of
slices. At each iteration, we perform a grid search for the
best parameter combination using the training data and then apply this
parameter combination to the test data. For simplicity, we only return
the final performance in the test data.
library(tidyverse)
slices %>%
group_by(start_test, end_test) %>%
reframe({
# Get test and training data of the present row / iteration
train_data <- filter(dat, between(time, start_train, end_train))
test_data <- filter(dat, between(time, start_test, end_test))
# Find the best parameter combination in the training data
best <- train_data |>
grid_search(progressbar = FALSE) |>
slice(1)
# Apply this parameter combination to the test data
pmap_df(best, backtest, data = test_data)
})
#> # A tibble: 7 × 12
#> start_test end_test profit n_trades max_draw_down
#> <dttm> <dttm> <dbl> <int> <dbl>
#> 1 2022-05-02 18:00:00 2022-09-01 12:00:00 -33.2 13 69.7
#> 2 2022-06-02 04:30:00 2022-10-01 22:30:00 39.3 350 12.8
#> 3 2022-07-02 15:00:00 2022-11-01 09:00:00 49.5 148 10.5
#> 4 2022-08-02 01:30:00 2022-12-01 19:30:00 13.5 119 29.6
#> 5 2022-09-01 12:00:00 2023-01-01 06:00:00 -8.29 96 31.7
#> 6 2022-10-01 22:30:00 2023-01-31 16:30:00 29.1 30 33.7
#> 7 2022-11-01 09:00:00 2023-03-03 03:00:00 0.628 15 39.7
#> # ℹ 7 more variables: required_capital <dbl>, covered_deviation <dbl>,
#> # down_tolerance <dbl>, max_time <dbl>, percent_inactive <dbl>,
#> # n_stoploss <int>, n_emergency_stops <int>We can see that only 3 of the 7 tested time periods were in profit.
This is because we only maximized profitability during training, which
likely led to the selection of “aggressive” or risky strategies that
work well in the training set but poorly in the test set due to little
robustness against sudden price drops. This is illustrated by the the
relatively small price down tolerance, which varied between 8.2 and 10.9
% for the selected parameter combinations (see column
down_tolerance in the above table). A potential solution to
this problem is therefore to restrict the search space to those
parameter combinations that have a minimum price down tolerance of, for
example, 12 %. We can do this by using the argument
min_down_tolerance of the grid_search
function:
library(tidyverse)
slices %>%
group_by(start_test, end_test) %>%
reframe({
train_data <- filter(dat, between(time, start_train, end_train))
test_data <- filter(dat, between(time, start_test, end_test))
best <- train_data |>
grid_search(min_down_tolerance = 12, progressbar = FALSE) |>
slice(1)
pmap_df(best, backtest, data = test_data)
})
#> # A tibble: 7 × 12
#> start_test end_test profit n_trades max_draw_down
#> <dttm> <dttm> <dbl> <int> <dbl>
#> 1 2022-05-02 18:00:00 2022-09-01 12:00:00 -28.0 6 67.3
#> 2 2022-06-02 04:30:00 2022-10-01 22:30:00 13.3 249 28.8
#> 3 2022-07-02 15:00:00 2022-11-01 09:00:00 13.4 244 9.82
#> 4 2022-08-02 01:30:00 2022-12-01 19:30:00 -6.80 286 29.4
#> 5 2022-09-01 12:00:00 2023-01-01 06:00:00 16.6 292 7.26
#> 6 2022-10-01 22:30:00 2023-01-31 16:30:00 15.7 42 37.0
#> 7 2022-11-01 09:00:00 2023-03-03 03:00:00 4.58 37 36.2
#> # ℹ 7 more variables: required_capital <dbl>, covered_deviation <dbl>,
#> # down_tolerance <dbl>, max_time <dbl>, percent_inactive <dbl>,
#> # n_stoploss <int>, n_emergency_stops <int>Except for the first time period, all time periods are now in profit. However, this more conservative strategy came with the price of slightly lower profits in the second and third time periods.
Alternatively, we could also select the most profitable parameter combination only among those combinations that had little draw down and did not result in “red bags” for extended periods of time. For example, to select the most profitable parameter combination among those combinations that had no more than 30% draw down and that were no longer than 3% of the time fully invested in the training period, we could do:
library(tidyverse)
slices %>%
group_by(start_test, end_test) %>%
reframe({
train_data <- filter(dat, between(time, start_train, end_train))
test_data <- filter(dat, between(time, start_test, end_test))
best <- train_data |>
grid_search(progressbar = FALSE) |>
filter(max_draw_down < 30 & percent_inactive < 3) |>
slice(1)
pmap_df(best, backtest, data = test_data)
})
#> # A tibble: 7 × 12
#> start_test end_test profit n_trades max_draw_down
#> <dttm> <dttm> <dbl> <int> <dbl>
#> 1 2022-05-02 18:00:00 2022-09-01 12:00:00 -33.1 26 69.6
#> 2 2022-06-02 04:30:00 2022-10-01 22:30:00 13.4 448 28.2
#> 3 2022-07-02 15:00:00 2022-11-01 09:00:00 13.0 431 10.0
#> 4 2022-08-02 01:30:00 2022-12-01 19:30:00 21.6 336 10.1
#> 5 2022-09-01 12:00:00 2023-01-01 06:00:00 19.0 270 8.73
#> 6 2022-10-01 22:30:00 2023-01-31 16:30:00 -1.78 74 39.0
#> 7 2022-11-01 09:00:00 2023-03-03 03:00:00 1.05 25 39.1
#> # ℹ 7 more variables: required_capital <dbl>, covered_deviation <dbl>,
#> # down_tolerance <dbl>, max_time <dbl>, percent_inactive <dbl>,
#> # n_stoploss <int>, n_emergency_stops <int>Another option would be to select the parameter combination that
maximizes a combination of measures, such as
profit - max_draw_down - percent_inactive .
library(tidyverse)
slices %>%
group_by(start_test, end_test) %>%
reframe({
train_data <- filter(dat, between(time, start_train, end_train))
test_data <- filter(dat, between(time, start_test, end_test))
best <- train_data |>
grid_search(progressbar = FALSE) |>
slice_max(profit - max_draw_down - percent_inactive)
pmap_df(best, backtest, data = test_data)
})
#> # A tibble: 24 × 12
#> start_test end_test profit n_trades max_draw_down
#> <dttm> <dttm> <dbl> <int> <dbl>
#> 1 2022-05-02 18:00:00 2022-09-01 12:00:00 -33.2 13 69.7
#> 2 2022-05-02 18:00:00 2022-09-01 12:00:00 22.4 435 37.3
#> 3 2022-05-02 18:00:00 2022-09-01 12:00:00 13.4 435 41.9
#> 4 2022-05-02 18:00:00 2022-09-01 12:00:00 -6.08 419 48.9
#> 5 2022-06-02 04:30:00 2022-10-01 22:30:00 39.3 350 12.8
#> 6 2022-06-02 04:30:00 2022-10-01 22:30:00 39.3 350 12.8
#> 7 2022-06-02 04:30:00 2022-10-01 22:30:00 39.3 350 12.8
#> 8 2022-06-02 04:30:00 2022-10-01 22:30:00 39.3 350 12.8
#> 9 2022-07-02 15:00:00 2022-11-01 09:00:00 49.5 148 10.5
#> 10 2022-07-02 15:00:00 2022-11-01 09:00:00 49.5 148 10.5
#> # ℹ 14 more rows
#> # ℹ 7 more variables: required_capital <dbl>, covered_deviation <dbl>,
#> # down_tolerance <dbl>, max_time <dbl>, percent_inactive <dbl>,
#> # n_stoploss <int>, n_emergency_stops <int>Instead of performing a grid search, we can also run cross‑validation with Differential Evolution (DE) using the built‑in helper:
library(tidyverse)
slices %>%
group_by(start_test, end_test) %>%
reframe({
# Split present fold
train_data <- filter(dat, between(time, start_train, end_train))
test_data <- filter(dat, between(time, start_test, end_test))
# Optimize on training set
best <- de_search(
data = train_data,
objective_metric = "profit / (1 + max_draw_down)",
# keep runtime reasonable for vignette
DEoptim_control = list(itermax = 30, NP = 48, trace = FALSE)
)
# Evaluate on test set
pmap_df(best, backtest, data = test_data)
})
#> Warning: There were 7 warnings in `reframe()`.
#> The first warning was:
#> ℹ In argument: `{ ... }`.
#> ℹ In group 1: `start_test = 2022-05-02 18:00:00` `end_test = 2022-09-01
#> 12:00:00`.
#> Caused by warning in `DEoptim::DEoptim()`:
#> ! For many problems it is best to set 'NP' (in 'control') to be at least ten times the length of the parameter vector.
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 6 remaining warnings.
#> # A tibble: 7 × 12
#> start_test end_test profit n_trades max_draw_down
#> <dttm> <dttm> <dbl> <int> <dbl>
#> 1 2022-05-02 18:00:00 2022-09-01 12:00:00 10.4 185 21.3
#> 2 2022-06-02 04:30:00 2022-10-01 22:30:00 13.9 170 32.2
#> 3 2022-07-02 15:00:00 2022-11-01 09:00:00 6.54 934 4.59
#> 4 2022-08-02 01:30:00 2022-12-01 19:30:00 34.0 155 8.53
#> 5 2022-09-01 12:00:00 2023-01-01 06:00:00 11.2 140 23.8
#> 6 2022-10-01 22:30:00 2023-01-31 16:30:00 24.1 306 32.0
#> 7 2022-11-01 09:00:00 2023-03-03 03:00:00 3.08 119 33.6
#> # ℹ 7 more variables: required_capital <dbl>, covered_deviation <dbl>,
#> # down_tolerance <dbl>, max_time <dbl>, percent_inactive <dbl>,
#> # n_stoploss <int>, n_emergency_stops <int>