고주파 거래 전략에 대한 생각 (2)

저자:초목, 창작: 2023-08-04 16:14:27, 업데이트: 2023-09-19 09:08:17

이 논문은 주로 고주파 거래 전략을 다루고 있으며, 집중 연구의 중심은 누적 거래량 모델링과 가격 충격이다. 이 논문은 단일 거래량, 고정된 간격의 가격 충격, 거래량의 가격에 미치는 영향을 분석함으로써 초기 최우선 단위 위치 모델을 제시한다. 이 모델은 거래량과 가격 충격에 대한 이해에 기초하여 최적의 거래 위치를 찾기 위해 노력한다. 모델의 가설에 대한 심도있는 논의가 이루어지고, 실제와 모델의 예측에 대한 기대 수익을 비교함으로써 최우선 단위 위치의 초기 평가가 이루어진다.

누적 트래픽 모델링

지난 기사에서는 단일 거래가 어떤 값보다 큰 확률 표현을 제시했습니다.

우리는 또한 거래량 분포에 대해 관심을 가지고 있으며, 직관적으로 매 거래량과 주문 빈도에 관련이 있어야 합니다. 아래는 데이터를 고정된 간격으로 처리합니다. 위와 같은 분포를 그리는 것입니다.

from datetime import date,datetime
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

trades = pd.read_csv('HOOKUSDT-aggTrades-2023-01-27.csv')
trades['date'] = pd.to_datetime(trades['transact_time'], unit='ms')
trades.index = trades['date']
buy_trades = trades[trades['is_buyer_maker']==False].copy()
buy_trades = buy_trades.groupby('transact_time').agg({
    'agg_trade_id': 'last',
    'price': 'last',
    'quantity': 'sum',
    'first_trade_id': 'first',
    'last_trade_id': 'last',
    'is_buyer_maker': 'last',
    'date': 'last',
    'transact_time':'last'
})
buy_trades['interval']=buy_trades['transact_time'] - buy_trades['transact_time'].shift()
buy_trades.index = buy_trades['date']

1s 간격에 1s를 매개한 거래들을 합쳐서 거래량으로 합쳐서 거래되지 않은 부분을 제거하고, 위의 단일 거래의 분포를 사용하여 합쳐서, 1s 내의 모든 거래를 단일 거래로 보는 것이 더 좋은 결과를 가져온다. 이 문제는 이미 해결된 문제로 변한다. 그러나 주기가 길어지면 (거래 빈도에 비해) 오류가 증가하는 것을 발견하고, 이 오류는 바로 이전의 파레토 분포 수정에 의해 발생한다는 것을 연구 결과 발견했다. 이것은 주기가 길어지면, 단일 거래를 더 많이 포함하고, 여러 거래의 결합이 더 많은 파레토 분포에 가까워지면, 이러한 상황을 제거해야 한다는 것을 보여줍니다.

df_resampled = buy_trades['quantity'].resample('1S').sum()
df_resampled = df_resampled.to_frame(name='quantity')
df_resampled = df_resampled[df_resampled['quantity']>0]

buy_trades

	agg_trade_id	가격	양	1st_trade_id	last_trade_id	is_buyer_maker	날짜	트랜잭션 시간	간격	다른

2023-01-27 00:00:00.161	1138369	2.901	54.3	3806199	3806201	거짓	2023-01-27 00:00:00.161	1674777600161	NaN	0.001
2023-01-27 00:00:04.140	1138370	2.901	291.3	3806202	3806203	거짓	2023-01-27 00:00:04.140	1674777604140	3979.0	0.000
2023-01-27 00:00:04.339	1138373	2.902	55.1	3806205	3806207	거짓	2023-01-27 00:00:04.339	1674777604339	199.0	0.001
2023-01-27 00:00:04.772	1138374	2.902	1032.7	3806208	3806223	거짓	2023-01-27 00:00:04.772	1674777604772	433.0	0.000
2023-01-27 00:00:05.562	1138375	2.901	3.5	3806224	3806224	거짓	2023-01-27 00:00:05.562	1674777605562	790.0	0.000
…	…	…	…	…	…	…	…	…	…	…
2023-01-27 23:59:57.739	1544370	3.572	394.8	5074645	5074651	거짓	2023-01-27 23:59:57.739	1674863997739	1224.0	0.002
2023-01-27 23:59:57.902	1544372	3.573	177.6	5074652	5074655	거짓	2023-01-27 23:59:57.902	1674863997902	163.0	0.001
2023-01-27 23:59:58.107	1544373	3.573	139.8	5074656	5074656	거짓	2023-01-27 23:59:58.107	1674863998107	205.0	0.000
2023-01-27 23:59:58.302	1544374	3.573	60.5	5074657	5074657	거짓	2023-01-27 23:59:58.302	1674863998302	195.0	0.000
2023-01-27 23:59:59.894	1544376	3.571	12.1	5074662	5074664	거짓	2023-01-27 23:59:59.894	1674863999894	1592.0	0.000

#1s内的累计分布
depths = np.array(range(0, 3000, 5))
probabilities = np.array([np.mean(df_resampled['quantity'] > depth) for depth in depths])
mean = df_resampled['quantity'].mean()
alpha = np.log(np.mean(df_resampled['quantity'] > mean))/np.log(2.05)
probabilities_s = np.array([((1+20**(-depth/mean))*depth/mean+1)**(alpha) for depth in depths])

plt.figure(figsize=(10, 5))
plt.plot(depths, probabilities)
plt.plot(depths, probabilities_s)
plt.xlabel('Depth')
plt.ylabel('Probability of execution')
plt.title('Execution probability at different depths')
plt.grid(True)

png

df_resampled = buy_trades['quantity'].resample('30S').sum()
df_resampled = df_resampled.to_frame(name='quantity')
df_resampled = df_resampled[df_resampled['quantity']>0]
depths = np.array(range(0, 12000, 20))
probabilities = np.array([np.mean(df_resampled['quantity'] > depth) for depth in depths])
mean = df_resampled['quantity'].mean()
alpha = np.log(np.mean(df_resampled['quantity'] > mean))/np.log(2.05)
probabilities_s = np.array([((1+20**(-depth/mean))*depth/mean+1)**(alpha) for depth in depths])
alpha = np.log(np.mean(df_resampled['quantity'] > mean))/np.log(2)
probabilities_s_2 = np.array([(depth/mean+1)**alpha for depth in depths]) # 无修正

plt.figure(figsize=(10, 5))
plt.plot(depths, probabilities,label='Probabilities (True)')
plt.plot(depths, probabilities_s, label='Probabilities (Simulation 1)')
plt.plot(depths, probabilities_s_2, label='Probabilities (Simulation 2)')
plt.xlabel('Depth')
plt.ylabel('Probability of execution')
plt.title('Execution probability at different depths')
plt.legend() 
plt.grid(True)

png

이제 서로 다른 시간에 축적된 거래량 분포에 대한 일반적인 공식을 요약하고, 매번 분리 통계를 사용하지 않고 단일 거래량 분포를 적용합니다.

여기서 avg_interval는 단일 거래의 평균 간격을 나타내고, avg_interval_T는 추정해야 할 간격의 평균 간격을 나타냅니다. 1s의 거래를 추정하려면 통계 1s에 포함된 거래 사건의 평균 간격을 계산해야 합니다.

여기서 어떤 시간 간격에서 거래가 특정 값보다 큰 확률과 실제로 깊이에서 그 위치에 있는 거래의 확률이 크게 차이가 있어야 한다는 점에 유의하십시오. 대기 시간이 길어질수록 주문록의 변화가 가능하고 거래가 깊이를 변화시키므로 동일한 깊이 위치에 있는 거래의 확률은 데이터 업데이트와 함께 실시간으로 변화합니다.

df_resampled = buy_trades['quantity'].resample('2S').sum()
df_resampled = df_resampled.to_frame(name='quantity')
df_resampled = df_resampled[df_resampled['quantity']>0]
depths = np.array(range(0, 6500, 10))
probabilities = np.array([np.mean(df_resampled['quantity'] > depth) for depth in depths])
mean = buy_trades['quantity'].mean()
adjust = buy_trades['interval'].mean() / 2620
alpha = np.log(np.mean(buy_trades['quantity'] > mean))/0.7178397931503168
probabilities_s = np.array([((1+20**(-depth*adjust/mean))*depth*adjust/mean+1)**(alpha) for depth in depths])

plt.figure(figsize=(10, 5))
plt.plot(depths, probabilities)
plt.plot(depths, probabilities_s)
plt.xlabel('Depth')
plt.ylabel('Probability of execution')
plt.title('Execution probability at different depths')
plt.grid(True)

png

단편 거래 가격 충격

트랜잭션 데이터는 보물이고, 채굴할 수 있는 데이터는 많이 있다. 우리는 주문이 가격에 미치는 충격에 대해 매우 주의해야 하며, 이는 전략의 목록 위치에 영향을 미칩니다. 또한 transact_time 집계 데이터에 따라 마지막 가격과 첫 번째 가격의 차이를 계산합니다.

결과, 충격을 일으키지 않는 비율은 77%에 달하며, 1개의 틱의 비율은 16.5%이며, 2개의 틱은 3.7%이며, 3개의 틱은 1.2%이며, 4개의 틱 이상의 틱은 1% 미만이다.

그에 따른 가격의 차이를 일으키는 거래량을 통계적으로 계산하여 충격을 너무 큰 부정확을 제거하고, 기본적으로 선형 관계에 적합하며, 약 1000 개량으로 인해 1 개의 틱의 가격 변동이 발생합니다. 또한 각 접시에 가까운 가격에 매달린 단위의 평균은 약 1000 개량으로 이해할 수 있습니다.

diff_df = trades[trades['is_buyer_maker']==False].groupby('transact_time')['price'].agg(lambda x: abs(round(x.iloc[-1] - x.iloc[0],3)) if len(x) > 1 else 0)
buy_trades['diff'] = buy_trades['transact_time'].map(diff_df)

diff_counts = buy_trades['diff'].value_counts()
diff_counts[diff_counts>10]/diff_counts.sum()

0.000    0.769965
0.001    0.165527
0.002    0.037826
0.003    0.012546
0.004    0.005986
0.005    0.003173
0.006    0.001964
0.007    0.001036
0.008    0.000795
0.009    0.000474
0.010    0.000227
0.011    0.000187
0.012    0.000087
0.013    0.000080
Name: diff, dtype: float64

diff_group = buy_trades.groupby('diff').agg({
    'quantity': 'mean',
    'diff': 'last',
})

diff_group['quantity'][diff_group['diff']>0][diff_group['diff']<0.01].plot(figsize=(10,5),grid=True);

png

고정된 간격의 가격 충격

통계 2s 내의 가격 충격을, 여기서 다른 것은 부정적일 것이다, 물론 여기서 지불을만 통계화하기 때문에, 대칭 위치가 하나의 틱을 더 큰 것이다. 계속 트레이드량과 충격을 관찰하는 관계를 계속하면, 단지 통계적으로 0보다 큰 결과를 얻으며, 결론과 단일 주문은 거의 같으며, 또한 대략적인 선형적인 관계가 있으며, 각 틱은 약 2000의 양을 필요로 한다.

df_resampled = buy_trades.resample('2S').agg({ 
    'price': ['first', 'last', 'count'],
    'quantity': 'sum'
})
df_resampled['price_diff'] = round(df_resampled[('price', 'last')] - df_resampled[('price', 'first')],3)
df_resampled['price_diff'] = df_resampled['price_diff'].fillna(0)
result_df_raw = pd.DataFrame({
    'price_diff': df_resampled['price_diff'],
    'quantity_sum': df_resampled[('quantity', 'sum')],
    'data_count': df_resampled[('price', 'count')]
})
result_df = result_df_raw[result_df_raw['price_diff'] != 0]

result_df['price_diff'][abs(result_df['price_diff'])<0.016].value_counts().sort_index().plot.bar(figsize=(10,5));

png

result_df['price_diff'].value_counts()[result_df['price_diff'].value_counts()>30]

 0.001    7176
-0.001    3665
 0.002    3069
-0.002    1536
 0.003    1260
 0.004     692
-0.003     608
 0.005     391
-0.004     322
 0.006     259
-0.005     192
 0.007     146
-0.006     112
 0.008      82
 0.009      75
-0.007      75
-0.008      65
 0.010      51
 0.011      41
-0.010      31
Name: price_diff, dtype: int64

diff_group = result_df.groupby('price_diff').agg({ 'quantity_sum': 'mean'})

diff_group[(diff_group.index>0) & (diff_group.index<0.015)].plot(figsize=(10,5),grid=True);

png

거래량 가격 충격

앞쪽은 하나의 틱의 변화로 필요한 거래량을 구하지만, 충격이 이미 발생했다고 가정하는 상황에서 구축되었기 때문에 정확하지 않습니다. 지금은 반대로 거래량으로 인한 가격 충격을 보고 있습니다.

여기서는 1s로 표본을 채취하고, 100량당 1단계 길이로, 이 수량 범위 내에서 가격의 변화를 통계화한다.

거래량이 500보다 낮을 때, 예상되는 가격 변화는 하락으로 이어집니다. 이것은 예상에 부합합니다. 결국 판매 주문도 가격에 영향을 미칩니다.
거래량이 낮을 때, 거래량이 많을수록 가격 상승률이 높다는 선형적인 관계와 일치한다.
지불 거래량이 커지면 가격 변동이 커질수록, 이것은 종종 가격의 돌파를 나타냅니다. 돌파 후 가격이 되돌아 갈 수 있으며, 고정된 간격 샘플링과 함께 데이터가 불안정합니다.
점화 그래프의 상단 부분, 즉 거래량이 가격 상승에 대응하는 부분에는 주의를 기울여야 합니다.
이 거래 쌍에만 거래량이 가격 변화를 일으키는 관계를 대략적으로 나타냅니다.

이 중,?? C?? 는 가격의 변화를 나타내고,?? Q?? 는 구매 거래량을 나타냅니다.

df_resampled = buy_trades.resample('1S').agg({ 
    'price': ['first', 'last', 'count'],
    'quantity': 'sum'
})
df_resampled['price_diff'] = round(df_resampled[('price', 'last')] - df_resampled[('price', 'first')],3)
df_resampled['price_diff'] = df_resampled['price_diff'].fillna(0)
result_df_raw = pd.DataFrame({
    'price_diff': df_resampled['price_diff'],
    'quantity_sum': df_resampled[('quantity', 'sum')],
    'data_count': df_resampled[('price', 'count')]
})
result_df = result_df_raw[result_df_raw['price_diff'] != 0]

df = result_df.copy()
bins = np.arange(0, 30000, 100)  # 
labels = [f'{i}-{i+100-1}' for i in bins[:-1]]  
df.loc[:, 'quantity_group'] = pd.cut(df['quantity_sum'], bins=bins, labels=labels)
grouped = df.groupby('quantity_group')['price_diff'].mean()

grouped_df = pd.DataFrame(grouped).reset_index()
grouped_df['quantity_group_center'] = grouped_df['quantity_group'].apply(lambda x: (float(x.split('-')[0]) + float(x.split('-')[1])) / 2)

plt.figure(figsize=(10,5))
plt.scatter(grouped_df['quantity_group_center'], grouped_df['price_diff'],s=10)
plt.plot(grouped_df['quantity_group_center'], np.array(grouped_df['quantity_group_center'].values)/2e6-0.000352,color='red')
plt.xlabel('quantity_group_center')
plt.ylabel('average price_diff')
plt.title('Scatter plot of average price_diff by quantity_group')
plt.grid(True)

png

grouped_df.head(10)

	양_그룹	price_diff	quantity_group_center

0	0-199	-0.000302	99.5
1	100-299	-0.000124	199.5
2	200-399	-0.000068	299.5
3	300-499	-0.000017	399.5
4	400-599	-0.000048	499.5
5	500-699	0.000098	599.5
6	600-799	0.000006	699.5
7	700-899	0.000261	799.5
8	800-999	0.000186	899.5
9	900-1099	0.000299	999.5

초기 최상위 위치

거래량 모델링과 거래량이 가격 충격에 대응하는 거친 모델로, 최적의 랭킹 위치를 계산할 수 있는 것 같다.

충격 후 가격이 원값으로 돌아간다고 가정합니다. (이것은 물론 가능성이 높지 않으며 충격 후 가격 변화에 대한 재분석이 필요합니다.)
이 기간 동안 거래량과 주문 빈도의 분포가 예측에 부합한다고 가정한다. (이것은 또한 정확하지 않으며, 여기서 하루의 값으로 추정되며 거래는 명백한 집적 현상이 있습니다.)
시뮬레이션 시간 동안 단 한 번의 판매 주문이 발생한다고 가정하고 동결합니다.
주문이 완료된 후 다른 지불이 계속 가격을 올리는 것을 가정하면, 특히 소량의 경우, 이 효과는 단순히 다시 돌아올 것이라고 생각하면 무시됩니다.

먼저 간단한 기대수익을 써서, 즉 1s 안에 Q보다 더 큰 지불의 누적 확률을 예상수익률 (즉, 충격의 가격) 으로 곱하면:

이미지에 따르면, 기대 수익은 최대 약 2500에서, 평균 거래량의 약 2.5배이다. 즉, 판매 주문은 2500 위치에 매달려야 한다. 다시 한 번 강조해야 할 것은 가로축이 1s를 나타내는 거래량은 단순히 깊이 위치와 동일할 수 없다는 것입니다. 그리고 이것은 현재 중요한 깊이 데이터가 부족할 때, 거래에 대한 추측에 따라만 이루어집니다.

요약

서로 다른 시간 간격 거래량 분포는 단일 거래량 분포에 대한 간단한 축소라고 발견되었다. 또한 가격 충격과 거래 확률에 따라 간단한 기대 수익 모델을 만들었습니다. 이 모델의 결과는 우리의 기대에 부합합니다. 판매량량이 작고 가격이 떨어지는 것을 예측하는 경우 수익 공간이 필요한 양의 경우, 거래량이 더 많을수록 확률이 낮고 중간에 최적의 크기가 있으며 전략적 위치도 찾고 있습니다. 물론이 모델은 너무 간단합니다. 다음 기사에서 더 깊이 이야기 할 것입니다.

#1s内的累计分布
df_resampled = buy_trades['quantity'].resample('1S').sum()
df_resampled = df_resampled.to_frame(name='quantity')
df_resampled = df_resampled[df_resampled['quantity']>0]

depths = np.array(range(0, 15000, 10))
mean = df_resampled['quantity'].mean()
alpha = np.log(np.mean(df_resampled['quantity'] > mean))/np.log(2.05)
probabilities_s = np.array([((1+20**(-depth/mean))*depth/mean+1)**(alpha) for depth in depths])
profit_s = np.array([depth/2e6-0.000352 for depth in depths])
plt.figure(figsize=(10, 5))
plt.plot(depths, probabilities_s*profit_s)
plt.xlabel('Q')
plt.ylabel('Excpet profit')
plt.grid(True)

png

더 많은

오크 정량화 🐂🍺

2019 FMZ - 모든 권리 보호