Introduction

Fingym is a toolkit for developing reinforcement learning algorithms tailored specifically for stock market trading. Enviroments work much like gym from openai, but tailored specifically for trading.

If you are not familiar with gym from openai, don’t worry, this guide will go over the basics.

Installation

To get started, you’ll need to have Python 3.5+, Numpy and Pandas installed. Clone the repo:

git clone https://github.com/entrpn/fingym

And you’re good to go.

Environments

Here’s a bare minimum example of getting something running. This will run an instance of the Daily environment which supports OHLC daily prices for SPY for 10 years

from fingym import fingym
env = fingym.make('SPY-Daily-v0')
env.reset()
for _ in range(1000):
    env.step(env.action_space.sample())
env.close()

Observations

^{(Most of this section taken from the open ai documentation)}

If we ever want to do better than take random actions at each step, it’d probably be good to actually know what our actions are doing to the environment.

The environment’s step function returns exactly what we need. In fact, step returns eight values. These are:

[stock_owned, cash_in_hand, date_epoch_seconds, open, high, low, close, volume]

observation (object): an environment-specific object representing your observation of the environment. For example, ohcl prices for a given day.
reward (float): amount of reward achieved by the previous action. The goal is alwasy to increase your total reward.
done (boolean): whether it’s time to reset the environment again. Most environments will be done only when the last data point has been stepped through. This is because the market in theory is never ending.
info (dict): diagnostic information useful for debugging. Currently it is used to pass the current value of your portfolio. This value is a calculation of number_of_shares_owned*current_stock_price + cash_in_hand.

This is just an implementation of the classic “agent-environment loop”. Each timestep, the agent chooses an action and the environment returns an observation and a reward.

The process gets started by calling reset(), which returns an initial observation. So a more proper way of writing the previous code would be to respect the done flag:

from fingym import fingym
env = fingym.make('SPY-Daily-v0')
env.reset()
for _ in range(5000):
    observation, reward, done, info = env.step(env.action_space.sample())
    if done:
        print("Episode finished")
        break
env.close()

Spaces

In the example above, we’ve been sampling random actions from the environment’s action space. But what actually are those actions? For fingym it is a touple that represents buy/sell/hold for the first element and number of shares for the second:

(Buy/sell/hold, # of shares)

where:

0 - hold
1 - buy
2 - sell

For example:

env.step([0,0])  # do nothing

env.step([1, 15]) # buy 15 shares

env.step([2, 25]) # sell 25 shares

env.step([0,10]) # do nothing because the first element represents hold

env.step([1,0]) # buy 0 shares, so do nothing

env.step([2,0]) # sell 0 shares, so do nothing

env.step([1,-10]) # negative number of shares not accepted, will do nothing.

^{The environment will never buy or sell more than the amount of left over cash or shares you have available. In other words no margin, no short selling (maybe in the future).}

Available Environments

There are two main environments:

Intraday supports OHLC minute data with volume from January 2017 to December 2019. Training with this environment can be very computationally expensive, for example, when training with deep neural networks.

Daily supports OHLC daily prices with volume for 10 years starting December 23 2008 and ending December 23 2019.

For Daily the following are supported:

SPY-Daily-v0 TSLA-Daily-v0 GOOGL-Daily-v0 CGC-Daily-v0 CRON-Daily-v0 BA-Daily-v0 AMZN-Daily-v0 AMD-Daily-v0 ABBV-Daily-v0 AAPL-Daily-v0