Deep Learning Trading Strategy from the beginning to the production. Part III. TFX Pipeline.
- Articles, Blog

Deep Learning Trading Strategy from the beginning to the production. Part III. TFX Pipeline.

Hello everybody, my name is Denis and you are watching close to algotrading. New Year holidays are finished and we continue
our experiment. Well it is going slow… but still moving
forwards. Today we will have a look at our labeled data
and then we will get to know the TFX pipeline. Let’s go. In the previous video, we created labels,
but I forgot to show you how they look on the graph. As you remember we stored all the data in
our new data frame. Let’s read this data frame. In our data frame, the column ‘dir’ contains
labels, and the column cross_idx contains the tick number when the price goes out of
our window. I defined a simple function to represent open
and close position events according to these columns on the graph. dir is open event cross_idx is close event. Here the filled triangle represents open position
event, and not filled one represents close position event. As you may see, most of the time our open
event is placed on the local maximum or minimum. For further work, we will use a small dataset
like before. Also, I’ll split my dataset here on train,
eval, and test datasets. Ok, now, when we know everything about our
labels, let’s go to the next step, and begin to work with tfx pipeline. Maybe you already know what the tfx is, or
maybe not. Anyway I will shortly present it. If we open the Tensorflow Tfx webpage, we
can read that A TFX pipeline is a sequence of components
that implement an ML pipeline which is specifically designed for scalable, high-performance machine
learning tasks. That includes modeling, training, serving
inference, and managing deployments to different platform. I highly recommend checking the Tensorflow
TFX presentation and user guides. From there, you will get a basic understanding
of the TFX components and how they are connected to each other. You can find all the links in the description
to this video. As I never used tfx before, I will be learning
it myself from episode to episode, and today we will start from the first four components
of it. The first one is ExampleGen component – it
is the initial input component of a pipeline that ingests and optionally splits the input
dataset. StatisticsGen component that calculates statistics
for the dataset. SchemaGen – this component examines the s
tatistics and creates a data schema. and the ExampleValidator this component looks
for anomalies and missing values in the dataset. To be sure that I’m going the right direction
I will use the Chicago taxi example from the Tensorflow team as template, The example demonstrates
the end-to-end workflow and steps of how to analyze, validate and transform data, how
to train a model and serve it. Now, let’s switch to our data. After we import all required modules and set
all variables which contain input and output data folders, we can try to load our data
into the pipeline. Our input folder contains eval, train and
test subfolder with datasets. It should be a very simple step to use ExampleGen. It consumes external files/services to generate
Examples which will be read by other TFX components. BUT it looks like that ExampleGen doesn’t
support custom time series splits. For this reason, I have split the data manually
into the train, eval and test dataset. Unfortunately, by default, it splits the input
data into the train and evaluation only and it seems that, at the moment, the tfx components
are working only with this default split. But, as I said, we can split the data manually
and set our own input split configuration, so that ExampleGen will generate the output
split one-to-one to input mapping. You can see, the result output of ExampleGen
consists of two artifacts: train and eval. We can check for example first tree elements
of our train set. As you can see that is the same dataset as
we have. The next step should go smoothly. We put the result from the ExampleGen components
into the StatisticsGen. As the output, we also get two artifacts that
contain the datasets’ statistics. Now the magic begins, in one command line
we can represent all the statistics graphically Here we can see the statistics of train dataset,
for example, the data distribution, standard deviation, how many missing values we have
and so on. The same set of statistics you can see for
eval set. The statistics show that we have only 2% of
our labels that are not equal to 0. It could mean that our entry events of profitable
trades are outliers and we are going to find them Also, It could give us a problem in future
beause we have unbalanced classes The next step is the generaton of data schema. We will generate the schema automatically
basing on the statistics output, but we also could define our own input data description. As an output we receive a schema that describes
our data. And the last component for today is the exampleValidator. The component validates the data basing on
the statistics and the schema and if we have anomalies in our data it will show that to
us. Just like in the chicago taxi example where
the feature _company has an unxpected string value. The feature _company takes on a new value
that was not in the traning set. As you can see in our case we don’t have any
anomalies in our dataset. Well… it was a quick start with the tfx,
in the next episode we will go further through other tfx components and see how to transform
our data and train our model. See you in the nex video. Buy.

About Ralph Robinson

Read All Posts By Ralph Robinson

4 thoughts on “Deep Learning Trading Strategy from the beginning to the production. Part III. TFX Pipeline.

  1. Hey dear, thank you very much for posting this amazing video! I will see it like to 30x+ until I understand each detail hahah
    I have experience trading and I'm a data scientist, but I really don't know how to combine both and in general, the examples don't work and aren't so clear on how to set all details.
    Looking forward to more content about ML/DL + trading and how to set the simplest algorithm you know and really works.

  2. Man, this is awesome. It is complicated, but it's ok, it needs some sort of engagement from the audiance and I like it. Looking forward for next videos

Leave a Reply

Your email address will not be published. Required fields are marked *