In this documentation, we will show you how to create new trains and how to store them in order to allow the usage of the train in PADME.
Trains are essentially Docker images. This means a wide range of programming languages and frameworks are supported. In the following, a basic understanding of Docker is assumend. A basic introduction of Docker can be found here
To create a new train, you need to create a Dockerfile, describing the build process for the Docker image. We will describe the parts of the Dockerfile that are specific to PADME and the creation of trains.
The following sections teach you how to create this file and adapt it for your train.
A Dockerfile can include various instructions needed for the image creation. For our purpose mostly the instructions
FROM
, RUN
, COPY
, LABEL
, and CMD
will be used.
However, these instructions are static and will be executed the same way wherever the Docker container is running. To adapt a Train/Image to a local environment found at a Station, we will use our own environment variables, which will be presented in the following.
In order to use environment varibales we declare them in the following presented way, which differs from the Docker environment variables declared with the
ENV
instruction.
To allow the configuration of your train docker images, we use our own environment variables, not to be confused with the Docker environment variables declared with the ENV
instruction. Whenever a train arrives at a station, values for these variables need to be specified by the station software user. This feature can for example be used to configure connection strings, names of datasets, credentials, etc.
All environment variables supported by the train need to be set via a LABEL
instruction combined with the envs
metadata. You can see an example of this label in the Dockerfile at the end of this section.
This envs
metadata contains a description of the supported environment variables in form of a JSON
array. For each environment variable, the following needs to be provided:
{
"name": "the_name_of_the_variable",
"type": "number|password|text|url|select",
"required": true,
"options": [
"only needed",
"when using select"
]
}
Environment variables like
name
can not include spaces.
The property name
is the name that gets displayed to the station software user. Besides the name
, we also support five different types: number
, password
, text
, url
, and select
. Depending on the type, a different visualization is used in the station software. For example, a password will not be visible in plain text. When using the select
type, you can provide the selectable options via the options
array. Moreover, it is possible to mark variables as required.
The following image shows an example of how these environment variables will be visualized to a station software user. This example shows the train build by the Dockerfile at the end of this section.
Quotes in an array have to be escaped!
Since the JSON
array containing the environment variable descriptions is stored in the Dockerfile as a string, the quotes contained in the array itself need to be escaped. This can be achieved by putting a backslash before every quote. For example “name”
would be replaced with \”name\”
. This can also be done with many online tools or sed:
echo 'envs=[...]' | sed 's/"/\\"/g'
The following code is an example of a train image using python. The specified environment variables are the ones shown in the picture above.
FROM python:3
WORKDIR /usr/src/app/
#Required env vars
LABEL envs="[{\"name\":\"ADDRESS\",\"type\":\"url\",\"required\":true}, \
{\"name\":\"PORT\",\"type\":\"number\",\"required\":true}, \
{\"name\":\"ALGORITHM_TYPE\",\"type\":\"select\",\"options\":[\"linear\",\"loop\"],\"required\":true},\
{\"name\":\"PASSWORD\",\"type\":\"password\",\"required\":true}, \
{\"name\":\"MESSAGE\",\"type\":\"text\",\"required\":false}]"
# Install all required packages
COPY packages.txt .
RUN pip3 install -r packages.txt
# Copy Source
COPY . .
# Run
CMD [ "python", "main.py" ]
In the previous section, we discussed how to specify which environment variables are needed for your train. We will now show a small example of how to use these variables.
Since values for the environment variables are specified in the station software when the container for the train is created, we can use the values in the train simply be reading the environment variables during runtime. For example this can be achieved in Python by the following code:
address = str(os.environ['ADDRESS'])
port = str(os.environ['PORT'])
algotype = str(os.environ['ALGORITHM_TYPE'])
password = str(os.environ['PASSWORD'])
All trains are stored in our Train Depot. This Train Depot is a git repository containing all PADME trains. If you don’t have access yet, please feel free to contact us. For the BETTER Project, use this contact point.
To make a train available in PADME, you simply need to create a new folder in the train depot, containing your code and the Dockerfile described in the previous section.
On every commit to the repository, changed trains will be automatically build and made available in PADME.
The build progress of your train can be monitored via the CI/CD menu in GitLab. Below is an example of a finished build:
If you need to change your train at a later point in time, simply change the code and docker image, and everything will be rebuild and updated automatically.
Now you are all set for creating trains in PADME. If you have further questions please do not hesitate to contact us. For the BETTER Project, use this contact point.
Before you begin: Make sure you have access to the Train Depot (For RWTH PADME installation this is https://depot.padme-analytics.de and for FIT PADME installation, this is https://depot.pht.fit.fraunhofer.de). You should be able to login to the Gitlab instance and have write access to the two repositories - Padme Train Depot and Padme Federated Train Depot.
api
access in Train Depot/Gitlab. To do so, go to User Settings -> Access Tokens -> Personal Access Tokens
and add a new token. You may leave out the Expiration date
which will then default to a token with a validity of 365 days.If you’ve set up train creator following the above steps, you’ll be able to use it to create and publish trains for use in PADME ecosystem. The train creator is basically just a different way to push code to the Train Depot.
Once you’ve assembled your code as described earlier in this document (i.e., you’ve your code and your Dockerfile), you can use Train Creator to upload this to the Train Depot. Just login in to Train Creator web app and follow the steps in the UI. The steps are also summarized below:
Note that to run a Federated Learning use-case, you will need both Federated Learning and Federated Aggregation trains for that use-case.
main.py
file.requirements.txt
file if you’ve oneDockerfile
. You also have the option to create a Dockerfile
from a standard template which you can then modify.main.py
you’ve uploaded earlier.main
branch in Train Depot, it will kick off the CI/CD pipeline that will build your train. You can do this either directly via Gitlab UI or through the Storehouse App.