Running multiple remote nrp-cores with docker compose

This guide gives an overview on how to run multiple nrp-core experiments possibly distributed in multiple machines using docker compose files.

The resulting runtime structure of the created nrp-core processes follows will be similar as the example depicted below.

Simulation architecture for multiple nrp-core

In order to reproduce this scenario some environment setup should be done. We require 3 Virtual Machines including one master-node and two worker nodes.

1- Master node should have direct password-less ssh access to worker nodes. To do this run these commands on master-node:

# Generate ssh keys for master node (press enter and accept defaults)
ssh-keygen

# Copy the content of your public_key (id_rsa.pub)
cat "${HOME}"/.ssh/id_rsa.pub

# Then go the each of the worker nodes and add paste the key at the end of authorized_keys file
vim "${HOME}"/.ssh/authorized_keys

# Then from the master-node try to ssh to both worker nodes and it should be fine

2- Since manager node is trying to run docker commands on remote nodes, we need to allow manager’s docker client to connect to worker’s docker engine.

This could be done via docker context. see Docker documentation for more details.

Follow these steps to add context for each worker node (IMPORTANT: You should already run step 1 so manager node can ssh to worker nodes):

# Install docker on manager node in case its not installed. Do not install docker from snap store since it install very old version, use convenience script

curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh


# Add context for worker node 1 with IP address of 10.1.1.184
docker context create worker_1 --docker "host=ssh://10.1.1.184"

# Add context for worker node 2 with IP address of 10.1.1.79
docker context create worker_2 --docker "host=ssh://10.1.1.79"

3- Next step is to clone nrp-core repository from bitbucket on master node. Let’s say you clone it in your home’s root folder.

Then you should install nrp-core. see installation page for more details.

4- There is an environment variable in the compose file called “NRPCORE_EXPERIMENT_DIR” that obviously contains the path to experiment folder (husky_braitenberg), and is being used throughout the compose file to mount files inside engines. So this address should be identical in all 3 nodes. Also the output logs from different engines will be written to this folder. We propose to use folder sharing using NFS, in a way that master node shares the experiment folder through nfs with two worker nodes. Follow these steps on the manager node to setup nfs and share a folder with it:

# Install nfs
sudo apt-get update

sudo apt install nfs-kernel-server

# Then add two lines to /etc/exports for two worker nodes. Following is an example for these two lines. My experiment is in the folder of '/home/ubuntu/nrp-core/examples/husky_braitenberg'
# The IP address of my worker nodes are 10.1.1.184 and 10.1.1.79 and you should put your ip addresses there. (don't copy the # sign !! )

# /home/ubuntu/nrp-core/examples/husky_braitenberg    10.1.1.184(rw,sync,no_root_squash,no_subtree_check)
# /home/ubuntu/nrp-core/examples/husky_braitenberg    10.1.1.79(rw,sync,no_root_squash,no_subtree_check)

# Make the NFS Share Available to Clients
sudo exportfs -a
sudo systemctl restart nfs-kernel-server #restarting the NFS kernel

# If you have a firewall enabled, you�ll also need to open up firewall access using the sudo ufw allow command.

# Installing NFS Client Packages : Run these commands on the worker nodes

sudo apt update

sudo apt install nfs-common

# Create the folder in worker nodes
sudo mkdir -p /home/ubuntu/nrp-core/examples/husky_braitenberg

# Mount the file share by running the mount command, as follows. There is no output if the command is successful.
# Remember to replace 10.1.1.154 with your master nodes' IP address and path to your experiment folder

sudo mount 10.1.1.154:/home/ubuntu/nrp-core/examples/husky_braitenberg /home/ubuntu/nrp-core/examples/husky_braitenberg

# To verify that the NFS share is mounted successfully:
df -h

# If df -h command takes forever, try rebooting your VM and remount using the above command again

5- If you are pulling docker images from private registry, first login to the registry with docker login command, make sure you use docker login instead of sudo docker login

Consider the learning scenario, where you want to run Husky experiment multiple times, each time with a different input.

You can run these nrp-cores serially, but if there is the possibility of running them simultaneously, you can have an array of nrp-cores (along with their engines) that can significantly hasten your learning process.

This can be done with the docker-compose launcher. In docker-compose we define nrp-core and different engines as services that are connected via an internal network together and this eco-system have only one entry point which is nrp-core running in a container.

Outside of this ecosystem, a user using a python client of nrp-core talks to that nrp-core with GRPC and nrp-core runs the simulation by commanding the already run engines.

Let’s say your master-script is as follow:

import time
from nrp_client import NrpCore

nrp_1 = "10.1.1.184:50050"
nrp_2 = "10.1.1.79:50050"
compose_file = "/home/HOME/nrp-core/examples/husky_braitenberg/docker_compose_deploy_husky.yaml"
config_file = "/home/HOME/nrp-core/examples/husky_braitenberg/simulation_config_docker-compose.json"

# This line will run the compose file in the next step

nrp_core_1 = NrpCore(nrp_1, compose_file=compose_file,config_file=config_file)

nrp_core_2 = NrpCore(nrp_2, compose_file=compose_file,config_file=config_file)

nrp_core_1.initialize()

nrp_core_2.initialize()

time.sleep(10)
for i in range(100):
    nrp_core_1.run_loop(1)
    nrp_core_2.run_loop(1)

nrp_core_1.shutdown()
nrp_core_2.shutdown()

NrpCore takes as arguments:

  • address (str): the address that will be used by NRPCoreSim server

  • compose_file (str): path to the docker compose file needed to execute the experiment. It will throws an error if you don’t provide this value.

  • config_file (str): path to the experiment configuration file. It can be an absolute path or relative to experiment_folder.

  • log_output (bool): if true, console output from NRPCoreSim process is hidden and logged into a file .console_output.log in the experiment folder. It is true by default.

  • get_archives (list): list of archives (files or folders) that should be retrieved from the docker container when shutting down the simulation (eg. folder containing logged simulation data from a data transfer engine). Empty list by default.

You can use other functions like run_until_timeout instead of run_loop() function.

Preparing the environment.

This tutorial is going to explain how to run husky experiment husky experiment using the compose file. The docker-compose file used for this example is as follow:

version: '3.2'

services:

    nest-service:
        image: docker-registry.ebrains.eu/nest/nest-simulator:3.3
        env_file:
            - nest.env
        container_name: nrp-nest-simulator
        networks:
            - husky

    gazebo-service:
        env_file:
            - gazebo.env
        image: ${NRP_DOCKER_REGISTRY}nrp-core/nrp-gazebo-ubuntu20${NRP_CORE_TAG}
        volumes:
            - ${NRPCORE_EXPERIMENT_DIR}:/experiment
        command: /usr/xvfb-run-gazebo-runcmd.bash
        networks:
            - husky
        container_name: nrp-gazebo

    nrp-core-service:
        image: ${NRP_DOCKER_REGISTRY}nrp-core/nrp-gazebo-nest-ubuntu20${NRP_CORE_TAG}
        volumes:
            - ${NRPCORE_EXPERIMENT_DIR}:/experiment
        command: NRPCoreSim -c /experiment/simulation_config_docker-compose.json  -m server --server_address 172.16.238.10:50050 --logoutput=all --logfilename=.console_output.log --slave
        networks:
            husky:
                ipv4_address: 172.16.238.10
        ports:
            - "50050:50050"
        depends_on:
            - nest-service
            - gazebo-service
            - mqtt-broker-service
        container_name: nrp-core


    mqtt-broker-service:
        image: eclipse-mosquitto
        networks:
            - husky
        volumes:
            - ${NRPCORE_EXPERIMENT_DIR}/mosquitto.conf:/mosquitto/config/mosquitto.conf
        user: 0:0
        container_name: mqtt-broker

networks:
  husky:
    driver: bridge
    ipam:
      driver: default
      config:
        - subnet: 172.16.238.0/24
          gateway: 172.16.238.1

As you can see there are multiple environment variables that should be set before running this compose-file. The NRPCORE_EXPERIMENT_DIR should be set to the path of experiment on manager node. These variables could be added to .bashrc of the manager node.

export NRP_DOCKER_REGISTRY=nexus.neurorobotics.ebrains.eu:443/
export NRP_CORE_TAG=:latest
export NRPCORE_EXPERIMENT_DIR=/home/ubuntu/nrp-core/examples/husky_braitenberg

For each engine there is an env file (for example for we have gazebo.env and nest.env) that sets the passes required environment variables inside the engine’s container to run the engine.

The nrp-core is accessible via <worker_node_ip>:50050 and you can test it with a simple grpc tester function. Make sure that the port 50050 is open and accessible from manager node to worker nodes.

if your manager and worker nodes are not in the same network you have to open port 50050 for nrp-core and also port 22 so docker can connect to remote docker engine with ssh.

The specified IP address for nrp-core service is due to fact that when we want to access nrp-core running inside a container from outside the compose (like our python client in our case ) we need to run nrp-core with a fixed IP address which matches the IP of the interface within container. We also connected all engines via husky docker network so that can freely talk to each other on any port.