PADRES User Guide
Contents
Introduction
PADRES is a
distributed content-based publish/subscribe system. It consists of a set of
clients connected to
brokers organized in an overlay network. A client can be a
publisher and/or a
subscriber. Publishers produce information that subscribers are possibly interested in (
publications,) and subscribers consume them. Initially, publishers broadcast their publication schemas by issuing
advertisements. Subscribers register their interest via issuing
subscriptions, and these subscriptions are routed towards candidate publishers (whose advertisements match the subscriptions.) When a new publication is published, it is matched against all the registered subscriptions and routed to the subscribers who sent matching subscriptions.
A topic-based pub/sub system makes the routing decisions based on the "topic" attribute that classifies publications into pre-defined classes. In contrast, the content-based routing used in PADRES can use any attribute in the publications/subscriptions to make the matching/routing decision, providing a highly flexible routing mechanism. PADRES is also different from many other content-based pub/sub systems, due to its fully decentralized architecture: routing decisions are taken in a distributed broker overlay. This provides a scalable solution that can span across a large enterprise or even throughout the Internet.
PADRES Installation
Follow the instructions from the PADRES
download page.
Running PADRES
A PADRES overlay is deployed by instantiating a set of brokers and clients (see Section
Introduction). First, you have to decide on the topology of the broker overlay and client-broker connections. You can additionally instantiate a
monitor in order to monitor the status of the brokers and clients. For this user guide, let us consider the PADRES system topology shown in Figure 1.
Figure 1: A simple PADRES network.
PADRES system components are typically deployed on different physical machines. However, it may also be advantageous to learn to run the whole system in a single physical machine for verification and debugging purposes. We first describe how to instantiate the system shown in Figure 1, on a single physical machine, and then in a distributed setting.
Running a PADRES System in a Single Physical Machine
The parameters of the PADRES system shown in Figure 1 running on a single machine is given in the Table 1. The port numbers are chosen arbitrarily. However, it is better to choose high port numbers to avoid conflicts with the ports used by other services.
Table 1: PADRES system parameters.
| Broker A |
ID: |
BrokerA |
|
Broker B |
ID: |
BrokerB |
|
Broker C |
ID: |
BrokerC |
| |
Location: |
localhost |
|
|
Location: |
localhost |
|
|
Location: |
localhost |
| |
Port A: |
1100 |
|
|
Port B: |
1101 |
|
|
Port C: |
1102 |
| |
Neighbors: |
Broker B |
|
|
Neighbors: |
Broker A, Broker C |
|
|
Neighbors: |
Broker B |
| Client X |
Location: |
localhost |
|
Client Y |
Location: |
localhost |
| |
Broker: |
Broker A |
|
|
Broker: |
Broker C |
When launching a PADRES system, the broker overlay has to be created before instantiating the clients. The broker overlay can be created using the following commands:
$ startbroker -i BrokerA -p 1100 -n localhost:1101
$ startbroker -i BrokerB -p 1101 -n localhost:1100,localhost:1102
$ startbroker -i BrokerC -p 1102 -n localhost:1101
Notes:
- Option
-i is used to specify the broker ID, option -p to specify port number, and option -n to specify the neighbors.
- Broker IDs should be unique and should not contain the
'-' (hyphen) character.
- Only one broker can listen at a particular port.
- The neighbors are given in a comma separated (with no space)
<location>:<port#> list. location can be DNS names or IP addressses.
- Even though the above example shows all the brokers specifying their neighbors, it is enough to specify that only at one side: Broker A and Broker B are neighbors; if that neighborhood relation is specified when starting Broker A, it need not be specified when starting Broker B.
- There is no stipulated order in which the brokers have to be started. Brokers can detect whether the specified neighbors are alive or not using heartbeat messages.
- For detailed description of the options used with the
startbroker command, run it with -help option (link.)
Now, start the clients:
$ startclient 1100 # this is client X, connecting to Broker A via port 1100
$ startclient 1102 # this is client Y, connecting to Broker C via port 1102
The
startclient command will bring up a GUI for each client which can be used to interact with the client (
startclient help.) The layout of the client GUI is shown in Figure 2.
Figure 2: PADRES RMI Client GUI.
The client GUI provides a way to perform the basic operations in a PADRES network: advertise, subscribe, and publish. A client can publish only after it has sent out matching advertisements. For example, if the Client X is going to publish temperature data from Toronto area, it must first advertise by issuing an advertisement as below (enter this command into the user interface area of Client X):
a [class,eq,'temp'],[area,eq,'tor'],[value,<,100]
Here, the client advertises that it is going to publish temperature data for Toronto area, with values less than 100 degree Celsius (note that the interface is space sensitive.) If the Client Y wants to be informed once the temperature in city of Toronto drops below zero, it can register a subscription via its GUI as below (enter this command into the user interface area of Client Y):
s [class,eq,'temp'],[area,eq,'tor'],[value,<,0]
Now, Client X can publish new temperature data via its GUI (enter this command into the user interface area of Client X):
p [class,'temp'],[area,'tor'],[value,-10]
This data will be routed through the broker overlay and the Client Y will be informed about it via the output area of its GUI (this message should have appeared in the client output area of Client Y upon sending of the previous publication):
Got Publication: [class,temp],[area,"tor"],[value,-10];Thu Jun 19 17:14:58 GMT 2008
Advertisements and subscriptions have (attribute, operator, value) tuple format, whereas publications have (attribute, value) tuple format. More information on adv/sub/pub patterns and operators can be found
here.
Note: currently the client GUI does not support "unadvertise" and "unsubscribe".
The PADRES system can be graphically viewed using the PADRES monitor. To start the monitor, use the following command:
$ startmonitor
It will bring up a GUI with a blank canvas initially. Use
Federation -> Connect to Federation... (
Main -> Connect to Federation... in some versions) to bring up a connection dialog. Enter the hostname and port number of a valid active broker in the system and press 'OK'. For example, if you want to connect to Broker A, use 'localhost' and '1100'. After some time, the monitor will show the current system layout as in Figure 3. You can also use
Federation -> Refresh (
Main -> Refresh Federation) or press F1 (F5) to refresh the view.
Figure 3: PADRES Monitor GUI.
Instructions on the basic usage of the monitor can be found
here.
You can stop the clients and the monitor just by closing the GUIs. The brokers can be stopped using the
stopbroker command:
$ stopbroker BrokerA
$ stopbroker BrokerB
$ stopbroker BrokerC
Running a Distributed PADRES System
Using the same topology as in Figure 1 but running each process on a separate node, we have a slightly different setup compared to the previous configuration, as shown in Table 2 below:
Table 2: Distributed PADRES system parameters.
| Broker A |
ID: |
BrokerA |
|
Broker B |
ID: |
BrokerB |
|
Broker C |
ID: |
BrokerC |
| |
Location: |
10.0.1.1 |
|
|
Location: |
10.0.1.2 |
|
|
Location: |
10.0.1.3 |
| |
Port A: |
10000 |
|
|
Port B: |
10001 |
|
|
Port C: |
10002 |
| |
Neighbors: |
Broker B |
|
|
Neighbors: |
Broker A,Broker C |
|
|
Neighbors: |
Broker B |
| Client X |
Location: |
10.0.1.4 |
|
Client Y |
Location: |
10.0.1.5 |
| |
Broker: |
Broker A |
|
|
Broker: |
Broker C |
You may start brokers and clients individually using the same technique above by logging into each node separately, or better you can use the provided tool called
PANDA to simplify and speed up your deployment.
PANDA
Padres Automated Node Deployer and Administrator (PANDA) allows you to deploy and manage a network of brokers and clients. In addition to starting and terminating remote processes, PANDA also supports installing/uninstalling of rpms and tarballs and fetching and cleaning of log files at remote nodes. The user has complete freedom of the topology to deploy as all deployment details are captured in a user defined
Topology File. Internally, PANDA consists of a Java program with many helper shell scripts that interact with the remote nodes. With PANDA, you no longer need to manually log into every single node to do anything, as everything is now automated.
The current version of PANDA is only available on the Linux platform, requires OpenSSH 3.9 or later (with options ConnectTimeout and StrictHostKeyChecking), requires all remote machines to be accessible via ssh (remote machines must host ssh servers), and have the
screen application installed.
Running a Distributed PADRES System with PANDA
Before deploying any Padres processes, you must make sure that the machines on which you wish to run padres has Java 6 and screen installed, and both Padres binaries and libraries. Installation of Java 6 and screen on
PlanetLab can be done via the
install command in PANDA. By default, PANDA assumes that Java is installed in the home directory under "java/". Additionally, the script named javahome located in the distribution's etc/panda/setup must be present in the home directory of the remote nodes. Uploading of the Padres binaries and libraries can be done by using the
upload command to upload and extract tarball containing the required files. Please complete
PANDA's configuration before proceeding. More help regarding PANDA commands can be found by typing "help" in the PANDA console.
Using the PADRES system configuration in Table 2, to deploy Broker A using PANDA, first start panda by running the
startpanda command. Once you get the PANDA console, type the following command into PANDA's console:
$ startpanda
Type 'help' or '?' for help.
> 0.0 ADD BrokerA 10.0.1.1 startbroker -Xms 64 -Xmx 128 -hostname 10.0.1.1 -p 10000 -i BrokerA
The 0.0 value at the beginning of the line marks the time when the broker should be started (0.0 implies an immediate action, also see below.) All node addresses must strictly be IP addresses. To stop the deployed broker, issue the command below. Note that the ID of the process and IP address of the node must match with the previous ADD command.
> 0.0 REMOVE BrokerA 10.0.1.1
Instead of typing ADD/REMOVE commands separately for each broker/client process, it is possible to group all the console commands into a file (called a PANDA
topology file) to be imported into PANDA. Below is the PANDA topology file that captures the setup illustrated in Table 2. This topology file utilizes PANDA's
2-phase deployment where PANDA ensures all brokers marked with time 0.0 are fully up and connected in phase-I before deploying clients with time > 0.0 in phase-II.
# Phase 1, deploy the 3 brokers
0.0 ADD BrokerA 10.0.1.1 bin/startbroker -Xms 64 -Xmx 128 -hostname 10.0.1.1 -p 10000 -i BrokerA
0.0 ADD BrokerB 10.0.1.2 bin/startbroker -Xms 64 -Xmx 128 -hostname 10.0.1.2 -n 10.0.1.1:10000 -p 10001 -i BrokerB
0.0 ADD BrokerC 10.0.1.3 bin/startbroker -Xms 64 -Xmx 128 -hostname 10.0.1.3 -n 10.0.1.2:10001 -p 10002 -i BrokerC
# Phase 2, deploy Client X and Client Y.
# Client X is a publisher deployed at 1s after successful broker deployment that publishes stock quote publications of symbol
# ANTP at 60 msgs/min to BrokerA with 0 delay before initial publication. demo/stockquote/startSQpublisher.sh is the script that starts this
# automated stock quote publisher
1.01 ADD ClientX 10.0.1.4 demo/bin/stockquote/startSQpublisher.sh -hostname 10.0.1.4 -i ClientX -s ANTP -r 60 -d 0 -b 10.0.1.1:10000
# ClientY is a subscriber deployed at 10s after successful broker deployment that subscribes to [class,eq,'STOCK'],[volume,>,0] at BrokerC.
# demo/stockquote/startSQsubscriber.sh is the script that starts this automated stock quote subscriber
10 ADD ClientY 142.150.237.136 demo/bin/stockquote/startSQsubscriber.sh -hostname 10.0.1.5 -i ClientY -s "[class,eq,'STOCK'],[volume,>,0]" -b 10.0.1.3:10002
To deploy this topology using PANDA, run panda with the topology file, assume it is named
topology.txt.
Note: You must configure PANDA before using it. See below section on
Configuring PANDA.
$ startpanda topology.txt
Alternatively, you may load the topology file after running panda without the command line parameter by using the
load command in panda's console:
$ startpanda
> load topology.txt
Loading a topology file does not automatically start the broker/client processes. Once the topology file is successfully loaded and panda has verified that all nodes referenced by the file is reachable, issue the
deploy command to deploy the processes:
$ startpanda topology.txt
Checking reachability of referenced nodes in topology file ...
10.0.1.1 OK
10.0.1.2 OK
10.0.1.3 OK
10.0.1.4 OK
10.0.1.5 OK
topology.txt successfully loaded
Type 'help' or '?' for help.
> deploy
After issuing the
deploy command, PANDA will ask if you want to skip PANDA's automated 2-phase deployment process. By skipping the 2-phase deployment process, you will be given a prompt to decide whether or not to proceed with phase 2 deployment. Alternatively, if you choose not to skip the 2-phase deployment process, PANDA will automatically deploy phase 2 once it sees that all brokers and links are established in phase 1 using an internal monitoring client.
To stop the deployment at any time, use the
stop command. Ignore any innocuous error messages.
The full list of PANDA commands, syntaxes, and descriptions can be found
here.
Configuring PANDA
By default, all of PANDA's configuration is contained in
$PADRES_HOME/etc/panda/panda.properties. You may use
-c config_file_path command line argument on
startpanda.sh to load your own configuration file for PANDA.
-
Configure remote login name
This is the login name used to log into all of the remote nodes.
scripts.env.SLICE=<login name>
-
Configure remote =PADRES_HOME=
This is a relative path to the remote machines home's directory.
remote.padres.path=<path to padres home directory>
-
Configure SSH keys
Panda uses public/private ssh keys to log into remote nodes. See here or google yourself on how to generate public/private ssh keys. Note that PANDA requires you to use an empty paraphrase. Put the public key in $HOME/.ssh directory at all remote nodes. Then modify the line illustrated below in panda.properties to reflect the path to your private key. Note, the private key must only have read and write permission only to the user.
scripts.env.IDENTITY=<path to private ssh key>
-
Configure tarball package
Essentially, the tarball is the complete PADRES package with 3rd party library jar files. PANDA will download the tarball onto the remote nodes and extract the tarball in the $HOME directory (not $PADRES_HOME.) Therefore, it is recommended that a directory containing the contents of PADRES be automatically created upon extraction. Note that the remote.padres.path property value mentioned above should match this. To enable uploading of the PADRES tarball onto the remote nodes, you should configure the line below in panda.properties to point to the URL of your tarball. PANDA uses wget for this operation, and, therefore, the URL should be a valid HTTP address that is prefixed with http://
scripts.env.TARBALL=<url to tarball>
PADRES Configuration
The operations of the PADRES components can be configured using configuration files. These files have the format of a standard Java
property file. The default PADRES configuration files can be found in
$PADRES_HOME/etc/. However, PADRES components generally provide a command line option with which you can specify your own configuration file. In addition, PADRES components provide other command line options which can be used to configure individual parameters. Refer the documentations on each PADRES components below for further details.
Note: Command line options overwrite configurations from user-specified configuration file; configurations from user-specified configuration file overwrite the configurations from default configuration file; configurations from default configuration file overwrites the default parameters hard-coded within the code.
Configuring PADRES Broker
By default,
$PADRES_HOME/etc/broker.properties is used to configure PADRES brokers. This can be overwritten by specifying a different configuration file using
-c command line option.
Few available configuration parameters and their descriptions are as follows. For the complete configuration options, please check the property file in
$PADRES_HOME/etc/.
# Sample configuration/properties file for the PADRES broker.
# REQUIRED. This key specifies the identifier of the broker. This ID
# must be unique across all brokers in the same federation, otherwise,
# there will be duplicate message IDs, and erroneous routing will occur.
# Needless to say more, it will result in a catastrophy.
padres.brokerID=Broker1
# OPTIONAL (default=1099). This tells the broker which RMI port it
# should bind its transport handler to receive messages from clients and
# neighboring brokers. Note: An RMI registry should already be running
# at this port before you start the broker, otherwise you will get an
# RMI exception.
padres.port=1099
# OPTIONAL (default=""). You can specify here one or more neighbors for
# this broker to connect to upon joining the federation. An address of
# a broker typically consists of an IP address and an RMI port. Multiple
# broker addresses must be separated by a comma, as shown in the example
# below. It is OK to leave this field blank, especially when you are
# starting the first broker that has no available brokers to which to
# connect. There is no default value for this parameter.
#padres.remoteBrokers=128.100.241.50:1100,128.100.241.51:1099
For the command line options to overwrite these configurations, refer the
help document on the
startbroker command.
Configuring PADRES Client
PADRES client uses
$PADRES_HOME/etc/client.properties as its default configuration file. Currently there is no command line option to specify a user-specific configuration file.
Configuring PADRES Logs
PADRES uses
log4j.properties and
log4j.properties files to configure the logging of messages from broker and client respectively. At present, there is no command line option available to specify a user-defined configuration file for PADRES logging engine.
By default, the logs will be created in the
~/.padres/logs/ directory. You can change the location of the log directory using
-ll options while launching the commands.
Additional Information