- ceph-mgr module developer’s guide
- Creating a module
- Installing a module
- Logging
- Exposing commands
- Configuration options
- KV store
- Accessing cluster data
- Exposing health checks
- What if the mons are down?
- Reporting if your module cannot run
- Sending commands
- Receiving notifications
- Accessing RADOS or CephFS
- Implementing standby mode
- Communicating between modules
- Shutting down cleanly
- Limitations
- Is something missing?
ceph-mgr module developer’s guide
Warning
This is developer documentation, describing Ceph internals thatare only relevant to people writing ceph-mgr modules.
Creating a module
In pybind/mgr/, create a python module. Within your module, create a classthat inherits from MgrModule
. For ceph-mgr to detect your module, yourdirectory must contain a file called module.py.
The most important methods to override are:
a
serve
member function for server-type modules. Thisfunction should block forever.a
notify
member function if your module needs totake action when new cluster data is available.a
handle_command
member function if your moduleexposes CLI commands.
Some modules interface with external orchestrators to deployCeph services. These also inherit from Orchestrator
, which addsadditional methods to the base MgrModule
class. SeeOrchestrator modules for more oncreating these modules.
Installing a module
Once your module is present in the location set by themgr module path
configuration setting, you can enable itvia the ceph mgr module enable
command:
- ceph mgr module enable mymodule
Note that the MgrModule interface is not stable, so any modules maintainedoutside of the Ceph tree are liable to break when run against any neweror older versions of Ceph.
Logging
Logging in Ceph manager modules is done as in any other Python program. Justimport the logging
package and get a logger instance with thelogging.getLogger
function.
Each module has a log_level
option that specifies the current Pythonlogging level of the module.To change or query the logging level of the module use the following Cephcommands:
- ceph config get mgr mgr/<module_name>/log_level
- ceph config set mgr mgr/<module_name>/log_level <info|debug|critical|error|warning|>
The logging level used upon the module’s start is determined by the currentlogging level of the mgr daemon, unless if the log_level
option waspreviously set with the config set …
command. The mgr daemon logginglevel is mapped to the module python logging level as follows:
<= 0 is CRITICAL
<= 1 is WARNING
<= 4 is INFO
<= +inf is DEBUG
We can unset the module log level and fallback to the mgr daemon logging levelby running the following command:
- ceph config set mgr mgr/<module_name>/log_level ''
By default, modules’ logging messages are processed by the Ceph logging layerwhere they will be recorded in the mgr daemon’s log file.But it’s also possible to send a module’s logging message to it’s own file.
The module’s log file will be located in the same directory as the mgr daemon’slog file with the following name pattern:
- <mgr_daemon_log_file_name>.<module_name>.log
To enable the file logging on a module use the following command:
- ceph config set mgr mgr/<module_name>/log_to_file true
When the module’s file logging is enabled, module’s logging messages stopbeing written to the mgr daemon’s log file and are only written to themodule’s log file.
It’s also possible to check the status and disable the file logging with thefollowing commands:
- ceph config get mgr mgr/<module_name>/log_to_file
- ceph config set mgr mgr/<module_name>/log_to_file false
Exposing commands
Set the COMMANDS
class attribute of your module to a list of dictslike this:
- COMMANDS = [
- {
- "cmd": "foobar name=myarg,type=CephString",
- "desc": "Do something awesome",
- "perm": "rw",
- # optional:
- "poll": "true"
- }
- ]
The cmd
part of each entry is parsed in the same way as internalCeph mon and admin socket commands (see mon/MonCommands.h inthe Ceph source for examples). Note that the “poll” field is optional,and is set to False by default; this indicates to the ceph
CLIthat it should call this command repeatedly and output results (seeceph -h
and its —period
option).
Each command is expected to return a tuple (retval, stdout, stderr)
.retval
is an integer representing a libc error code (e.g. EINVAL,EPERM, or 0 for no error), stdout
is a string containing anynon-error output, and stderr
is a string containing any progress orerror explanation output. Either or both of the two strings may be empty.
Implement the handle_command
function to respond to the commandswhen they are sent:
MgrModule.
handlecommand
(_inbuf, cmd)- Called by ceph-mgr to request the plugin to handle oneof the commands that it declared in self.COMMANDS
Return a status code, an output buffer, and anoutput string. The output buffer is for data results,the output string is for informative text.
- Parameters
inbuf (str) – content of any “-i
” supplied to ceph cli cmd (dict) – from Ceph’s cmdmap_t
Returns
- HandleCommandResult or a 3-tuple of (int, str, str)
Configuration options
Modules can load and store configuration options using theset_module_option
and get_module_option
methods.
Note
Use set_module_option
and get_module_option
tomanage user-visible configuration options that are not blobs (likecertificates). If you want to persist module-internal data orbinary configuration data consider using the KV store.
You must declare your available configuration options in theMODULE_OPTIONS
class attribute, like this:
- MODULE_OPTIONS = [
- {
- "name": "my_option"
- }
- ]
If you try to use set_module_option or get_module_option on options not declaredin MODULE_OPTIONS
, an exception will be raised.
You may choose to provide setter commands in your module to performhigh level validation. Users can also modify configuration usingthe normal ceph config set command, where the configuration optionsfor a mgr module are named like mgr/<module name>/<option>.
If a configuration option is different depending on which node the mgris running on, then use localized configuration (get_localized_module_option
, set_localized_module_option
).This may be necessary for options such as what address to listen on.Localized options may also be set externally with ceph config set
,where they key name is like mgr/<module name>/<mgr id>/<option>
If you need to load and store data (e.g. something larger, binary, or multiline),use the KV store instead of configuration options (see next section).
Hints for using config options:
Reads are fast: ceph-mgr keeps a local in-memory copy, so in many casesyou can just do a get_module_option every time you use a option, rather thancopying it out into a variable.
Writes block until the value is persisted (i.e. round trip to the monitor),but reads from another thread will see the new value immediately.
If a user has used config set from the command line, then the newvalue will become visible to get_module_option immediately, although themon->mgr update is asynchronous, so config set will return a fractionof a second before the new value is visible on the mgr.
To delete a config value (i.e. revert to default), just pass
None
toset_module_option.
MgrModule.
getmodule_option
(_key, default=None)Retrieve the value of a persistent configuration setting
- Parameters
key (str) –
default (str) –
Returns
- str
MgrModule.
setmodule_option
(_key, val)Set the value of a persistent configuration setting
- Parameters
- key (str) –
MgrModule.
getlocalized_module_option
(_key, default=None)- Retrieve localized configuration for this ceph-mgr instance:param str key::param str default::return: str
MgrModule.
setlocalized_module_option
(_key, val)- Set localized configuration for this ceph-mgr instance:param str key::param str val::return: str
KV store
Modules have access to a private (per-module) key value store, whichis implemented using the monitor’s “config-key” commands. Usethe set_store
and get_store
methods to access the KV store fromyour module.
The KV store commands work in a similar way to the configurationcommands. Reads are fast, operating from a local cache. Writes blockon persistence and do a round trip to the monitor.
This data can be access from outside of ceph-mgr using theceph config-key [get|set]
commands. Key names follow the sameconventions as configuration options. Note that any values updatedfrom outside of ceph-mgr will not be seen by running modules untilthe next restart. Users should be discouraged from accessing module KVdata externally – if it is necessary for users to populate data, modulesshould provide special commands to set the data via the module.
Use the get_store_prefix
function to enumerate keys withina particular prefix (i.e. all keys starting with a particular substring).
MgrModule.
setstore
(_key, val)Set a value in this module’s persistent key value store.If val is None, remove key from store
- Parameters
key (str) –
val (str) –
MgrModule.
getstore_prefix
(_key_prefix)Retrieve a dict of KV store keys to values, where the keyshave the given prefix
- Parameters
key_prefix (str) –
Returns
- str
Accessing cluster data
Modules have access to the in-memory copies of the Ceph cluster’sstate that the mgr maintains. Accessor functions as exposedas members of MgrModule.
Calls that access the cluster or daemon state are generally goingfrom Python into native C++ routines. There is some overhead to this,but much less than for example calling into a REST API or calling intoan SQL database.
There are no consistency rules about access to cluster structures ordaemon metadata. For example, an OSD might exist in OSDMap buthave no metadata, or vice versa. On a healthy cluster thesewill be very rare transient states, but modules should be writtento cope with the possibility.
Note that these accessors must not be called in the modules init
function. This will result in a circular locking exception.
MgrModule.
get
(data_name)Called by the plugin to fetch named cluster-wide objects from ceph-mgr.
- Parameters
data_name (str) – Valid things to fetch are osd_crush_map_text,osd_map, osd_map_tree, osd_map_crush, config, mon_map, fs_map,osd_metadata, pg_summary, io_rate, pg_dump, df, osd_stats,health, mon_status, devices, device
. Note:
- All these structures have their own JSON representations: experimentor look at the C++
dump()
methods to learn about them.
MgrModule.
getserver
(_hostname)- Called by the plugin to fetch metadata about a particular hostname fromceph-mgr.
This is information that ceph-mgr has gleaned from the daemon metadatareported by daemons running on a particular server.
- Parameters
- hostname – a hostname
MgrModule.
list_servers
()Like
get_server
, but gives information about all servers (i.e. allunique hostnames that have been mentioned in daemon metadata)- Returns
a list of information about all servers
Return type
- list
ceph-mgr fetches metadata asynchronously, so are windows of time duringaddition/removal of services where the metadata is not available tomodules. None
is returned if no metadata is available.
- Parameters
svc_type (str) – service type (e.g., ‘mds’, ‘osd’, ‘mon’)
svc_id (str) – service id. convert OSD integer IDs to strings whencalling this
Return type
- dict, or None if no metadata found
MgrModule.
getdaemon_status
(_svc_type, svc_id)- Fetch the latest status for a particular service daemon.
This method may return None
if no status information isavailable, for example because the daemon hasn’t fully started yet.
- Parameters
svc_type – string (e.g., ‘rgw’)
svc_id – string
Returns
- dict, or None if the service is not found
MgrModule.
getperf_schema
(_svc_type, svc_name)Called by the plugin to fetch perf counter schema info.svc_name can be nullptr, as can svc_type, in which casethey are wildcards
- Parameters
svc_type (str) –
svc_name (str) –
Returns
- list of dicts describing the counters requested
MgrModule.
getcounter
(_svc_type, svc_name, path)Called by the plugin to fetch the latest performance counter data for aparticular counter on a particular service.
- Parameters
svc_type (str) –
svc_name (str) –
path (str) – a period-separated concatenation of the subsystem and thecounter name, for example “mds.inodes”.
Returns
- A list of two-tuples of (timestamp, value) is returned. This may beempty if no data is available.
MgrModule.
get_mgr_id
()Retrieve the name of the manager daemon where this pluginis currently being executed (i.e. the active manager).
- Returns
- str
Exposing health checks
Modules can raise first class Ceph health checks, which will be reportedin the output of ceph status
and in other places that report on thecluster’s health.
If you use set_health_checks
to report a problem, be sure to callit again with an empty dict to clear your health check when the problemgoes away.
MgrModule.
sethealth_checks
(_checks)- Set the module’s current map of health checks. Argument is adict of check names to info, in this form:
- {
- 'CHECK_FOO': {
- 'severity': 'warning', # or 'error'
- 'summary': 'summary string',
- 'count': 4, # quantify badness
- 'detail': [ 'list', 'of', 'detail', 'strings' ],
- },
- 'CHECK_BAR': {
- 'severity': 'error',
- 'summary': 'bars are bad',
- 'detail': [ 'too hard' ],
- },
- }
- Parameters
- list – dict of health check dicts
What if the mons are down?
The manager daemon gets much of its state (such as the cluster maps)from the monitor. If the monitor cluster is inaccessible, whichevermanager was active will continue to run, with the latest state it sawstill in memory.
However, if you are creating a module that shows the cluster stateto the user then you may well not want to mislead them by showingthem that out of date state.
To check if the manager daemon currently has a connection tothe monitor cluster, use this function:
MgrModule.
have_mon_connection
()- Check whether this ceph-mgr daemon has an open connectionto a monitor. If it doesn’t, then it’s likely that theinformation we have about the cluster is out of date,and/or the monitor cluster is down.
Reporting if your module cannot run
If your module cannot be run for any reason (such as a missing dependency),then you can report that by implementing the can_run
function.
- static
MgrModule.
can_run
() - Implement this function to report whether the module’s dependenciesare met. For example, if the module needs to import a particulardependency to work, then use a try/except around the import atfile scope, and then report here if the import failed.
This will be called in a blocking way from the C++ code, so do notdo any I/O that could block in this function.
:return a 2-tuple consisting of a boolean and explanatory string
Note that this will only work properly if your module can always be imported:if you are importing a dependency that may be absent, then do it in atry/except block so that your module can be loaded far enough to usecan_run
even if the dependency is absent.
Sending commands
A non-blocking facility is provided for sending monitor commandsto the cluster.
MgrModule.
sendcommand
(args, *kwargs_)Called by the plugin to send a command to the moncluster.
- Parameters
result (CommandResult) – an instance of the
CommandResult
class, defined in the same module as MgrModule. This acts as acompletion and stores the output of the command. UseCommandResult.wait()
if you want to block on completion.svc_type (str) –
svc_id (str) –
command (str) – a JSON-serialized command. This uses the sameformat as the ceph command line, which is a dictionary of commandarguments, with the extra
prefix
key containing the commandname itself. Consult MonCommands.h for available commands andtheir expected arguments.tag (str) – used for nonblocking operation: when a commandcompletes, the
notify()
callback on the MgrModule instance istriggered, with notify_type set to “command”, and notify_id set tothe tag of the command.
Receiving notifications
The manager daemon calls the notify
function on all active moduleswhen certain important pieces of cluster state are updated, such as thecluster maps.
The actual data is not passed into this function, rather it is a cue forthe module to go and read the relevant structure if it is interested. Mostmodules ignore most types of notification: to ignore a notificationsimply return from this function without doing anything.
MgrModule.
notify
(notify_type, notify_id)Called by the ceph-mgr service to notify the Python pluginthat new state is available.
- Parameters
notify_type – string indicating what kind of notification,such as osd_map, mon_map, fs_map, mon_status,health, pg_summary, command, service_map
notify_id – string (may be empty) that optionally specifieswhich entity is being notified about. With“command” notifications this is set to the tag
from send_command
.
Accessing RADOS or CephFS
If you want to use the librados python API to access data stored inthe Ceph cluster, you can access the rados
attribute of yourMgrModule
instance. This is an instance of rados.Rados
whichhas been constructed for you using the existing Ceph context (an internaldetail of the C++ Ceph code) of the mgr daemon.
Always use this specially constructed librados instance instead ofconstructing one by hand.
Similarly, if you are using libcephfs to access the file system, thenuse the libcephfs create_with_rados
to construct it from theMgrModule.rados
librados instance, and thereby inherit the correct context.
Remember that your module may be running while other parts of the clusterare down: do not assume that librados or libcephfs calls will returnpromptly – consider whether to use timeouts or to block if the rest ofthe cluster is not fully available.
Implementing standby mode
For some modules, it is useful to run on standby manager daemons as wellas on the active daemon. For example, an HTTP server can usefullyserve HTTP redirect responses from the standby managers so thatthe user can point his browser at any of the manager daemons withouthaving to worry about which one is active.
Standby manager daemons look for a subclass of StandbyModule
in each module. If the class is not found then the module is notused at all on standby daemons. If the class is found, thenits serve
method is called. Implementations of StandbyModule
must inherit from mgr_module.MgrStandbyModule
.
The interface of MgrStandbyModule
is much restricted compared toMgrModule
– none of the Ceph cluster state is available tothe module. serve
and shutdown
methods are used in the sameway as a normal module class. The get_active_uri
method enablesthe standby module to discover the address of its active peer inorder to make redirects. See the MgrStandbyModule
definitionin the Ceph source code for the full list of methods.
For an example of how to use this interface, look at the source codeof the dashboard
module.
Communicating between modules
Modules can invoke member functions of other modules.
MgrModule.
remote
(module_name, method_name, *args, **kwargs)- Invoke a method on another module. All arguments, and the returnvalue from the other module must be serializable.
Limitation: Do not import any modules within the called method.Otherwise you will get an error in Python 2:
- RuntimeError('cannot unmarshal code objects in restricted execution mode',)
- Parameters
module_name – Name of other module. If module isn’t loaded,an ImportError exception is raised.
method_name – Method name. If it does not exist, a NameErrorexception is raised.
args – Argument tuple
kwargs – Keyword argument dict
Raises
RuntimeError – Any error raised within the method is converted to a RuntimeError
ImportError – No such module
Be sure to handle ImportError
to deal with the case that the desiredmodule is not enabled.
If the remote method raises a python exception, this will be convertedto a RuntimeError on the calling side, where the message string describesthe exception that was originally thrown. If your logic intendsto handle certain errors cleanly, it is better to modify the remote methodto return an error value instead of raising an exception.
At time of writing, inter-module calls are implemented withoutcopies or serialization, so when you return a python object, you’rereturning a reference to that object to the calling module. Itis recommend not to rely on this reference passing, as in future theimplementation may change to serialize arguments and returnvalues.
Shutting down cleanly
If a module implements the serve()
method, it should also implementthe shutdown()
method to shutdown cleanly: misbehaving modulesmay otherwise prevent clean shutdown of ceph-mgr.
Limitations
It is not possible to call back into C++ code from a module’sinit()
method. For example calling self.get_module_option()
atthis point will result in an assertion failure in ceph-mgr. For modulesthat implement the serve()
method, it usually makes sense to do mostinitialization inside that method instead.
Is something missing?
The ceph-mgr python interface is not set in stone. If you have a needthat is not satisfied by the current interface, please bring it upon the ceph-devel mailing list. While it is desired to avoid bloatingthe interface, it is not generally very hard to expose existing datato the Python code when there is a good reason.