I finally got around to updating the code with all the changes that have been running on the main project this was initially written for. So just a quick update to say added in some actually monitoring code rather than just examples. Can now monitor and restart (using event handler functionality in Nagios):
* AFP
* DHCP
* Directory Services
* DNS
* FTP
* Jabber
* Mail
* MySQL
* NAT
* Netboot
* NFS
* Print
* Quicktime Streaming Server
* SMB/CIFS
* Software Update
* Web
Services on an Apple Server… it has to be the actual Server edition though. Send comments, suggestions, general grief and mayhem.. HERE đ
libsrvrmgrd-osx
18 comments
Comments feed for this article
Monday, October 18, 2010 at 9:28
David
Hi Felim
Thanks for the work you’ve put in to this. I have a problem I was wondering if you can help with?
I have Nagios running on an OSX 10.6 server and I’ve installed your plugin to monitor the local host and a remote host. Everything will work fine for a time, then Nagios will show that all of the monitored OSX services have gone off line, and I get the following in the event log:
SERVICE NOTIFICATION: nagiosadmin;knowledgebase;OS X MySQL;WARNING;notify-service-by-email;(null)
The email notification doesn’t get sent, and all notifications stop after this happens. When I restart Nagios, the plugin then reconnects and I get a notification that the service has recovered. And everything is back to normal until the pattern repeats.
Any Ideas?
Regards
David
Monday, October 18, 2010 at 13:49
Félim
Hi David,
No problem, glad it’s of use! Shame it’s not working 100% for you though..
Any chance you can run it manually when that starts happening? Pass it the correct parameters it should dump a load of errors out. I generally run it on Linux servers but should be fine.
Monday, October 18, 2010 at 14:07
David
Hi Felim, thanks for your quick response.
When I check from the command line it takes a second or two and then I get the correct response:
./check_osx_server 192.168.204.4 mysql 311 username password
RUNNING
But looking in Nagios I have the following:
Current Status: WARNING (for 0d 0h 25m 16s)
Status Information: (null)
Performance Data:
Current Attempt: 4/4 (HARD state)
Last Check Time: 10-18-2010 13:04:42
Check Type: ACTIVE
Check Latency / Duration: 0.232 / 0.008 seconds
Next Scheduled Check: 10-18-2010 13:07:42
Last State Change: 10-18-2010 12:40:42
Last Notification: 10-18-2010 12:43:52 (notification 1)
Is This Service Flapping? NO (5.33% state change)
In Scheduled Downtime? NO
Last Update: 10-18-2010 13:05:52 ( 0d 0h 0m 6s ago)
And once this happens, all of the services using this plugin show the same.
Regards
David
Monday, October 18, 2010 at 14:09
David
The notifications log shows
knowledgebase OS X MySQL WARNING 10-18-2010 12:43:52 nagiosadmin notify-service-by-email (null)
and the event log shows
[10-18-2010 12:43:52] SERVICE ALERT: knowledgebase;OS X MySQL;WARNING;HARD;4;(null)
Regards
David
Monday, October 18, 2010 at 15:14
Félim
Hmm ok strange, the (null) essentially means it gets no output from the plugin, ie. the test saying RUNNING in this case. Can you try something,
sudo su
As your user, which will get you to root, then from there
su nagios
Or whatever user your nagios system runs as, then try re-run it again. Make sure first of all to remove any of the temp files in /tmp/ that correspond to the server then rerun the test. It may be possible nagios can’t access a library somewhere or overwrite a file.
Tuesday, October 19, 2010 at 9:05
David
Hi Félim
This is getting interesting, when I run it as the nagios user as above whilst it is reporting a null state, it sits there and does nothing. I hit ctrl c after two minutes and got the following which I’m guessing is related to killing it?
^CTraceback (most recent call last):
File “./check_osx_server”, line 270, in
serviceData = getServiceData(‘servermgr_%s’ % checkService, ‘getState’, ‘withDetails’)
File “./check_osx_server”, line 130, in getServiceData
dataFileLocation = srvrmgrdIO.buildDataFile(servermgrdModule, dataRequest, serverAddress, serverPort, serverUser, serverPassword)
File “/usr/local/nagios/libexec/srvrmgrdIO.py”, line 166, in buildDataFile
createNewDataFile(ServerDataFile, servermgrdModule, request, server, port, webuser, webpass)
File “/usr/local/nagios/libexec/srvrmgrdIO.py”, line 172, in createNewDataFile
DataPList = sendXML(servermgrdModule, request, server, port, webuser, webpass)
File “/usr/local/nagios/libexec/srvrmgrdIO.py”, line 147, in sendXML
xmlresult = requestServerData(url, webuser, webpass)
File “/usr/local/nagios/libexec/srvrmgrdIO.py”, line 131, in requestServerData
htmlFile = urllib2.urlopen(request)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py”, line 124, in urlopen
return _opener.open(url, data, timeout)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py”, line 383, in open
response = self._open(req, data)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py”, line 401, in _open
‘_open’, req)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py”, line 361, in _call_chain
result = func(*args)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py”, line 1138, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/urllib2.py”, line 1102, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py”, line 874, in request
self._send_request(method, url, body, headers)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py”, line 911, in _send_request
self.endheaders()
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py”, line 868, in endheaders
self._send_output()
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py”, line 740, in _send_output
self.send(msg)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py”, line 699, in send
self.connect()
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/httplib.py”, line 1073, in connect
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/ssl.py”, line 350, in wrap_socket
suppress_ragged_eofs=suppress_ragged_eofs)
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/ssl.py”, line 118, in __init__
self.do_handshake()
File “/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/ssl.py”, line 293, in do_handshake
self._sslobj.do_handshake()
KeyboardInterrupt
Running it as root whilst it is in this state works.
The permissions are set as follows:
-rwxr-xr-x@ 1 root nagios 13K Oct 16 10:48 check_osx_server*
which is the same as all of the other plugins apart from the @ on the end.
After issuing a killall nagios, and running the command as the nagios user again I get the following:
sh-3.2$ whoami
nagios
sh-3.2$ ./check_osx_server mysql 192.168.204.4 311 user password
RUNNING
Regards
David
Tuesday, October 19, 2010 at 9:28
David
Further info
Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29)
OSX Server 10.6.4
Tuesday, October 19, 2010 at 9:37
Félim
Can you give me one more thing, the data from:
ls -l /tmp/
That’ll show us who owns what temp files the plugin uses for the cache.
Tuesday, October 19, 2010 at 9:41
David
Hi Félim
THe plugin is working at the moment, here’s the output
srwxrwxrwx 1 lithium wheel 0B Oct 19 07:53 .s.PGSQL.51132=
-rw——- 1 lithium wheel 66B Oct 19 07:53 .s.PGSQL.51132.lock
-rw-rw-r– 1 nagios wheel 1.0K Oct 19 08:38 127.0.0.1_311_servermgr_dirserv_getState.dat
-rw-rw-r– 1 nagios wheel 694B Oct 19 08:37 127.0.0.1_311_servermgr_dns_getState.dat
-rw-r–r– 1 root wheel 1.0K Oct 17 14:15 192.168.204.2_311_servermgr_dirserv_getState.dat
-rw-r–r– 1 root wheel 694B Oct 18 13:04 192.168.204.2_311_servermgr_dns_getState.dat
-rw-r–r– 1 root wheel 1.5K Oct 17 14:16 192.168.204.2_311_servermgr_web_getState.dat
-rw-r–r– 1 root wheel 561B Oct 17 18:09 192.168.204.4_311_servermgr_afp_getState.dat
-rw-rw-r– 1 nagios wheel 456B Oct 19 08:36 192.168.204.4_311_servermgr_mysql_getState.dat
-rw-rw-r– 1 nagios wheel 1.1K Oct 17 14:33 192.168.204.4_311_servermgr_web_getState.dat
srwxrwxrwx 1 root wheel 0B Oct 15 11:06 ARD_ABJMMRT=
srwxr-xr-x 1 mcadmin wheel 0B Oct 15 11:06 icssuis501=
drwx—— 3 mcadmin wheel 102B Oct 15 11:06 launch-iFHnbX/
drwx—— 3 mcadmin wheel 102B Oct 15 11:06 launch-j1Y9ns/
drwx—— 3 mcadmin wheel 102B Oct 15 11:06 launch-pHEejo/
drwx—— 3 mcadmin wheel 102B Oct 15 11:06 launchd-438.OaY3p3/
drwx—— 3 davidwhite wheel 102B Oct 19 07:54 launchd-77639.Zatimp/
drwx—— 3 root wheel 102B Oct 19 07:55 launchd-77658.IeSuzB/
drwxr-xr-x 2 root wheel 68B Oct 15 11:05 lithium/
srwxr-xr-x 1 root wheel 0B Oct 15 11:05 lithium-client_handler_core=
srwxr-xr-x 1 root wheel 0B Oct 15 11:05 lithium-core=
-rw——- 1 root wheel 390B Oct 15 11:05 pydirzWKroV
I’ll check it again when it stops.
Tuesday, October 19, 2010 at 9:48
David
Just to add to the above, the files owned by root are from tests I’ve run as root. The files owned by nagios are the ones that are actually being checked by nagios
Tuesday, October 19, 2010 at 9:51
David
Hi Again, I’ve just discovered that one of my colleagues installed Lithium on this box and it was running. I’ve killed it now. I don’t know if that would cause a problem?
Tuesday, October 19, 2010 at 14:25
Félim
Hey David,
sorry for the delay, had to grab fuel and supplies (living in France at moment!) so making sure we are covered đ Ok Lithium shouldn’t really effect it… I’m almost positive. It’s odd root can run it fine and nagios not. There no network based access control on the box like SELinux from Linux world? I imagine you’ve got other plugins running though. It seems to be having trouble getting the temporary file. If you delete all files starting with:
192.168.204.2_311_servermgr
That’ll eliminate the issue, meanwhile I’ll get my hands on a test server if I can in case 10.6 is doing something strange. If that doesn’t work I’ve got a way to force it into memory mode but assuming your mail address is correct when you posted I can mail it to you.
Wednesday, October 20, 2010 at 8:53
David
Hi Félim
Sorry I didn’t respond yesterday, one of those day! So please don’t worry about the delay! I’m somewhat jealous as I miss living in France!
There’s no access control on the two hosts I’m testing this with, and yes, all the other plugins are fine. When it stopped working, I deleted the files and rescheduled the checks without restarting Naigos, and it doesn’t come back to life.
I ran it via the command line as the Nagios user, and it exhibits the same behaviour, ie sits there and hangs, and does not recreate the temp file.
Now it get’s rather strange:
I ran a check for a different service as the nagios user:
sh-3.2$ ./check_osx_server afp 192.168.204.4 311 user pass
currentConnections:0
sh-3.2$ whoami
nagios
and this works without restarting nagios, and creates the temp file:
mediastation:~ davidwhite$ ls /tmp/
192.168.204.4_311_servermgr_afp_getState.dat
I then ran the original command as the nagios user:
sh-3.2$ ./check_osx_server mysql 192.168.204.4 311 user pass
RUNNING
mediastation:~ davidwhite$ ls /tmp/
192.168.204.4_311_servermgr_afp_getState.dat
192.168.204.4_311_servermgr_mysql_getState.dat
Which as you can see brought it back to life from the command line, however Nagios is still reporting Null
Current Status: WARNING (for 0d 16h 36m 26s)
Status Information: (null)
Performance Data:
Current Attempt: 4/4 (HARD state)
Last Check Time: 10-20-2010 07:47:34
Check Type: ACTIVE
Check Latency / Duration: 0.089 / 0.008 seconds
Next Scheduled Check: 10-20-2010 07:50:34
Last State Change: 10-19-2010 15:11:58
Last Notification: 10-20-2010 07:17:38 (notification 17)
Is This Service Flapping? NO (0.00% state change)
In Scheduled Downtime? NO
Last Update: 10-20-2010 07:48:18 ( 0d 0h 0m 6s ago)
Even though the Nagios software is having problems, it’s working as the nagios user from the command line.
I then ran the commands for the other services I’m checking, and they work fine from the command line as the Nagios user, but the nagios core still thinks it has problems and is reporting as NULL.
I’m going to restart Nagios and see how it behaves today.
Yes the email address you have is correct.
Thanks again for your help
Regards
David
Thursday, October 21, 2010 at 9:00
David
Hi Félim
I’ve been monitoring top -U nagios, and when the plugin is scheduled to be checked by nagios, once it has stopped working it doesn’t get launched by nagios. All of the other plugin checks show up as being launched, but the check_osx_server plugin doesn’t start.
Thursday, October 21, 2010 at 9:12
David
Hi Félim
Another update, when I try and run the plugin from the command line as the Nagios user, it’s working fine now. But still not working from within Nagios.
Regards
David
Thursday, October 21, 2010 at 10:54
Félim
Hi David,
Got caught up in some work stuff yesterday. Interesting it’s not running it only from within nagios. Any chance you can paste your plugin config for it here. I’ll get you some debug code as well to help display more output.
Thursday, October 21, 2010 at 13:15
Félim
Eh jsut had a quick thought, the password you are using for the account doesn’t contain any characters Nagios deems to be illegal or requiring escaping does it?
http://nagios.sourceforge.net/docs/2_0/configmain.html#illegal_object_name_chars
Just a thought… debug stuff is nearly ready.
Wednesday, January 12, 2011 at 23:53
Mark
I found the issue. I had a similar problem with one of my custom checks, it kept returning “status: (null)”. And this thread was the most helpful in leading me in trouble shooting. I put in debugging info and detailed logs, and sure enough, the check wasn’t being executed.
It was an illegal character on the check_command line in the service definition.
Wrong:
ldap_check_curl!8389|mon
Right:
ldap_check_curl!8389!mon
In case you don’t see it, I put in a pipe instead of a bang for $ARG2$. I am so getting glasses tomorrow.