Cluster Health Monitor Database size, or who is filling my clusterware volume?

Inside most installations of Oracle Clusterware 11g exists an “Berkeley DB” database used to store system performance data. This database also exists on the initial release of 12g (12.1.0.1) database, and after that (from 12c patch1 or 12.1.0.2) this “Berkeley DB” is replaced by an Oracle DB (called “Grid Infrastructure Management Repository”)

This database is limited to 1Gb in size, but due to some bugs or due to configuration changes (changing default retention of data) it could overpass this limit.

In the case explained here, a bug caused the database to grow over 80Gb, and we need to resize it to normal values. To do this we used the method explained on MOS note “1343105.1“, the procedure consists basically on stopping the Berkeley DB database, dropping all datafiles and start it again. Also it is a good practice to “review” the retention time for the data and correct it, if it’s not the expected one. Note that this procedure implies removing all historic performance data.

Let’s go, first we stop the Berkeley DB on all nodes (every node retains our own data), this database is managed by a clusterware service called “CRF”, and we use this service to start / stop them.


oracle@myserver1:~/scripts> crsctl status res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=ONLINE
STATE=ONLINE on myserver1

oracle@myserver1:~/scripts> crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'myserver1'
CRS-2677: Stop of 'ora.crf' on 'myserver1' succeeded

oracle@myserver2:~/scripts> crsctl status res ora.crf -init
NAME=ora.crf
TYPE=ora.crf.type
TARGET=ONLINE
STATE=ONLINE on myserver2

oracle@myserver2:~/scripts> crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'myserver2'
CRS-2677: Stop of 'ora.crf' on 'myserver2' succeeded

Ok, next we must go to the location of Berkeley DB and remove all datafiles, we must execute the “rm” command as root, because the files are root owned.


oracle@myserver1:~/scripts> cd $ORACLE_HOME
oracle@myserver1:/oracle/app/grid> cd crf
oracle@myserver1:/oracle/app/grid/crf> ls
admin db
oracle@myserver1:/oracle/app/grid/crf> cd db
oracle@myserver1:/oracle/app/grid/crf/db> cd myserver1
oracle@myserver1:/oracle/app/grid/crf/db/myserver1> ls -ltr
total 79231088
-rw-rr 1 root root 1132569 feb 20 2015 20-FEB-2015-13:25:05.txt
-rw-rr 1 root root 51 feb 20 2015 20-FEB-2015-17:10:44.txt
-rw-rr 1 root root 562292 feb 20 2015 20-FEB-2015-17:12:37.txt
-rw-rr 1 root root 747177 feb 20 2015 20-FEB-2015-17:13:05.txt
-rw-rr 1 root root 857277 feb 20 2015 20-FEB-2015-17:17:33.txt
-rw-rr 1 root root 76941 feb 20 2015 20-FEB-2015-17:25:46.txt
-rw-rr 1 root root 64 feb 21 2015 20-FEB-2015-18:30:43.txt
-rw-rr 1 root root 567523 feb 21 2015 20-FEB-2015-18:32:56.txt
-rw-rr 1 root root 1287775 feb 21 2015 20-FEB-2015-18:44:36.txt
-rw-rr 1 root root 1210847 feb 21 2015 20-FEB-2015-18:47:17.txt
-rw-rr 1 root root 1285087 feb 21 2015 20-FEB-2015-19:03:48.txt
-rw-rr 1 root root 1170879 feb 21 2015 20-FEB-2015-19:11:20.txt
-rw-rr 1 root root 2262397 feb 21 2015 20-FEB-2015-19:20:18.txt
-rw-rr 1 root root 39956 feb 21 2015 20-FEB-2015-19:34:20.txt
-rw-rr 1 root root 1130824 feb 21 2015 20-FEB-2015-19:39:32.txt
-rw-rr 1 root root 1336976 mar 2 2015 02-MAR-2015-09:30:16.txt
-rw-rr 1 root root 1150202 mar 2 2015 02-MAR-2015-09:36:56.txt
-rw-rr 1 root root 1411749 mar 2 2015 02-MAR-2015-09:55:38.txt
-rw-rr 1 root root 1150052 mar 2 2015 02-MAR-2015-10:02:06.txt
-rw-rr 1 root root 1600999 mar 18 2015 18-MAR-2015-18:01:52.txt
-rw-rr 1 root root 1149208 mar 18 2015 18-MAR-2015-18:08:29.txt
-rw-rr 1 root root 2017628 abr 15 2015 15-ABR-2015-11:32:46.txt
-rw-rr 1 root root 1171267 abr 15 2015 15-ABR-2015-11:39:42.txt
-rw-rr 1 root root 1973011 jun 12 2015 12-JUN-2015-06:22:26.txt
-rw-rr 1 root root 1069454 jun 12 2015 12-JUN-2015-06:26:33.txt
-rw-rr 1 root root 596970 sep 18 2015 18-SEP-2015-07:22:42.txt
-rw-rr 1 root root 132573 sep 18 2015 18-SEP-2015-12:24:43.txt
-rw-rr 1 root root 1338352 sep 18 2015 18-SEP-2015-12:48:56.txt
-rw-rr 1 root root 1806269 sep 18 2015 18-SEP-2015-12:56:02.txt
-rw-rr 1 root root 58697 sep 18 2015 18-SEP-2015-13:26:12.txt
-rw-rr 1 root root 568858 sep 18 2015 18-SEP-2015-13:28:22.txt
-rw-rr 1 root root 2334114 nov 13 2015 13-NOV-2015-12:14:10.txt
-rw-rr 1 root root 58662 nov 13 2015 13-NOV-2015-12:27:53.txt
-rw-rr 1 root root 62471 nov 13 2015 13-NOV-2015-12:53:41.txt
-rw-rr 1 root root 62489 nov 13 2015 13-NOV-2015-13:15:29.txt
-rw-rr 1 root root 1187101 nov 13 2015 13-NOV-2015-13:23:47.txt
-rw-rr 1 root root 2423088 feb 16 2016 16-FEB-2016-02:44:47.txt
-rw-rr 1 root root 2037166 feb 16 2016 16-FEB-2016-02:46:21.txt
-rw-rr 1 root root 1370969 abr 22 2016 22-ABR-2016-09:22:18.txt
-rw-rr 1 root root 2429879 abr 22 2016 22-ABR-2016-09:32:33.txt
-rw-rr 1 root root 2355664 abr 22 2016 22-ABR-2016-09:35:13.txt
-rw-r- 1 root root 8192 abr 26 2016 repdhosts.bdb
-rw-rr 1 root root 2684653 abr 26 2016 26-ABR-2016-04:21:24.txt
-rw-rr 1 root root 1542974 abr 26 2016 26-ABR-2016-04:26:25.txt
-rw-rr 1 root root 2868056 may 25 2016 25-MAY-2016-12:19:06.txt
-rw-r- 1 root root 24576 may 25 2016 __db.001
-rw-rr 1 root root 120000000 may 25 2016 myserver1.ldb
-rw-rr 1 root root 57779 may 25 2016 25-MAY-2016-12:48:07.txt
-rw-rr 1 root root 1368787 may 25 2016 25-MAY-2016-12:52:27.txt
-rw-r- 1 root root 8192 may 25 2016 crfconn.bdb
-rw-r- 1 root root 16777216 dic 20 08:31 log.0000028130
-rw-r- 1 root root 1002070016 dic 20 08:34 crfts.bdb
-rw-r- 1 root root 16777216 dic 20 08:34 log.0000028131
-rw-r- 1 root root 75416928256 dic 20 08:34 crfclust.bdb
-rw-r- 1 root root 57344 dic 20 08:34 __db.006
-rw-r- 1 root root 1187840 dic 20 08:35 __db.005
-rw-r- 1 root root 401408 dic 20 08:35 __db.002
-rw-r- 1 root root 1343184896 dic 20 08:35 crfloclts.bdb
-rw-r- 1 root root 1088425984 dic 20 08:35 crfhosts.bdb
-rw-r- 1 root root 1059303424 dic 20 08:35 crfcpu.bdb
-rw-r- 1 root root 1066074112 dic 20 08:35 crfalert.bdb
-rw-r- 1 root root 2162688 dic 20 08:35 __db.004
-rw-r- 1 root root 2629632 dic 20 08:35 __db.003
oracle@myserver1:/oracle/app/grid/crf/db/myserver1> su
Password:
myserver1:/oracle/app/grid/crf/db/myserver1 # rm *.bdb
myserver1:/oracle/app/grid/crf/db/myserver1 # rm *.ldb
myserver1:/oracle/app/grid/crf/db/myserver1 # rm *.txt
myserver1:/oracle/app/grid/crf/db/myserver1 # exit
exit
oracle@myserver1:/oracle/app/grid/crf/db/myserver1> ls -la
total 22600
drwxr-x- 2 root oinstall 4096 dic 20 08:39 .
drwxr-x- 3 root oinstall 4096 feb 20 2015 ..
-rw-r- 1 root root 24576 may 25 2016 __db.001
-rw-r- 1 root root 401408 dic 20 08:35 __db.002
-rw-r- 1 root root 2629632 dic 20 08:35 __db.003
-rw-r- 1 root root 2162688 dic 20 08:35 __db.004
-rw-r- 1 root root 1187840 dic 20 08:35 __db.005
-rw-r- 1 root root 57344 dic 20 08:34 __db.006
-rw-r- 1 root root 16777216 dic 20 08:31 log.0000028130
-rw-r- 1 root root 16777216 dic 20 08:34 log.0000028131

After that, we must start again the “CRF” service and all datafiles will recreate with small sizes.


oracle@myserver1:/oracle/app/grid/crf/db/myserver1> crsctl start res ora.crf -init
CRS-2672: Attempting to start 'ora.crf' on 'myserver1'
CRS-2676: Start of 'ora.crf' on 'myserver1' succeeded
oracle@myserver1:/oracle/app/grid/crf/db/myserver1> ls -ltr
total 33056
-rw-r- 1 root root 16777216 dic 20 08:31 log.0000028130
-rw-r- 1 root root 16777216 dic 20 08:40 log.0000028131
-rw-r- 1 root root 57344 dic 20 08:40 __db.006
-rw-r- 1 root root 1187840 dic 20 08:40 __db.005
-rw-r- 1 root root 2162688 dic 20 08:40 __db.004
-rw-r- 1 root root 401408 dic 20 08:40 __db.002
-rw-r- 1 root root 24576 dic 20 08:40 __db.001
-rw-r- 1 root root 8192 dic 20 08:40 crfts.bdb
-rw-r- 1 root root 8192 dic 20 08:40 crfloclts.bdb
-rw-r- 1 root root 8192 dic 20 08:40 crfhosts.bdb
-rw-r- 1 root root 8192 dic 20 08:40 crfcpu.bdb
-rw-r- 1 root root 8192 dic 20 08:40 crfconn.bdb
-rw-r- 1 root root 131072 dic 20 08:40 crfclust.bdb
-rw-r- 1 root root 8192 dic 20 08:40 crfalert.bdb
-rw-rr 1 root root 120000000 dic 20 08:40 myserver1.ldb
-rw-r- 1 root root 2629632 dic 20 08:40 __db.003

If you want to know the “retention period” of data you can execute this command:


oracle@myserver1:/oracle/app/grid> oclumon manage -get repsize

CHM Repository Size = 61511

Done

The accumulated size of the databases (in all nodes) is calculated with this formula:

Number of nodes * 720MB * RETENTION_DAYS

To change the default retention we must use this command


oclumon manage -repos resize <new_retention_in_seconds>

The maximum retention allowed is 259200 (3 days).

Finally for those of you who want to “view” the data stored you can execute this command:


oclumon dumpnodeview -n <node name> -s <start time> -e <end time> -v

Or to view data from local node on “real time”:


oclumon dumpnodeview

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s