Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc updates for Release 1.6 #2289

Merged
merged 11 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/Upgrade/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,9 @@ Once the cloning process is done, follow the steps listed below to invoke the up

- In Omnia v1.5.1 OpenLDAP client configuration was supported. If you had configured OpenLDAP client to external enterprise LDAP server in Omnia v1.5.1, then this configuration will not be restored during upgrade. In Omnia v1.6, Omnia installs OpenLDAP server and the user needs to reconfigure the OpenLDAP server to integrate it with an external LDAP server.
- The slurm setup in Omnia v1.5.1 cluster is upgraded to configless slurm in v1.6.
* While the Omnia upgrade process does attempt an automatic backup of the Telemetry database, it is recommended to manually create a backup before initiating the upgrade for added precaution. After the upgrade, the restoration of the telemetry database must be performed manually by the user.

* Omnia recommends to stop the telemetry services in Omnia v1.5.1 by configuring ``idrac_telemetry_support`` and ``omnia_telemetry_support`` to ``false`` in ``input/telemetry_config.yml``, followed by the execution of the ``telemetry.yml`` playbook before proceeding with the upgrade flow.
* For a successful restoration of the telemetry database in Omnia v1.6, ensure ``input/telemetry_config.yml`` has ``idrac_telemetry_support`` set to ``false`` and ``omnia_telemetry_support`` set to ``true``, after executing ``prepare_config.yml``.

.. caution:: The NFS share directory mentioned in ``omnia_usrhome_share``, provided in v1.5.1 ``omnia_config.yml``, is unmounted from the cluster and deleted from the head node, along with all the user data while executing the ``prepare_upgrade.yml`` playbook. Hence, it is recommended that you take a backup of the Omnia NFS share before executing the ``prepare_upgrade.yml`` playbook.
6 changes: 5 additions & 1 deletion docs/source/Upgrade/prepare_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ This is the first step of upgrade process and uses the ``prepare_config.yml`` pl
* Generates the inventory for Omnia v1.6 from the v1.5.1 inventory.
* Sets the Omnia v1.6 execution environment by updating the ansible and python versions compatible to v1.6.
* Creates backup of the Omnia v1.5.1 database.
* Creates a backup of the Omnia v1.5.1 telemetry database if the ``timescaledb`` pod is in ``running`` state.

.. note:: Post upgrade, restoring the Omnia telemetry database in Omnia v1.6 is completely manual and user-driven.

Before executing ``prepare_config.yml``, user needs to update ``upgrade_config.yml`` with the following details:

Expand All @@ -30,7 +33,8 @@ Expected output of this playbook execution:

* Auto-populated Omnia v1.6 configuration files in the ``<omnia_1.6_location>/omnia/input``.
* Auto-generated inventory file in Omnia v1.6 format. This is available in the ``<omnia_1.6_location>/omnia/upgrade`` folder and will be used later during the execution of `upgrade.yml <upgrade.html>`_.
* Backup of the omnia v1.5.1 database is created at the ``backup_location`` specified in the ``upgrade_config.yml``. The backup file is named as ``backup.sql``.
* Backup of the Omnia v1.5.1 database is created at the ``backup_location`` specified in the ``upgrade_config.yml``. The backup file is named as ``backup.sql``.
* Backup of the Omnia v1.5.1 telemetry database is created at the ``backup_location`` specified in the ``upgrade_config.yml``. The backup file is named as ``telemetry_tsdb_dump.sql``.

**Review or Update the auto-generated config files**

Expand Down
65 changes: 65 additions & 0 deletions docs/source/Upgrade/restore_telemetryDB.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
Restoring Telemetry database post Omnia upgrade
================================================

After upgrading Omnia, if you want to retain the telemetry data from Omnia v1.5.1, you need to manually restore the telemetry database from the ``telemetry_tsdb_dump.sql`` file. Perform the following steps to do so:

1. Copy the backed up telemetry database file, that is ``telemetry_tsdb_dump.sql``, from the ``backup_location`` to ``/opt/omnia/telemetry/iDRAC-Referencing-Tools``.

2. Stop the Omnia telemetry services on all the cluster nodes. Run the ``telemetry.yml`` playbook after setting the ``idrac_telemetry_support``, ``omnia_telemetry_support``, and ``visualization_support`` parameters to ``false`` in ``input/telemetry_config.yml``. Execute the following command: ::

cd telemetry
ansible-playbook telemetry.yml -i ../upgrade/inventory

3. Connect to the ``timescaledb`` pod and execute the psql commands. Perform the following steps:

* Execute the following command: ::

kubectl exec -it timescaledb-0 -n telemetry-and-visualizations -- /bin/bash

* Verify that the dump file is present using the ``ls`` command.

* Connect to the psql client using the following command: ::

psql -U <timescaledb_user>

where "timescaledb_user" is the configured ``timescaledb`` username for telemetry.

* Drop the current database using the command below: ::

DROP DATABASE telemetry_metrics;

.. note:: If there are processes which are preventing you to drop the database, then terminate those processes and try again.

* Create an empty telemetry database for Omnia v1.6 using the command below: ::

CREATE DATABASE telemetry_metrics;

* Exit from the psql client using ``\q`` command.

* Execute the following command to initiate the database restore operation: ::

psql --dbname=telemetry_metrics --host=<pod_external_ip> --port=5432 --username=<timescaledb_user> -v ON_ERROR_STOP=1 --echo-errors -c "SELECT public.timescaledb_pre_restore();" -f telemetry_tsdb_dump.sql -c "SELECT public.timescaledb_post_restore();"

.. note:: Execute the following command to obtain the ``pod_external_ip`` and ``port`` for the ``timescaledb`` pod:
::
kubectl get svc -A output

* Drop the ``insert_block_trigger`` if it exists using the following commands: ::

psql -U omnia
\c telemetry_metrics
DROP TRIGGER IF EXISTS ts_insert_blocker ON public.timeseries_metrics;
DROP TRIGGER IF EXISTS ts_insert_blocker ON omnia_telemetry.metrics;


Next steps
============

1. Connect to the ``telemetry_metrics`` database and verify if the restored telemetry data is present in ``public.timeseries_metrics`` and ``omnia_telemetry.metrics`` tables.

2. Post verification, you can choose to restart the Omnia telemetry services. Run the ``telemetry.yml`` playbook after modifying the ``input/telemetry_config.yml`` as per your requirements. For more information regarding the telemetry parameters, `click here <../InstallationGuides/BuildingClusters/schedulerinputparams.html#id18>`_. Execute the following command: ::

cd telemetry
ansible-playbook telemetry.yml -i ../upgrade/inventory

3. After telemetry services are enabled, check ``omnia_telemetry.metrics`` and ``public.timeseries_metrics`` tables to see if the number of rows have increased. This signifies that the fresh telemetry data from Omnia v1.6 is getting updated in the database.
4 changes: 3 additions & 1 deletion docs/source/Upgrade/upgrade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,6 @@ To execute the ``upgrade.yml`` playbook, run the following command: ::

Where inventory refers to the auto-generated inventory file in Omnia v1.6 format.

This is the final step, and once the upgrade.yml playbook is executed successfully, the upgrade process is complete!
This is the final step, and once the upgrade.yml playbook is executed successfully, the upgrade process is complete!

Optional - `Restore Telemetry database post upgrade <restore_telemetryDB.html>`_