============================================= VOLTHA and ONOS software update procedures ============================================= This document describes the software upgrade procedure for VOLTHA and ONOS in a deployed system. Distinction is made between a `minor` software upgrade, which can be done in service, meaning with no dataplane service interruption to existing customers, and a `major` software upgrade, which in turns requires a full maintenance window during which service is impacted. Changes to data-structures in storage (ETCD for VOLTHA and Atomix for ONOS) are out of scope for in-service upgrades. Such changes qualify as “major” software upgrades that require a maintenance windows. The KAFKA bus update has its own section given that the procedure is different from the rest of the components. The following elements expect a fully working provisioned VOLTHA and ONOS deployment on top of a Kubernetes cluster, with exposed ONOS REST API ports. It is also expected that new versions of the different components are available to the operator that performs the upgrade. Minor Software Version Update ============================= The `minor` software upgrade qualifier refers to an upgrade that does not involve API changes, which in VOLTHA, refers to either a change to the protos or to voltha-lib-go, and in ONOS to a change in the Java interfaces, CLI commands or REST APIs of either the Apps or the platform. A `minor` software update is intended for bug fixes and not for new features. `Minor` software update is supported only for ONOS apps and VOLTHA components. No in service software update is supported for ETCD or Kafka. VOLTHA services --------------- VOLTHA components `minor` software upgrade leverages `helm` and `k8s`. During this process is expected that no provision subscriber call is executed from the northbound. In process calls will be executed thanks to the stored data and/or the persistence of messages over KAFKA. After changes in the code are made and verified the following steps are needed: #. Update Minor Version of the component #. Build a new version of the needed component to update #. update the component's minor version in the helm chart #. | issue the helm upgrade command. If the changes have been already upstreamed to ONF the upstream chart | `onf/` can be used, otherwise a local copy of the chart is required. Following is an example of the `helm` command to upgrade the openonu adapter. Topics, kv store paths and kafka endpoints need to be adapted to the specific deployment. .. code:: bash helm upgrade --install --create-namespace \ -n voltha1 opeonu-adapter onf/voltha-adapter-openonu \ --set global.stack_name=voltha1 \ --set adapter_open_onu.kv_store_data_prefix=service/voltha/voltha1_voltha1 \ --set adapter_open_onu.topics.core_topic=voltha1_voltha1_rwcore \ --set adapter_open_onu.topics.adapter_open_onu_topic=voltha1_voltha1_brcm_openomci_onu \ --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \ --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \ --set services.etcd.service=voltha-infra-etcd.infra.svc ONOS apps --------- `Minor` software update is also available for the following ONOS apps - `sadis`, `olt`, `aaa`, `kafka`, `dhcpl2relay`, `mac-learning`, `igmpproxy`, and `mcast`. These apps can be thus updated with no impact on the dataplane of provisioned subscribers. The `minor` software update for the ONOS apps leverage existing ONOS REST APIs. During this process is expected that no provision subscriber call is executed from the REST APIs. In process calls will be executed thanks to the Atomix stored flows. Some metrics and/or packet processing might be lost during this procedure, the system relies on retry mechanisms present in the services and the dataplane protocols for converging to a stable stated (e.g. DHCP retry) After changes in the code of ONOS apps are made and verified the following steps are needed: #. | obtain the .oar of the app, either via a local build with `mvn clean install` or, if the code has been upstreamed | by downloading it from `maven central `_ or sonatype. #. Delete the old version of the ONOS app. #. Upload install and activate the new `oar` file. Following is an example of the different `curl` commands to upgrade the olt app. This assumes the .oar to be present in the directory where the command is executed from/ .. code:: bash # download the app curl --fail -sSL https://oss.sonatype.org/content/groups/public/org/opencord/olt-app/4.5.0-SNAPSHOT/olt-app-4.5.0-20210504.162620-3.oar > org.opencord.olt-4.5.0.SNAPSHOT.oar # delete the app curl --fail -sSL -X DELETE http://karaf:karaf@127.0.0.1:8181/onos/v1/applications/org.opencord.olt # install and activate the new version of the app curl --fail -sSL -H Content-Type:application/octet-stream -X POST http://karaf:karaf@127.0.0.1:8181/onos/v1/applications?activate=true --data-binary @org.opencord.olt-4.5.0.SNAPSHOT.oar 2>&1 Minor Software Version Rollback Due To Failure ---------------------------------------------- A `Minor` software upgrade can incur in failures and broken functionality. There are two possible cases, 1. container does not start, 2. broken functionality during operations VOLTHA Component updated container does not start ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is automatically handled by Kubernetes. An old version of the pod does not get terminated unless the new one is running and ready according to its readiness probe. No system or data-plane functionality is impacted. The operator will need to go in, manually delete the failing pod, fix the issue and re-deploy after fixing the new `minor` version. VOLTHA Component Broken functionality during operations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this case the container started and became `ready` in Kubernetes but functionality of the system or data-plane is broken, e.g. a subscriber can't be provisioned or no traffic is flowing. In this case the operator needs to perform a manual intervention, rolling back to the previous minor version of the container. The rollback operation is the same as a `minor` software update via `helm` but instead of increasing the version number it's a decrement of it to the last known running one. ONOS app not starting or broken functionality ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For ONOS apps a manual intervention is always necessary, both if the app does not start or if functionality is broken. The rollback of an ONOS application is done by following the same procedure as the update using the previous, or last known working, version of the `.oar` file. Inter-dependency among changes submitted in different Components ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Even though it is expected that minor version upgrade should be seemless, still there are chances that the changes that went in for a component are related with other component changes. In this case the operator needs to perform a manual intervention, and upgrade the components manually in desired order. Major Software Version Update ============================= A software update is qualified to be `major` where there are changes in the APIs or in the format of the data stored by a component. A major software update at the moment in VOLTHA and ONOS requires a maintenance window during which the dataplane for the subscribers is going to be interrupted, thus no service will be provided. There are several cases and they can be handled differently. VOLTHA services API or Data format changes ------------------------------------------ A `major` update is needed because VOLTHA API between components have been changed or because format of the data being stored is different, thus a complete-wipe out needs to be performed. In such scenario each stack can be updated independently with no teardown required of the infrastructure of ONOS, ETCD, KAFKA. Different versions of Voltha can co-exists over the same infrastructure. The procedure is iterative on each stack and is performed as follows: #. un-provision all the subscribers via ONOS REST API. #. delete all the OLTs managed by the stack via VOLTHA gRPC API. #. upgrade the stack version via `helm` upgrade command and the correct version of the `voltha-stack` chart. Details on the `helm` commands can be found in the voltha-helm-charts README file _ If the API change is between the `openolt adapter` and the `openolt agent` on the OLT hardware please refer to section :ref:`OpenOLT Agent Update `. ONOS, Atomix or ONOS apps ------------------------- A `major` update is needed because of changes in the interfaces (Java APIs), REST APIs, of ONOS itself or in one of the apps have been made, rendering incompatible the two subsequent implementations. A `major` software update is also needed for changes made to the data stored in Atomix or for an update of the Atomix version iself. In this scenario all the stacks connected to an ONOS instance need to be cleaned of data before moving them over to a new ONOS cluster. The procedure is as follows: #. deploy a new ONOS cluster in a new namespace `infra1` #. un-provision all the subscribers via ONOS REST API #. delete the OLT device (not strictly required, but best to ensure clean state) #. redeploy the of-agent with the new ONOS cluster endpoints #. re-provision the OLT #. re-provision the subscribers #. iterate over steps 2,3,4,5,6 for each of the stack connected to the ONOS you want to update. Following is an example on how to deploy ONOS: .. code:: bash helm install --create-namespace \ --set replicas=3,atomix.replicas=3 \ --set atomix.persistence.enabled=false \ --set image.pullPolicy=Always,image.repository=voltha/voltha-onos,image.tag=5.0.0 \ --namespace infra1 onos onos/onos-classic Following is an example on how to re-deploy the of-agent, using the `voltha-stack` chart, pointing new controller endpoints. Only the `ofagent` pod will be restarted. .. code:: bash helm upgrade --install --create-namespace \ --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \ --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \ --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \ --set services.kafka.adapter.service=voltha-infra-kafka.infra.svc \ --set services.kafka.cluster.service=voltha-infra-kafka.infra.svc \ --set services.etcd.service=voltha-infra-etcd.infra.svc --set 'voltha.services.controller[0].service=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc' \ --set 'voltha.services.controller[0].port=6653' \ --set 'voltha.services.controller[0].address=voltha-infra1-onos-classic-0.voltha-infra1-onos-classic-hs.infra1.svc:6653' \ --set 'voltha.services.controller[1].service=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc' \ --set 'voltha.services.controller[1].port=6653' \ --set 'voltha.services.controller[1].address=voltha-infra1-onos-classic-1.voltha-infra1-onos-classic-hs.infra1.svc:6653' \ --set 'voltha.services.controller[2].service=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc' \ --set 'voltha.services.controller[2].port=6653' \ --set 'voltha.services.controller[2].address=voltha-infra1-onos-classic-2.voltha-infra1-onos-classic-hs.infra1.svc:6653' \ --set global.log_level=WARN --namespace voltha voltha onf/voltha-stack ETCD ---- A `major` update is needed because tearing down the ETCD cluster means deleting the data stored, thus requiring a rebuild by the different components. The procedure is as follows: #. deploy a new ETCD cluster. #. un-provision all the subscribers via ONOS REST API #. delete the OLT device (not strictly required, but best to ensure clean state) #. redeploy the voltha stack with the `voltha-stack` `helm` chart pointing it to the new ETCD endpoints. #. re-provision the OLT #. re-provision the subscribers #. iterate over steps 2,3,4,5,6 for each stack connected to the ETCD cluster you want to update. Details on the `helm` commands for the voltha stack can be found in the `voltha-helm-charts README file <../voltha-helm-charts/README.md>`_ Following is an example on how to deploy a new 3 node ETCD cluster: .. code:: bash helm install --create-namespace --set auth.rbac.enabled=false,persistence.enabled=false,statefulset.replicaCount=3 --namespace infra etcd bitnami/etcd KAFKA Update ============ An update of Kafka is not considered to be a `major` software upgrade because it can be performed with no service impact to the user. .. code:: bash helm install --create-namespace --set global.log_level=WARN --namespace infra kafka bitnami/kafka Following is an example on how to re-deploy the stack pods, using the `voltha-stack` chart, pointing new kafka (`voltha-infra-kafka-2.infra.svc`) endpoints. Each pod will be restarted but without dataplane interruption because it will be the same of a pod restart, thus leveraging the data stored in ETCD. .. code:: bash helm upgrade --install --create-namespace \ --set global.topics.core_topic=voltha1_voltha1_rwcore,defaults.kv_store_data_prefix=service/minimal \ --set global.kv_store_data_prefix=service/voltha/voltha1_voltha1 \ --set services.etcd.port=2379 --set services.etcd.address=etcd.default.svc:2379 \ --set services.kafka.adapter.service=voltha-infra-kafka-2.infra.svc \ --set services.kafka.cluster.service=voltha-infra-kafka-2.infra.svc \ --set services.etcd.service=voltha-infra-etcd.infra.svc --set 'voltha.services.controller[0].service=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc' \ --set 'voltha.services.controller[0].port=6653' \ --set 'voltha.services.controller[0].address=voltha-infra-onos-classic-0.voltha-infra-onos-classic-hs.infra.svc:6653' \ --set 'voltha.services.controller[1].service=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc' \ --set 'voltha.services.controller[1].port=6653' \ --set 'voltha.services.controller[1].address=voltha-infra-onos-classic-1.voltha-infra-onos-classic-hs.infra.svc:6653' \ --set 'voltha.services.controller[2].service=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc' \ --set 'voltha.services.controller[2].port=6653' \ --set 'voltha.services.controller[2].address=voltha-infra-onos-classic-2.voltha-infra-onos-classic-hs.infra.svc:6653' \ --set global.log_level=WARN --namespace voltha voltha onf/voltha .. _openolt-update: OpenOLT Agent Update ==================== The `openolt agent` on the box can be upgrade without having to teardown all the VOLTHA stack to which the OLT was connected. Again here we make the ditinction of a minor update and a major update of the openolt agent. A minor update happens when there is no API change between the `openolt agent` and the `openolt adapter`, meaning the `openolt.proto` has not been updated in either of those components. A major update is required when there are changes to the `openolt.proto` API. Both updates of the OpenOLT agent are service impacting for the customer. Minor Update ------------ A minor update will be seen from VOLTHA as a reboot of the OLT. During a minor update of the openolt agent no northbound should be done, in progress provision call will reconcile upon OLT reboot. Events, metrics and performance measurements data can be lost and should not be expected during this procedure. The procedure is as follows: #. place the new openolt agent `.deb` package on the desired OLT. #. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT. #. run the new openolt packages #. reboot the OLT hardware. After these steps are done VOLTHA will re-receive the OLT connection and re-provision data accordingly. Major update ------------ A major update will require the OLT to be deleted from VOLTHA to ensure no inconsistent data is stored. During a major update of the openolt agent and adapter no northbound should be done and in progress call will fail. Events, metrics and performance measurements data will be lost. The procedure is as follows: #. Delete the OLT device from VOLTHA (e.g. voltctl device delete ) #. Upgrade the openolt-adapter to the new version via `helm upgrade`. #. place the new openolt agent `.deb` package on the desired OLT. #. stop the running `openolt`, `dev_mgmnt_deamon` and optionally the `watchdog` processes on the OLT. #. run the new openolt packages #. reboot the OLT hardware. #. re-provision the OLT (e.g. `voltctl device provision ` #. re-enable the OLT (e.g. `voltctl device enable ` #. re-provision the subscribers. After these steps VOLTHA effectively treats the OLT as a brand new one which it had no prior knowledge of.