Kubernetes in the Department of Defense (DoD)

Kubernetes in the Department of Defense (DoD)

The DoD Enterprise DevSecOps reference design mandates the use of Cloud Native Computing Foundation-compliant Kubernetes clusters and other open-source technologies to achieve DoD-side continuous Authority to Operate (ATO).

Modern software infrastructure is built on a microservices framework, which leverages containers to run software reliably when moved from one computing environment to another. With the growth of Artificial Intelligence (AI), Machine Learning (ML), and cybersecurity, a critical need has emerged for DevSecOps in the U.S. DoD to solve the problem of long software development and delivery cycles. A primary focus of the DoD’s DevSecOps initiative is avoiding any vendor lock. Therefore, the DoD mandated Open Container Initiative (OCI) containers with no vendor lock-in to containers or container runtimes/builders. Since containers are immutable, this will allow the DoD to accredit and harden containers. Also, the DoD mandated Cloud Native Computing Foundation (CNCF) Kubernetes compliant cluster for container orchestration with no vendor lock-in for orchestration options, networking, or storage APIs.

Kubernetes brings the DoD many advantages: 1) Resiliency: when a container fails or crashes it can be automatically restarted, thereby providing a self-healing capability; 2) Baked-in Security: The DoD’s Sidecar Container Security Stack (SCSS) can be automatically injected into any Kubernetes cluster with Zero Trust; 3) Adaptability: There is no downtime when swapping out modular containers; 4) Automation: The GitOps model and Infrastructure as Code (IAC) enable automation; 5) Auto-scaling: Kubernetes automatically scales based on compute/memory needs; and 6) Abstraction layer: Since Kubernetes is managed by CNCF there is no fear of getting lock-in to Cloud APIs or a specific platform.

The DoD is moving to cloud-native environments and microservices, with many systems currently being designed for a microservices framework from the start. Kubernetes is quickly becoming the foundation for all software in the DoD, from jets to bombers to ships. Kubernetes is running across systems throughout the DoD, which can reside on embedded systems, at the edge, and in the cloud. In 2019, a team at Hill Air Force Base in Utah successfully demonstrated Kubernetes on an F-16 jet. Currently, teams are working on building applications on top of Kubernetes for all facets of weapons systems, from space systems to nuclear systems to jets.

Tovi: Modernizing HPC Systems Throughout Academia, Industry, and Government

Tovi: Modernizing HPC Systems Throughout Academia, Industry, and Government

Modernizing HPC infrastructure allows all services to be run under one environment, which brings both increased efficiency to all services, not just HPC workloads, and decreased management overhead for system administrators.

High performance computing (HPC) systems are ubiquitous in academic, industrial, and government organizations and are used for a wide range of applications from basic research all the way to operational systems. Depending on the application, an HPC system can be composed of a single compute server (i.e., node) or can be composed of hundreds or even thousands of nodes that are networked together. HPC continues to experience rapid adoption as a game-changing technology and decision-making aid. In addition, with the ever-growing volume and velocity of data across the battlefield and the increasing use of Artificial Intelligence (AI) in theater, HPC systems are expected to also be deployed at the tactical edge. This will lead to even further adoption of HPC systems as they move beyond stationary emplacements to mobile, forward-deployable systems.

Whereas HPC systems leverage the latest software and hardware to achieve remarkable processing capabilities, there is a fundamental problem with current HPC infrastructure: HPC workload management solutions (e.g. SLURM, HTCondor, IBM LSF, etc.) are not compatible with modern microservices architecture. Microservices enable applications to be built more easily by breaking them down into smaller components that work together collectively. Microservices have replaced monolithic architectures to support scalable software that is composed of smaller applications (i.e., containers) that work together by communicating through language-independent interfaces. Indeed, containers are an increasingly popular method to enable software to run reliably when moved from one computing environment to another. The use of containers has exploded in recent years and this trend is expected to continue for the foreseeable future. The majority of containers are orchestrated, with Kubernetes being the most popular container orchestration platform available. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Managed Kubernetes services have seen significant growth in recent years and are still the standard for managing container environments in the cloud.

While Kubernetes was built to orchestrate applications of loosely coupled, containerized services the types of applications that run on Kubernetes is very different from those that run on HPC systems. Indeed, HPC applications are designed to run to completion, leveraging resources optimally, whereas Kubernetes applications usually run continuously. Microservices users typically leverage containers for speed and modularity; whereas HPC users are more focused on portability and the ability to encapsulate the software with containers. While containers are extremely valuable for a wide range of HPC applications, making the switch to containers can be very difficult. Indeed, HPC applications are difficult to deploy on Kubernetes. Tovi was designed to modernize HPC clusters, enabling HPC users to benefit from Kubernetes orchestration. Thus, Tovi provides a robust and reliable solution to run HPC workloads with Kubernetes. Therefore, Tovi enables organizations to simultaneously reduce costs and maintain maximum flexibility. Tovi is built as a Kubernetes application, which means Tovi users can benefit from Kubernetes container orchestration and at the same time maintain all the benefits of a reliable and robust HPC workload manager. Tovi is simple, easy to use, and designed with the demands of a wide range of HPC applications in mind. Tovi lets users request resources instead of requesting time. Tovi handles the scheduling to maximize resource utilization and minimize wait times. Moreover, since Tovi is built as a Kubernetes application, adding and removing resources, updating system settings, and customizing the deployment is painless and often requires little or no downtime.

Tovi Solution Brief: Run HPC Workloads with Kubernetes

Tovi Solution Brief: Run HPC Workloads with Kubernetes

Tovi is an innovative solution that enables organizations to deploy High Performance Computing (HPC) applications on Kubernetes. Tovi makes it super easy for HPC sites to modernize their software infrastructure and switch to containers.

Popularity of Containers and Kubernetes

Modern software infrastructure is built on a microservices framework. Indeed, containers are an increasingly popular method to enable software to run reliably when moved from one computing environment to another. The use of containers has exploded in recent years and this trend is expected to continue for the foreseeable future. The majority of containers are orchestrated, with Kubernetes being the most popular container orchestration platform available. Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Managed Kubernetes services have seen significant growth in recent years and are still the standard for managing container environments in the cloud.

Containers, Kubernetes, and HPC Applications

Containers are extremely valuable for a wide range of HPC applications, however making the switch to containers can be very difficult. While Kubernetes is exceptional for orchestrating containers, it is very challenging to run HPC workloads with Kubernetes. Indeed, HPC applications are difficult to deploy on Kubernetes because Kubernetes jobs are typically long-running services that run to completion, whereas HPC applications often demand low-latency and high-throughput scheduling to execute jobs in parallel across many nodes and often require specialized resources like GPUs or access to limited software licenses.

Running Kubernetes Orchestration and HPC Workloads

Organizations seeking to deploy HPC applications on Kubernetes can try some of the following limited solutions:

  • Support separate HPC and containerized infrastructures: This approach may have some utility for certain organizations that are already heavily invested in HPC infrastructure. However, this option increases infrastructure and management costs since it requires deploying new containerized applications on a separate cluster from the HPC cluster.
  • Use an existing HPC workload manager and run containerized workloads: This may be a viable option for organizations with simple requirements and a desire to maintain their existing HPC scheduler. However, such an approach will preclude access to native Kubernetes features and consequently may constrain flexibility in managing long-running services where Kubernetes excels.
  • Use Kubernetes native job scheduling features: This may be a viable option for organizations that have not invested much in HPC applications, but it is not practical for the majority of HPC users.

Solution: Tovi

Tovi was designed to address all the shortcomings above, thereby providing a robust and reliable solution to run HPC workloads with Kubernetes. Therefore, Tovi enables organizations to simultaneously reduce costs and maintain maximum flexibility. Tovi is built as a Kubernetes application, which means Tovi users can benefit from Kubernetes container orchestration and at the same time maintain all the benefits of a reliable and robust HPC workload manager. Tovi is simple, easy to use, and designed with the demands of a wide range of HPC applications in mind. Tovi lets users request resources instead of requesting time. Tovi handles the scheduling to maximize resource utilization and minimize wait times. Moreover, since Tovi is built as a Kubernetes application, adding and removing resources, updating system settings, and customizing the deployment is painless and often requires little or no downtime.