Virtualization made simple for Everyone.
Tanzu https://blogs.vmware.com/vsphere/2020/03/vsphere-7-tanzu-kubernetes-clusters.html

Introduction

Today we are hitting on a key shift in how vSphere Administrators can interact with and empower developers (DevOps). Let’s expand on this and why VMware Tanzu Kubernetes Grid (TKG) is a compelling solution.

The Traditional vSphere Admin/Developer Workflow (and its Pain Points):

The traditional model often involves a ticketing system. Developers need resources (VMs, storage, networking) and file a request. The vSphere admin then has to manually provision these resources. This leads to:

  • Slow Turnaround: Delays in provisioning can slow down development cycles.
  • Administrative Overhead: Managing individual requests is time-consuming for the vSphere admin.
  • Lack of Agility: Developers lack the ability to quickly experiment and iterate.
  • Configuration Drift: Manual configuration can lead to inconsistencies and errors.

Kubernetes and Tanzu Kubernetes Grid (TKG) as a Solution:

Kubernetes, and specifically TKG, offer a different paradigm. They provide a self-service model for developers, while still giving the vSphere admin control and visibility.

Why TKG?

  • Consistent Kubernetes: TKG delivers a consistent, conformant Kubernetes experience across vSphere environments (and even on public clouds). This means developers can use the same tools and workflows regardless of where their applications are deployed.
  • Integrated with vSphere: TKG is deeply integrated with vSphere. This allows you to leverage your existing vSphere infrastructure (compute, storage, networking) and management tools. You’re not replacing vSphere; you’re enhancing it.
  • Centralized Management: While developers gain self-service capabilities, you retain centralized control over the Kubernetes clusters. You can set resource quotas, limits, and security policies to ensure compliance and prevent resource abuse.
  • Simplified Operations: TKG simplifies the deployment and management of Kubernetes clusters. It automates many of the tasks that would otherwise be manual, reducing your operational overhead.
  • Developer Self-Service: Developers can use kubectl and YAML files to define their infrastructure needs (as you mentioned). They can request and provision resources on demand, without having to go through a ticketing system.

Why is this Interesting for the vSphere Administrator?

  • Reduced Ticket Volume: By empowering developers with self-service capabilities, you can significantly reduce the number of resource requests you have to handle.
  • Increased Efficiency: You can focus on higher-level tasks, such as capacity planning, security, and infrastructure optimization.
  • Improved Developer Satisfaction: Developers are happier because they can get the resources they need quickly and easily.
  • Modernization of Your Skillset: Managing Kubernetes environments is a valuable skill in today’s IT landscape. TKG provides a way for you to expand your expertise and stay relevant.
  • Strategic Role: You become an enabler of innovation, rather than a bottleneck. You can help your organization adopt modern application development practices and accelerate its digital transformation.

In short, TKG allows you to provide a “paved road” for developers to consume infrastructure resources while maintaining governance and control. It’s a win-win for both developers and vSphere administrators.

Elephent in the Room ( Permissions and empowerment)

The conflict between platform stability and developer autonomy is more apparent than ever in today’s cloud-native environment. Businesses that move to shared Tanzu Application Platform (TAP) and Tanzu Kubernetes Grid (TKG) environments must strike a careful balance between giving developers the self-service tools they require and avoiding resource sprawl, which can throw operations and budgets off course.

An exciting path towards containerised microservices can easily turn into the “wild west” of Kubernetes resources, where storage volumes remain long after their usefulness has gone, namespaces proliferate unchecked, and CPU and memory requirements far outweigh real needs.


The ramifications are more than just hypothetical. Uncontrolled resource usage results in rapidly increasing cloud expenses, deteriorated performance for important workloads, and possible outages when platform limitations are suddenly reached.

However, tight limitations are also not the solution. Developers eventually find workarounds or, worse, give up on platform adoption completely when they encounter too much bureaucracy when allocating resources. When teams are unable to iterate rapidly, the very efficiencies Tanzu promised vanish.


Let’s discuss a problem that every DevOps team encounters: how can you allow developers the latitude they require on a shared Tanzu platform without allowing resource utilisation to get out of hand? Working with platform teams and following industry best practices, we have found governance strategies that preserve this important equilibrium. The following is a useful guide for setting up rules that safeguard your environment while maintaining the developer experience that first makes Tanzu worthwhile.

Resource Management – The Foundation

Namespaces are essential. Namespaces are the cornerstone of resource management in Kubernetes (and hence Tanzu). Every project or team ought to have its own namespace. This gives you solitude and lets you set limitations and quotas on your own.

Resource quotas limit how much memory, CPU, persistent storage, and other resources can be used in a namespace overall. Establish quotas that avoid excessive consumption while yet satisfying the team’s or project’s acceptable needs. For instance:

Namespace isolation is essential, not optional. Create dedicated namespaces for each team or project to establish clean boundaries for applying controls and maintaining accountability.

Resource quotas should be tailored to each team’s actual workflow patterns:

yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: dev-team-a
spec:
hard:
cpu: "4"
memory: "8Gi"
pods: "20"
persistentvolumeclaims: "5"
services: "10"
configmaps: "30"
secrets: "30"

Limit Ranges: These specify the standard resource requests and restrictions that containers in a namespace are subject to. Additionally, they stop users from building deployments or pods without defining resource needs or restrictions. This guarantees that the resource boundaries of each container are clearly defined. For instance:

Implement granular limit ranges that prevent resource hogging while still accommodating legitimate workload spikes:

yaml
apiVersion: v1
kind: LimitRange
metadata:
name: limits-dev
namespace: dev-team-a
spec:
limits:
- default:
cpu: 500m
memory: 256Mi
defaultRequest:
cpu: 250m
memory: 128Mi
max:
cpu: "2"
memory: "2Gi"
min:
cpu: 50m
memory: 64Mi
type: Container

Beyond Resources – Security Boundaries

Network policies should be designed as a comprehensive mesh. Start with a default-deny policy, then explicitly allow only required communication paths between namespaces and external services.

For Pod Security, embrace the shift to Pod Security Admission with customized profiles that match your security posture. Consider implementing:

  • Development namespaces: Baseline with select exceptions
  • Staging environments: Restricted with limited exceptions
  • Production: Fully restricted profiles with no exceptions

Network Policy:
Control Communication: Network traffic between pods and namespaces is restricted by network policies. This can lessen the effect of a compromised or misbehaving application and is essential for security. You may, for instance, limit communication between the production and development namespaces.

Pod Security Policies:

Although they are no longer in use, it’s nevertheless crucial to comprehend the idea for shifting.

Security Contexts: Although Pod Security Admission has replaced Pod Security Policies, the fundamental ideas are still applicable. To limit what a container can do, you can set security contexts (e.g., executing as root, accessing host filesystems). These days, Pod Security Admission is used to handle them.
PSPs have been replaced by Pod Security Admission (PSA). Various security profile levels (Privilege, Baseline, and Restricted) are defined by PSA that you can

These are now managed through Pod Security Admission.  

  • Pod Security Admission (PSA): This is the replacement for PSPs. PSA defines different levels of security profiles (Privileged, Baseline, Restricted) that you can apply to namespaces. This is a more declarative and easier-to-manage way to enforce pod security.  

Image Registries and Scanning:

  • Approved Images Only: Use a private image registry and only allow developers to deploy images that have been scanned for vulnerabilities and approved. This prevents the introduction of malicious or insecure software into the cluster.
  • Image Scanning: Integrate image scanning into your CI/CD pipeline. Reject images that fail the scan.

Monitoring and Alerting:

  • Real-time Visibility: Set up monitoring and alerting to track resource usage. Alert on namespaces that are approaching or exceeding their quotas. Tools like Prometheus and Grafana are excellent for this.
  • Cost Monitoring: If you’re using a cloud provider, use their cost monitoring tools to track spending by namespace or project.

Automation and GitOps:

  • Infrastructure as Code (IaC): Manage your Kubernetes resources (quotas, limits, network policies) as code using tools like Terraform or Flux. This allows you to version control your configurations and automate their deployment.  
  • GitOps: Use Git as the source of truth for your Kubernetes configurations. Changes are made through pull requests, reviewed, and then automatically deployed. This provides an audit trail and helps to prevent unauthorized changes.  

Developer Training and Guidelines:

  • Education is Key: Train developers on Kubernetes best practices, resource management, and the importance of adhering to quotas and limits.
  • Clear Guidelines: Establish clear guidelines for resource usage and application deployment. Make sure developers understand the consequences of over-provisioning.

Tiered Access Control (RBAC):

  • Principle of Least Privilege: Grant developers only the permissions they need to do their jobs. Avoid giving them cluster-admin privileges. Use RBAC to define roles and assign them to users or groups.

Example RBAC Setup (Simplified):

  • dev-role: Allows developers to create, update, and delete resources within their assigned namespace, but restricts them from creating namespaces or managing cluster-wide resources.
  • dev-team-a group: Developers in this group are assigned the dev-role for the dev-team-a namespace.

Practical Controls

Image governance should be proactive. Implement Kyverno or OPA Gatekeeper policies that automatically reject images:

  • From unapproved registries
  • With critical CVEs
  • Without proper labeling
  • Lacking resource specifications

Enhance your monitoring with predictive analytics. Track resource consumption trends over time to identify potential issues before they become problems. Set up multi-level alerts (warning at 70%, critical at 90%) with automatic notifications to both DevOps, development teams, and the management team for awareness.

Process Matters

Evolve beyond basic GitOps to policy-as-code. Define organizational standards as enforceable policies that automatically validate changes against best practices. This creates guardrails that prevent problematic configurations from being applied.

Develop a comprehensive education program that includes:

  • Hands-on workshops for resource optimization
  • Peer review sessions for deployment configurations
  • Case studies from actual production incidents
  • Recognition for teams demonstrating resource efficiency

Tiered Access Model

Implement a sophisticated RBAC structure with contextual permissions:

  • Namespace-specific developer roles with granular permissions
  • Time-bound elevated access for debugging and deployments
  • Audit-focused roles for compliance and security teams
  • Pipeline service accounts with scoped permissions

For larger organizations, consider namespace federation where teams can request resources through a self-service portal with built-in governance checks.

Remember that effective governance requires continuous refinement based on actual usage patterns and feedback. The most successful Tanzu environments balance controls with developer agility through data-driven policies and transparent processes.

Conclusion

In summary, the adoption of DevOps and microservices demands a fundamental transformation in our approach to infrastructure management. The dynamic and fluid nature of morden application development is simply too much for the ticket-based, traditional method of a decade ago. We now orchestrate sophisticated microservices deployments that need scalability and rapid iteration, rather than deploying monolithic applications on static infrastructure. Encouraging DevOps engineers is now a need for businesses to stay competitive, not just a “nice-to-have.” The provision of self-service infrastructure access, in conjunction with strong governance and control, is essential to this empowerment. The platform to close this gap is offered by tools like VMware Tanzu Kubernetes Grid (TKG), which empowers vSphere administrators to act as innovators rather than gatekeepers. Through the use of Kubernetes and an infrastructure-as-code methodology, We can simplify processes, cut down on administrative burdens, and allow developers to concentrate on creating and implementing applications—what they do best. In addition to increasing developer satisfaction and speeding up time-to-market, this change enables vSphere administrators to update their skill set and take on a more strategic role in the digital transformation of their company. Automation, self-service, and cooperation between development and operations teams are key components of the future of IT. In the era of microservices, embracing these changes is essential to maximising DevOps’ potential and achieving business success.

Sources:

https://blogs.vmware.com/vsphere/2020/03/vsphere-7-tanzu-kubernetes-clusters.html

by:

Leave a Reply

Your email address will not be published. Required fields are marked *