Showing posts with label infrastructure. Show all posts
Showing posts with label infrastructure. Show all posts

June 14, 2023

Changing the focus of this blog: now... Observability

My previous post about Infrastructure as Code concludes the exploration of Data Center and Cloud solutions and the methodologies that are related to automation and optimization of IT processes.

I've been working in this area for 15 years, after spending the previous 15 in software development. 
It's been an amazing adventure and I really enjoied learning new things, exploring and challenging some limits - and sharing the experience with you.

Now I start focusing on a new adventure... possibly for the next 15 years 😜

I assumed a new professional role, that is the technical lead for Full Stack Observability, EMEA sales, at Cisco Appdynamics. From now on, I will tell you stories about my experience with the ultimate evolution of monitoring: it's all about collecting telemetry from every single component of your business architecture, including digital services (= distributed software applications), computing infrastructure, network, cloud resources, IoT, etc.

It's not just putting all those data together, but correlating them to create an insight. Transforming raw data into information about the health state of your processes, matching business KPI with the state of the infrastructure that supports the services.

To visualize that information and to navigate it, you can subscribe to (or create your own) different domain models, that are views of the world built specifically for each stakeholder: from lines of business to applications managers, from SRE to network operations and security teams...

A domain model is made of entities and their relationships, where entities represent what is relevant in your (business or technical) domain. They might be the usual entities in a APM domain (applications, services, business transactions...) or infrastructure entities (servers, VM, clusters, K8s nodes, etc.). 
You can also model entities in a business domain (e.g. trains, stations, passengers, tickets, etc.).




Unlike Application Performance Monitoring (APM), where solutions like Appdynamics and its competitors excel in drilling down in the application architecture and its topology, with Full Stack Observability you really have full control end-to-end and have a context shared among all the teams that collaborate at building, running and operating the business ecosystem.

New standards like OpenTelemetry make it easy to convey Metrics, Events, Logs and Traces (MELT) to a unique platform from every single part of the system, including eventually robots in manufacturing, GPS tracing from your supply chain, etc.

All these data will be filtered according to the domain model and those that are relevant will feed the databases containing information about the domain entities and their relationships, that are used to populate the dashboard.




Those data will be matched with any other source of information that is relevant in your business domain (CRM, sales, weather forecast, logistics...) so that you can analyse and forecast the health of the business and relate it to the technologies and the processes behind. You can immediately remediate any problem because you detect the root cause easily, and even be proactive in preventing problems before they occur (or before they are perceived by end users). At the same time, you are able to spot opportunities for optimising the preformances and the cost efficiency of the system.

To see what is the official messaging from Cisco about the Full Stack Observability, check this page describing the FSO Platform

Stay tuned, interesting news are coming...


April 28, 2022

Infrastructure as Code: what's the advantage

This post describes the value provided by managing the infrastructure the same way you manage the source code of software applications, applying standard tools and best practices to the automation. The reference to infrastructure, of course, includes all cloud services incorporated in your architecture.

The following topics areI explore in this post. More posts will follow with a deeper investigation, and to show what is the link between Infrastructure as Code (IaC) and DevOps.

  • What does Infrastructure as Code mean?
  • Is IaC a product I can buy?
  • Most common use cases.
  • From where do I start?
  • Resources to practice with Infrastructure as Code.


What does Infrastructure as Code mean

Infrastructure as code (IaC) is the process of managing and provisioning data center environments through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools. 

The IT infrastructure managed by this process includes both physical equipment, such as bare-metal servers, storage and network, as well as virtual machines, and associated configuration resources. The same concept applies to public cloud resources, i.e. IaaS and PaaS services.

The definition files for Infrastructure as Code are maintained in a version control system, similarly to what we do with source code of software applications. Generally, in these files you describe the desired state of the system, rather than a sequence of commands that must be executed. This implies that you trust a component in the infrastructure, called a controller, delegating all the logic and the exception handling to it (or to more than one).


Descriptive model, not commands.

You don't configure the individual components of the system (e.g. 20 switches, or 5 servers or 15 virtual machines and their virtual network) one by one, in the right order, managing eventual error conditions and verifying manually that everything work as expected.

You simply describe what you expect the system to look like to a software controller, that owns the configuration of all the individual components. The controller knows how to contact, provision and configure the elements and to make sure your intent is realised. If any command fails, everything is rolled back to ensure a clean state. The APIC controller in the Cisco ACI architecture has this role, but many examples can be found among Cisco products and other vendors', and open-source solutions.

It is like ordering a slice of cake 

 

  versus preparing the cake yourself following the recipe from your grandma:

In other architectures you don't have a centralised controller, but the programmability of the individual targets and the API that they expose allow for a remote, automated management that is still much better than using the command line interface or any GUI offered by the device. One script could update the configuration of dozens of devices at the same time, e.g. adding a VLAN to all the switches in the network. 


Treat infrastructure like software (source control, single source of truth).

The input files for this process are text files, using different formats based on the tool you use. They might contain variables, whose values are defined externally to make the template reusable (e.g. via environment variables, databases, or systems designed to keep secrets like Vault). I use the word template for Ansible playbooks, Terraform plans, etc.

In any case these are text files, like the files that contain the source code of a software application. And they can be treated the same way: stored in a versioning system, edited collaboratively, subject to role-based access control, retrieved and built automatically by a pipeline orchestrator. 

When you adopt this approach, the latest validated version of the system configuration is stored in the versioning system. You can consider that one the single source of truth, rather than the current configuration of the system (that might be corrupted by uncontrolled manual changes, either made intentionally or by mistake, or consciously applied long time ago for a reason that nobody remembers today). Instead, the last committed version in the repository is documented (including the tests that it passed) and ready to be applied again to reset the system, in case you need to solve a configuration drift, or to clone the environment, or for other use cases that require consistency.


Provision and configure entire environments

One example is creating clones of a complete environment, including computing, network and storage resources, to deploy an application in the different phases of the release process. Even with different sizing, being generated by a single template (or blueprint) makes sure they are identical in the configuration that influences the behaviour of the applications deployed.
There will be no surprise due to a missing configuration of a firewall port, of a datastore or a vlan trunk: consistency is granted, troubleshooting is limited.


Ensure idempotence

Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application (you can find the complete definition here). Well designed API ensure that repeated calls with the same input will not alter the state of the system, that is good because in case of retry you don't risk to create duplicated resources or other troubles.


Is Infrastructure as Code a product I can buy?

No: IaC is a methodology, not a product. It's a set of best practices that you can gradually adopt, and learn step by step. You can start with basic use cases, like creating a new tenant with a few associated resources, if that is a recurring activity and you want to make it faster and error-proof.

Then you will grow with more complex use cases, like creating an entire test environment on demand with all needed resources, including services from a cloud platform. The adoption of Infrastructure as Code is not a big bang, you don't need to build complete automation with no manual activity in one week. You can target quick wins, that validate the approach and generate momentum in the organization. The critical factor in the adoption is change management (i.e. the introduction of a new operational model and new ways of doing things), not the technology. 


Of course you need supporting tools: automation frameworks like Ansible and/or Terraform (eventually also scripting languages could do the job), versioning systems, collaboration systems. And your infrastructure needs to be programmable (public cloud is programmable by default), meaning that your servers, networks, storage should be managed via  software controllers or, at least, expose well-documented API.


Common use cases for Infrastructure as Code


Environments on demand

IT admins and the Operations teams receive a lot of requests from the applications teams: they need a configuration change to fix a problem or to deploy a new service, they need a new test environment, they need a clone for a new tenant, etc. Most of these requests cannot be satisfied immediately because of higher priorities, or because they require the collaboration of different teams that needs planning.

If the provisioning of the system - and some "day 2 operations" - was automated, with controlled execution of validated templates, both parties (Dev and Ops) would save time and be more satisfied... and efficient, for the benefit of the entire company.  


Shared resource pool to increase efficiency

Some companies keep separate environments for each stage of a project: integration test, performance test, quality assurance, production, etc. Resources are always allocated, regardless they are only used - let's say - a week every three months (the interval may vary between one day and one year): only when they release a new version of the project, or have a maintenance windows for deploying bug fixes.

Keeping resources allocated when they are not in use is a waste of capacity, hence a waste of money. Imagine if you multiply the waste by the number of projects. But they cannot do it differently because of the complexity and the time required to build the different environments.

If they could - and with automation they can - recreate an identical environment, end to end, whenever required, they could dispose each environment as soon as it's no longer in use. Knowing that they can recreate it in minutes, they would reuse the returned resources for another stage or another project.
Using a shared resource pool (computing, networking and storage) for many project would be more efficient from a cost perspective. It applies to fixed capacity (less capex: you need to buy less hardware to satisfy all the requests) but also to pay per use scenarios (less opex: you dismiss resources when not needed).


Disaster Recovery

In case you need to repurpose existing or new resources to recover from a disaster, recreating a clone of the system from a single source of truth is much faster and safer. Generating the new infrastructure from the same blueprint that had created the old one, makes sure they are identical.


Blueprints and Compliance

Subject matter experts from every technology domain that collaborate to provision and maintain a system, instead of being engaged every time, could design and release Infrastructure as Code models once. Users (i.e. applications teams or other operations teams) could then use the blueprints for a self-service provisioning, without depending on the availability of the SME. The SME would save their time, feeling safe because the blueprints respect all the defined constraints and comply with the policies (no provisioning anarchy is allowed). 


Auditing

Running automation scripts with standardised logging, or better using a pipeline orchestrator for provisioning and configuring systems, would trace what operations have been done, by whom, the input and the outcome. Very useful audit information, with no effort. 


From where do I start?


Tools: Ansible and Terraform

Those are the most widely used tools for automation (with or without an Infrastructure as Code approach). They are open-source and free, easy to use. An enterprise version also exist, and in some cases you will find it very useful. But you can start practicing with the free tool and use it for years, with great advantage, if you are the only responsible for the infrastructure. In case of teamwork, you can still use the free version and dedicate some time to build your own operational model and additional tools, or you can switch to the enterprise version that makes it easy to scale at the enterprise level.

You can download the software from the Ansible and Terraform websites, along with good documentation and reusable examples (see below). Good tutorials are also available.


Single operating tool: Cisco Intersight 

Cisco Intersight™ is a Software-as-a-Service (SaaS) hybrid cloud operations platform which delivers intelligent automation, observability, and optimization to customers for traditional and cloud-native applications and infrastructure. It supports Cisco Unified Computing System™ (Cisco UCS®) and Cisco HyperFlex™ hyperconverged infrastructure, other Intersight-connected devices, third-party Intersight-connected devices, cloud platforms and services, and other integration endpoints. Because it’s a SaaS-delivered platform, Intersight functionality increases and expands with weekly releases.

With Intersight, you get all of the benefits of SaaS delivery and full lifecycle management of distributed infrastructure and workloads across data centers, remote sites, branch offices, and edge environments. This empowers you to analyze, update, fix, and automate your environment in ways that were not previously possible. As a result, your organization can achieve significant TCO savings and deliver applications faster in support of new business initiatives.

See also Diving Deeper into Hybrid Cloud Operations with Intersight 


Resources to practice Infrastructure as Code

DevNet - Cisco's developers community, that offers tutorials, sandboxes, labs and reusable assets. This is the Infrastructure as Code page at DevNet: https://developer.cisco.com/iac/

Terraform - documentation, download and tutorials at https://www.terraform.io/. The integration with Cisco Intersight is explained at https://www.hashicorp.com/resources/standardizing-hybrid-cloud-environments-with-hashicorpterraform-and-cisco-intersi  

Ansible - documentation, download and tutorials can be found  at https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html 

Cisco workshops on Infrastructure as Code - feel free to contact me if you're interested in participating in our free, 3 x half days hands-on workshop.








October 23, 2017

Turn the lights on in your automated applications deployment - part 2

In the previous post we described the benefit of using Application Automation in conjunction with Network Analytics in the Data Centre, a Public Cloud or both. We described two solutions from Cisco that offer great value individually, and we also explained how they can multiply their power when used together in an integrated way.
This post describes a lab activity that we implemented to demonstrate the integration of Cisco Tetration (network analytics) with Cisco CloudCenter (application deployment and cloud brokerage), creating a solution that combines deep insight into the application architecture and into the network flows.
The Application Profile managed by CloudCenter is the blueprint that defines the automated deployment of a software application in the cloud (public and private). We add information in the Application Profile to automate the configuration of the Tetration Analytics components during the deployment of the application.

Deploy a new (or update an existing) Application Profile with Tetration support enabled

Intent of the lab:
To modify an existing Application Profile or model a new one so that Tetration is automatically configured to collect telemetry, leveraging also the automated installation of sensors.
Execution:
A Tetration Injector service is added to the application tiers to create a scope, define dedicated sensor profiles and intents, and automatically generate an application workspace to render the Application Dependency Mapping for each deployed application.
Step 1 – Edit an existing Application Profile
Cisco CloudCenter editor and the Tetration Injector service


Step 2 – Drag the Tetration Injector service into the Topology Modeler 


Cisco CloudCenter editor

Step 3 – Automate the deployment of the app: select a Tetration sensor type to be added
Tetration sensors can be of two types: Deep Visibility and Deep Visibility with Policy Enforcement. The Tetration Injector service allows you to select the type you want to deploy for this application. The deployment name will be reflected in the Tetration scope and application name.

Defining the Tetration sensor to be deployed

Two types of Tetration sensors

In addition to deploying the sensors, the Tetration injector configures the target Tetration cluster and logs all configuration actions leveraging the CloudCenter centralized logging capabilities.
The activity is executed by the CCO (CloudCenter Orchestrator):
CloudCenter Orchestrator

Step 4 – New resources created on the Tetration cluster
After the user has deployed the application from the CloudCenter self-service catalog, you can go to the Tetration user interface and verify that everything has been created to identify the packet flow that will come from the new application:
Tetration configuration

In addition, the software sensors (also called Agents) are recognized by the Tetration cluster:

Tetration agents

Tetration agents settings


Tetration Analytics – Application Dependency Mapping

An application workspace has been created automatically for the deployed application, through the Tetration API: it shows the communication among all the endpoints and the processes in the operating system that generate and receive the network flows.
The following interactive maps are generated as soon as the network packets, captured by the sensors when the application is used, are processed in the Tetration cluster.
The Cisco Tetration Analytics machine learning algorithms grouped the applications based on distinctive processes and flows.
The figure below shows how the distinctive process view looks like for the web tier:
Tetration Application Dependency Mapping



The distinctive process view for the database tier: 

Tetration Application Dependency Mapping


Flow search on the deployed application:  


Detail of a specific flow from the Web tier to the DB tier: 

 Tetration deep dive on network flow
 


Terminate the application: De-provisioning
When you de-provision the software application as part of the lifecycle managed by CloudCenter (with one click), the following cleanup actions will be managed by the orchestrator automatically:
  • Turn off and delete VMs in the cloud, including the software sensors
  • Delete the application information in Tetration
  • Clear all configuration items and scopes

Conclusion

The combined use of automation (deploying both the applications and the sensors and configuring the context in the analytics platform) as well as the telemetry data that are processed by Tetration help in building a security model based on zero-trust policies.
The following use cases enable a powerful solution thanks to the integrated deployment:
  • Get communication visibility across different application components
  • Implement consistent network segmentation
  • Migrate applications across data center infrastructure
  • Unify systems after mergers and acquisitions
  • Move applications to cloud-based services
Automation limits the manual tasks of configuring, collecting data, analyzing and investigating. It makes security more pervasive, predictive and even improves your reaction capability if a problem is detected.
Both platforms are constantly evolving and the RESTful API approach enables extreme customization in order to accommodate your business needs and implement features as they get released.
The upcoming Cisco Tetration Analytics release – 2.1.1 – will bring new data ingestion modes like ERSPAN, Netflow and neighborhood graphs to validate and assure policy intent on software sensors.
You can learn more from the sources linked in this post, but feel free to add your comments here or contact us to get a direct support if you want to evaluate how this solution applies to your business and technical requirements.

Credits

This post is co-authored with a colleague of mine, Riccardo Tortorici.
He is the real geek and he created the excellent lab for the integration that we describe here, I just took notes from his work.

References

 

October 20, 2017

Turn the lights on in your automated applications deployment - part 1

A very common goal for software designers and security administrators is to get to a Secure Zero-Trust model in an Application-Centric world.
They absolutely need to avoid malicious or accidental outages, data leaks and performance degradation. However this can be very difficult to achieve sometimes, due to the complexity of distributed architectures and the coexistence of many different software applications in the modern shared IT environment.
Two very important steps in the right direction can be Visibility and Automation. In this blog, we will see how the combination of two Cisco software solutions can contribute towards achieving this goal.
This is the description of a lab activity, that we implemented to show the advantage from the integration of Cisco Tetration Analytics (providing network analytics) with Cisco CloudCenter (application deployment and cloud brokerage), creating a really powerful solution that combines deep insight into the application architecture and into the network flows.

Telemetry from the Data Center


Tetration provides telemetry data for your applications

Cisco Tetration Analytics captures telemetry from every packet and every flow, delivers pervasive real-time and historical visibility across your data center, providing a deep understanding of application dependencies and interactions. You can learn more here: http://cs.co/9003BvtPB.
Main use cases for Tetration are:
  • Pervasive Visibility
  • Security
  • Forensics/Troubleshooting, Single Source of Truth
The architecture of Tetration Analytics is made of a big data analytics cluster and two types of sensors: hardware and software based. Sensors can be either in the switches (hw) or in the servers (sw).
Data is collected, stored and processed through a high performance customized Hadoop cluster, which represents the very inner core of the architecture. The software sensors will collect the metadata information from each packet’s header as they leave or enter the hosts. In addition, they will also collect process information, such as the user ID associated with the process and OS characteristics.

Tetration high level architecture
Tetration high level architecture



Tetration can be deployed today in the Data Center or in the cloud (AWS). The choice of the best placement depends on whether you have more deployments on cloud or on premises.
Thanks to the knowledge obtained from the data, you can create zero-trust policies based on white lists and enforce them across every physical and virtual environment. By observing the communication among all the endpoints, you can define exactly who is allowed to contact who (white list, where everything else is denied by default). This applies to both Virtual Machines and Physical Servers (bare metal), including your applications running in the public cloud.
As an example, one of your database servers will be only accessed by the application servers running the business logic for that specific application, by the monitoring and backup tools and no one else. These policies can be enforced by Tetration itself or exported to generate policies in an existing environment (e.g. Cisco ACI).

Behavior analysis of workloads deployed everywhere
Behavior analysis of workloads deployed everywhere

Another benefit of the network telemetry is that you have visibility on any packet, any flow at any time (you can keep up to 2 years of historical data, depending on your Tetration deployment and DC architecture) among two or more application tiers. You can detect performance degradation, ie increasing latency between two application tiers and see the overall status of any complex application.

How to onboard applications in Tetration Analytics
When you start collecting information from the network and the servers into the analytics cluster, you need to give it a context. Every flow needs to be associated to applications, tenants, etc. so that you can give it business significance.
This can be done through the user interface or through the API of the Tetration cluster, matching metadata that come associated with the packet flow. Based on this context, reports and drill down inspection will give you an insight on every breath that the system does.

Automation makes Deployment of Software Applications secure and compliant

The lifecycle of Software Applications generally impacts different organizations in the IT, spreading responsibility and making it hard to ensure quality (including security) and end-to-end visibility.
This is where Cisco Cloud Center comes in. It is a solution for two main use cases:
  • modeling the automated deployment of a software stack (creating a template or blueprint for deployments)
  • brokering cloud services for your applications (different resource pools offered from a single self-service catalog). You can consume IaaS and PaaS services from any private and public cloud, with a portable automation that frees you from lock-in to a specific cloud provider.
Cisco CloudCenter: one solution for all clouds
Cisco CloudCenter: one solution for all clouds



Integration of Automated Deployment and Network Analytics

It is important to note that both platforms are very open and come with a significant support for integration API. The joint usage means benefitting from the visibility and the automation capabilities of each product:
CloudCenter
  • Application architecture awareness (the blueprint for the deployment is created by the software architect)
  • Operating System visibility (version, patches, modules and monitoring)
  • Automation of all configuration actions, both local (in the server) and external (in the cloud environment)
Tetration Analytics
  • Application Dependency Mapping, driven by the observation of all communication flows
  • Awareness of Network nodes behavior, including defined policies and deviations from the baseline
  • Not just sampling, but storing and processing anytime metadata for any single packet in the Data Center

The table below shows how each engine provides additional value to the other one:
Leveraging the integration between the two solutions allows a feedback loop between applications design and operations, providing compliance, continuous improvement and delivery of quality services to the business.

Consequently, all the following Tetration Analytics use cases are made easier if all the setup is automated by CloudCenter, with the advantage of being cloud agnostic:
Cisco Tetration use cases
Cisco Tetration use cases


Of course one of the most tangible results claimed by this end-to-end visibility and policy enforcement is security.
More detail on the integration between CloudCenter and Tetration Analytics are described in the second part of this post, where we will demonstrate how easy it is to automate the deployment of software sensors along with the application, as well as preparing the analytics cluster to welcome the telemetry data.

Credits

This post is co-authored with a colleague of mine, Riccardo Tortorici.
He is the real geek and he created the excellent lab for the integration that we describe here, I just took notes from his work.

References