Open-source News

Why organizations need site reliability engineers

opensource.com - Tue, 06/28/2022 - 15:00
Why organizations need site reliability engineers Robert Kimani Tue, 06/28/2022 - 03:00 Register or Login to like Register or Login to like

In this final article that concludes my series about best practices for effective site reliability engineering (SRE), I cover some of the practical applications of site reliability engineering.

There are some significant differences between software engineering and systems engineering.

Software engineering
  • Focuses on software development and engineering only.
  • Involves writing code to create useful functionality.
  • Time is spent on developing repeatable and reusable software that can be easily extended.
  • Has problem-solving orientation.
  • Software engineering aids the SRE.
Systems engineering
  • Focuses on the whole system including software, hardware and any associated technologies.
  • Time is spent on building, analyzing, and managing solutions.
  • Deals with defining characteristics of a system and feeds requirements to software engineering.
  • Has systems-thinking orientation.
  • Systems engineering enables SRE.

The site reliability engineer (SRE) utilizes both software engineering and system engineering skills, and in so doing adds value to an organization.

As the SRE team runs production systems, an SRE produces the most impactful tools to manage and automate manual processes. Software can be built faster when an SRE is involved, because most of the time the SRE creates software for their own use. As most of the tasks for an SRE are automated, which entails a lot of coding, this introduces a healthy mix of development and operations, which is great for site reliability.

Finally, an SRE enables an organization to automatically scale rapidly whether it's scaling up or scaling down.

More DevOps resources What is DevOps? The ultimate DevOps hiring guide DevOps monitoring tools guide A guide to implementing DevSecOps Download the DevOps glossary eBook: Ansible for DevOps Latest DevOps articles SRE and DevSecOps

An SRE helps build end to end effective monitoring systems by utilizing logs, metrics and traces. An SRE enables fast, effective, and reliable rollbacks and automatic scaling up or down infrastructure as needed. These are especially effective during a security breach.

With the advent of cloud and container-based architectures, data processing pipelines have become a prominent component in IT architectures. An SRE helps configure the most restrictive access to data processing pipelines.

[ Download now: A guide to implementing DevSecOps ]

Finally, an SRE helps develop tools and procedures to handle incidents. While most of these incidents focus on IT operations and reliability, it can be easily extended to security. For example, DevSecOps deals with integrating development, security, and operations with heavy emphasis on automation. It's a field where development, security and operations teams work together to support and maintain an organization's applications and infrastructure.

Designing SRE and pre-production computing environments

A pre-production or non-production environment is an environment used by an SRE to develop, deploy, and test.

The non-production environment is the testing ground for automation. But it's not just application code that requires a non-production environment. Any associated automated processes, primarily the ones that an SRE develops, requires a pre-production environment. Most organizations have more than one pre-production environment. By resembling production as much as possible, the pre-production environment improves confidence in releases. At least one of your non-production environments should resemble the production environment. In many cases it's not possible to replicate production data, but you should try your best to make the non-production environments match the production environments as closely as possible.

Pre-production computing and the SRE

An SRE helps spin-up identical application serving environments by using automation and specialized tools. This is essential, as you can quickly spin up a non-production environment in a matter of seconds using scripts and tools developed by SREs.

A smart SRE treats configuration as code to ensure fast implementation of testing and deployment. Through the use of automated CI/CD pipelines, application releases and hot fixes can be made seamlessly.

Finally, by developing effective monitoring solutions, an SRE helps to ensure the reliability of a pre-production computing environment.

One of the closely related fields to pre-production computing is inner loop development.

Executing on inner loop development

Picture two loops, an inner loop and an outer loop, forming the DevOps loop. In the inner loop, you code, build, run, and debug. This cycle mostly happens in a developer's workstation or some other non-production environment.

Once the code is ready, it is moved to the outer loop, where the process starts with code review, build, deploy, integration tests, security and compliance, and finally pre-production release.

Many of the processes in the outer loop and inner loop are automated by the SRE.

Image by:

(Robert Kimani, CC BY-SA 40)

SRE and inner loop development

The SRE speeds up inner loop development by enabling fast, iterative development by providing tools for containerized deployment. Many of the tools an SRE develops revolve around container automation and container orchestration, using tools such as Podman, Docker, Kubernetes, or platforms like OpenShift.

An SRE also develops tools to help debug crashes with tools such as Java heap dump analysis tools, and Java thread dump analysis tools.

Overall value of SRE

By utilizing both systems engineering and software engineering, an SRE organization delivers impactful solutions. An SRE helps to implement DevSecOps where development, security, and operations intersect with a primary focus on automation.

SRE principles help maximize the function of pre-production environments by utilizing tools and processes that the SRE organizations deliver, so one can easily spin up non-production environment in a matter of seconds. An SRE organization enables efficient inner loop development by developing and providing necessary tools.

  • Improved end user experience: It's all about ensuring that the users of the applications and services, get the best experience as possible. This includes uptime of the applications or services. Applications should be up and running all the time and should be healthy.
  • Minimizes or eliminates outages: It's better for users and developers alike.
  • Automation: As the saying goes, you should always be trying to automate yourself out of the job that you are currently performing manually.
  • Scale: In the age of cloud-native applications and containerized services, massive automated scalability is critical for an SRE to scale up or down in a safe and fast manner.
  • Integrated: The principles and processes that the SRE organization embraces can be, and in many cases should be, extended to other parts of the organization, as with DevSecOps.

The SRE is a valuable component in an efficient organization. As demonstrated over the course of this series, the benefits of SRE affect many departments and processes.

Further reading

Below are some GitHub links to a few of my favorite SRE resources:

SRE is a valuable component in an efficient organization for software engineering, systems engineering, implementing DevSecOps, and more.

Image by:

Opensource.com

DevOps Careers What to read next What you need to know about site reliability engineering How SREs can achieve effective incident response A site reliability engineer's guide to change management How to use a circuit breaker pattern for site reliability engineering What is distributed consensus for site reliability engineering? This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Top PHP Hardening Security Tips for Linux Servers

Tecmint - Tue, 06/28/2022 - 13:03
The post Top PHP Hardening Security Tips for Linux Servers first appeared on Tecmint: Linux Howtos, Tutorials & Guides .

It’s no brainier that PHP is one of the most used server scripting programming languages. It makes sense for an attacker to find various ways by which he can manipulate PHP as it is

The post Top PHP Hardening Security Tips for Linux Servers first appeared on Tecmint: Linux Howtos, Tutorials & Guides.

Intel Begins Publishing Habana Labs Gaudi2 Linux Driver Code

Phoronix - Tue, 06/28/2022 - 06:36
Last month for their Vision conference, Intel announced the Habana Labs Gaudi2 accelerator. Gaudi2 is the second-generation offering from Intel-owned Habana Labs for training and inference. The open-source Linux kernel driver support for Gaudi2 is now getting underway along with accompanying open-source user-space software stack support...

Git 2.37 Released With Sparse Index Feature Now Ready For Widespread Use

Phoronix - Tue, 06/28/2022 - 06:19
Git 2.37 is out today as the latest feature update to this widely-used, distributed revision control system...

GCC-Rust Feedback Sought - Possibly Aiming For Upstream In GCC 13

Phoronix - Tue, 06/28/2022 - 02:10
The Rust-GCC front-end that allows Rust code to be compiled with the GNU Compiler Collection (GCC) could possibly be upstreamed for next year's GCC 13 compiler release but not necessarily complete at that stage. In any case, it's good seeing progress on Rust-GCC as an alternative to Rust's official LLVM-based compiler...

Linux "RADV" Radeon Driver Gets A Big Speed-Up For 16-bit FidelityFX Super Resolution

Phoronix - Tue, 06/28/2022 - 01:30
Merged today to Mesa 22.2 was a four month old merge request for the open-source Radeon Vulkan driver "RADV" that significantly improves the 16-bit FidelityFX Super Resolution (FSR) performance...

Open Programmable Infrastructure: 1+1=3

The Linux Foundation - Mon, 06/27/2022 - 21:31

At last week’’s Open Source Summit North America, Robin Ginn, Executive Director of the OpenJS Foundation, relayed a principle her mentor taught: “1+1=3”. No, this isn’t ‘new math,’ it is demonstrating the principle that, working together, we are more impactful than working apart. Or, as my wife and I say all of the time, teamwork makes the dream work. 

This principle is really at the core of open source technology. Turns out it is also how I look at the Open Programmable Infrastructure project. 

Stepping back a bit, as “the new guy” around here, I am still constantly running across projects where I want to dig in more and understand what it does, how it does it, and why it is important. I had that very thought last week as we launched another new project, the Open Programmable Infrastructure Project. As I was reading up on it, they talked a lot about data processing units (DPUs) and infrastructure processing units (IPUs), and I thought, I need to know what these are and why they matter. In the timeless words of The Bobs, “What exactly is it you do here?” 

What are DPUs/IPUs? 

First – and this is important – they are basically the same thing, they just have different names. Here is my oversimplified explanation of what they do.

In most personal computers, you have a separate graphic processing unit(s) that helps the central processing unit(s) (CPU) handle the tasks related to processing and displaying the graphics. They offload that work from the CPU, allowing it to spend more time on the tasks it does best. So, working together, they can achieve more than each can separately. 

Servers powering the cloud also have CPUs, but they have other tasks that can consume tremendous computing  power, say data encryption or network packet management. Offloading these tasks to separate processors enhances the performance of the whole system, as each processor focuses on what it does best. 

In order words, 1+1=3. 

DPUs/IPUs are highly customizable

While separate processing units have been around for some time, like your PC’s GPU, their functionally was primarily dedicated to a particular task. Instead, DPUs/IPUs combine multiple offload capabilities that are highly  customizable through software. That means a hardware manufacturer can ship these units out and each organization uses software to configure the units according to their specific needs. And, they can do this on the fly. 

Core to the cloud and its continued advancement and growth is the ability to quickly and easily create and dispose of the “hardware” you need. It wasn’t too long ago that if you wanted a server, you spent thousands of dollars on one and built all kinds of infrastructure around it and hoped it was what you needed for the time. Now, pretty much anyone can quickly setup a virtual server in a matter of minutes for virtually no initial cost. 

DPUs/IPUs bring this same type of flexibility to your own datacenter because they can be configured to be “specialized” with software rather than having to literally design and build a different server every time you need a different capability. 

What is Open Programmable Infrastructure (OPI)?

OPI is focused on utilizing  open software and standards, as well as frameworks and toolkits, to allow for the rapid adoption and use of DPUs/IPUs. The OPI Project is both hardware and software companies coming together to establish and nurture an ecosystem to support these solutions. It “seeks to help define the architecture and frameworks for the DPU and IPU software stacks that can be applied to any vendor’s hardware offerings. The OPI Project also aims to foster a rich open source application ecosystem, leveraging existing open source projects, such as DPDK, SPDK, OvS, P4, etc., as appropriate.”

In other words, competitors are coming together to agree on a common, open ecosystem they can build together and innovate, separately, on top of. The are living out 1+1=3.

I, for one, can’t wait to see the innovation.

A special thanks to Yan Fisher of Red Hat for helping me understand open programmable infrastructure concepts. He and his colleague, Kris Murphy, have a more technical blog post on Red Hat’s blog. Check it out. 

For more information on the OPI Project, visit their website and start contributing at https://github.com/opiproject/opi.  

Click here to add your own text

The post Open Programmable Infrastructure: 1+1=3 appeared first on Linux Foundation.

Intel Releases OSPRay Studio v0.11 Visualization / Ray-Tracing App

Phoronix - Mon, 06/27/2022 - 21:30
One of Intel's many wonderful open-source projects for creators is OSPRay Studio that is their interactive visualization and ray-tracing application built atop their OSPRay portable ray-tracing engine. OSPRay Studio has been making steady progress since its 2020 debut and out today is their newest update...

Pages