Open-source News

Open Programmable Infrastructure: 1+1=3

The Linux Foundation - Mon, 06/27/2022 - 21:31

At last week’’s Open Source Summit North America, Robin Ginn, Executive Director of the OpenJS Foundation, relayed a principle her mentor taught: “1+1=3”. No, this isn’t ‘new math,’ it is demonstrating the principle that, working together, we are more impactful than working apart. Or, as my wife and I say all of the time, teamwork makes the dream work. 

This principle is really at the core of open source technology. Turns out it is also how I look at the Open Programmable Infrastructure project. 

Stepping back a bit, as “the new guy” around here, I am still constantly running across projects where I want to dig in more and understand what it does, how it does it, and why it is important. I had that very thought last week as we launched another new project, the Open Programmable Infrastructure Project. As I was reading up on it, they talked a lot about data processing units (DPUs) and infrastructure processing units (IPUs), and I thought, I need to know what these are and why they matter. In the timeless words of The Bobs, “What exactly is it you do here?” 

What are DPUs/IPUs? 

First – and this is important – they are basically the same thing, they just have different names. Here is my oversimplified explanation of what they do.

In most personal computers, you have a separate graphic processing unit(s) that helps the central processing unit(s) (CPU) handle the tasks related to processing and displaying the graphics. They offload that work from the CPU, allowing it to spend more time on the tasks it does best. So, working together, they can achieve more than each can separately. 

Servers powering the cloud also have CPUs, but they have other tasks that can consume tremendous computing  power, say data encryption or network packet management. Offloading these tasks to separate processors enhances the performance of the whole system, as each processor focuses on what it does best. 

In order words, 1+1=3. 

DPUs/IPUs are highly customizable

While separate processing units have been around for some time, like your PC’s GPU, their functionally was primarily dedicated to a particular task. Instead, DPUs/IPUs combine multiple offload capabilities that are highly  customizable through software. That means a hardware manufacturer can ship these units out and each organization uses software to configure the units according to their specific needs. And, they can do this on the fly. 

Core to the cloud and its continued advancement and growth is the ability to quickly and easily create and dispose of the “hardware” you need. It wasn’t too long ago that if you wanted a server, you spent thousands of dollars on one and built all kinds of infrastructure around it and hoped it was what you needed for the time. Now, pretty much anyone can quickly setup a virtual server in a matter of minutes for virtually no initial cost. 

DPUs/IPUs bring this same type of flexibility to your own datacenter because they can be configured to be “specialized” with software rather than having to literally design and build a different server every time you need a different capability. 

What is Open Programmable Infrastructure (OPI)?

OPI is focused on utilizing  open software and standards, as well as frameworks and toolkits, to allow for the rapid adoption and use of DPUs/IPUs. The OPI Project is both hardware and software companies coming together to establish and nurture an ecosystem to support these solutions. It “seeks to help define the architecture and frameworks for the DPU and IPU software stacks that can be applied to any vendor’s hardware offerings. The OPI Project also aims to foster a rich open source application ecosystem, leveraging existing open source projects, such as DPDK, SPDK, OvS, P4, etc., as appropriate.”

In other words, competitors are coming together to agree on a common, open ecosystem they can build together and innovate, separately, on top of. The are living out 1+1=3.

I, for one, can’t wait to see the innovation.

A special thanks to Yan Fisher of Red Hat for helping me understand open programmable infrastructure concepts. He and his colleague, Kris Murphy, have a more technical blog post on Red Hat’s blog. Check it out. 

For more information on the OPI Project, visit their website and start contributing at https://github.com/opiproject/opi.  

Click here to add your own text

The post Open Programmable Infrastructure: 1+1=3 appeared first on Linux Foundation.

Intel Releases OSPRay Studio v0.11 Visualization / Ray-Tracing App

Phoronix - Mon, 06/27/2022 - 21:30
One of Intel's many wonderful open-source projects for creators is OSPRay Studio that is their interactive visualization and ray-tracing application built atop their OSPRay portable ray-tracing engine. OSPRay Studio has been making steady progress since its 2020 debut and out today is their newest update...

Firefox 102 Available With Transform Streams, Geoclue On Linux

Phoronix - Mon, 06/27/2022 - 20:45
The Firefox 102.0 release is now available for download ahead of its stable release announcement tomorrow...

The Performance Cost To A Proposed Fedora 37 CFLAGS/CXXFLAGS Change

Phoronix - Mon, 06/27/2022 - 18:30
Coming about last week was a Fedora 37 change proposal to improve the profiling and debugging of Fedora packages but with possible performance costs. That suggested change is about adding "-fno-omit-frame-pointer" to the default CFLAGS/CXXFLAGS when building packages so the frame pointer is always available for improving the debugging/profiling of the stock Fedora packages. Unfortunately, it can come with significant performance costs as these benchmarks show.

F2FS File-System Driver Preparing A Low-Memory Mode

Phoronix - Mon, 06/27/2022 - 17:12
Google engineers are working on the notion of "memory modes" for the Flash-Friendly File-System (F2FS) with the intent on introducing a "low memory" mode for storage devices that would alter its behavior. Presumably Google is working on this new F2FS feature for low-end Android devices...

Make a temporary file on Linux with Bash

opensource.com - Mon, 06/27/2022 - 15:00
Make a temporary file on Linux with Bash Seth Kenlon Mon, 06/27/2022 - 03:00 Register or Login to like Register or Login to like

When programming in the Bash scripting language, you sometimes need to create a temporary file. For instance, you might need to have an intermediary file you can commit to disk so you can process it with another command. It's easy to create a file such as temp or anything ending in .tmp. However, those names are just as likely to be generated by some other process, so you could accidentally overwrite an existing temporary file. And besides that, you shouldn't have to expend mental effort coming up with names that seem unique. The mktemp command on Fedora-based systems and tempfile on Debian-based systems are specially designed to alleviate that burden by making it easy to create, use, and remove unique files.

Create a temporary file

Both mktemp and tempfile create a temporary file as their default action and print the name and location of the file as output:

$ tempfile
/tmp/fileR5dt6r

$ mktemp
/tmp/tmp.ojEfvMaJEp

Unless you specify a different path, the system places temporary files in the /tmp directory. For mktemp, use the -p option to specify a path:

$ mktemp -p ~/Demo
/home/tux/Demo/tmp.i8NuhzbEJN

For tempfile, use the --directory or -d option:

$ tempfile --directory ~/Demo/
/home/sek/Demo/fileIhg9aX

More Linux resources Linux commands cheat sheet Advanced Linux commands cheat sheet Free online course: RHEL technical overview Linux networking cheat sheet SELinux cheat sheet Linux common commands cheat sheet What are Linux containers? Our latest Linux articles Find your temporary file

The problem with using an auto-generated temporary file is that you have no way of knowing what its name is going to be. That's why both commands return the generated file name as output. You can use an interactive shell such as Konsole, GNOME Terminal, or rxvt to use the filename displayed on your terminal to interact with the file.

However, if you're writing a script, there's no way for you to intervene by reading the name of the file and using it in the following commands.

The authors of mktemp and tempfile thought of that problem, and there's an easy fix. The terminal sends output to a stream called stdout. You can capture stdout by setting a variable to the results of a command launched in a subshell:

$ TMPFILE=$(mktemp -p ~/Demo)

$ echo $TMPFILE
/home/tux/Demo/tmp.PjP3g6lCq1

Use $TMPFILE when referring to the file, and it's the same as interacting directly with the file itself.

Create a temporary directory with mktemp

You can also use the mktemp command to create a directory instead of a file:

$ mktemp --directory -p ~/Demo/
/home/tux/Demo/tmp.68ukbuluqI

$ file /home/tux/Demo/tmp.68ukbuluqI
/home/tux/Demo/tmp.68ukbuluqI: directoryCustomize temporary names

Sometimes you might want an element of predictability in even your pseudo-randomly generated filenames. You can customize the names of your temporary files with both commands.

With mktemp, you can add a suffix to your filename:

$ mktemp -p ~/Demo/ --suffix .mine
/home/tux/Demo/tmp.dufLYfwJLO.mine

With tempfile, you can set a prefix and a suffix:

$ tempfile --directory ~/Demo/ \
--prefix tt_ --suffix .mine
/home/tux/Demo/tt_0dfu5q.mineTempfile as touch

You can also set a custom name with tempfile:

$ tempfile --name not_random
not_random

When you use the --name option, it's absolute, ignoring all other forms of customization. In fact, it even ignores the --directory option:

$ tempfile --directory ~/Demo \
--prefix this_is_ --suffix .all \
--name not_random_at
not_random_at

In a way, tempfile can be a substitute for touch and test because it refuses to create a file that already exists:

$ tempfile --name example.txt
open: file exists

The tempfile command isn't installed on all Linux distributions by default, so you must ensure that it exists before you use it as a hack around test in a script.

Install mktemp and tempfile

GNU Core Utils includes the mktemp command. Major distributions include Core Utils by default (it's the same package that contains chmod, cut, du, and other essential commands).

The Debian Utils package includes the tempfile command and is installed by default on most Debian-based distributions and Slackware Linux.

Wrap up

Temporary files are convenient because there's no confusion about whether they're safe to delete. They're temporary, meant to be used as needed and discarded without a second thought. Use them when you need them, and clear them out when you're done.

The mktemp command on Fedora-based systems and tempfile on Debian-based systems are specially designed to alleviate that burden by making it easy to create, use, and remove unique files.

Image by:

Opensource.com

Linux Command line What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

What is distributed consensus for site reliability engineering?

opensource.com - Mon, 06/27/2022 - 15:00
What is distributed consensus for site reliability engineering? Robert Kimani Mon, 06/27/2022 - 03:00 1 reader likes this 1 reader likes this

In my previous article, I discussed how to enforce best practices within your infrastructure. A site reliability engineer (SRE) is responsible for reliability, first and foremost, and enforcing policies that help keep things running is essential.

Distributed consensus

With microservices, containers, and cloud native architectures, almost every application today is going to be a distributed application. Distributed consensus is a core technology that powers distributed systems.

Distributed consensus is a protocol for building reliable distributed systems. You cannot rely on "heartbeats" (signals from your hardware or software to indicate that they're operating normally) because network failures are inevitable.

There are some inherent problems to highlight when it comes to distributed systems. Hardware will fail. Nodes in a distributed system can randomly fail.

This is one of the important assumptions you have to make before you design a distributed system. Network outages are inevitable. You cannot always guarantee 100% network connectivity. Finally, you need a consistent view of any node within a distributed system.

According to the CAP theorem, a distributed system cannot simultaneously have all of these three properties:

  1. Consistency: Consistent views of data at each node. This means that it is possible that you don't see the same data when viewed from 2 different nodes in a distributed system.
  2. Availability: Refers to the availability of data at each node.
  3. Partition tolerance: Refers to tolerance to network failures (which results in network partitions).

Therefore a node needs to have these qualities to function properly.

Over the years, several protocols have been developed in the area of distributed consensus, including Paxos, Raft, and Zab.

Paxos, for instance, was one of the original solutions to the distributed consensus problem. In the Paxos algorithm, nodes in a distributed system send a series of proposals with a unique sequence number. When the majority of processes in the distributed system accept the proposal, that proposal wins, and the sender generates a commit message. The key here is that the majority of the processes accept the proposal.

The strict sequence numbering of proposals is how it avoids duplication of data, and how it solves the problem of ordering.

More DevOps resources What is DevOps? The ultimate DevOps hiring guide DevOps monitoring tools guide A guide to implementing DevSecOps Download the DevOps glossary eBook: Ansible for DevOps Latest DevOps articles Open source distributed consensus

You don't have to reinvent the wheel by writing your own distributed consensus code. There are many open source implementations already available, such as the most popular one Zookeeper. Other implementations are Consul and etcd.

Designing autoscaling

Autoscaling is a process by which the number of servers in a server farm are automatically increased or decreased based on the load. The term "server farm" is used here to refer to any pool of servers in a distributed system. These servers are commonly behind a load balancer, as described in my previous article.

There are numerous benefits to autoscaling, but here are the 4 major ones:

  1. Reduce cost by running only the required servers. For instance, you can automatically remove servers from your pool when the load is relatively low.
  2. Flexibility to run less time-sensitive workload during low traffic, which is another variation of automatically reducing the number of servers.
  3. Automatically replace unhealthy servers (most cloud vendors provide this functionality).
  4. Increase reliability and uptime of your services.

While there are numerous benefits, there are some inherent problems with autoscaling:

  1. A dependent back-end server or a service can get overwhelmed when you automatically expand your pool of servers. The service that you depend on, for example, the remote service your application connects to, may not be aware of the autoscaling activity of your service.
  2. Software bugs can trigger the autoscaler to expand the server farm abruptly. This is a dangerous situation that can happen in production systems. A configuration error, for instance, can cause the autoscaler to uncontrollably start new instances.
  3. Load balancing may not be intelligent enough to consider new servers. For example, a newly added server to the pool usually requires a warm up period before it can actually receive traffic from the load balancer. When the load balancer isn't fully aware of this situation, it can inundate the new server before it's ready.
Autoscaling best practices

Scaling down is more sensitive and dangerous than scaling up. You must fully test all scale-down scenarios.

Ensure the back-end systems, such as your database, remote web service, and so on, or any external systems that your applications depend on can handle the increased load. You may be automatically adding new servers to your pool to handle increased load, but the remote service that your application depends on may not be aware of this.

You must configure an upper limit on the number of servers. This is important. You don't want the autoscaler to uncontrollably start new instances.

Have a "kill switch" you can use to easily stop the autoscaling process. If you hit a bug or configuration error that causes the autoscaler to behave erratically, you need a way to stop it.

3 systems that act in concert for successful autoscaling

There are three systems to consider for successful implementation of autoscaling:

  1. LoadBalancing: One of the crucial benefits of load balancing is the ability to minimize latency by routing traffic to the location closest to the user.
  2. LoadShedding: In order to accept all incoming requests, you only process the ones you can. Drop the excess traffic. Examples of load shedding systems are Netflix Zuul, and Envoy.
  3. Autoscaling: Based on load, your infrastructure automatically scales up or down.

When you're designing your distributed applications, think through all the situations your applications might encounter. You should clearly document how load balancing, load shedding, and autoscaling work together to handle all situations.

Implementing effective health checks

The core job of load balancers is to direct traffic to a set of back-end servers. Load balancers need to know which servers are alive and healthy in order for it to successfully direct traffic to them. You can use health checks to determine which servers are healthy and can receive requests.

Here's what you need to learn about effective health checks:

  • Simple: Monitor for the availability of a back-end server.
  • Content verification: Send a small request to the back-end server and examine the response. For instance, you could look for a particular string or response code.
  • Failure: Your server may be up, but the application listening on a particular pod may be down. Or the pod may be listening, but it may not be accepting new connections. A health check must be intelligent enough to identify a problematic back-end server.

Health checks with sophisticated content verification can increase network traffic. Find the balance between a simple health check (a simple ping, for instance) and a sophisticated content-based health check.

In general, for a web application, hitting the home page of a web server and looking for a proper HTML response can serve as a reasonable health check. These kinds of checks can be automated using the curl command.

Whenever you are doing a postmortem analysis of an outage, review your health check policies and determine how fast your load balancer marked a server up or down. This can be very useful to determine your health check policies.

Stay healthy

Keeping your infrastructure healthy takes time and attention, but done correctly it's an automated process that keeps your systems running smoothly. There's yet more to an SRE's job to discuss, but those are topics for my next article.

Keeping your infrastructure healthy takes time and attention, but done correctly it's an automated process that keeps your systems running smoothly.

Image by:

Opensource.com

DevOps What to read next What you need to know about site reliability engineering How SREs can achieve effective incident response A site reliability engineer's guide to change management How to use a circuit breaker pattern for site reliability engineering This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Pages