Open-source News

Linux 5.19 Heavy On Intel Power Management & Thermal Improvements

Phoronix - Wed, 05/25/2022 - 21:30
The power management, ACPI, and thermal control updates are ready for Linux 5.19. This cycle there is a lot of PM/thermal work as usual on the Arm side while Intel also continues with a lot of changes from new hardware support to improving overheat handling of laptops for S0ix handling...

Migrate databases to Kubernetes using Konveyor

opensource.com - Wed, 05/25/2022 - 20:21
Migrate databases to Kubernetes using Konveyor Yasu Katsuno Wed, 05/25/2022 - 08:21 1 reader likes this 1 reader likes this

Kubernetes Database Operator is useful for building scalable database servers as a database (DB) cluster. But because you have to create new artifacts expressed as YAML files, migrating existing databases to Kubernetes requires a lot of manual effort. This article introduces a new open source tool named Konveyor Tackle-DiVA-DOA (Data-intensive Validity Analyzer-Database Operator Adaptation). It automatically generates deployment-ready artifacts for database operator migration. And it does that through datacentric code analysis.

What is Tackle-DiVA-DOA?

Tackle-DiVA-DOA (DOA, for short) is an open source datacentric database configuration analytics tool in Konveyor Tackle. It imports target database configuration files (such as SQL and XML) and generates a set of Kubernetes artifacts for database migration to operators such as Zalando Postgres Operator.

Image by:

(Yasuharu Katsuno and Shin Saito, CC BY-SA 4.0)

DOA finds and analyzes the settings of an existing system that uses a database management system (DBMS). Then it generates manifests (YAML files) of Kubernetes and the Postgres operator for deploying an equivalent DB cluster.

Image by:

(Yasuharu Katsuno and Shin Saito, CC BY-SA 4.0)

More on Kubernetes What is Kubernetes? Free online course: Containers, Kubernetes and Red Hat OpenShift technical over… eBook: Storage Patterns for Kubernetes Test drive OpenShift hands-on An introduction to enterprise Kubernetes How to explain Kubernetes in plain terms eBook: Running Kubernetes on your Raspberry Pi homelab Kubernetes cheat sheet eBook: A guide to Kubernetes for SREs and sysadmins Latest Kubernetes articles

Database settings of an application consist of DBMS configurations, SQL files, DB initialization scripts, and program codes to access the DB.

  • DBMS configurations include parameters of DBMS, cluster configuration, and credentials. DOA stores the configuration to postgres.yaml and secrets to secret-db.yaml if you need custom credentials.
     
  • SQL files are used to define and initialize tables, views, and other entities in the database. These are stored in the Kubernetes ConfigMap definition cm-sqls.yaml.
     
  • Database initialization scripts typically create databases and schema and grant users access to the DB entities so that SQL files work correctly. DOA tries to find initialization requirements from scripts and documents or guesses if it can't. The result will also be stored in a ConfigMap named cm-init-db.yaml.
     
  • Code to access the database, such as host and database name, is in some cases embedded in program code. These are rewritten to work with the migrated DB cluster.
Tutorial

DOA is expected to run within a container and comes with a script to build its image. Make sure Docker and Bash are installed on your environment, and then run the build script as follows:

$ cd /tmp
$ git clone https://github.com/konveyor/tackle-diva.git
$ cd tackle-diva/doa
$ bash util/build.sh

docker image ls diva-doa
REPOSITORY   TAG       IMAGE ID       CREATED        SIZE
diva-doa     2.2.0     5f9dd8f9f0eb   14 hours ago   1.27GB
diva-doa     latest    5f9dd8f9f0eb   14 hours ago   1.27GB

This builds DOA and packs as container images. Now DOA is ready to use.

The next step executes a bundled run-doa.sh wrapper script, which runs the DOA container. Specify the Git repository of the target database application. This example uses a Postgres database in the TradeApp application. You can use the -o option for the location of output files and an -i option for the name of the database initialization script:

$ cd /tmp/tackle-diva/doa
$ bash run-doa.sh -o /tmp/out -i start_up.sh \
      https://github.com/saud-aslam/trading-app
[OK] successfully completed.

The /tmp/out/ directory and /tmp/out/trading-app, a directory with the target application name, are created. In this example, the application name is trading-app, which is the GitHub repository name. Generated artifacts (the YAML files) are also generated under the application-name directory:

$ ls -FR /tmp/out/trading-app/
/tmp/out/trading-app/:
cm-init-db.yaml  cm-sqls.yaml  create.sh*  delete.sh*  job-init.yaml  postgres.yaml  test/

/tmp/out/trading-app/test:
pod-test.yaml

The prefix of each YAML file denotes the kind of resource that the file defines. For instance, each cm-*.yaml file defines a ConfigMap, and job-init.yaml defines a Job resource. At this point, secret-db.yaml is not created, and DOA uses credentials that the Postgres operator automatically generates.

Now you have the resource definitions required to deploy a PostgreSQL cluster on a Kubernetes instance. You can deploy them using the utility script create.sh. Alternatively, you can use the kubectl create command:

$ cd /tmp/out/trading-app
$ bash create.sh  # or simply “kubectl apply -f .”

configmap/trading-app-cm-init-db created
configmap/trading-app-cm-sqls created
job.batch/trading-app-init created
postgresql.acid.zalan.do/diva-trading-app-db created

The Kubernetes resources are created, including postgresql (a resource of the database cluster created by the Postgres operator), service, rs, pod, job, cm, secret, pv, and pvc. For example, you can see four database pods named trading-app-*, because the number of database instances is defined as four in postgres.yaml.

$ kubectl get all,postgresql,cm,secret,pv,pvc
NAME                                        READY   STATUS      RESTARTS   AGE

pod/trading-app-db-0                        1/1     Running     0          7m11s
pod/trading-app-db-1                        1/1     Running     0          5m
pod/trading-app-db-2                        1/1     Running     0          4m14s
pod/trading-app-db-3                        1/1     Running     0          4m

NAME                                      TEAM          VERSION   PODS   VOLUME   CPU-REQUEST   MEMORY-REQUEST   AGE   STATUS
postgresql.acid.zalan.do/trading-app-db   trading-app   13        4      1Gi                                     15m   Running

NAME                            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/trading-app-db          ClusterIP   10.97.59.252    <none>        5432/TCP   15m
service/trading-app-db-repl     ClusterIP   10.108.49.133   <none>        5432/TCP   15m

NAME                         COMPLETIONS   DURATION   AGE
job.batch/trading-app-init   1/1           2m39s      15m

Note that the Postgres operator comes with a user interface (UI). You can find the created cluster on the UI. You need to export the endpoint URL to open the UI on a browser. If you use minikube, do as follows:

$ minikube service postgres-operator-ui

Then a browser window automatically opens that shows the UI.

Image by:

(Yasuharu Katsuno and Shin Saito, CC BY-SA 4.0)

Now you can get access to the database instances using a test pod. DOA also generated a pod definition for testing.

$ kubectl apply -f /tmp/out/trading-app/test/pod-test.yaml # creates a test Pod
pod/trading-app-test created
$ kubectl exec trading-app-test -it -- bash  # login to the pod

The database hostname and the credential to access the DB are injected into the pod, so you can access the database using them. Execute the psql metacommand to show all tables and views (in a database):

# printenv DB_HOST; printenv PGPASSWORD
(values of the variable are shown)

# psql -h ${DB_HOST} -U postgres -d jrvstrading -c '\dt'
             List of relations
 Schema |      Name      | Type  |  Owner  
--------+----------------+-------+----------
 public | account        | table | postgres
 public | quote          | table | postgres
 public | security_order | table | postgres
 public | trader         | table | postgres
(4 rows)

# psql -h ${DB_HOST} -U postgres -d jrvstrading -c '\dv'
                List of relations
 Schema |         Name          | Type |  Owner  
--------+-----------------------+------+----------
 public | pg_stat_kcache        | view | postgres
 public | pg_stat_kcache_detail | view | postgres
 public | pg_stat_statements    | view | postgres
 public | position              | view | postgres
(4 rows)

After the test is done, log out from the pod and remove the test pod:

# exit
$ kubectl delete -f /tmp/out/trading-app/test/pod-test.yaml

Finally, delete the created cluster using a script:

$ bash delete.shWelcome to Konveyor Tackle world!

To learn more about application refactoring, you can check out the Konveyor Tackle site, join the community, and access the source code on GitHub.

Konveyor Tackle-DiVA-DOA helps database engineers easily migrate database servers to Kubernetes.

Kubernetes Upstream Communities What to read next Refactor your applications to Kubernetes This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 31 points | Follow shinsa82 Open Enthusiast Author Register or Login to post a comment.

TUXEDO Aura 15 Gen2 - AMD Ryzen 5000 Series Powered, Linux Laptop

Phoronix - Wed, 05/25/2022 - 19:00
Bavarian PC vendor TUXEDO Computers that specializes in various Linux pre-loaded notebooks and desktop computers recently launched their Aura 15 Gen2 laptop focused on being an "affordable business allrounder" and powered by AMD Ryzen 5000 series processors with integrated Vega graphics to make for a nice open-source driver experience. TUXEDO sent over the Aura 15 Gen2 for a round of testing and here's a look at this Ubuntu Linux laptop's performance and capabilities.

GCC 13 Compiler Finally Adds Support For AMD GFX90A "Aldebaran"

Phoronix - Wed, 05/25/2022 - 18:28
It was over a year ago that AMD initially added the "GFX90A" target to their LLVM AMDGPU compiler back-end while now this week added to the GNU Compiler Collection for the GCC 13 release not due out until next year is its GFX90A support for the GNU toolchain...

Nearly Half A Million Lines Of New Graphics Driver Code Sent In For Linux 5.19

Phoronix - Wed, 05/25/2022 - 18:02
David Airlie this morning sent in the Direct Rendering Manager (DRM) subsystem updates for the Linux 5.19 merge window. Most notable with the DRM display/graphics driver updates for this next kernel version is a lot of work on Intel Arc Graphics DG2/Alchemist in getting that support ready plus initial Raptor Lake enablement. as well as AMD preparing for next-generation CDNA Instinct products and RDNA3 Radeon RX 7000 series graphics cards...

Stratis 3.1 Released For Red Hat's Linux Storage Management Solution

Phoronix - Wed, 05/25/2022 - 17:20
It's been five years already since Red Hat started Stratis as a configuration daemon built atop LVM and XFS in aiming to provide advanced storage functionality in user-space akin to what is offered by the advanced Btrfs and ZFS file-systems...

ARMv9 Scalable Matrix Extension Support Lands In Linux 5.19

Phoronix - Wed, 05/25/2022 - 16:40
The 64-bit Arm (AArch64) architecture changes have been merged into the in-development Linux 5.19 kernel...

Linux's RNG Code Continues Modernization Effort With v5.19

Phoronix - Wed, 05/25/2022 - 16:35
Security researcher Jason Donenfeld known as the founder of the WireGuard project has recently been focused on modernizing the Linux kernel's random number generator (RNG/random) code. With the Linux 5.19 kernel there is yet more work landing...

Improve network performance with this open source framework

opensource.com - Wed, 05/25/2022 - 15:00
Improve network performance with this open source framework Hifza Khalid Wed, 05/25/2022 - 03:00 2 readers like this 2 readers like this

In the age of high-speed internet, most large information systems are structured as distributed systems with components running on different machines. The performance of these systems is generally assessed by their throughput and response time. When performance is poor, debugging these systems is challenging due to the complex interactions between different subcomponents and the possibility of the problem occurring at various places along the communication path.

On the fastest networks, the performance of a distributed system is limited by the host's ability to generate, transmit, process, and receive data, which is in turn dependent on its hardware and configuration. What if it were possible to tune the network performance of a distributed system using a repository of network benchmark runs and suggest a subset of hardware and OS parameters that are the most effective in improving network performance?

To answer this question, our team used Pbench, a benchmarking and performance analysis framework developed by the performance engineering team at Red Hat. This article will walk step by step through our process of determining the most effective methods and implementing them in a predictive performance tuning tool.

What is the proposed approach?

Given a dataset of network benchmark runs, we propose the following steps to solve this problem.

  1. Data preparation: Gather the configuration information, workload, and performance results for the network benchmark; clean the data; and store it in a format that is easy to work with
     
  2. Finding significant features: Choose an initial set of OS and hardware parameters and use various feature selection methods to identify the significant parameters
     
  3. Develop a predictive model: Develop a machine learning model that can predict network performance for a given client and server system and workload
     
  4. Recommend configurations: Given the user's desired network performance, suggest a configuration for the client and the server with the closest performance in the database, along with data showing the potential window of variation in results
     
  5. Evaluation: Determine the model's effectiveness using cross-validation, and suggest ways to quantify the improvement due to configuration recommendations

We collected the data for this project using Pbench. Pbench takes as input a benchmark type with its workload, performance tools to run, and hosts on which to execute the benchmark, as shown in the figure below. It outputs the benchmark results, tool results, and the system configuration information for all the hosts.

Image by:

(Hifza Khalid, CC BY-SA 4.0)

Out of the different benchmark scripts that Pbench runs, we used data collected using the uperf benchmark. Uperf is a network performance tool that takes the description of the workload as input and generates the load accordingly to measure system performance.

Data preparation

There are two disjoint sets of data generated by Pbench. The configuration data from the systems under test is stored in a file system. The performance results, along with the workload metadata, are indexed into an Elasticsearch instance. The mapping between the configuration data and the performance results is also stored in Elasticsearch. To interact with the data in Elasticsearch, we used Kibana. Using both of these datasets, we combined the workload metadata, configuration data, and performance results for each benchmark run.

Finding significant features

To select an initial set of hardware specifications and operating system configurations, we used performance-tuning configuration guides and feedback from experts at Red Hat. The goal of this step was to start working with a small set of parameters and refine it with further analysis. The set was based on parameters from almost all major system subcomponents, including hardware, memory, disk, network, kernel, and CPU.

Once we selected the preliminary set of features, we used one of the most common dimensionality-reduction techniques to eliminate the redundant parameters: remove parameters with constant values. While this step eliminated some of the parameters, given the complexity of the relationship between system information and performance, we resolved to use advanced feature selection methods.

Correlation-based feature selection

Correlation is a common measure used to find the association between two features. The features have a high correlation if they are linearly dependent. If the two features increase simultaneously, their correlation is +1; if they decrease concurrently, it is -1. If the two features are uncorrelated, their correlation is close to 0.

We used the correlation between the system configuration and the target variable to identify and cut down insignificant features further. To do so, we calculated the correlation between the configuration parameters and the target variable and eliminated all parameters with a value less than |0.1|, which is a commonly used threshold to identify the uncorrelated pairs.

Feature-selection methods

Since correlation does not imply causation, we needed additional feature-selection methods to extract the parameters affecting the target variables. We could choose between wrapper methods like recursive feature elimination and embedded methods like Lasso (Least Absolute Shrinkage and Selection Operator) and tree-based methods.

We chose to work with tree-based embedded methods for their simplicity, flexibility, and low computational cost compared to wrapper methods. These methods have built-in feature selection methods. Among tree-based methods, we had three options: a classification and regression tree (CART), Random Forest, and XGBoost.

We calculated our final set of significant features for the client and server systems by taking a union of the results received from the three tree-based methods, as shown in the following table.

Parameters client/server Description Advertised_auto-negotation  client If the linked advertised auto-negotiation CPU(s) server Number of logical cores on the machine Network speed server Speed of the ethernet device Model name client Processor model rx_dropped server Packets dropped after entering the computer stack Model name server Processor model System type server Virtual or physical system Develop predictive model

For this step, we used the Random Forest (RF) prediction model since it is known to perform better than CART and is also easier to visualize.

Random Forest (RF) builds multiple decision trees and merges them to get a more stable and accurate prediction. It builds the trees the same way CART does, but to ensure that the trees are uncorrelated to protect each other from their individual errors, it uses a technique known as bagging. Bagging uses random samples from the data with replacement to train the individual trees. Another difference between trees in a Random Forest and a CART decision tree is the choice of features considered for each split. CART considers every possible feature for each split. However, each tree in a Random Forest picks only from a random subset of features. This leads to even more variation among the Random Forest trees.

The RF model was constructed separately for both the target variables.

More for sysadmins Enable Sysadmin blog The Automated Enterprise: A guide to managing IT with automation eBook: Ansible automation for Sysadmins Tales from the field: A system administrator's guide to IT automation eBook: A guide to Kubernetes for SREs and sysadmins Latest sysadmin articles Recommend configurations

For this step, given desired throughput and response time values, along with the workload of interest, our tool searches through the database of benchmark runs to return the configuration with the performance results closest to what the user requires. It also returns the standard deviation for various samples of that run, suggesting potential variation in the actual results.

Evaluation

To evaluate our predictive model, we used a repeated K-Fold cross-validation technique. It is a popular choice to get an accurate estimate of the efficiency of the predictive model.

To evaluate the predictive model with a dataset of 9,048 points, we used k equal to 10 and repeated the cross-validation method three times. The accuracy was calculated using the two metrics given below.

  • R2 score: The proportion of the variance in the dependent variable that is predictable from the independent variable(s). Its value varies between -1 and 1.
  • Root mean squared error (RMSE): It measures the average squared difference between the estimated values and the actual values and returns its square root.

Based on the above two criteria, the results for the predictive model with throughput and latency as target variables are as follows:

  • Throughput (trans/sec):
    • R2 score: 0.984
    • RMSE: 0.012
  • Latency (usec):
    • R2 score: 0.930
    • RMSE: 0.025
What does the final tool look like?

We implemented our approach in a tool shown in the following figure. The tool is implemented in Python. It takes as input the dataset containing the information about benchmark runs as a CSV file, including client and server configuration, workload, and the desired values for latency and throughput. The tool uses this information to predict the latency and throughput results for the user's client server system. It then searches through the database of benchmark runs to return the configuration that has performance results closest to what the user requires, along with the standard deviation for that run. The standard deviation is part of the dataset and is calculated using repeated samples for one iteration or run.

Image by:

(Hifza Khalid, CC BY-SA 4.0)

What were the challenges with this approach?

While working on this problem, there were several challenges that we addressed. The first major challenge was gathering benchmark data, which required learning Elasticsearch and Kibana, the two industrial tools used by Red Hat to index, store, and interact with Pbench data. Another difficulty was dealing with the inconsistencies in data, missing data, and errors in the indexed data. For example, workload data for the benchmark runs was indexed in Elasticsearch, but one of the crucial workload parameters, runtime, was missing. For that, we had to write extra code to access it from the raw benchmark data stored on Red Hat servers.

Once we overcame the above challenges, we spent a large chunk of our effort trying out almost all the feature selection techniques available and figuring out a representative set of hardware and OS parameters for network performance. It was challenging to understand the inner workings of these techniques, their limitations, and their applications and analyze why most of them did not apply to our case. Because of space limitations and shortage of time, we did not discuss all of these methods in this article.

Use Pbench to predict throughput and latency for specific workloads.

Networking Databases What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

SODA Foundation Prioritizes Backup and Restore for Containers, Introduces Object Data Management Across Cloud Providers

The Linux Foundation - Wed, 05/25/2022 - 13:00
Welcomes SoftBank Group to its member ranks

TOKYO, May 25, 2022 – The SODA Foundation, which hosts the SODA Open Data Framework (ODF) for data mobility from edge to core to cloud, today announced two new open source projects: Kahu and Como. Kahu streamlines data protection for Kubernetes and its application data, and Como is a virtual data lake project to enable seamless access to data stored in different clouds. The SODA Foundation also welcomes SoftBank Group as an end-user supporter and key collaboration partner on the Como project.

According to the 2021 SODA Data and Storage Trends Report, two of the top challenges in managing data in containers and cloud-native environments are availability (46%) and management tools (38%).  In direct response to the report findings, the SODA Foundation community collaborated to introduce new tooling options through the Kahu project to improve backup and restore practices critical to data availability.  Furthermore, as enterprises become more data-driven and data growth for some enterprises can exceed 10PB per year, object data management offered by the Como Project will play an important role in performance and scalability requirements for cloud-native environments.

“Data collection, management, and consumption is becoming the new competitive battlefield in IT”, said Steven Tan, chairman, SODA Foundation. “We’re excited to announce Kahu and Como as the latest advances in open source data management and storage. Our 28 members are also excited to welcome the engineers and open source community within SoftBank Group to the Foundation.” 

“Data is the fuel of our global digital economy and harnessing its power requires collaboration on a massive scale”, said Kuniyoshi Suzuki, Senior Director, Cloud Engineering , SoftBank Group.  “Softbank is excited to be joining a community of open source software developers focused on enabling improvements toward data storage, recovery, and retention in cloud environments. We look forward to collaborating with the SODA Foundation and its members, while contributing to the future of this important community.”

New Open Source Releases

In addition to the announcement of Kahu and Como projects, the SODA Foundation also announced the:

  • Release of SODA Framework Madagascar v1.7.0: Formerly Open Data Framework (ODF), SODA Framework comprises independent projects initiated by the community to solve common data and storage problems faced by end users. It includes:
    • Terra: a universal SDS controller for connecting storage to Kubernetes, OpenStack, and VMware environments.
    • Delfin: a performance monitor for heterogeneous storage infrastructure in a single pane of glass.
    • Strato: a multi-cloud data controller using a common S3-compatible interface to connect to cloud storage.
    • Kahu : new project to streamline data protection for Kubernetes and application data.
  • Expansion of its Eco Project Initiative with the introduction of more open source projects: 

DAOS: a software-defined object store designed from the ground up for massively distributed Non Volatile Memory (NVM), providing features such as transactional non-blocking I/O, advanced data protection with self-healing on top of commodity hardware, end-to-end data integrity, fine-grained data control and elastic storage.

YIG: extends Minio backend storage aggregating multiple Ceph clusters to form a massive storage resource pool that can easily scale up to exabyte (EB) levels with minimal performance disruption.

CubeFS: a cloud-native storage platform used as the underlying storage infrastructure for online applications, database or data processing services and machine learning jobs orchestrated by Kubernetes.

Karmada: a Kubernetes management system that enables organizations to run cloud-native applications across multiple Kubernetes clusters and clouds, with no changes to your applications.

SBK: an open source software framework for the performance benchmarking of any storage system.

Conferences and Survey

  • SODACODE: this week, developers from around the world will participate in SODACODE 2022 – the Data & Storage Hackathon on May 25 – 26.  The first-of-its-kind coding event organized by SODA Foundation is open to developers from all levels ranging from beginner to advanced. The hackathon will conclude with project demonstrations, presentation sessions, panel discussions and an award ceremony for the hackathon winners.
  • Trend Survey: The SODA Foundation will release its second-annual Data and Storage Trends Survey on June 30, 2022.
  • SODACON: a technical conference held by SODA Foundation, will be held this year in Yokohama, Japan on December 7, 2022. The conference will bring together industry leaders, developers and end users to present and discuss the most recent innovations, trends, and concerns as well as practical challenges and solutions in the field of Data and Storage Management in the era of cloud-native, IoT, big data, machine learning, and more.

Additional Resources

  • Join the SODA Foundation
  • Attend SODACODE 2022 – The Data & Storage Hackathon
  • Read the 2021 Data and Storage Trends Report

About the SODA Foundation

Previously OpenSDS, the SODA Foundation is part of the Linux Foundation and includes both open source software and standards to support the increasing need for data autonomy. SODA Foundation Premiere members include China Unicom, Fujitsu, Huawei, NTT Communications and Toyota Motor Corporation. Other members include China Construction Bank Fintech, Click2Cloud, GMO Pepabo, IIJ, MayaData, LinBit, Scality, Sony, Wipro and Yahoo Japan.

Media Contact

info@sodafoundation.io

###

The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see its trademark usage page: www.linuxfoundation.org/trademark-usage. Linux is a registered trademark of Linus Torvalds.

The post SODA Foundation Prioritizes Backup and Restore for Containers, Introduces Object Data Management Across Cloud Providers appeared first on Linux Foundation.

Pages