Open-source News

Linux 6.4 Looking To Drop The SLOB Memory Allocator

Phoronix - Wed, 03/15/2023 - 19:00
A patch series is proposing that the SLOB memory allocator be removed from the Linux 6.4 kernel this summer...

Open3D 0.17 Released For Open-Source 3D Data Processing

Phoronix - Wed, 03/15/2023 - 18:38
Open3D as an open-source library for 3D data processing from 3D machine learning tasks to adaptable viewing of 3D data is out with its newest feature release...

SPECFEM3D 4.0 Released With AMD HIP GPU Support

Phoronix - Wed, 03/15/2023 - 18:19
The latest notable high performance computing (HPC) open-source project adding mainline support for AMD HIP with ROCm is SPECFEM3D...

GCC 13 Adds RISC-V T-Head Vendor Extension Collection

Phoronix - Wed, 03/15/2023 - 18:11
Being merged today into the GCC 13 compiler is the set of T-Head vendor extensions to the RISC-V ISA. This set of vendor extensions is designed to augment the RISC-V ISA and provide faster and more energy efficient capabilities...

The Qt Group Launches Qt Insight

Phoronix - Wed, 03/15/2023 - 17:51
The Qt Group as the company behind the Qt open-source toolkit has launched Qt Insight as their newest software offering. However, Qt Insight does not appear to be open-source and is marketed as a SaaS product...

How to Install Firefox on RHEL and Debian Systems

Tecmint - Wed, 03/15/2023 - 15:20
The post How to Install Firefox on RHEL and Debian Systems first appeared on Tecmint: Linux Howtos, Tutorials & Guides .

In most modern Linux distributions, the latest version of Firefox has been already installed from the default distribution package manager and configured as the default browser. In this article, we will explain other ways

The post How to Install Firefox on RHEL and Debian Systems first appeared on Tecmint: Linux Howtos, Tutorials & Guides.

How to set up your own open source DNS server

opensource.com - Wed, 03/15/2023 - 15:00
How to set up your own open source DNS server Amar1723 Wed, 03/15/2023 - 03:00

A Domain Name Server (DNS) associates a domain name (like example.com) with an IP address (like 93.184.216.34). This is how your web browser knows where in the world to look for data when you enter a URL or when a search engine returns a URL for you to visit. DNS is a great convenience for internet users, but it's not without drawbacks. For instance, paid advertisements appear on web pages because your browser naturally uses DNS to resolve where those ads "live" on the internet. Similarly, software that tracks your movement online is often enabled by services resolved over DNS. You don't want to turn off DNS entirely because it's very useful. But you can run your own DNS service so you have more control over how it's used.

I believe it's vital that you run your own DNS server so you can block advertisements and keep your browsing private, away from providers attempting to analyze your online interactions. I've used Pi-hole in the past and still recommend it today. However, lately, I've been running the open source project Adguard Home on my network. I found that it has some unique features worth exploring.

Adguard Home

Of the open source DNS options I've used, Adguard Home is the easiest to set up and maintain. You get many DNS resolution solutions, such as DNS over TLS, DNS over HTTPS, and DNS over QUIC, within one single project.

You can set up Adguard as a container or as a native service using a single script:

$ curl -s -S -L \ https://raw.githubusercontent.com/AdguardTeam/AdGuardHome/master/scripts/install.sh

Look at the script so you understand what it does. Once you're comfortable with the install process, run it:

$ sh ./install.sh

Our favorite resources about open source Git cheat sheet Advanced Linux commands cheat sheet Open source alternatives Free online course: RHEL technical overview Check out more cheat sheets

Some of my favorite features of AdGuard Home:

  • An easy admin interface

  • Block ads and malware with the Adguard block list

  • Options to configure each device on your network individually

  • Force safe search on specific devices

  • Set HTTPS for the admin interface, so your remote interacts with it are fully encrypted

I find that Adguard Home saves me time. Its block lists are more robust than those on Pi-hole. You can quickly and easily configure it to run DNS over HTTPS.

No more malware

Malware is unwanted content on your computer. It's not always directly dangerous to you, but it may enable dangerous activity for third parties. That's not what the internet was ever meant to do. I believe you should host your own DNS service to keep your internet history private and out of the hands of known trackers such as Microsoft, Google, and Amazon. Try Adguard Home on your network.

Take control of your internet privacy by running your own DNS server with the open source project, Adguard Home.

Image by:

Opensource.com

Networking Internet What to read next 5 open source tools to take control of your own data This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Synchronize databases more easily with open source tools

opensource.com - Wed, 03/15/2023 - 15:00
Synchronize databases more easily with open source tools Li Zongwen Wed, 03/15/2023 - 03:00

Change Data Capture (CDC) uses Server Agents to record, insert, update, and delete activity applied to database tables. CDC provides details on changes in an easy-to-use relational format. It captures column information and metadata needed to apply the changes to the target environment for modified rows. A changing table that mirrors the column structure of the tracked source table stores this information.

Capturing change data is no easy feat. However, the open source Apache SeaTunnel project i is a data integration platform provides CDC function with a design philosophy and feature set that makes these captures possible, with features above and beyond existing solutions.

CDC usage scenarios

Classic use cases for CDC is data synchronization or backups between heterogeneous databases. You may synchronize data between MySQL, PostgreSQL, MariaDB, and similar databases in one scenario. You could synchronize the data to a full-text search engine in a different example. With CDC, you can create backups of data based on what CDC has captured.

When designed well, the data analysis system obtains data for processing by subscribing to changes in the target data tables. There's no need to embed the analysis process into the existing system.

Sharing data state between microservices

Microservices are popular, but sharing information between them is often complicated. CDC is a possible solution. Microservices can use CDC to obtain changes in other microservice databases, acquire data status updates, and execute the corresponding logic.

Update cache

The concept of Command Query Responsibility Segregation (CQRS) is the separation of command activity from query activity. The two are fundamentally different:

  • A command writes data to a data source.
  • A query reads data from a data source.

The problem is, when does a read event happen in relation to when a write event happened, and what bears the burden of making those events occur?

It can be difficult to update a cache. You can use CDC to obtain data update events from a database and let that control the refresh or invalidation of the cache.

CQRS design usually uses two different storage instances to support business query and change operations. Because of the use of two stores, in order to ensure data consistency, we can use distributed transactions to ensure strong data consistency, at the cost of availability, performance, and scalability. You can also use CDC to ensure final consistency of data, which has better performance and scalability, but at the cost of data latency, which can currently be kept in the range of millisecond in the industry.

For example, you could use CDC to synchronize MySQL data to your full-text search engine, such as ElasticSearch. In this architecture, ElasticSearch searches all queries, but when you want to modify data, you don't directly change ElasticSearch. Instead, you modify the upstream MySQL data, which generates a data update event. This event is consumed by the ElasticSearch system as it monitors the database, and the event prompts an update within ElasticSearch.

In some CQRS systems, a similar method can be used to update the query view.

Pain points

CDC isn't a new concept and various existing projects implement it. For many users, though, there are some disadvantages to the existing solutions.

Single table configuration

With some CDC software, you must configure each table separately. For example, to synchronize ten tables, you need to write ten source SQLs and Sink SQLs. To perform a transform, you also need to write the transform SQL.

Sometimes, a table can be written by hand, but only when the volume is small. When the volume is large, type mapping or parameter configuration errors may occur, resulting in high operation and maintenance costs.

Apache SeaTunnel is an easy-to-use data integration platform hoping to solve this problem.

Schema evolution is not supported

Some CDC solutions support DDL event sending but do not support sending to Sink so that it can make synchronous changes. Even a CDC that can get an event may not be able to send it to the engine because it cannot change the Type information of the transform based on the DDL event (so the Sink cannot follow the DDL event to change it).

Too many links

On some CDC platforms, when there are several tables, a link must be used to represent each table while one is synchronized. When there are many tables, a lot of links are required. This puts pressure on the source JDBC database and causes too many Binlogs, which may result in repeated log parsing.

SeaTunnel CDC architecture goals

Apache SeaTunnel is an open source high-performance, distributed, and massive data integration framework. To tackle the problems the existing data integration tool's CDC functions cannot solve, the community "reinvents the wheel" to develop a CDC platform with unique features. This architectural design is based on the strengths and weaknesses of existing CDC tools.

Apache Seatunnel supports:

  • Lock-free parallel snapshot history data.
  • Log heartbeat detection and dynamic table addition.
  • Sub-database, sub-table, and multi-structure table reading.
  • Schema evolution.
  • All the basic CDC functions.

The Apache SeaTunnel reduces the operations and maintenance costs for users and can dynamically add tables.

For example, when you want to synchronize the entire database and add a new table later, you don't need to maintain it manually, change the job configuration, or stop and restart jobs.

Additionally, Apache SeaTunnel supports reading sub-databases, sub-tables, and multi-structure tables in parallel. It also allows schema evolution, DDL transmission, and changes supporting schema evolution in the engine, which can be changed to Transform and Sink.

SeaTunnel CDC current status

Currently, CDC has the basic capabilities to support incremental and snapshot phases. It also supports MySQL for real-time and offline use. The MySQL real-time test is complete, and the offline test is coming. The schema is not supported yet because it involves changes to Transform and Sink. The dynamic discovery of new tables is not yet supported, and some interfaces have been reserved for multi-structure tables.

Open source and data science What is data science? What is Python? Data scientist: A day in the life Try OpenShift Data Science MariaDB and MySQL cheat sheet Latest data science articles Project outlook

As an Apache incubation project, the Apache SeaTunnel community is developing rapidly. The next community planning session has these main directions:

1. Expand and improve connector and catalog ecology

We're working to enhance many connector and catalog features, including:

  • Support more connectors, including TiDB, Doris, and Stripe.
  • Improving existing connectors in terms of usability and performance.
  • Support CDC connectors for real-time, incremental synchronization scenarios.

Anyone interested in connectors can review Umbrella.

2. Support for more data integration scenarios (SeaTunnel Engine)

There are pain points that existing engines cannot solve, such as the synchronization of an entire database, the synchronization of table structure changes, and the large granularity of task failure.

We're working to solve those issues. Anyone interested in the CDC engine should look at issue 2272.

3. Easier to use (SeaTunnel Web)

We're working to provide a web interface to make operations easier and more intuitive. Through a web interface, we will make it possible to display Catalog, Connector, Job, and related information, in the form of DAG/SQL. We're also giving users access to the scheduling platform to easily tackle task management.

Visit the web sub-project for more information on the web UI.

Wrap up

Database activity often must be carefully tracked to manage changes based on activities such as record updates, deletions, or insertions. Change Data Capture provides this capability. Apache SeaTunnel is an open source solution that addresses these needs and continues to evolve to offer more features. The project and community are active and your participation is welcome.

The open source Apache SeaTunnel project is a data integration platform that makes it easy to synchronize data.

Image by:

Jason Baker. CC BY-SA 4.0.

SCaLE Data Science Databases What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Mice Play In VR - Hackaday

Google News - Wed, 03/15/2023 - 10:00
Mice Play In VR  Hackaday

The supply chain optimization imperative

Red Hat News - Wed, 03/15/2023 - 08:00
<p>The pandemic brought physical supply chain issues into focus and made organizations recognize dynamic supply assurance as a critical capability for their business. Over the next decade, consumer packaged goods (CPG) and retail businesses, among others dependent on supply chains, will invest in supply chain optimization to solve key industry challenges. The desire for convenience and personalization, awareness of the environmental impact of consumption, and fears about trade disruptions and fluctuations in cost are all challenges that can be solved with more connected, agile, s

Pages