Open-source News

F2FS Brings Minor Improvements With Linux 6.3

Phoronix - Tue, 02/28/2023 - 22:00
While in recent days there has been much talk around the new, experimental and currently out-of-tree SSDFS file-system for NVMe ZNS drives, when it comes to a modern flash-optimized Linux file-system today, the Flash-Friendly File-System (F2FS) continues handling that space well and has been battle-tested via deployment on Android devices and more. With Linux 6.3, F2FS continues to be refined with more fixes and other minor enhancements...

HP Dev One Production Ends For One Of The Most Interesting Linux Developer Laptops

Phoronix - Tue, 02/28/2023 - 20:00
The HP Dev One Linux laptop is now sold-out and the production on it has ended. The HP Dev One that launched last year was the very interesting collaboration between HP and System76 for coming out with a Linux laptop catering to developers and running Pop!_OS...

EXT4 Scores A Nice Direct I/O Performance Improvement With Linux 6.3

Phoronix - Tue, 02/28/2023 - 19:38
With the EXT4 file-system being quite mature at this stage, with many kernel cycles these days this widely-used file-system just sees bug fixes and other minor work. But for the newly-opened Linux 6.3 cycle, EXT4 is seeing a nice performance boost under certain conditions with direct I/O...

FFmpeg 6.0 Released With NVIDIA NVENC AV1, VA-API Improvements

Phoronix - Tue, 02/28/2023 - 19:15
As was expected given the FFmpeg 6.0 FOSDEM presentation earlier this month in Brussels, this multimedia open-source project is now celebrating its latest major release...

Intel ISPC 1.19 Released With Sapphire Rapids Support, Data Center GPU Max

Phoronix - Tue, 02/28/2023 - 18:57
Making its debut today as their latest open-source project receiving optimizations for 4th Gen Xeon Scalable "Sapphire Rapids" processors, Intel has rolled out ISPC 1.19 as their Implicit SPMD Program Compiler...

3 tips to manage large Postgres databases

opensource.com - Tue, 02/28/2023 - 16:00
3 tips to manage large Postgres databases elizabeth.chri… Tue, 02/28/2023 - 03:00

The relational database PostgreSQL (also known as Postgres) has grown increasingly popular, and enterprises and public sectors use it across the globe. With this widespread adoption, databases have become larger than ever. At Crunchy Data, we regularly work with databases north of 20TB, and our existing databases continue to grow. My colleague David Christensen and I have gathered some tips about managing a database with huge tables.

Big tables

Production databases commonly consist of many tables with varying data, sizes, and schemas. It's common to end up with a single huge and unruly database table, far larger than any other table in your database. This table often stores activity logs or time-stamped events and is necessary for your application or users.

Really large tables can cause challenges for many reasons, but a common one is locks. Regular maintenance on a table often requires locks, but locks on your large table can take down your application or cause a traffic jam and many headaches. I have a few tips for doing basic maintenance, like adding columns or indexes, while avoiding long-running locks.

Adding indexes problem: Index creation locks the table for the duration of the creation process. If you have a massive table, this can take hours.

CREATE INDEX ON customers (last_name)

Solution: Use the CREATE INDEX CONCURRENTLY feature. This approach splits up index creation into two parts, one with a brief lock to create the index that starts tracking changes immediately but minimizes application blockage, followed by a full build-out of the index, after which queries can start using it.

CREATE INDEX CONCURRENTLY ON customers (last_name)Adding columns

Adding a column is a common request during the life of a database, but with a huge table, it can be tricky, again, due to locking.

Problem: When you add a new column with a default that calls a function, Postgres needs to rewrite the table. For big tables, this can take several hours.

Solution: Split up the operation into multiple steps with the total effect of the basic statement, but retain control of the timing of locks.

Add the column:

ALTER TABLE all_my_exes ADD COLUMN location text

Add the default:

ALTER TABLE all_my_exes ALTER COLUMN location SET DEFAULT texas()

Use UPDATE to add the default:

UPDATE all_my_exes SET location = DEFAULT Adding constraints

Problem: You want to add a check constraint for data validation. But if you use the straightforward approach to adding a constraint, it will lock the table while it validates all of the existing data in the table. Also, if there's an error at any point in the validation, it will roll back.

ALTER TABLE favorite_bands ADD CONSTRAINT name_check CHECK (name = 'Led Zeppelin')

Open source and data science What is data science? What is Python? Data scientist: A day in the life Try OpenShift Data Science MariaDB and MySQL cheat sheet Latest data science articles

Solution: Tell Postgres about the constraint but don't validate it. Validate in a second step. This will take a short lock in the first step, ensuring that all new/modified rows will fit the constraint, then validate in a separate pass to confirm all existing data passes the constraint.

Tell Postgres about the constraint but do not to enforce it:

ALTER TABLE favorite_bands ADD CONSTRAINT name_check CHECK (name = 'Led Zeppelin') NOT VALID

Then VALIDATE it after it's created:

ALTER TABLE favorite_bands VALIDATE CONSTRAINT name_check​​Hungry for more?

David Christensen and I will be in Pasadena, CA, at SCaLE's Postgres Days, March 9-10. Lots of great folks from the Postgres community will be there too. Join us!

Try these handy solutions to common problems when dealing with huge databases.

Image by:

Internet Archive Book Images. Modified by Opensource.com. CC BY-SA 4.0

Data Science Databases What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Pages