Open-source News

3 tips to manage large Postgres databases

opensource.com - Tue, 02/28/2023 - 16:00
3 tips to manage large Postgres databases elizabeth.chri… Tue, 02/28/2023 - 03:00

The relational database PostgreSQL (also known as Postgres) has grown increasingly popular, and enterprises and public sectors use it across the globe. With this widespread adoption, databases have become larger than ever. At Crunchy Data, we regularly work with databases north of 20TB, and our existing databases continue to grow. My colleague David Christensen and I have gathered some tips about managing a database with huge tables.

Big tables

Production databases commonly consist of many tables with varying data, sizes, and schemas. It's common to end up with a single huge and unruly database table, far larger than any other table in your database. This table often stores activity logs or time-stamped events and is necessary for your application or users.

Really large tables can cause challenges for many reasons, but a common one is locks. Regular maintenance on a table often requires locks, but locks on your large table can take down your application or cause a traffic jam and many headaches. I have a few tips for doing basic maintenance, like adding columns or indexes, while avoiding long-running locks.

Adding indexes problem: Index creation locks the table for the duration of the creation process. If you have a massive table, this can take hours.

CREATE INDEX ON customers (last_name)

Solution: Use the CREATE INDEX CONCURRENTLY feature. This approach splits up index creation into two parts, one with a brief lock to create the index that starts tracking changes immediately but minimizes application blockage, followed by a full build-out of the index, after which queries can start using it.

CREATE INDEX CONCURRENTLY ON customers (last_name)Adding columns

Adding a column is a common request during the life of a database, but with a huge table, it can be tricky, again, due to locking.

Problem: When you add a new column with a default that calls a function, Postgres needs to rewrite the table. For big tables, this can take several hours.

Solution: Split up the operation into multiple steps with the total effect of the basic statement, but retain control of the timing of locks.

Add the column:

ALTER TABLE all_my_exes ADD COLUMN location text

Add the default:

ALTER TABLE all_my_exes ALTER COLUMN location SET DEFAULT texas()

Use UPDATE to add the default:

UPDATE all_my_exes SET location = DEFAULT Adding constraints

Problem: You want to add a check constraint for data validation. But if you use the straightforward approach to adding a constraint, it will lock the table while it validates all of the existing data in the table. Also, if there's an error at any point in the validation, it will roll back.

ALTER TABLE favorite_bands ADD CONSTRAINT name_check CHECK (name = 'Led Zeppelin')

Open source and data science What is data science? What is Python? Data scientist: A day in the life Try OpenShift Data Science MariaDB and MySQL cheat sheet Latest data science articles

Solution: Tell Postgres about the constraint but don't validate it. Validate in a second step. This will take a short lock in the first step, ensuring that all new/modified rows will fit the constraint, then validate in a separate pass to confirm all existing data passes the constraint.

Tell Postgres about the constraint but do not to enforce it:

ALTER TABLE favorite_bands ADD CONSTRAINT name_check CHECK (name = 'Led Zeppelin') NOT VALID

Then VALIDATE it after it's created:

ALTER TABLE favorite_bands VALIDATE CONSTRAINT name_check​​Hungry for more?

David Christensen and I will be in Pasadena, CA, at SCaLE's Postgres Days, March 9-10. Lots of great folks from the Postgres community will be there too. Join us!

Try these handy solutions to common problems when dealing with huge databases.

Image by:

Internet Archive Book Images. Modified by Opensource.com. CC BY-SA 4.0

Data Science Databases What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Delegate common tasks with an open source automation tool

opensource.com - Tue, 02/28/2023 - 16:00
Delegate common tasks with an open source automation tool lnxchk Tue, 02/28/2023 - 03:00

A key precept for site reliability engineering (SRE), platform engineering, and *Ops teams is automating non-essential toil. Think of toil as any tasks that must be done for systems to run smoothly but don't directly impact functionality. Tasks like running security updates and installing or upgrading packages fall into the toil category.

As teams become more proficient at creating automation, automating for other teams becomes more appealing. Delegating tasks to other technical teams so they can self-serve saves time on both sides of a ticket request: The requesting teams get tasks done when they need them, and the implementing teams aren't buried under a ticket queue full of requests that might be simple but need to be done correctly.

Automation should address three gaps to reap these benefits:

  • Knowledge: Tasks can have many steps and dozens of possible options.
  • Skills: Modern ecosystems consist of an enormous number of components with specific execution and configuration requirements.
  • Access: Environments are restricted to a few users for security and compliance reasons.

Powerful automation platforms give teams a solution for these three gaps, providing a centralized resource to create, maintain, and grant access to automation for many types of tasks. These tools shouldn't ask too much of the requestor, who might know, "I want a developer environment in our cloud account" and not, "I want a set of containers built with this image, networked in that way, with access to the test data."

Rundeck is an open source software used to quickly automate manual tasks. Use it to create workflows consisting of commands, scripts, and APIs. These workflows might include software management, configurations, and scheduled events. Rundeck logs these activities for transparency and troubleshooting. How can Rundeck address the example of a developer needing an environment in the organization's cloud? Consider the following points.

Capture and share what you know

Creating automation that "civilians" can use has some challenges. The folks in your organization who work on other teams have their own skill sets and expertise, and it's not uncommon for different teams to have diverse skill sets and even platforms or environments where they are comfortable.

Administrators can automate a wide variety of tasks to be consumed by other teams in the organization. Common targets include provisioning environments and deploying new software versions, but runtime tasks such as service restarts and capacity adjustments are also helpful. These tasks might contain multiple steps, require access to various systems, platforms, or dashboards, and need to adhere to institutional requirements.

A script or other tool represents the team's knowledge of those tasks and others. The solution might be a simple shell script or a more elaborate Python or Go tool; it could run a few commands or a series of requests to a third-party API. Sysadmins should be able to write automation tools in the languages and platforms that make the most sense for the job. Rundeck supports a wide variety of tools and functionality.

Delegate safely

Internal tools aren't usually meant for general consumption, so opening up access to other teams can be challenging. These tools were designed and developed with a specific persona in mind—that of the people who created them! If you want to delegate work to others, you must consider what knowledge they bring and what you can do with that knowledge.

Automation with open source Download now: The automated enterprise eBook Free online course: Ansible essentials Ansible cheat sheet eBook: A practical guide to home automation using open source tools A quickstart guide to Ansible eBook: 7 examples of automation on the edge More articles about open source automation

In the example of a developer who wants a new development environment, you already know things about the developer, like their team or department and the projects they can access. You also know how work gets done for those teams or projects. If the developer is on Team A, and you already have a script called build_dev_env_TeamA.sh, then you can delegate that job to the developer with an automation platform like Rundeck.

With Rundeck, the experts on your teams can use the scripts they already have to create jobs and plugins that others in the organization can utilize. When developers log into the Rundeck server, they only see the jobs and resources they've been granted access to. In this example, that might be a job named "Build a Developer Environment - Team A". The developer is satisfied, and your experts can continue their work uninterrupted. Rundeck records the activity and includes the execution of the job in the audit log. Should something go wrong, it can be found and fixed.

Give everyone what they want: Time

Teams reap the most rewards from automation that saves them time. A purpose-built automation platform like Rundeck gives time back on both sides of complex tasks: Requestors no longer wait for someone else to execute a task for them, and the experts keep their time working on other projects. The organization gains insight into the jobs run in the environment via the audit logs, and everything is secured using existing authorization solutions and Rundeck's access control lists features.

For more on the open source Rundeck platform, check out the tutorials and documentation and download it today.

Rundeck is an open source software used to quickly automate manual tasks. Use it to create workflows consisting of commands, scripts, and APIs.

Image by:

opensource.com

Automation Sysadmin DevOps What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

KDE Plasma Development Branch Switches To Qt6-Only

Phoronix - Tue, 02/28/2023 - 13:00
As of today the KDE Plasma development branch is now made Qt6-only as development work on Plasma 6.0 heats up...

GIMP 2.10.34 Released With JPEG XL Export, Some Backports From GIMP 2.99 Series

Phoronix - Tue, 02/28/2023 - 07:25
While GIMP 3.0 will hopefully release this year after years of waiting, for those using the current GIMP 2.10 stable series the v2.10.34 release is now available to round out the day...

Apache OpenOffice 4.1.14 Brings A Handful Of Bug Fixes

Phoronix - Tue, 02/28/2023 - 04:00
LibreOffice 7.5 released earlier this month as just the latest six-month update to this cross-platform, open-source office suite while today the Apache Software Foundation released OpenOffice 4.1.14. While the prior release, Apache OpenOffice 4.1.13, happened all the way back in July, there isn't much to show for today's update...

Armbian 23.02 ARM/RISC-V OS Released With Linux 6.1 LTS Kernel

Phoronix - Tue, 02/28/2023 - 02:40
Armbian as the Ubuntu and Debian based Linux distribution that is optimized for single board computers primarily in the ARM/AArch64 and RISC-V space is out with its first major update of 2023...

Pages