Open-source News

Automate checking for flaws in Python with Thoth

opensource.com - Mon, 04/11/2022 - 15:00
Automate checking for flaws in Python with Thoth Fridolin Pokorny Mon, 04/11/2022 - 03:00 Up Register or Login to like.

Most cyberattacks take advantage of publicly known vulnerabilities. Many programmers can automate builds using Continuous Integration/Continuous Deployment (CI/CD) or DevOps techniques. But how can we automate the checks for security flaws that turn up hourly in different free and open source libraries? Many methods now exist to ferret out buggy versions of libraries when building an application.

This article will focus on Python because it boasts some sophisticated tools for checking the security of dependencies. In particular, the article explores Project Thoth because it pulls together many of these tools to automate Python program builds with security checks as part of the resolution process. One of the authors, Fridolín, is a key contributor to Thoth.

More on security The defensive coding guide 10 layers of Linux container security SELinux coloring book More security articles Inputs to automated security efforts

This section lists efforts to provide the public with information about vulnerabilities. It focuses on tools related to the article's subject: Reports of vulnerabilities in open source Python libraries.

Common Vulnerabilities and Exposures (CVE) program

Any discussion of software security has to start with the comprehensive CVE database, which pulls together flaws discovered by thousands of scattered researchers. The other projects in this article depend heavily on this database. It's maintained by the U.S. National Institute of Standards and Technology (NIST), and additions to it are curated by MITRE, a non-profit corporation specializing in open source software and supported by the U.S. government. The CVE database feeds numerous related projects, such as the CVE Details statistics site.

A person or automated tool can find exact packages and versions associated with security vulnerabilities in a structured format, along with less structured text explaining the vulnerability, as seen below.

Image by:

(Fridolín Pokorný and Andy Oram, CC BY-SA 4.0)

Security efforts by the Python Packaging Authority

The Python Packaging Authority (PyPA) is the major organization creating best practices for open source packages in the Python language. Volunteers from many companies support PyPA. Security-related initiatives by PyPA are significant advances in making Python robust.

PyPA's Advisory Database curates known vulnerabilities in Python packages in a machine-readable form. Yet another project, pip-audit, supported by PyPA, audits application requirements and reports any known vulnerabilities in the packages used. Output from pip-audit can be in both human-readable and structured formats such as JSON. Thus, automated tools can consult the Advisory Database or pip-audit to warn developers about the risks in their dependencies.

A video by Dustin Ingram, a maintainer of PyPI, explains how these projects work.

Open Source Insights

An initiative called Open Source Insights tries to help open source developers by providing information in structured formats about dependencies in popular language ecosystems. Such information includes security advisories, license information, libraries' dependencies, etc.

To exercise Open Source Insights a bit, we looked up the popular TensorFlow data science library and discovered that (at the time of this writing) it has a security advisory on PyPI (see below). Clicking on the MORE DETAILS button shows links that can help research the advisory (second image).

Image by:

(Fridolín Pokorný and Andy Oram, CC BY-SA 4.0)

Image by:

(Fridolín Pokorný and Andy Oram, CC BY-SA 4.0)

Interestingly, the version of TensorFlow provided by the Node.js package manager (npm) had no security advisories at that time. The programming languages used in this case may be the reason for the difference. However, the apparent inconsistency reminds us that provenance can make a big difference, and we'll show how an automated process for resolving dependencies can adapt to such issues.

Open Source Insights obtains dependency information on Python packages by installing them into a clean environment. Python packages are installed by the pip resolver—the most popular installation tool for Python libraries—from PyPI, the most popular index listing open source Python libraries. Vulnerability information for each package is retrieved from the Open Source Vulnerability database (OSV). OSV acts as a triage service, grouping vulnerabilities across multiple language ecosystems.

Open Source Insights would be a really valuable resource if it had an API; we expect that the developers will add one at some point. Even though the information is currently available only as web pages, the structured format allows automated tools to scrape the pages and look for critical information such as security advisories.

Security Scorecards by the Open Source Security Foundation

Software quality—which is intimately tied to security—calls for basic practices such as conducting regression tests before checking changes into a repository, attaching cryptographic signatures to releases, and running static analysis. Some of these practices can be detected automatically, allowing security experts to rate the security of projects on a large scale.

An effort called Security Scorecards, launched in 2020 and backed by the Open Source Security Foundation (OpenSSF), currently lists a couple of dozen such automated checks. Most of these checks depend on GitHub services and can be run only on projects stored in GitHub. The project is still very useful, given the dominance of GitHub for open source projects, and represents a model for more general rating systems.

Project Thoth

Project Thoth is a cloud-based tool that helps Python programmers build robust applications, a task that includes security checking along with many other considerations. Red Hat started Thoth, and it runs in the Red Hat OpenShift cloud service, but its code is entirely open source. The project has built up a community among Python developers. Developers can copy the project's innovations in other programming languages.

A tool that helps programmers find libraries and build applications is called a resolver. The popular pip resolver generally picks the most recent version of each library, but is sophisticated enough to consider the dependencies of dependencies in a hierarchy called a dependency graph. pip can even backtrack and choose a different version of a library to handle version range specifications found by traversing the dependency graph.

When it comes to choosing the best version of a dependency, Thoth can do much more than pip. Here is an overview of Thoth with a particular eye to how it helps with security.

Thoth overview

Thoth considers many elements of a program's environment when installing dependencies: the CPU and operating system on which the program will run, metadata about the application's container such as the ones extracted by Skopeo, and even information about the GPU that a machine learning application will use. Thoth can take into account several other variables, but you can probably guess from the preceding list that Thoth was developed first to support machine learning in containers. The developer provides Thoth with information about the application's environment in a configuration file.

What advantages does the environment information give? It lets Thoth exclude versions of libraries with known vulnerabilities in the specified environment. A developer who notices that a build fails or has problems during a run can store information about what versions of dependencies to use or avoid in a specification called a prescription, consulted by Thoth for future users.

Thoth can even run tests on programs and their environments. Currently, it uses Clair to run static testing over the content of container images and stores information about the vulnerabilities found. In the future, Thoth's developers plan to run actual applications with various combinations of library versions, using a project from the Python Code Quality Authority (PyCQA) named Bandit. Thoth will run Bandit on each package source code separately and combine results during the resolution process.

The different versions of the various libraries can cause a combinatorial explosion (too many possible combinations to test them all). Thoth, therefore, models dependency resolution as a Markov Decision Process (MDP) to decide on the most productive subset to run.

Sometimes security is not the primary concern. For instance, perhaps you plan to run a program in a private network isolated from the Internet. In that case, you can tell Thoth to prioritize some other benefit, such as performance or stability, over security.

Thoth stores its dependency choices in a lock file. Lock files "lock in" particular versions of particular dependencies. Without the lock files, subtle security vulnerabilities and other bugs can creep into the production application. In the worst case, without locking, users can be confronted with so-called "dependency confusion attacks".

For instance, a resolver might choose to get a library from an index with a buggy version because the index from which the resolver usually gets the dependency is temporarily unavailable.

Another risk is that an attacker might bump up a library's version number in an index, causing a resolver to pick that version because it is the most recent one. The desired version exists in a different index but is overlooked in favor of the one that seems more up-to-date.

Wrap-up

Thoth is a complicated and growing collection of open source tools. The basic principles behind its dependency resolutions can be an inspiration for other projects. Those principles are:

  1. A resolver should routinely check for vulnerabilities by scraping websites such as the CVE database, running static checks, and through any other sources of information. The results must be stored in a database.
  2. The resolver has to look through the dependencies of dependencies and backtrack when it finds that some bug or security flaw calls for changing a decision that the resolver made earlier.
  3. The resolver's findings and information passed back by the developers using the resolver should be stored and used in future decisions.

In short, with the wealth of information about security vulnerabilities available these days, we can automate dependency resolution and produce safer applications.

Project Thoth pulls together many open source tools to automate program builds with security checks as part of the resolution process.

Image by:

opensource.com

Security and privacy DevOps Python What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 32 points Arlington, Massachusetts, USA

Andy is a writer and editor in the computer field. His editorial projects at O'Reilly Media ranged from a legal guide covering intellectual property to a graphic novel about teenage hackers. Andy also writes often on health IT, on policy issues related to the Internet, and on trends affecting technical innovation and its effects on society. Print publications where his work has appeared include The Economist, Communications of the ACM, Copyright World, the Journal of Information Technology & Politics, Vanguardia Dossier, and Internet Law and Business. Conferences where he has presented talks include O'Reilly's Open Source Convention, FISL (Brazil), FOSDEM (Brussels), DebConf, and LibrePlanet. Andy participates in the Association for Computing Machinery's policy organization, named USTPC, and is on the editorial board of the Linux Professional Institute.

| Follow praxagora Open Enthusiast Register or Login to post a comment.

Red Hat named to Fortune’s 100 Best Companies to Work For list for the fourth year in a row

Red Hat News - Mon, 04/11/2022 - 12:00

We’re proud to share that Red Hat has been included on Fortune 100 Best Companies to Work For 2022, produced by Great Places to Work and published by Fortune, for the fourth consecutive year. Red Hat is ranked No. 34. 

Installation and Review of Qubes Linux [Lightweight Distro]

Tecmint - Mon, 04/11/2022 - 11:44
The post Installation and Review of Qubes Linux [Lightweight Distro] first appeared on Tecmint: Linux Howtos, Tutorials & Guides .

This article will talk about the installation and setup process of Qubes Linux. It will also talk about how to test and evaluate the security features of Qubes Linux. Finally, it will offer a

The post Installation and Review of Qubes Linux [Lightweight Distro] first appeared on Tecmint: Linux Howtos, Tutorials & Guides.

Linux 5.18-rc2 Released With The Kernel So Far Looking "Fairly Normal"

Phoronix - Mon, 04/11/2022 - 08:58
Following last week's first release candidate of Linux 5.18 that capped off the two week merge window, Linux 5.18-rc2 was just issued as the newest weekly release candidate...

Reiser5 Issues New Development Release, Performance Numbers For Scaling Out

Phoronix - Mon, 04/11/2022 - 00:28
While Reiser4 never made it to mainline and has lacked any major corporate backing while Linux 5.18 is deprecating the older ReiserFS driver for removal later on, former Namesys developer Edward Shishkin continues progressing development on "Reiser5" as the evolution of Reiser4. Out today is the newest Reiser5 snapshot and some performance numbers from Shishkin...

Initial Intel TDX Enablement Positioned For Linux 5.19

Phoronix - Sun, 04/10/2022 - 19:42
It looks like the initial Linux kernel enablement code around Trust Domain Extensions (TDX) will be mainlined for the Linux 5.19 cycle this summer...

Updated AMD Zen 1 Through Zen 3 CPU Microcode Published

Phoronix - Sun, 04/10/2022 - 17:45
On Friday AMD published new CPU microcode files for both Family 17h and Family 19h for Zen 1/2/3 processors. At the moment there isn't any public insight into the changes with this updated microcode but it may be significant...

DisplayLink USB Display Driver 5.5 Supports Newer Linux Kernel Versions, Fixes

Phoronix - Sun, 04/10/2022 - 17:26
While early on DisplayLink's USB2-based devices were friendly with Linux and had upstream open-source driver support, their newer USB3-based display hardware has relied on a binary driver focused on just supporting Ubuntu. Last month DisplayLink released an updated version of that binary blob ahead of Ubuntu 22.04 LTS...

Fedora 37 Planning To Use RPM 4.18 For Better Security

Phoronix - Sun, 04/10/2022 - 17:08
In addition to removing legacy X.Org drivers, deprecating legacy BIOS support, and signing RPM contents another Fedora 37 change proposal submitted this past week is for upgrading against RPM 4.18...

New book teaches readers how to tell data stories with PostgreSQL

opensource.com - Sun, 04/10/2022 - 15:00
New book teaches readers how to tell data stories with PostgreSQL Joshua Allen Holm Sun, 04/10/2022 - 03:00 Up Register or Login to like.

SQL databases can be daunting but can also be very fun if you know how to use them. The information contained in a database can provide many insights to someone who knows how to properly query and manipulate the data. Practical SQL, 2nd Edition: A Beginner's Guide to Storytelling with Data by Anthony DeBarros teaches readers how to do just that.

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources The content

DeBarros, currently a data editor at the Wall Street Journal, pulls from his practical experience in journalism to teach readers how to tell stories with data. The book consists of an introduction, 20 chapters, and several appendices. The introduction sets the tone for the book, explaining what the book is about and who it is for, and the 20 chapters teach lessons about various database topics. Chapter 1 is the traditional "how to set up your environment" chapter and covers how to install PostgreSQL on Windows, macOS, or Linux (specifically, Ubuntu). The following chapters cover the basics of working with SQL databases, like creating databases and tables, performing basic queries, understanding data types, importing and exporting data, and basic math and stats functions. The chapters then progress to more complex topics like joining tables and extracting, inspecting, and modifying data. By the time the reader reaches the book's midpoint, they should have a solid understanding of how databases work.

The chapters in the second half of the book, starting with chapter 11, explore advanced topics.

  • Chapter 11 covers statistical functions.
  • Chapter 12 explains how to work with dates and time.
  • Chapter 13 teaches advanced query techniques.
  • Chapter 14 explores text mining features.
  • Chapter 15 looks at analyzing spatial data using PostGIS.
  • Chapter 16 explains how to work with JSON data.
  • Chapter 17 shows how to use views, functions, and triggers.
  • Chapter 18 discusses using PostgreSQL from the command line.
  • Chapter 19 covers database maintenance.

The final chapter, Chapter 20: Telling Your Data's Story, shifts away from the practical aspects of chapters one through 19 toward providing advice about telling stories using data. Again, Debarros pulls from his experience as a journalist to offer lessons about the whys, hows, and best practices of doing data journalism or data storytelling. If chapters one through 19 are the tools in the toolbox, chapter 20 is a sample blueprint that will inspire the reader to create their own project.

The exercises

There are SQL files and other supplemental resources for the exercises in each chapter in the book's GitHub Repository, except for chapter 20, which has no activities. The repository also contains a file with solutions for each of the "try it yourself" end-of-chapter exercises.

The exercises throughout the book are all very interesting. While the earliest chapters are understandably basic (there are only so many ways to teach CREATE DATABASE and CREATE TABLE), they provide an excellent foundation for the more advanced topics later in the book. The advanced exercises use real-world data to give verisimilitude to the learning experience. The database of choice for Practical SQL, 2nd Edition is PostgreSQL, but the book makes some mentions of different databases when things might work differently. However, it is very much a PostgreSQL book, so that is something to keep in mind.

Final thoughts

Practical SQL, 2nd Edition is a well-written and informative book that can help someone begin to master SQL. Even more importantly, it is an extremely enjoyable book that will keep the reader engaged with interesting, thought-provoking exercises. Anyone interested in learning the ins and outs of PostgreSQL should consider picking up this book. The book's only drawback is that it is a PostgreSQL book, not a database-agnostic book, so anyone trying to learn MySQL, MariaDB, or some other SQL-based database might want to choose a book that focuses on that particular database. The overall "Guide to Storytelling with Data" lessons are something a moderately experienced MySQL, MariaDB, etc. can apply to their database of choice, but this book is not the ideal first book for learning a non-PostgreSQL database. That one caveat emptor aside, I highly recommend Practical SQL, 2nd Edition to anyone wanting to learn PostgreSQL and how to tell stories with data.

Practical SQL, 2nd Edition: A Beginner's Guide to Storytelling with Data by Anthony DeBarros offers an informative and enjoyable way to learn SQL.

Image by:

Opensource.com

Databases What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 1 Comment Register or Login to post a comment. madtom1999 | April 12, 2022

Worth noting that PostgreSQL allow Stored Procedure overloading which can make development a nightmare if you are unaware of it.

Pages