opensource.com

Subscribe to opensource.com feed
Updated: 11 min 49 sec ago

How to use modern Python packaging and setuptools plugins together

Thu, 09/08/2022 - 15:00
How to use modern Python packaging and setuptools plugins together Moshe Zadka Thu, 09/08/2022 - 03:00 Register or Login to like Register or Login to like

Python packaging has evolved a lot. The latest ("beta") uses one file, pyproject.toml, to control the package.

A minimal pyproject.toml might look like this:

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "cool_project"
version = "1.0.0"The project section

The project section is the data about the Python project itself, including fields like name and version, which are required.

Other fields are often used, including:

  • description: A one-line description.
  • readme: Location of a README file.
  • authors: Author names and e-mails.
  • dependencies: Other packages used by this project.
The build-system section

Though it does not have to be first, the build-system usually goes at the top. This is because it is the most important one.

The build-backend key points to a module that knows how to build source distributions and wheels from the project. The requires field allows specifying build time dependencies.

Many projects are built with setuptools. There are some new alternatives, like flit and hatch.

More Python resources What is an IDE? Cheat sheet: Python 3.7 for beginners Top Python GUI frameworks Download: 7 essential PyPI libraries Red Hat Developers Latest Python articles Plugins

One benefit of the requires section in build-system is that it can be used to install plugins. The setuptools package, in particular, can use plugins to modify its behavior.

One thing that plugins can do is to set the version automatically. This is a popular need because version management can often be painful.

Segue

Before continuing, it is worth reflecting on the nature of "parodies". A parody of X is an instance of X which exaggerates some aspects, often to the point of humor.

For example, a "parody of a spy movie" is a spy movie, even as it riffs on the genre.

A parody of setuptools plugins

With this in mind, what would a parody of a setuptools plugin look like? By the rule above, it has to be a plugin.

The plugin, called onedotoh, sets the version to... 1.0.0. In order to be a plugin, it first has to be a package.

A package should have a pyproject.toml:

[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"

[project]
name = "onedotoh"
version = "1.0.0"

[project.entry-points."setuptools.finalize_distribution_options"]
setuptools_scm = "onedotoh:guess_version"

There is a new section: project.entry-points. This means that the function guess_version will be called as setuptools is ready to finalize the distribution options.

The code of guess_version is one line:

def guess_version(dist):
    dist.metadata.version = "1.0.0"Version to 1.0.0

Using onedotoh is subtle. One problem is writing the pyproject.toml project section to look like this:

[project]
name = "a_pyproject"
version = "0.1.2"

The version in the pyproject.toml will override the version from the plugin.

The obvious solution is to remove the version field:

[project]
name = "a_pyproject"

This fails in a different way. Without the version in the project section, the file is invalid. The name will not be used.

The right way to do it is as follows:

[project]
name = "a_pyproject"
dynamic = ["version"]

This approach explicitly declares the version field as dynamic.

A full example will look like this:

[build-system]
requires = [
    "setuptools",
    "onedotoh",
]
build-backend = "setuptools.build_meta"

[project]
name = "a_pyproject"
dynamic = ["version"]

Finally, the version is automatically set to 1.0.0.

Wrap up

The setuptools plugin can still be used with modern Python packaging, as long as relevant features are explicitly declared as "dynamic." This makes a field rife for further experimentation with automation.

For example, what if, in addition to guessing the version from external metadata, it would guess the name as well, using the git remote URL?

Using the setuptools plugin with modern Python packaging allows for experimentation with automation.

Image by:

Yuko Honda on Flickr. CC BY-SA 2.0

Python What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Open source events: 4 goals to set and how to measure them

Wed, 09/07/2022 - 15:00
Open source events: 4 goals to set and how to measure them Shaun McCance Wed, 09/07/2022 - 03:00 1 reader likes this 1 reader likes this

Events are an essential component of open source community health. A positive event experience can inspire current contributors and encourage new ones. But how can you tell whether your events are successful?

We at the CHAOSS (Community Health Analytics Open Source Software) App Ecosystem Working Group have considered this question for our events to maintain the health of the communities involved. The CHAOSS App Ecosystem includes several projects focused on developing applications for the Linux platform. While currently dominated by the GNOME and KDE communities, it is not defined by them. The app ecosystem, as it stands today, is primarily driven by volunteers with altruistic goals organized around free software principles. The work we share in this article is an update from our November 2020 article about success metrics for open source events.

We follow the goal-question-metric approach. We first identify several goals that a person or organization might have. We then identify questions we need to answer to achieve those goals. Finally, for each question, we consider quantifiable metrics that provide answers. For event organizers, we established four goals.

Goal 1: Retain and attract contributors

Contributors are the lifeblood of any open source project or ecosystem, and events are a core part of the contributor experience. They are an opportunity to attract new contributors, and they can build and reaffirm relationships that help retain existing contributors. Events should be geared toward creating a harmonious and enthusiastic atmosphere that builds relationships with the project, ecosystem, and community.

One question to consider, then, is how long people who attend events stay with the community. If the value of events is to strengthen the relationships that retain contributors, we should be able to measure that. Answering this question requires long-term data on contributions and event attendance. Fortunately, many projects will already have this historical data.

Another question is what roles event attendees have in our open source projects. Events can often be highly technical and may not attract people who want to contribute in non-coding roles, such as design, documentation, and marketing. If projects have the means to track non-code contributions, which is often tricky, you can correlate with event attendee lists. This information can help event organizers create events that attract a wider audience.

[ Also read Attract contributors to your open source project with authenticity ]

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources Goal 2: Have engaging events

Events are more than presentations. They create a space for community members to come together. If they don't engage with each other, the event is a lost opportunity. The importance of interactions between members is also evident in the so-called hallway track, the name given to engagement that occurs in the hallways and outside the regular event schedule. Some conferences even have discounted tickets for people who don't plan to attend any sessions but want to engage with others in the hallway track. The question for us is how to measure an event's success in terms of how it fosters engagements between event participants.

To measure engagement during regularly scheduled sessions, we could count the number of questions asked during a Q&A session. While the individual speakers directly influence this metric, there are things event organizers can do to encourage Q&A participation. For example, event organizers can use their morning keynote to encourage everyone to ask questions as a form of showing appreciation.

We also considered engagement during the hallway track. One way to measure this is observing how people behave when not in sessions. Another way to measure this is to ask participants in a post-event survey about their experience outside of scheduled sessions.

Finally, we considered engagement in virtual spaces. One option for measuring this is through the number of social media messages that use the event's hashtag or from people we know are at the event. Another possibility for online or even in-person sessions is to have an emoji that could be collected as an easy one-click reaction to sessions, keynotes, and general conference experience.

Goal 3: Understand company contributions

Companies and other organizations are essential contributors to events, even community events. They may provide sponsorship, sponsor employees to help organize, or simply send employees to the event. It's important to ensure companies find value in their contributions, but we must do this in a way that doesn't alienate community contributions. We looked at what companies expect from community events and what we can measure to improve company contributions.

Some open source events have a reputation for being more corporate than others, so one question you might ask is what percentage of attendees are sent by their employer rather than attending as volunteers. Sometimes, the difference is built into ticket pricing. Otherwise, it's an easy question to ask on a registration survey. Although this information is helpful, it is an imperfect metric because paid open source contributors often attend events outside their job responsibilities.

We also considered several metrics around which companies are attending events, what companies are doing apart from sending employees (for example, financially sponsoring the event), and whether companies repeat their involvement. It's important to recognize this to cultivate long-term relationships.

[ Related read 8 ways your company can support and sustain open source ]

Finally, we considered how competitive the landscape of similarly scoped events is. Companies only have so much budget to spend on events. As much as we enjoy similar events, they draw from the same pool of potential sponsors. Understanding which other events are seeking sponsorship can help organizers better differentiate their own events.

Goal 4: Address diversity and skill gaps

The international communities built and developed across the globe, consisting of people from diverse backgrounds, are an essential component of open source projects. These communities contribute their ideas, collaborate on common goals, and help the project expand and scale toward new directions.

In-person events are a unique chance for projects and organizations to bring their contributors together and allow them to interact and exchange diverse experiences, boosting innovation and accelerating growth from different perspectives. They are also an opportunity to cultivate and reinforce a common culture, focusing on increasing diversity and inclusion. These metrics should measure how well an event contributes to fostering this diversity and closing skill gaps in the community.

We started by considering which skill programs we have at our events and how wide a breadth of skills these programs represent. Skill programs could include tutorials, hands-on workshops, hackathons, or many other formats. Do we have skill programs around skills other than coding? There are many skills and roles valuable to open source projects, such as the skills required to organize events.

It's then helpful to look at which skills we need in our community and which skills are lacking. Project leads and people involved in onboarding are often a good source of this information. By comparing the skills programs we have with the skills our community needs, we can better design the programs for future events.

Get involved

The CHAOSS App Ecosystem working group is interested in working with event organizers to continually refine and implement metrics. The KDE and GNOME event organizers have already discussed changing their events to better capture some of these metrics. Our work for event organizers is also available as a PDF.

The CHAOSS App Ecosystem working group is also taking up the challenge of defining metrics for the marketing and communications functions within OSS App Ecosystems. The first step towards this goal was a conversation with folks from KDE and GNOME fulfilling this role. The conversation is available as CHAOSScast Episode #31: Marketing Metrics for OSS Foundations and Projects. The learnings from this conversation are next translated into goals-questions-metrics, as described above for the event organizer metrics.

After our work on promotions and communications teams, we plan to address metrics for finance teams, community managers, release managers, cross-project coordinators, and mentors. Our work has only begun, and we welcome feedback and new contributors.

You are invited to join the work of the CHAOSS App Ecosystem WG. Find more information on our GitHub page.

Measure the success of your community events by defining goals and creating metrics.

Community management What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 64 points Athens, Greece

I enjoy connecting the dots between teams and projects, products and communities. I thrive working in nimble teams, environments with diversity and projects developed in a forward-looking mindset. I am a long-time contributor to international Open Source software projects, and a founder of Open Data/Government initiatives in my home-country, Cyprus.

Find more ways to connect with me on my personal website: http://neofytosk.com/

| Follow tetris4 Open Enthusiast Author 287 points Omaha, NE, USA

Georg Link is an Open Source Strategist. Georg’s mission is to make open source more professional in its use of community metrics and analytics. Georg co-founded the Linux Foundation CHAOSS Project to advance analytics and metrics for open source project health. Georg is an active contributor to several open source projects and has presented on open source topics on many occasions. Georg has an MBA and a Ph.D. in Information Technology. As the Director of Sales at Bitergia, Georg helps organizations and communities with adopting metrics and making open source more sustainable. In his spare time, Georg enjoys reading fiction and hot-air ballooning.

| Follow georglink Open Minded Author Contributor Club 76 points

Community and Engagement Team member for the GNOME Project. Princiipal ecosystems engineer for ITRenew Inc.

| Follow sramkrishna Open Enthusiast Author 64 points California, USA

Nuritzi is currently the Sr. Open Source Program Manager at GitLab. Previously, she was a founding team member of Endless, a company making Linux-based computers for users with little to no internet access, and she is a former President and Chairperson of the Board at the GNOME Foundation.

| Follow 1nuritzi Open Enthusiast Author Register or Login to post a comment.

Applying open organization principles to save factory energy

Tue, 09/06/2022 - 15:00
Applying open organization principles to save factory energy Ron McFarland Tue, 09/06/2022 - 03:00 1 reader likes this 1 reader likes this

The problem with energy costs is that most people don't think about them. They just look at their overall bill without considering how the energy was used. By monitoring and measuring energy use through sensors in very specific locations, energy waste can be made transparent and reduced.

That's the premise of the book Reinventing Fire: Bold Business Solutions for the New Energy Era, which offers methods to eliminate the use of fossil fuels by 2050 through energy waste reduction. It's also the premise of a story I began in an earlier article about two retired utilities salespeople-turned-consultants who started an open organization community to reduce energy waste in their region, which they call the "Reinventing Fire Community." Their work depends on the open organization principles of Community, Transparency, Collaboration, Inclusivity, and Adaptability.

In that article, I talked about actions one of my imaginary consultants recommended for home and commercial building owners. In this article, I discuss the other fictional consultant, who sold electricity to factories.

It may be surprising how much energy is wasted in factories. The fact is, there aren't enough energy specialists available right now, and most production managers have their attention on other things, particularly production volume. Too many top managers look at sales, profit margins, and gross profit and miss "boring" expenses. In addition, small savings are hard to see until they're multiplied hundreds of times.

In addition, most operations managers look at each process separately and individually. They can't see the overall energy waste picture until it's pointed out. This is where transparency and collaboration between people in all plant processes are essential. There might be countless duplications of processes where cooperation could be very beneficial. Real rewards may take 3-5 years to materialize, and unfortunately, many people only look at short-term gains.

Open Organization resources Download resources Join the community What is an open organization? How open is your organization? Factory energy consultant

In this hypothetical case study, the factory energy consultant goes to factories, starting with their previous electricity users. They offer suggestions on how to reduce yearly energy expenses by at least 10-50%. If equipment investment is required, the consultant promises that the equipment will pay for itself in 3-5 years of savings.

The consultant sets the following conditions for working together:

  1. The factory operator must be as transparent as possible about current energy use and expenses.
  2. The consultant will not give anyone this information except trusted partners they are working with to generate the very best proposals.
  3. There is no charge for the consulting, as they only want to reduce energy use in the community, and at this stage the consultants don't know how valuable they will be.
  4. The consultant would like a performance report and a general testimonial letter at the end of 2 years explaining how much the energy user's bill has been reduced.
Industrial energy use

How can this consultant use all the principles of Community, Transparency, Collaboration, Inclusion, and Adaptability?

First, the factory energy consultant makes an appointment with their most trusted electricity-using manufacturing company and explains that they can help the company save money, with a target of saving 27% of primary energy by 2050 while maintaining current production levels. This will save the company money and reduce the impact of volatile price changes or power failures.

The consultant explains the chart below to provide perspective on where energy waste can be avoided.

Image by:

From Reinventing Fire, by Amory Lovins (Ron McFarland, CC BY-SA 4.0)

In manufacturing, 40% of the primary energy (energy that comes directly from the source, such as coal, gas, oil, and so on) is used for heating, and another 40% is used for rotating shafts on machines. The remaining 20% goes to a wide range of processes like smelting, cutting, grinding, boring, lighting, and space conditioning. Those are the areas the consultant wants to explore.

Secondary energy is converted to electricity and delivered. There is loss during this conversion, as illustrated in the bar chart below. Electricity-system losses use nearly as much primary energy as process heat does.

Image by:

From Reinventing Fire, by Amory Lovins (Ron McFarland, CC BY-SA 4.0)

The consultant explains that there are two main areas to look at regarding energy-use reduction: heat and pressure. Where is extremely high temperature applied in the manufacturing process? Where is high pressure applied? Also, how can waste in one process become a raw material input for another? This includes excess heat and excess pressure. Sometimes these two are produced and used jointly, called combined heat and power (CHP). Those two can sometimes generate electricity from primary high heat and pressure and other times from leftover lower heat and pressure from a given process.

Reducing energy consumption in factories

The consultant explains four areas where energy is used and can be reduced in any factory:

  • Level 1: Reduce manufacturing processing energy use. Sometimes this can be accomplished by using sensors and controls to stop equipment from running when not needed. Fine-tuning pressure controls and giving notifications when maintenance is needed to maintain ideal performance are other means of reducing energy use at this level.
     
  • Level 2: Reduce energy delivery losses. This can be achieved by insulating pipes, maintaining steam traps and air filters, and so on. Changing wiring can also reduce electrical energy loss. Old wires were developed only to prevent fires, but new versions are designed to save energy, cutting waste significantly. The consultant mentions that they should review the age and energy delivery efficiency of all wiring. Air compressors could be checked for energy-wasting leaks. Just making these changes could pay for itself in six months.
     
  • Level 3: Get better at generating in-house energy through better devices. Motors run drill presses, chillers, grinders, mixers, and blowers. The cost of running them for only a few weeks could be the same as entirely replacing the power motor itself. The consultant asks several questions: When are these motors running? Are they the right size and specifications for their use right now? Are there devices on the machines that measure energy use? Can variable-speed control be installed to reduce wasted RPMs? What is the age of the motors? These questions also apply to pumps, fans, furnaces, compressors, chillers, blowers, and boilers. The consultant explains that in most factories today, almost all boilers lack smart controls that reduce start-up and shutdown energy waste.
     
  • Level 4: Put energy waste to use. Waste heat and excess pressure can be either useless or a valuable energy source. Repeated cycles of heating and cooling take energy. In most factories, fuel is converted one-third into electricity and two-thirds into heat. That heat very often is thrown away. The amount of energy loss in US factories is equal to the amount of energy Japan uses for everything. That is a huge waste. Could that heat be transported to another process? The consultant might have to include a specialist from the Rocky Mountain Institute to address this topic.

The consultant adds that the four levels above can play off each other. Super-efficient boilers, ultrasound-aided drying systems, and other energy-saving devices are now available. The Industrial Assessment Centers from the US Department of Energy is a great source. There, US companies can learn where to reduce energy for free. The US Energy Information System could also be helpful.

Another thing to look at is industrial waste. Could what they throw away be converted into a byproduct?

[ Related read 4 reasons IT leaders should champion sustainability ]

Analyzing waste

When analyzing energy waste, start where the energy is applied and work backward to the power generation location. Reinventing Fire gives an example of a datacenter in the diagram below:

Image by:

From Reinventing Fire, by Amory Lovins (Ron McFarland, CC BY-SA 4.0)

Notice it starts on the left with the energy needed for computing, then reflects the process of energy loss through the datacenter, server, applications, and business process. Is that computation needed? Is the software efficient? Next, look at the IT equipment. Is it suitable for that type of work? Next, look at the cooling and power supply. Is it appropriate? This same type of questioning should be done for each process in the factory operations manager's facility.

The consultant mentions that starting at the use point is also helpful for piping and air/fluid flow. Generally, smaller pipes require more energy to pump fluid/air/gas through them and require larger pumps. This is true for curves in the piping as well. The less the curve angle, the better. First, design the piping size and curves, then select the pump required. The fatter and straighter, the smaller the pump requirement. Doing this right saves over half the energy required.

Many factories were designed when fuel was cheap, so these cost savings were not considered. The consultant gives the below example:

Image by:

From Reinventing Fire, by Amory Lovins (Ron McFarland, CC BY-SA 4.0)

Notice that most energy is lost right in the power station where the electricity is generated. Then, there is loss in getting the electricity to the location, to operate the motor, to run the drivetrain, pump and values and finally, the fluid going through the pipes. All those losses should be looked at. Therefore, the consultant asks the operations manager to explain all factory piping and air/fluid flow systems.

Assume they find areas that waste a lot of energy and retrofit the entire flow system. What about the size of the pumping equipment? Is it now too large for the process? Equipment is usually purchased to have the capacity to supply maximum demand. When that maximum demand is no longer needed, the motor should also be downsized, even if it has life in it. Motor choice, life, sizing, controls, maintenance, and associated electrical and mechanical elements interact intricately. Now, with a reduction in the requirement, the motors could be replaced with more efficient and smaller ones.

Over and above reduced pumping demand, equipment refurbishing can sometimes pay for itself in less than a few years. For example, bearing quality has improved over the years. That alone could generate significant savings.

The consultant asks about any furnaces in the factory, a major energy user. They mention that reverberatory furnaces to melt aluminum can be replaced by electric heaters with ceramic coatings. This is called isothermal melting, which saves energy and floor space. Solar heating is another major power source for industry, with falling prices and higher efficiency.

Production equipment lifecycle

Items used for production go through a lifecycle, and old equipment need not be thrown away. Before they are declared as a total waste, resources could be:

  • Recovered (brought out of retirement and used again)
  • Reused (moved from a main process to a backup process)
  • Repaired (just fixed with upgraded parts)
  • Remanufactured (reshape for continued use)
  • Recycled (break down raw materials for added use)

The consultant asks about the age and life of all the production equipment. How many years of life does each piece of equipment have? What is the recycling program?

Energy sourcing

Whether for buildings, homes, or factories, energy sourcing will change greatly from centralized supply to distributed supply in the years ahead. The consultant recommends they look over the entire factory and ask where solar panels and mini wind-power generators might be installed. Additionally, they ask about any community energy grid projects in the region.

Furthermore, they ask about the vehicle fleet (including delivery trucks and forklifts) and internal-combustion-powered generators, cranes, and any material-moving equipment. Could they be powered by electricity? If so, they might be a potential power source as well.

Company incentives

After reviewing all these energy-saving measures, the consultant mentions that looking at energy waste isn't just about processes. It's about employee attention too. Issues of energy waste, transparency, collaboration, and adaptation must be inclusive to all front-line workers. They should be included and encouraged to form their own "Reinventing Fire Communities" so they can see waste firsthand. The consultant mentions that many companies have drawn employees' attention to energy waste by introducing competitions between divisions, comparing waste reduction percentages from their current energy consumption benchmarks. A wide range of incentives, raffles, games, contests, and awards can may also attract employee interest.

Community expansion

To ensure expertise develops as quickly and efficiently as possible, the factory energy efficiency consultant and the building consultant in the previous article discuss and collaborate on how their meetings with energy users went. They also kept the Rocky Mountain Institute in the loop. That way, everyone can learn from each other's experiences. For total transparency, their meeting reporting system is interconnected.

With many successes in reducing energy waste in factories, buildings, and homes, they gathered many testimonial letters on how they have helped individual families and companies. With those letters and experiences, the two fictional consultants decided to carefully expand their community and purpose to consult for warehouses, stores, shopping malls, industrial park developers and operators (not involved in manufacturing), and distribution centers. For this expansion, they started looking for consultants as dedicated to reducing energy waste as they are.

Those were equally successful, so they discussed whether these projects could be monetized and opened discussions of establishing a start-up company in the area.

This has been a theoretical case study largely informed by Reinventing Fire: Bold Business Solutions for the New Energy Era. But you can see the critical role open organization principles have played in this scenario. Energy waste is reduced. Companies and families save a great deal of money. Lifestyles have not changed, but quality of life has improved. And lastly, a potentially viable business model is created that could spread both nationally and possibly globally.

This example shows how one community turns to open organization principles to help factories save energy.

Image by:

Opensource.com

The Open Organization Sustainability What to read next Saving home energy using open organization principles This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

A beginner's guide to making a dark theme for a website

Tue, 09/06/2022 - 15:00
A beginner's guide to making a dark theme for a website Sachin Samal Tue, 09/06/2022 - 03:00 1 reader likes this 1 reader likes this

Having a dark theme for your website is a common feature these days. There are various ways to add a dark theme to your website, and in this article, I demonstrate a beginner-friendly way of programming a dark theme for the web. Feel free to explore, make mistakes, and, more importantly, learn by manipulating the code in your own way.

Image by:

(Sachin Samal, CC BY-SA 4.0)

Programming and development Red Hat Developers Blog Programming cheat sheets Try for free: Red Hat Learning Subscription eBook: An introduction to programming with Bash Bash shell scripting cheat sheet eBook: Modernizing Enterprise Java Icons

I like to provide a visual way for my users to discover the dark mode feature. You can include an easy set of icons by inserting the Font Awesome link in the element of your HTML.


<html lang="en">
<head>
  <title>Toggle - Dark Theme</title>
  <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.0.7/css/all.css">
</head>

Inside your tag, create a Font Awesome moon icon class, which you will switch to the Font Awesome sun icon class later using JavaScript. This icon allows users to switch from the default light theme to the dark theme and back again. In this case, you're changing from fa-moon while in the light theme to fa-sun while in the dark theme. In other words, the icon is always the opposite of the current mode.

<body>
  <div id="theme-btn" class="far fa-moon"></div>
</body>

Next, create a CSS class in your stylesheet. You'll append this using the JavaScript add() method to toggle between themes. The toggle() function adds or removes a class name from an element with JavaScript. This CSS code creates a changeTheme class, setting the background color to dark gray and the foreground color (that's the text) to light gray.

.changeTheme {
  background: #1D1E22;
  color: #eee;
}Toggle between themes

To toggle the appearance of the theme button and to apply the changeTheme class, use the onclick(), toggle(), contains(), add(), and remove() JavaScript methods inside your tag:

<script>
    //gets the button by ID from your HTML element
    const themeBtn = document.getElementById("theme-btn");
    //when you click that button
    themeBtn.onclick = () => {
    //the default class "fa-moon" switches to "fa-sun" on toggle
      themeBtn.classList.toggle("fa-sun");
    //after the switch on toggle, if your button contains "fa-sun" class
      if (themeBtn.classList.contains("fa-sun")) {
    //onclicking themeBtn, the changeTheme styling will be applied to the body of your HTML
        document.body.classList.add("changeTheme");
      } else {
    // onclicking themeBtn, applied changeTheme styling will be removed
        document.body.classList.remove("changeTheme");
      }
    }
</script>

The complete code:


<html lang="en">
<head>
  <title>Toggle - Dark Theme</title>
  <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.0.7/css/all.css">
</head>
<style>
/* Toggle Dark Theme */
#theme-btn {
  font-size: 1.5rem;
  cursor: pointer;
  color: #596AFF;
}
#theme-btn:hover {
  color: #BB86FC;
}
.changeTheme {
  background: #1D1E22;
  color: #eee;
}
</style>
<body>
  <div id="theme-btn" class="far fa-moon"></div>
<script>
const themeBtn = document.getElementById("theme-btn");
themeBtn.onclick = () => {
  themeBtn.classList.toggle("fa-sun");
  if (themeBtn.classList.contains("fa-sun")) {
    document.body.classList.add("changeTheme");
  } else {
    document.body.classList.remove("changeTheme");
  }
}
</script>
</body>
</html>Complete themes

The code so far may not fully switch the theme of a complex website. For instance, your site might have a header, a main, and a footer, each with multiple divs and other elements. In that case, you could create a standard dark theme CSS class and append it to the desired web parts.

Get familiar with your browser's console

To inspect your browser's console, on the webpage where you run this code, press Ctrl+Shift+I or right-click and select the Inspect option.

When you select Elements in the console and toggle your theme button, the browser gives you an indication of whether or not your JavaScript is working. In the console, you can see that the CSS class you appended using JavaScript is added and removed as you toggle.

Image by:

(Sachin Samal, CC BY-SA 4.0)

Add a navigation and card section to see how adding the CSS class name on an HTML element with JavaScript works.

Example code for a dark theme

Here's some example code. You can alternately view it with a live preview here.


<html lang="en">
<head>
  <title>Toggle - Dark Theme</title>
  <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.0.7/css/all.css">
  <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css">
</head>

<style>
  #header {
    width: 100%;
    box-shadow: 0px 4px 10px rgba(52, 72, 115, 0.35);
  }
  nav {
    width: 100%;
    display: flex;
    flex-wrap: wrap;
    justify-content: space-between;
    align-items: center;
  }
  .nav-menu {
    max-width: 100%;
    display: flex;
    align-items: center;
    justify-content: space-between;
  }
  .nav-menu li {
    margin: 0 0.5rem;
    list-style: none;
    text-decoration: none;
  }
  .nav-menu li a {
    text-decoration: none;
    color: inherit;
  }
  h1 {
    text-align: center;
  }

  /* Toggle Dark Theme */
  #theme-btn {
    font-size: 1.5rem;
    cursor: pointer;
    color: #596AFF;
  }

  #theme-btn:hover {
    color: #BB86FC;
  }
  .changeTheme {
    background: #1D1E22;
    color: #eee;
  }
  /*-- Card Section --*/
  .card-section .card {
    border: none;
  }
  .card-body {
    box-shadow: rgba(50, 50, 93, 0.25) 0px 2px 5px -1px,
      rgba(0, 0, 0, 0.3) 0px 1px 3px -1px;
  }
  .card-body ul {
    margin: 0;
    padding: 0;
  }
  .card-body ul li {
    text-decoration: none;
    list-style-type: none;
    text-align: center;
    font-size: 14px;
  }
</style>

<body>
  <header id="header">
    <nav>
      <div id="theme-btn" class="far fa-moon"></div>
      <div class="navbar">
        <ul class="nav-menu">
          <li class="nav-item">
            <a class="nav-link fas fa-home" href="# "> Home
            </a>
          </li>
        </ul>
      </div>
    </nav>
  </header>
  <section>
    <h1>Beginner Friendly Dark Theme</h1>
  </section>

  <section class="card-section mt-3">
    <div class="container text-center">
      <div class="row">

        <div class="col-md-4 col-sm-12">
          <div class="card p-1 dark-theme">
            <div class="card-body">
              <h6>What is Lorem Ipsum?</h6>
              <ul>
                <li>Sed sit amet felis tellus.</li>
                <li>Sed sit amet felis tellus.</li>
              </ul>
            </div>
          </div>
        </div>

        <div class="col-md-4 col-sm-12">
          <div class="card p-1 dark-theme">
            <div class="card-body">
              <h6>What is Lorem Ipsum?</h6>
              <ul>
                <li>Sed sit amet felis tellus.</li>
                <li>Sed sit amet felis tellus.</li>
              </ul>
            </div>
          </div>
        </div>

        <div class="col-md-4 col-sm-12">
          <div class="card p-1 dark-theme">
            <div class="card-body">
              <h6>What is Lorem Ipsum?</h6>
              <ul>
                <li>Sed sit amet felis tellus.</li>
                <li>Sed sit amet felis tellus.</li>
              </ul>
            </div>
          </div>
        </div>

      </div>
    </div>
  </section>
  <script>
    const themeBtn = document.getElementById("theme-btn");
    const darkTheme = document.querySelectorAll(".dark-theme");
    themeBtn.onclick = () => {
      themeBtn.classList.toggle("fa-sun");
      if (themeBtn.classList.contains("fa-sun")) {
        document.body.classList.add("changeTheme");
        for (const theme of darkTheme) {
          theme.style.backgroundColor = "#1D1E22";
          theme.style.color = "#eee";
        }
      } else {
        document.body.classList.remove("changeTheme");
        for (const theme of darkTheme) {
          theme.style.backgroundColor = "#eee";
          theme.style.color = "#1D1E22";
        }
      }
    }
  </script>
  </body>
</html>

The for...of loop of JavaScript applies ".dark-theme" class styling properties to each card on the page, regardless of its position. It applies the theme to all web parts selected with querySelectorAll() in the tag. Without this, the font and background color remain unchanged on toggle.

{
          theme.style.backgroundColor = "#eee";
          theme.style.color = "#1D1E22";
}

If you set background-color or any color property to the page, your dark theme would not work as expected. This is because the preset CSS style overrides the one yet to be appended. If you set the color of the HTML font to black, it stays the same in the dark theme, which you don't want. For example:

* {
  color: #000;
}

By adding this code in your CSS or in a tag, you set the font color to black for all the HTML on the page. When you toggle the theme, the font stays black, which doesn't contrast with the dark background. The same goes for the background color. This is common if you've used, for instance, Bootstrap to create the card sections in the code above. When Bootstrap is used, the styling for each card comes from Bootstrap's card.scss styling, which sets background-color to white, which is more specific than a general CSS rule for the entire page, and so it takes precedence.

You must target such cases by using a JavaScript selector and set your desired background-color or color using document.style.backgroundColor or document.style.color properties.

Dark theme with multiple web parts

Here's an example of a personal portfolio with a dark theme feature that may help you understand how to enable dark mode when you have multiple web parts.

These steps are a beginner-friendly approach to programming dark themes. Whether you target your HTML element in JavaScript by using getElementById() or getElementByClassName(), or with querySelector() or querySelectorAll(), depends on the content of your website, and how you want to program as a developer. The same is true with the selection of JavaScript if…else statements and loops. Feel free to choose your own method. As always, consider the constraints and flexibility of your programming approach. Good luck!

Learn how to program dark website themes using HTML, CSS variables, classes, and JavaScript methods.

Accessibility Programming JavaScript What to read next A practical guide to light and dark mode in Jekyll This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Saving home energy using open organization principles

Mon, 09/05/2022 - 15:00
Saving home energy using open organization principles Ron McFarland Mon, 09/05/2022 - 03:00 1 reader likes this 1 reader likes this

There are (at least) two ways to look at the move to more green power generation: the supply side and the demand side. The supply side emphasizes a transition from fossil fuels to renewable energy and carbon-free energy. The demand side, however, looks at how to reduce energy use, preferably without hurting the current standard of living.

This article will focus on making changes from the demand side.

The business opportunities in reducing energy demand

Imagine two salespeople who sell electricity for a utility company. They decide they don't want to promote or sell electricity use anymore because of its negative environmental and climate change impact. They retire and then decide they want to work together to assemble an Open Organization Community to reduce energy consumption in a way that does not adversely impact people's standard of living. Simply put, they want to consult and collaborate with energy users on reducing waste with the indirect goal of eliminating the use of fossil fuels as an energy source by 2050.

These retired salespeople happened to read my article "A 5-step plan to encourage your team to make changes on your project" and see my presentation Creating change. After thinking about that content, they decide to work together quietly at first, as they know many of their friends at the utility company might resist and create barriers to their project. The project is very fragile initially, and the utility company is very powerful and wealthy, so they decide to visit trusted major electricity users very narrowly and expand from there.

The imaginary energy efficiency consultants also read the book Reinventing Fire: Bold Business Solutions for the New Energy Era by Amory B. Lovins. The book offers methods to eliminate the use of fossil fuels by 2050, thus "reinventing fire": moving from old power sources to new ones.

Reinventing Fire doesn't consider environmental approaches and presents very few legislative approaches. Instead, the authors believe there are significant business opportunities in the energy use and generation sectors, and businesses can make huge profits through energy waste reduction and efficient energy use. By applying open organization principles, businesses that address how energy is consumed and produced now and in the future could be very useful to any organization.

Although my imaginary consultants are interested in regional energy efficiency more than profits, they decide to include and commission Amory Lovins and his organization, Rocky Mountain Institute, as collaborators and advisors for this community on reducing energy waste. They call their project The Reinventing Fire Community.

One of these two imaginary experts specialized in selling electricity to factories. The other specialized in selling electricity to large and small buildings and single-unit homes. They know that one of the big problems with energy costs is that most people don't think about it. And that's what they want to change.

[ Related read How open organizations can harness energy disruptions ]

In many companies, energy is considered a fixed overhead cost, so it isn't tracked. Individuals often consider other expenses, like house or car payments, far more important. However, if monitoring of all their energy use in every specific location were available, instances of waste in each room and function would be transparent and easier to reduce.

In this article, I will discuss how the consultant who sold electricity to buildings and homes uses open organization principles. In a future article, I will discuss the factory consultant's activities.

Building and home energy consultant

This consultant wants to go to large and small non-industrial businesses, homes, and housing communities (starting with their previous electricity users) to offer suggestions on reducing yearly energy expenses by at least 10-50%. If equipment investment is required, they promise that the equipment is sure to pay for itself in 3-5 years of energy expense savings.

There are some conditions, however, for working together:

  • The facility owners must be as open as possible about their current energy use and expenses.
  • The consultant will not give anyone the information they receive except for trusted consultants they work with to generate the very best proposals.
  • There's no charge for the consulting, as they only want to reduce energy use in the community, and at this stage, they don't know how valuable that will be.
  • The consultant would like a performance report and a general testimonial letter from the user at the end of 2 years explaining how much the energy user's bill has been reduced by.
Energy use and waste

How can the home and building consultant put all the open organization principles of Community, Transparency, Collaboration, Inclusion, and Adaptability to further use?

First, the building energy consultant makes an appointment with their most trusted electricity user and explains that they want the user to save money. To make the problem transparent, the consultant presents the amount of primary energy use in US buildings now and points out that if nothing changes, it will get worse in the years ahead:

Image by:

From the book Reinventing Fire by Amory B. Lovins (Ron MacFarland, CC BY-SA 4.0)

The above chart, from Reinventing Fire, shows that between 1980 and 2010 US, residential and commercial building energy use grew by 54%. Between 2010 and 2050, it is projected to grow by another 33%. After looking at this chart, the electricity user is convinced that they should study energy waste and agrees to the terms of the consultancy.

The consultant then explains that building energy use falls into six groups, with the first four using the most energy:

  1. Space heating
  2. Water heating
  3. Space cooling
  4. Lighting
  5. Electronics
  6. Appliances

On average, improvement in existing buildings can lead to a reduced energy use bill of 30-40%, with payback in 3-4 years. That's the goal they are targeting, depending on where the electricity user's building uses the most energy. Each function's energy use must be measured to determine where the greatest waste is. The problems must be made transparent; then, they can be addressed and controlled.

Using Reinventing Fire as a model, they explain just how much loss and waste there is in energy from power utility to user. Much of building energy is wasted, so providing reduction methods could save a lot of money.

Image by:

From the book Reinventing Fire by Amory B. Lovins (Ron MacFarland, CC BY-SA 4.0)

The authors explain:

"The six key energy uses are different in residential and commercial buildings. Electrical uses entail far more primary than delivered energy because of the roughly threefold losses in converting power-plant fuel to electricity and the 7% average losses in delivering electricity to your meter (plus a few percentage points more lost in the building's wiring). Primary energy use drives your building's climate impact; delivered energy drives its operating costs."

The consultant also shows the electricity user the data above from Reinventing Fire, which confirms that it's worth the investment in energy-saving measures. In addition to direct cost reduction, users also benefit from increased resiliency and adaptability to sudden power failures or disruption—not to mention the reduction of greenhouse gases. If the investment amount is a concern, the consultant could help the user find financing with favorable incentives that promote climate change mitigation projects.

The consultant wants to collaborate in several areas, including wall insulation, window sealing and material, air moisture controls, energy use monitoring and controls for comparisons with similar structures, and electrification of both housing and transportation. One avenue for this assessment is the current smartphone apps that can diagnose building energy use and recommend what to install to mitigate waste.

Repairing lighting and air conditioner controls, installing weather seals on all doors, and adding occupancy sensors can quickly generate electricity savings. Lighting occupancy sensors reduce waste from lights being on when no one is in a room. These savings could make a business extremely profitable, even if overall business is slow.

Finding the waste

In the interest of transparency, the consultant presents new building technologies now available. The electricity user might not want or need them all, but this information exposure leads to collaboration and adaptability where appropriate. They reviews these categories:

  • Smart windows: Windows that darken in response to a small electric current or heat, such as Pleotint, RavenBrick, "AdaptivE" windows, or thermochromic windows, let in as much as five times more solar heat in the winter and less on hot summer days, which can reduce temperature expenses by 30%.
     
  • Enhanced evaporation: Spraying water into hot air or drying humid air enhances cooling.
     
  • Insulation: Extraordinarily light and fluffy silica-based gels are available and are six times better than simple plastic foam. They’re also thinner. This insulation could be placed on window frames or thin walls and even painted on surfaces to act as a thermal barrier.
     
  • Phase-change materials: Heat-absorbing materials that melt and then harden with temperature changes influence outside-to-inside heat differences. They slow temperature increases during the day by absorbing heat and storing it. Then, the heat is slowly released at night, so indoor daytime and nighttime temperatures are closer together.
     
  • Light-emitting diodes (LEDs): Next-generation LEDs will replace semiconductors with organic compounds, called OLEDs. They are already in cellphones and cameras. Users can also consider where the light is shining. Is it exactly where and when it is needed? Can excess lighting be eliminated? Could reflection devices be applied to direct light where needed without increasing electricity use? Can it be turned off automatically when not in use?
     
  • Efficient rotors:  Pumps, fans, blowers, turbines, and propellers are a significant source of energy use. Improving their shape dramatically reduces energy use for the same function. For example, these improvements can reduce energy use by 30% in computer cooling fans.
     
  • Appliances: Stove surfaces that hold their shape for better heat transfer, smarter controls, and heat-trapping pots are being developed to use less electricity for the same function.
     
  • Heat pumps: Miniature heat pumps that both heat and cool are in development. These are almost three times more effective than standard water heaters and water-heating heat pumps.
     
  • Sensors: The consultant can look around the electricity user's buildings to see where sensors could be located for energy-use measuring and control.
     
  • Power source. Should other energy sources be explored? Options could include electricity from a mini power grid in the neighborhood, solar panels on a roof or an overhang, or trading in an internal combustion engine vehicle for an electric car. These actions could make the user more of an electricity supplier than just a consumer.

As our consultant reviews these suggestions while looking around the electricity user's buildings, there may be cases that require specialist support. This is where inclusivity comes in. They want to install the best and most cost-effective equipment, which might require further technical support. In this case, collaboration with the Rocky Mountain Institute will be important.

Open Organization resources Download resources Join the community What is an open organization? How open is your organization? Right-sizing

Other reductions can be explored by looking at the total energy requirements after installing advanced materials. With the advantages of advanced insulation and superwindows, the term "passive heating" comes up. It means that no energy is required to keep a room comfortable. The reductions in heating equipment costs may balance the cost of any superwindow, superinsulation, or ventilation installations. They could save 99% of space- and water-heating energy.

Superwindows let in more light but block heat from entering, keeping the heat in during the winter and keeping sunlight heat out in the summer. That reduces the need for powerful energy-consuming air conditioners, so a smaller cooling and heating system can be installed.

In this theoretical example, the energy user could then be certified as having an advanced passive heating system. Internationally, there is a passive structure certification system called Passivhaus, and they could also explore US LEED Certification. In Japan, appliance makers have "Top Runner" policies for their products' efficiency, which is promoted with their sales advertising.

The consultant mentions that even existing and functioning buildings and equipment should be replaced if the replacement can greatly reduce the costs of electricity and natural gas at the current prices within 3-5 years. Astutely timed energy retrofits can piggyback on other modifications and changes to reduce upfront capital costs. Even simple improvements like light-colored roofs and paving, plus shade trees and revegetation to help bounce solar heat away, could be energy saving.

After these recommendations are implemented, the building owners are so happy with the savings over 2 years that they write a detailed report and letter of recommendation about the consultant's value. They also recommend their work to anyone who uses a lot of electricity.

How do open organization principles apply?
  • The consultant looked at all the electricity user's building inefficiencies with transdisciplinary insight. They looked for inclusive support when needed.
  • They made energy use more transparent through sensors, monitors, and telecommunications to speed up diagnoses and improve execution.
  • They gave suggestions on how to obtain easy financing to adapt and reduce upfront capital costs if needed.
  • They collaborated on and helped train the electricity user on energy productivity. They introduced a certification system that might help find further opportunities for savings.
  • They mentioned that their building's energy use could not only be reduced but could become energy generators through solar and possibly wind technology installation, benefiting their community.

My next article will consider how the factory consultant's activities might look using open organization principles.

The move to more green power generation requires making changes to demand by looking at how to reduce energy use, preferably without hurting the current standard of living.

Image by:

27707 via Pixabay, CC0. Modified by Jen Wike Huger.

The Open Organization Sustainability What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

3 things to know about planning for OTA updates in your homelab

Mon, 09/05/2022 - 15:00
3 things to know about planning for OTA updates in your homelab Alan Smithee Mon, 09/05/2022 - 03:00 1 reader likes this 1 reader likes this

Updates to a system used to be relatively straightforward. When a developer needed to revise something that they'd already distributed to the public, an updater would be released for people to run. Users would run the updater, allowing old files to be replaced by new files and new files to be added. Even with these "relatively straightforward" updates, though, there was a catch. What happens when a user's installation is in an unexpected state? What happens when an upgrade is interrupted? These questions are just as relevant now when all kinds of devices are online, and sometimes in need of important security updates. Many updates today are delivered wirelessly, over-the-air (OTA), and the potential for poor connections, sudden loss of signal, or loss of power, can potentially be disastrous to what should be a minor update. These are the top three strategies you need to consider when planning to deliver over-the-air updates.

1. Verification

The TCP protocol has a lot of verification built in, so it's usually true that when you send packets to a device, you can be confident that each packet has been received intact. However, TCP can't report errors on something it doesn't know about, so it's up to you to verify things like:

  • Have you sent all files required for the update? A device can't receive what wasn't sent in the first place.

  • Are the files received the same as the files you sent? At the very least, check SHA sums to verify file integrity.

  • When possible, use digital signing to ensure that a file is from a trusted source.

  • You must verify that the device is able to apply an update before you allow the update to begin. Check permissions and battery state before committing to an update, and ensure that your update process overrides any unexpected user events, like a scheduled reboot or hibernation.

  • Finally, you must verify that an update that claims to have completed successfully has actually completed. Check file locations and integrity on the target device before allowing the update to officially be marked as resolved by the system.

2. Fallback and failstates

The worst-case scenario for an update is that a device is left in a broken state, such that it can't even be used to continue an aborted update. In that scenario, the updater files exist on the target device, but the process has been interrupted. This can leave a device in an unknown state, where some files have been replaced with updated versions, while others haven't been touched. In the worst case, files that have been updated are incompatible with files that haven't yet been updated, and so the device cannot function as expected.

There are a few strategies to handle this. The initial update step could be to install a special boot image or environment dedicated to completing the update, and setting a "flag" on the system to establish that an update is in progress. This ensures that even when a device suddenly loses power in the middle of an update, the update process is started fresh during the next boot. The flag signaling a successful update is removed only once the update has been verified.

A special boot image may not be feasible or necessary, depending on the security policy of the target device and what you're updating. The principle remains the same, though. Once it has been started, an update must establish an environment in which the pending update is the only way forward until it's resolved.

Up until an update has been granted permission to start, though, a user (when there is one) should have the ability to delay or ignore the update.

More on edge computing Understanding edge computing Why Linux is critical to edge computing eBook: Running Kubernetes on your Raspberry Pi Download now: The automated enterprise eBook eBook: A practical guide to home automation using open source tools The latest on edge 3. Additive

In many edge and IoT devices, the foundation of the target device is immutable. Updates only add to a known state of a system. Projects like Fedora Silverblue are demonstrating that this model can work across many markets, so that luxury might become commonplace. Until then, though, part of successfully applying an update is understanding the environment you're about to affect.

You don't need an immutable core to apply additive updates, though. You may be able to architect a system to use the same concept, using update as a way to add libraries or packages without revising the old versions. As the final step of such an update, the executable with updated paths is the only actual revision you make.

OTA updates

The world is increasingly wireless. For mobile phones, IoT devices, and edge computing, over-the-air updates are often the only option. Implementing an OTA update policy takes careful planning and careful accounting for improbable scenarios. You know your target devices best, so map out your update schema well before you begin coding so that your initial architecture is designed for robust and safe OTA.

Define your over-the-air update plan for mobile phones, IoT devices, and edge computing before you even start coding your app.

Image by:

Image from Unsplash.com, Creative Commons Zero 

Edge computing Internet of Things (IoT) What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Infuse your awk scripts with Groovy

Sat, 09/03/2022 - 15:00
Infuse your awk scripts with Groovy Chris Hermansen Sat, 09/03/2022 - 03:00 Register or Login to like Register or Login to like

Recently I wrote a series on using Groovy scripts to clean up the tags in my music files. I developed a framework that recognized the structure of my music directory and used it to iterate over the content files. In the final article of that series, I separated this framework into a utility class that my scripts could use to process the content files.

This separate framework reminded me a lot of the way awk works. For those of you unfamiliar with awk, you might benefit from Opensource.com's eBook, A practical guide to learning awk.

I have used awk extensively since 1984, when our little company bought its first "real" computer, which ran System V Unix. For me, awk was a revelation: It had associative memory— think arrays indexed by strings instead of numbers. It had regular expressions built in, seemed designed to deal with data, especially in columns, and was compact and easy to learn. Finally, it was designed to work in Unix pipelines, reading its data from standard input or files and writing to output, with no ceremony required to do so—data just appeared in the input stream.

To say that awk has been an essential part of my day-to-day computing toolkit is an understatement. And yet there are a few things about how I use awk that leave me unsatisfied.

Probably the main issue is that awk is good at dealing with data presented in delimited fields but curiously not good at handling comma-separated-value files, which can have field delimiters embedded within a field, provided that the field is quoted. Also, regular expressions have moved on since awk was invented, and needing to remember two sets of regular expression syntax rules is not conducive to bug-free code. One set of such rules is bad enough.

Because awk is a small language, it's missing some things that I sometimes find useful, like a richer assortment of base types, structures, switch statements, and so on.

In contrast, Groovy has all of these good things: access to the OpenCSV library, which facilitates dealing with CSV files, Java regular expressions and great matching operators, a rich assortment of base types, classes, switch statements, and more.

What Groovy lacks is the simple pipeline-oriented view of data as an incoming stream and processed data as an outgoing stream.

But my music directory processing framework made me think, maybe I can create a Groovy version of awk's "engine". That's my objective for this article.

More on Java What is enterprise Java programming? Red Hat build of OpenJDK Java cheat sheet Free online course: Developing cloud-native applications with microservices Fresh Java articles Install Java and Groovy

Groovy is based on Java and requires a Java installation. Both a recent and decent version of Java and Groovy might be in your Linux distribution's repositories. Groovy can also be installed following the instructions on the Groovy homepage. A nice alternative for Linux users is SDKMan, which can be used to get multiple versions of Java, Groovy and many other related tools. For this article, I'm using SDK's releases of:

  • Java: version 11.0.12-open of OpenJDK 11;
  • Groovy: version 3.0.8.
Creating awk with Groovy

The basic idea here is to encapsulate the complexities of opening a file or files for processing, splitting the line into fields, and providing access to the stream of data in three parts:

  • Before any data is processed
  • On each line of data
  • After all data is processed

I'm not going for the general case of replacing awk with Groovy. Instead, I'm working toward my typical use case, which is:

  • Use a script file rather than having the code on the command line
  • Process one or more input files
  • Set my default field delimiter to | and split lines read on that delimiter
  • Use OpenCSV to do the splitting (what I can't do in awk)
The framework class

Here's the "awk engine" in a Groovy class:

 1 @Grab('com.opencsv:opencsv:5.6')
 2 import com.opencsv.CSVReader
 3 public class AwkEngine {
 4 // With admiration and respect for
 5 //     Alfred Aho
 6 //     Peter Weinberger
 7 //     Brian Kernighan
 8 // Thank you for the enormous value
 9 // brought my job by the awk
10 // programming language
11 Closure onBegin
12 Closure onEachLine
13 Closure onEnd

14 private String fieldSeparator
15 private boolean isFirstLineHeader
16 private ArrayList<String> fileNameList
   
17 public AwkEngine(args) {
18     this.fileNameList = args
19     this.fieldSeparator = "|"
20     this.isFirstLineHeader = false
21 }
   
22 public AwkEngine(args, fieldSeparator) {
23     this.fileNameList = args
24     this.fieldSeparator = fieldSeparator
25     this.isFirstLineHeader = false
26 }
   
27 public AwkEngine(args, fieldSeparator, isFirstLineHeader) {
28     this.fileNameList = args
29     this.fieldSeparator = fieldSeparator
30     this.isFirstLineHeader = isFirstLineHeader
31 }
   
32 public void go() {
33     this.onBegin()
34     int recordNumber = 0
35     fileNameList.each { fileName ->
36         int fileRecordNumber = 0
37         new File(fileName).withReader { reader ->
38             def csvReader = new CSVReader(reader,
39                 this.fieldSeparator.charAt(0))
40             if (isFirstLineHeader) {
41                 def csvFieldNames = csvReader.readNext() as
42                     ArrayList<String>
43                 csvReader.each { fieldsByNumber ->
44                     def fieldsByName = csvFieldNames.
45                         withIndex().
46                         collectEntries { name, index ->
47                             [name, fieldsByNumber[index]]
48                         }
49                     this.onEachLine(fieldsByName,
50                             recordNumber, fileName,
51                             fileRecordNumber)
52                     recordNumber++
53                     fileRecordNumber++
54                 }
55             } else {
56                 csvReader.each { fieldsByNumber ->
57                     this.onEachLine(fieldsByNumber,
58                         recordNumber, fileName,
59                         fileRecordNumber)
60                     recordNumber++
61                     fileRecordNumber++
62                 }
63             }
64         }
65     }
66     this.onEnd()
67 }
68 }

While this looks like a fair bit of code, many of the lines are continuations of a split longer lines (for example, normally you would combine lines 38 and 39, lines 41 and 42, and so on). Let's look at this line by line.

Line 1 uses the @Grab annotation to fetch the OpenCSV library version 5.6 from Maven Central. No XML required.

In line 2, I import OpenCSV's CSVReader class.

In line 3, just as with Java, I declare a public utility class, AwkEngine.

Lines 11-13 define the Groovy Closure instances used by the script as hooks into this class. These are "public by default" as is the case with any Groovy class—but Groovy creates the fields as private and external references to these (using getters and setters provided by Groovy). I'll explain that further in the sample scripts below.

Lines 14-16 declare the private fields—the field separator, a flag to indicate whether the first line of a file is a header, and a list for the file name.

Lines 17-31 define three constructors. The first receives the command line arguments. The second receives the field separator character. The third receives the flag indicating whether the first line is a header or not.

Lines 31-67 define the engine itself, as the go() method.

Line 33 calls the onBegin() closure (equivalent to the awk BEGIN {} statement).

Line 34 initializes the recordNumber for the stream (equivalent to the awk NR variable) to 0 (note I am doing 0-origin here rather than the awk 1-origin).

Lines 35-65 use each {} to loop over the list of files to be processed.

Line 36 initializes the fileRecordNumber for the file (equivalent to the awk FNR variable) to 0 (0-origin, not 1-origin).

Lines 37-64 get a Reader instance for the file and process it.

Lines 38-39 get a CSVReader instance.

Line 40 checks to see whether the first line is being treated as a header.

If the first line is being treated as a header, then lines 41-42 get the list of field header names from the first record.

Lines 43-54 process the rest of the records.

Lines 44-48 copy the field values into the map of name:value.

Lines 49-51 call the onEachLine() closure (equivalent to what appears in an awk program between BEGIN {} and END {}, though no pattern can be attached to make the execution conditional), passing in the map of name:value, the stream record number, the file name and the file record number.

Lines 52-53 increment the stream record number and file record number.

Otherwise:

Lines 56-62 process the records.

Lines 57-59 call the onEachLine() closure, passing in the array of field values, the stream record number, the file name and the file record number.

Lines 60-61 increment the stream record number and file record number.

Line 66 calls the onEnd() closure (equivalent to the awk END {}).

That's it for the framework. Now you can compile it:

$ groovyc AwkEngine.groovy

A couple of comments:

If an argument is passed in that is not a file, the code fails with a standard Groovy stack trace, which looks something like this:

Caught: java.io.FileNotFoundException: not-a-file (No such file or directory)
java.io.FileNotFoundException: not-a-file (No such file or directory)
at AwkEngine$_go_closure1.doCall(AwkEngine.groovy:46)

OpenCSV tends to return String[] values, which are not as convenient as List values in Groovy (for example there is no each {} defined for an array). Lines 41-42 convert the header field value array into a list, so perhaps fieldsByNumber in line 57 should also be converted into a list.

Using the framework in scripts

Here's a very simple script using AwkEngine to examine a file like /etc/group, which is colon-delimited and has no header:

1 def ae = new AwkEngine(args, ‘:')
2 int lineCount = 0

3 ae.onBegin = {
4    println “in begin”
5 }

6 ae.onEachLine = { fields, recordNumber, fileName, fileRecordNumber ->
7    if (lineCount < 10)
8       println “fileName $fileName fields $fields”
9       lineCount++
10 }

11 ae.onEnd = {
12    println “in end”
13    println “$lineCount line(s) read”
14 }

15 ae.go()

Line 1 calls the two-argument constructor, passing in the argument list and the colon as delimiter.

Line 2 defines a script top-level variable, lineCount, used to record the count of lines read (note that Groovy closures don't require variables defined external to the closure to be final).

Lines 3-5 define the onBegin() closure, which just prints the string "in begin" on standard output.

Lines 6-10 define the onEachLine() closure, which prints the file name and the fields for the first 10 lines and in any case increments the line count.

Lines 11-14 define the onEnd() closure, which prints the string "in end" and the count of the number of lines read.

Line 15 runs the script using the AwkEngine.

Run this script as follows:

$ groovy Test1Awk.groovy /etc/group
in begin
fileName /etc/group fields [root, x, 0, ]
fileName /etc/group fields [daemon, x, 1, ]
fileName /etc/group fields [bin, x, 2, ]
fileName /etc/group fields [sys, x, 3, ]
fileName /etc/group fields [adm, x, 4, syslog,clh]
fileName /etc/group fields [tty, x, 5, ]
fileName /etc/group fields [disk, x, 6, ]
fileName /etc/group fields [lp, x, 7, ]
fileName /etc/group fields [mail, x, 8, ]
fileName /etc/group fields [news, x, 9, ]
in end
78 line(s) read
$

Of course the .class files created by compiling the framework class must be on the classpath for this to work. Naturally, you could use jar to package up those class files.

I really like Groovy's support for the delegation of behavior, which requires various shenanigans in other languages. For many years Java required anonymous classes and quite a bit of extra code. Lambdas have gone a long way to fixing this, but they still cannot refer to non-final variables outside their scope.

Here's another, more interesting script that is very reminiscent of my typical use of awk:

1 def ae = new AwkEngine(args, ‘;', true)
2 ae.onBegin = {
3    // nothing to do here
4 }

5 def regionCount = [:]
6    ae.onEachLine = { fields, recordNumber, fileName, fileRecordNumber ->
7    regionCount[fields.REGION] =
8    (regionCount.containsKey(fields.REGION) ?
9    regionCount[fields.REGION] : 0) +
10   (fields.PERSONAS as Integer)
11 }

12 ae.onEnd = {
13    regionCount.each { region, population ->
14    println “Region $region population $population”
15    }
16 }

17 ae.go()

Line 1 calls the three-argument constructor, recognizing that this is a "true CSV" file with the header being on the first line. Because it's a Spanish file, where the comma is used as the decimal "point", the standard delimiter is the semicolon.

Lines 2-4 define the onBegin() closure which in this case doesn't do anything.

Line 5 defines an (empty) LinkedHashMap, which you will fill with String keys and Integer values. The data file is from Chile's most recent census and you are calculating the number of people in each region of Chile in this script.

Lines 6-11 processes the lines in the file (there are 180,500 including the header)—note that in this case, because you are defining line 1 as the CSV column headers, the fields parameter is going to be an instance of LinkedHashMap.

Lines 7-10 increment the regionCount map, using the value in the field REGION as the key and the value in the field PERSONAS as the value—note that, unlike awk, in Groovy you can't refer to a non-existent map entry on the right-hand side and expect a blank or zero value to materialize.

Lines 12- 16 print out population by region.

Line 17 runs the script on the AwkEngine instance.

Run this script as follows:

$ groovy Test2Awk.groovy ~/Downloads/Censo2017/ManzanaEntidad_CSV/Censo*csv
Region 1 population 330558
Region 2 population 607534
Region 3 population 286168
Region 4 population 757586
Region 5 population 1815902
Region 6 population 914555
Region 7 population 1044950
Region 8 population 1556805
Region 16 population 480609
Region 9 population 957224
Region 10 population 828708
Region 11 population 103158
Region 12 population 166533
Region 13 population 7112808
Region 14 population 384837
Region 15 population 226068
$

That's it. For those of you who love awk and yet would like a little more, I hope you enjoy this Groovy approach.

Awk and Groovy complement each other to create robust, useful scripts.

Image by:

Opensource.com

Java What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

How to display the presence and absence of nth-highest group-wise values in SQL

Fri, 09/02/2022 - 15:00
How to display the presence and absence of nth-highest group-wise values in SQL Mohammed Kamil Khan Fri, 09/02/2022 - 03:00 Register or Login to like Register or Login to like

While skimming through SQL to prepare for interviews, I often come across this question: Find the employee with the highest or (second-highest) salary by joining a table containing employee information with another that contains department information. This raises a further question: What about finding the employee who earns the nth-highest salary department-wide?

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources

Now I want to pose a more complex scenario: What will happen when a department doesn't have an employee earning the nth-highest salary? For example, a department with only two employees will not have an employee earning the third-highest salary.

Here's my approach to this question:

Create department and employee tables

I create a table that includes fields such as dept_id and dept_name.

CREATE TABLE department (
    dept_id INT,
    dept_name VARCHAR(60)
);

Now I insert various departments into the new table.

INSERT INTO department (dept_id,dept_name)
VALUES (780,'HR');
INSERT INTO department (dept_id,dept_name)
VALUES (781,'Marketing');
INSERT INTO department (dept_id,dept_name)
VALUES (782,'Sales');
INSERT INTO department (dept_id,dept_name)
VALUES (783,'Web Dev'); Image by:

Figure 1. The department table (Mohammed Kamil Khan, CC BY-SA 4.0)

Next, I create another table incorporating the fields first_name, last_name, dept_id, and salary.

CREATE TABLE employee (
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    dept_id INT,
    salary INT
);

Then I insert values into the table:

INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Sam','Burton',781,80000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Peter','Mellark',780,90000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Happy','Hogan',782,110000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Steve','Palmer',782,120000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Christopher','Walker',783,140000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Richard','Freeman',781,85000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Alex','Wilson',782,115000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Harry','Simmons',781,90000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Thomas','Henderson',780,95000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Ronald','Thompson',783,130000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('James','Martin',783,135000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Laurent','Fisher',780,100000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Tom','Brooks',780,85000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Tom','Bennington',783,140000); Image by:

Figure 2. A table of employees ordered by department ID (Mohammed Kamil Khan, CC BY-SA 4.0)

I can infer the number of employees in each department using this table (department ID:number of employees):

  • 780:4
  • 781:3
  • 782:3
  • 783:4

If I want the view the second-highest-earning employees from different departments, along with their department's name (using DENSE_RANK), the table will be as follows:

Image by:

Figure 3. The second-highest-earning employee in each department (Mohammed Kamil Khan, CC BY-SA 4.0)

If I apply the same query to find the fourth-highest-earning employees, the output will be only Tom Brooks of department 780 (HR), with a salary of $85,000.

Image by:

Figure 4. The fourth-highest-earning employee (Mohammed Kamil Khan, CC BY-SA 4.0)

Though department 783 (Web Dev) has four employees, two (James Martin and Ronald Thompson) will be classified as the third-highest-earning employees of that department, since the top two earners have the same salary.

Finding the nth highest

Now, to the main question: What if I want to display the dept_ID and dept_name with null values for employee-related fields for departments that do not have an nth-highest-earning employee?

Image by:

Figure 5. All departments listed, whether or not they have an nth-highest-earning employee (Mohammed Kamil Khan, CC BY-SA 4.0)

The table displayed in Figure 5 is what I am aiming to obtain when specific departments do not have an nth-highest-earning employee: The marketing, sales, and web dev departments are listed, but the name and salary fields contain a null value.

The ultimate query that helps obtain the table in Figure 5 is as follows:

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
FROM (SELECT * FROM (
SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
FROM employee INNER JOIN department
ON employee.dept_id=department.dept_id) AS k
WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * FROM null1
UNION
SELECT * FROM nulled
ORDER BY dept_id)
B;Breakdown of the query

I will break down the query to make it less overwhelming.

Use DENSE_RANK() to display employee and department information (not involving null for the absence of the nth-highest-earning member):

SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4

Output:

Image by:

Figure 6. The fourth-highest earner (Mohammed Kamil Khan, CC BY-SA 4.0)

Exclude the rank1 column from the table in Figure 6, which identifies only one employee with a fourth-highest salary, even though there are four employees in another department.

SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    FROM (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A

Output:

Image by:

Figure 7. The fourth-highest earner table without the rank 1 column (Mohammed Kamil Khan, CC BY-SA 4.0)

Point out the departments from the department table that do not have an nth-highest-earning employee:

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    FROM (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL))
SELECT * FROM full1)B

Output:

Image by:

Figure 8. The full1 table listing the departments without a fourth-highest earner (Mohammed Kamil Khan, CC BY-SA 4.0)

Replace full1 in the last line of the above code with null1:

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    FROM (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL))
SELECT * FROM null1)B Image by:

Figure 9. The null1 table listing all departments, with null values for those without a fourth-highest earner (Mohammed Kamil Khan, CC BY-SA 4.0)

Now, I fill the null values of the dept_id and dept_name fields in Figure 9 with the corresponding values from Figure 8.

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    FROM (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * FROM nulled) B; Image by:

Figure 10. The result of the nulled query (Mohammed Kamil Khan, CC BY-SA 4.0)

The nulled query uses CASE WHEN on the nulls encountered in the dept_id and dept_name columns of the null1 table and replaces them with the corresponding values in the full1 table. Now all I need to do is apply UNION to the tables obtained in Figure 7 and Figure 10. This can be accomplished by declaring the last query in the previous code using WITH and then UNION-izing it with null1.

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
FROM (SELECT * FROM (
SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
FROM employee INNER JOIN department
ON employee.dept_id=department.dept_id) AS k
WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * FROM null1
UNION
SELECT * FROM nulled
ORDER BY dept_id)
B; Image by:

Figure 11. The final result (Mohammed Kamil Khan, CC BY-SA 4.0)

Now I can infer from Figure 11 that marketing, sales, and web dev are the departments that do not have any employees earning the fourth-highest salary.

A step-by-step breakdown of the query.

Databases What to read next A hands-on tutorial of SQLite3 Improve your database knowledge with this MariaDB and MySQL cheat sheet This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Usability and accessibility starts with open communication

Thu, 09/01/2022 - 15:00
Usability and accessibility starts with open communication Klaatu Thu, 09/01/2022 - 03:00 1 reader likes this 1 reader likes this

Amazing though it may seem, we each experience the world differently. That's one reality with over 6 billion interpretations. Many of us use computers to broaden our experience of the world, but a computer is part of reality and so if you experience reality without, for instance, vision or sound, then you also experience a computer without vision or sound (or whatever your unique experience might be.) As humans, we don't quite have the power to experience the world the way somebody does. We can mimic some of the surface-level things (I can close my eyes to mimic blindness, for example) but it's only an imitation, without history, context, or urgency. As a result of this complexity, we humans design things primarily for ourselves, based on the way we experience the world. That can be frustrating, from an engineering and design viewpoint, because even when you intend to be inclusive, you end up forgetting something "obvious" and essential, or the solution to one problem introduces a problem for someone else, and so on. What's an open source enthusiast, or programmer, or architect, or teacher, or just everyday hacker, supposed to do to make software, communities, and processes accessible?

Don't miss the opportunities

A friend of mine, who lives with hearing loss, recently signed up for a webinar and contacted the host to request captioning or, failing that, a transcript of the lessons. It was a great disappointment when the host, who had specifically emailed all participants with an invitation for feedback, never even responded to the request. In the end, some mutual friends attended the webinar and took notes.

[ Also read My open source journey with C from a neurodiverse perspective ]

The webinar was a small event run by an individual, so it's possible that emails all around were going unanswered until the end of the multi-week event. However, this incident can serve as a valuable lesson: Accessibility starts with communication.

You can't know the unique needs of every single person interacting with the thing (website, software, podcast, article, and so on) you produce. You can't predict what small arbitrary choice you make might lead to the accidental exclusion of someone who would otherwise have engaged with you. What you can do, though, is look for opportunities to learn about them. When someone sends an email about how the 8-point, thin, 45% gray font on a white background makes your website hard to read, don't ignore it, and don't chalk it up to a difference in opinion. When someone files a bug that Orca or NVDA can't navigate your application, don't close it until it's fixed.

What to do when you can't help

Nobody knows everything, and that's true for each of us participating in open source. It's very likely that you'll get a comment from somebody with an issue in something you've designed, and you won't know how to fix it. Or you might know how to fix it, but you just won't have the time to implement the fix. That doesn't make you a bad person, it just reveals the one thing that's true for all of us: You have limited resources. But through open collaboration, there's more than likely an answer.

Open source is all about sharing, and this is as true for code as it is for community resources. Identifying a bug at the very least demonstrates what your project needs from potential future contributors. Possibly, the person making the request or filing the bug can help you find someone who knows how to fix the issue. Or maybe they have friends who help them find a work-around, and could at the very least document the round-about way they deal with the issue, which could be exactly the stop-gap you need while you upskill enough to find the "right" fix for the problem.

[ Related read A practical guide to light and dark mode in Jekyll ]

Answers to usability and accessibility aren't always as direct as you think they need to be. Sometimes, a simple work-around or accommodation is all that's needed. I contribute to a fairly technical podcast, and I was once asked whether I could release transcripts. It's beyond my means to produce those for every episode, but as a concession I have, ever since, included either existing reference documentation, or I write new documentation on the podcast's website, so that even if a potential listener can't process what I say in the podcast, at least the information I impart isn't lost. It's not the best solution (although admittedly my podcasts aren't always as focused as they could be, so actually reference documentation is probably the better option) but the "answer" to the problem is really easy for me to do, but something I hadn't thought to do until someone asked.

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources

Sometimes the "right" answer is "no." I've gotten requests for visuals to accompany audio-only content before. While it was possible to do that, it would have required a completely different production and hosting infrastructure, and so the answer truly was "no." However, I was able to respond to the request with a list of resources that were providing similar content along with video. You can't be everything to all people. Knowing your project's, and your own, limitations is important, and it's equally important to respect them.

Open communication

Communication is the starting point for usability and accessibility. When someone reaches out to you because something you're doing isn't accessible to them, that is, strange though it may seem, a marketing success. Somebody wants to engage with your content or your project. That's exciting! Don't pass up those opportunities.

Use open source principles to make your project more accessible for your users.

Image by:

Monsterkoi. Modified by Opensource.com. CC BY-SA 4.0

Accessibility What to read next 8 accessible Linux distributions to try New open source tool catalogs African language resources This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Use Tracee to solve for missing BTF information

Thu, 09/01/2022 - 15:00
Use Tracee to solve for missing BTF information Alessio Greggi Thu, 09/01/2022 - 03:00 Register or Login to like Register or Login to like

Tracee is a project by Aqua Security for tracing processes at runtime. By tracing processes using Linux eBPF (Berkeley packet filter) technology, Tracee can correlate collected information and identify malicious behavioral patterns.

eBPF

BPF is a system to help in network traffic analysis. The later eBPF system extends classic BPF to improve the programmability of the Linux kernel in different areas, such as network filtering, function hooking, and so on. Thanks to its register-based virtual machine, which is embedded in the kernel, eBPF can execute programs written with a restricted C language without needing to recompile the kernel or load a module. Through eBPF, you can run your program in kernel context and hook various events in the kernel path. To do so, eBPF needs to have deep knowledge about data structures that the kernel is using.

eBPF CO-RE

eBPF interfaces with Linux kernel ABI (application binary interface). Access to kernel structures from eBPF VM depends on the specific Linux kernel release.

eBPF CO-RE (compile once, run everywhere) is the ability to write an eBPF program that will successfully compile, pass kernel verification, and work correctly across different kernel releases without the need to recompile it for each particular kernel.

Ingredients

CO-RE needs a precise synergism of these components:

  • BTF (BPF type format) information: Allows the capture of crucial pieces of information about kernel and BPF program types and code, enabling all the other parts of BPF CO-RE puzzle.
     
  • Compiler (Clang): Records relocation information. For example, if you were going to access the task_struct->pid field, Clang would record that it was exactly a field named pid of type pid_t residing within a struct task_struct. This system ensures that even if a target kernel has a task_struct layout in which the pid field is moved to a different offset within a task_struct structure, you'll still be able to find it just by its name and type information.
     
  • BPF loader (libbpf): Ties BTFs from kernel and BPF programs together to adjust compiled BPF code to specific kernels on target hosts.

So how do these ingredients mix together for a successful recipe?

More Linux resources Linux commands cheat sheet Advanced Linux commands cheat sheet Free online course: RHEL technical overview Linux networking cheat sheet SELinux cheat sheet Linux common commands cheat sheet What are Linux containers? Our latest Linux articles Development/build

To make the code portable, the following tricks come into play:

  • CO-RE helpers/macros
  • BTF-defined maps
  • #include "vmlinux.h" (the header file containing all the kernel types)
Run

The kernel must be built with the CONFIG_DEBUG_INFO_BTF=y option in order to provide the /sys/kernel/btf/vmlinux interface that exposes BTF-formatted kernel types. This allows libbpf to resolve and match all the types and fields and update necessary offsets and other relocatable data to make sure that the eBPF program is working properly for the specific kernel on the target host.

The problem

The problem arises when an eBPF program is written to be portable but the target kernel doesn't expose the /sys/kernel/btf/vmlinux interface. For more information, refer to this list of distributions that support BTF.

To load an run an eBPF object in different kernels, the libbpf loader uses the BTF information to calculate field offset relocations. Without the BTF interface, the loader doesn't have the necessary information to adjust the previously recorded types that the program tries to access after processing the object for the running kernel.

Is it possible to avoid this problem?

Use cases

This article explores Tracee, an Aqua Security open source project, that provides a possible solution.

Tracee provides different running modes to adapt itself to the environment conditions. It supports two eBPF integration modes:

  • CO-RE: A portable mode, which seamlessly runs on all supported environments
  • Non CO-RE: A kernel-specific mode, requiring the eBPF object to be built for the target host

Both of them are implemented in the eBPF C code (pkg/ebpf/c/tracee.bpf.c), where the pre-processing conditional directive takes place. This allows you to compile CO-RE the eBPF binary, passing the -DCORE argument at build time with Clang (take a look at the bpf-core Make target).

In this article, we're going to cover a case of the portable mode when the eBPF binary is built CO-RE, but the target kernel has not been built with CONFIG_DEBUG_INFO_BTF=y option.

To better understand this scenario, it helps to understand what's possible when the kernel doesn't expose BTF-formatted types on sysfs.

No BTF support

If you want to run Tracee on a host without BTF support, there are two options:

  1. Build and install the eBPF object for your kernel. This depends on Clang and on the availability of a kernel version-specific kernel-headers package.
     
  2. Download the BTF files from BTFHUB for your kernel release and provide it to the tracee-ebpf's loader through the TRACEE_BTF_FILE environment variable.

The first option is not a CO-RE solution. It compiles the eBPF binary, including a long list of kernel headers. That means you need kernel development packages installed on the target system. Also, this solution needs Clang installed on your target machine. The Clang compiler can be resource-heavy, so compiling eBPF code can use a significant amount of resources, potentially affecting a carefully balanced production workload. That said, it's a good practice to avoid the presence of a compiler in your production environment. This could lead to attackers successfully building an exploit and performing a privilege escalation.

The second option is a CO-RE solution. The problem here is that you have to provide the BTF files in your system in order to make Tracee work. The entire archive is nearly 1.3 GB. Of course you can provide just the right BTF file for your kernel release, but that can be difficult when dealing with different kernel releases.

In the end, these possible solutions can also introduce problems, and that's where Tracee works its magic.

A portable solution

With a non-trivial building procedure, the Tracee project compiles a binary to be CO-RE even if the target environment doesn't provide BTF information. This is possible with the embed Go package that provides, at runtime, access to files embedded in the program. During the build, the continuous integration (CI) pipeline downloads, extracts, minimizes, and then embeds BTF files along with the eBPF object inside the tracee-ebpf resultant binary.

Tracee can extract the right BTF file and provide it to libbpf, which in turn loads the eBPF program to run across different kernels. But how can Tracee embed all these BTF files downloaded from BTFHub without weighing too much in the end?

It uses a feature recently introduced in bpftool by the Kinvolk team called BTFGen, available using the bpftool gen min_core_btf subcommand. Given an eBPF program, BTFGen generates reduced BTF files, collecting just what the eBPF code needs for its run. This reduction allows Tracee to embed all these files that are now lighter (just a few kilobytes) and support kernels that don't have the /sys/kernel/btf/vmlinux interface exposed.

Tracee build

Here's the execution flow of the Tracee build:

Image by:

(Alessio Greggi and Massimiliano Giovagnoli, CC BY-SA 4.0)

First, you must build the tracee-ebpf binary, the Go program that loads the eBPF object. The Makefile provides the command make bpf-core to build the tracee.bpf.core.o object with BTF records.

Then STATIC=1 BTFHUB=1 make all builds tracee-ebpf, which has btfhub targeted as a dependency. This last target runs the script 3rdparty/btfhub.sh, which is responsible for downloading the BTFHub repositories:

  • btfhub
  • btfhub-archive

Once downloaded and placed in the 3rdparty directory, the procedure executes the downloaded script 3rdparty/btfhub/tools/btfgen.sh. This script generates reduced BTF files, tailored for the tracee.bpf.core.o eBPF binary.

The script collects *.tar.xz files from 3rdparty/btfhub-archive/ to uncompress them and finally process them with bpftool, using the following command:

for file in $(find ./archive/${dir} -name *.tar.xz); do
    dir=$(dirname $file)
    base=$(basename $file)
    extracted=$(tar xvfJ $dir/$base)
    bpftool gen min_core_btf ${extracted} dist/btfhub/${extracted} tracee.bpf.core.o
done

This code has been simplified to make it easier to understand the scenario.

Now, you have all the ingredients available for the recipe:

  • tracee.bpf.core.o eBPF object
  • BTF reduced files (for all kernel releases)
  • tracee-ebpf Go source code

At this point, go build is invoked to do its job. Inside the embedded-ebpf.go file, you can find the following code:

//go:embed "dist/tracee.bpf.core.o"
//go:embed "dist/btfhub/*"

Here, the Go compiler is instructed to embed the eBPF CO-RE object with all the BTF-reduced files inside itself. Once compiled, these files will be available using the embed.FS file system. To have an idea of the current situation, you can imagine the binary with a file system structured like this:

dist
├── btfhub
│   ├── 4.19.0-17-amd64.btf
│   ├── 4.19.0-17-cloud-amd64.btf
│   ├── 4.19.0-17-rt-amd64.btf
│   ├── 4.19.0-18-amd64.btf
│   ├── 4.19.0-18-cloud-amd64.btf
│   ├── 4.19.0-18-rt-amd64.btf
│   ├── 4.19.0-20-amd64.btf
│   ├── 4.19.0-20-cloud-amd64.btf
│   ├── 4.19.0-20-rt-amd64.btf
│   └── ...
└── tracee.bpf.core.o

The Go binary is ready. Now to try it out!

Tracee run

Here's the execution flow of the Tracee run:

Image by:

(Alessio Greggi and Massimiliano Giovagnoli, CC BY-SA 4.0)

As the flow chart illustrates, one of the very first phases of tracee-ebpf execution is to discover the environment where it is running. The first condition is an abstraction of the cmd/tracee-ebpf/initialize/bpfobject.go file, specifically where the BpfObject() function takes place. The program performs some checks to understand the environment and make decisions based on it:

  1. BPF file given and BTF (vmlinux or env) exists: always load BPF as CO-RE
  2. BPF file given but no BTF exists: it is a non CO-RE BPF
  3. No BPF file given and BTF (vmlinux or env) exists: load embedded BPF as CO-RE
  4. No BPF file given and no BTF available: check embedded BTF files
  5. No BPF file given and no BTF available and no embedded BTF: non CO-RE BPF

Here's the code extract:

func BpfObject(config *tracee.Config, kConfig *helpers.KernelConfig, OSInfo *helpers.OSInfo) error {
        ...
        bpfFilePath, err := checkEnvPath("TRACEE_BPF_FILE")
        ...
        btfFilePath, err := checkEnvPath("TRACEE_BTF_FILE")
        ...
        // Decision ordering:
        // (1) BPF file given & BTF (vmlinux or env) exists: always load BPF as CO-RE
        ...
        // (2) BPF file given & if no BTF exists: it is a non CO-RE BPF
        ...
        // (3) no BPF file given & BTF (vmlinux or env) exists: load embedded BPF as CO-RE
        ...
        // (4) no BPF file given & no BTF available: check embedded BTF files
        unpackBTFFile = filepath.Join(traceeInstallPath, "/tracee.btf")
        err = unpackBTFHub(unpackBTFFile, OSInfo)
       
        if err == nil {
                if debug {
                        fmt.Printf("BTF: using BTF file from embedded btfhub: %v\n", unpackBTFFile)
                }
                config.BTFObjPath = unpackBTFFile
                bpfFilePath = "embedded-core"
                bpfBytes, err = unpackCOREBinary()
                if err != nil {
                        return fmt.Errorf("could not unpack embedded CO-RE eBPF object: %v", err)
                }
       
                goto out
        }
        // (5) no BPF file given & no BTF available & no embedded BTF: non CO-RE BPF
        ...
out:
        config.KernelConfig = kConfig
        config.BPFObjPath = bpfFilePath
        config.BPFObjBytes = bpfBytes
       
        return nil
}

This analysis focuses on the fourth case, when eBPF program and BTF files are not provided to tracee-ebpf. At that point, tracee-ebpf tries to load the eBPF program extracting all the necessary files from its embed file system. tracee-ebpf is able to provide the files that it needs to run, even in a hostile environment. It is a sort of high-resilience mode used when none of the conditions have been satisfied.

As you see, BpfObject() calls these functions in the fourth case branch:

  • unpackBTFHub()
  • unpackCOREBinary()

They extract respectively:

  • The BTF file for the underlying kernel
  • The BPF CO-RE binary
Unpack the BTFHub

Now take a look starting from unpackBTFHub():

func unpackBTFHub(outFilePath string, OSInfo *helpers.OSInfo) error {
        var btfFilePath string

        osId := OSInfo.GetOSReleaseFieldValue(helpers.OS_ID)
        versionId := strings.Replace(OSInfo.GetOSReleaseFieldValue(helpers.OS_VERSION_ID), "\"", "", -1)
        kernelRelease := OSInfo.GetOSReleaseFieldValue(helpers.OS_KERNEL_RELEASE)
        arch := OSInfo.GetOSReleaseFieldValue(helpers.OS_ARCH)

        if err := os.MkdirAll(filepath.Dir(outFilePath), 0755); err != nil {
                return fmt.Errorf("could not create temp dir: %s", err.Error())
        }

        btfFilePath = fmt.Sprintf("dist/btfhub/%s/%s/%s/%s.btf", osId, versionId, arch, kernelRelease)
        btfFile, err := embed.BPFBundleInjected.Open(btfFilePath)
        if err != nil {
                return fmt.Errorf("error opening embedded btfhub file: %s", err.Error())
        }
        defer btfFile.Close()

        outFile, err := os.Create(outFilePath)
        if err != nil {
                return fmt.Errorf("could not create btf file: %s", err.Error())
        }
        defer outFile.Close()

        if _, err := io.Copy(outFile, btfFile); err != nil {
                return fmt.Errorf("error copying embedded btfhub file: %s", err.Error())

        }

        return nil
}

The function has a first phase where it collects information about the running kernel (osId, versionId, kernelRelease, etc). Then, it creates the directory that is going to host the BTF file (/tmp/tracee by default). It retrieves the right BTF file from the embed file system:

btfFile, err := embed.BPFBundleInjected.Open(btfFilePath)

Finally, it creates and fills the file.

Unpack the CORE Binary

The unpackCOREBinary() function does a similar thing:

func unpackCOREBinary() ([]byte, error) {
        b, err := embed.BPFBundleInjected.ReadFile("dist/tracee.bpf.core.o")
        if err != nil {
                return nil, err
        }

        if debug.Enabled() {
                fmt.Println("unpacked CO:RE bpf object file into memory")
        }

        return b, nil
}

Once the main function BpfObject()returns, tracee-ebpf is ready to load the eBPF binary through libbpfgo. This is done in the initBPF() function, inside pkg/ebpf/tracee.go. Here's the configuration of the program execution:

func (t *Tracee) initBPF() error {
        ...
        newModuleArgs := bpf.NewModuleArgs{
                KConfigFilePath: t.config.KernelConfig.GetKernelConfigFilePath(),
                BTFObjPath:      t.config.BTFObjPath,
                BPFObjBuff:      t.config.BPFObjBytes,
                BPFObjName:      t.config.BPFObjPath,
        }

        // Open the eBPF object file (create a new module)

        t.bpfModule, err = bpf.NewModuleFromBufferArgs(newModuleArgs)
        if err != nil {
                return err
        }
        ...
}

In this piece of code we are initializing the eBPF args filling the libbfgo structure NewModuleArgs{}. Through its BTFObjPath argument, we are able to instruct libbpf to use the BTF file, previously extracted by the BpfObject() function.

At this point, tracee-ebpf is ready to run properly!

Image by:

(Alessio Greggi and Massimiliano Giovagnoli, CC BY-SA 4.0)

eBPF module initialization

Next, during the execution of the Tracee.Init() function, the configured arguments will be used to open the eBPF object file:

Tracee.bpfModule = libbpfgo.NewModuleFromBufferArgs(newModuleArgs)

Initialize the probes:

t.probes, err = probes.Init(t.bpfModule, netEnabled)

Load the eBPF object into kernel:

err = t.bpfModule.BPFLoadObject()

Populate eBPF maps with initial data:

err = t.populateBPFMaps()

And finally, attach eBPF programs to selected events' probes:

err = t.attachProbes() Conclusion

Just as eBPF simplified the way to program the kernel, CO-RE is tackling another barrier. But leveraging such features has some requirements. Fortunately, with Tracee, the Aqua Security team found a way to take advantage of portability in case those requirements can't be satisfied.

At the same time, we're sure that this is only the beginning of a continuously evolving subsystem that will find increasing support over and over, even in different operating systems.

By tracing processes using Linux eBPF (Berkeley packet filter) technology, Tracee can correlate collected information and identify malicious behavioral patterns.

Linux Security and privacy What to read next Using eBPF for network observability in the cloud This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 31 points Italy

Massimiliano Giovagnoli having always been fascinated by mathematics and computers, began his career as a web developer, and with a need to dive into how things work his interests and experience moved to infrastructure design and management.
With growing awareness and experience in operations and site reliability, he's now a cloud solution architect at Clastix, where he is responsible for cloud native infrastructures and to support the development of cloud native products.
Having fallen in love with Linux, Kubernetes, and the OSS ecosystem, started contributing to Falco.
Now he's a maintainer of the Falco's open infra.
He loves his wife, his cats, and the mountains. When he doesn't think about software, he loves to work out.

| Follow maxgio92 Open Enthusiast Author Register or Login to post a comment.

Make your own music tagging framework with Groovy

Wed, 08/31/2022 - 15:00
Make your own music tagging framework with Groovy Chris Hermansen Wed, 08/31/2022 - 03:00 1 reader likes this 1 reader likes this

In this series, I'm developing several scripts to help in cleaning up my music collection. In the last article I wrote and tested a Groovy script to clean up the motley assembly of tag fields. In this article, I'll separate the framework I've been using into a separate class and then write a test program to exercise it.

Install Java and Groovy

Groovy is based on Java and requires a Java installation. Both a recent and decent version of Java and Groovy might be in your Linux distribution's repositories. Groovy can also be installed following the instructions on the Groovy homepage. A nice alternative for Linux users is SDKMan, which can be used to get multiple versions of Java, Groovy and many other related tools. For this article, I'm using SDK's releases of:

  • Java: version 11.0.12-open of OpenJDK 11;
  • Groovy: version 3.0.8.
Back to the problem

If you haven't read parts 1-5 of this series, do that now so you understand the intended structure of my music directory, the framework created in that article and how we pick up FLAC, MP3 and OGG files.

The framework class

As I have mentioned a number of times, because of the music directory structure, we have a standard framework to read the artist subdirectories, the album sub-subdirectories, the music, and other files contained within. Rather than copying that code into each script, you should create a Groovy class that encapsulates the general framework behavior and delegates the application-specific behavior to scripts that call it.

More on Java What is enterprise Java programming? Red Hat build of OpenJDK Java cheat sheet Free online course: Developing cloud-native applications with microservices Fresh Java articles

Here's the framework, moved into a Groovy class:

 1    public class TagAnalyzerFramework {
   
 2        // called before any data is processed
 3        Closure atBeginning
   
 4        // called for each file to be processed
 5        Closure onEachLine
   
 6        // called after all data is processed
 7        Closure atEnd
   
 8        // the full path name to the music library
 9        String musicLibraryDirName
   
10        public void processMusicLibrary() {
11            // Before we start processing...
12            atBeginning()
13            // Iterate over each dir in music library
14            // These are assumed to be artist directories
 
15            new File(musicLibraryDirName).eachDir { artistDir ->
   
16                // Iterate over each dir in artist dir
17                // These are assumed to be album directories
18                artistDir.eachDir { albumDir ->
19                    // Iterate over each file in the album directory
20                    // These are assumed to be content or related
21                    // (cover.jpg, PDFs with liner notes etc)
22                    albumDir.eachFile { contentFile ->
   
23                        // Then on each line...
24                        onEachLine(artistDir, albumDir, contentFile)
25                    }
26                }
27            }
28            // And before we finish...
29            atEnd()
30        }
31    }

Line 1 introduces the public class name.

Lines 2-7 declare the three closures that the application script uses to define the specifics of the processing needed. This is called delegation of behavior.

Lines 8-9 declare the string holding the music directory file name.

Lines 10-30 declare the method that actually handles the processing.

Line 12 calls the Closure that is run before any data is processed.

Lines 15-27 loop over the artist/album/content file structure.

Line 24 calls the Closure that processes each content file.

Line 29 calls the Closure that is run after all data is processed.

I want to compile this class before I use it, as follows:

$ groovyc TagAnalyzerFramework.groovy$

That's it for the framework.

Using the framework in a script

Here's a simple script that prints out a bar-separated value listing of all the files in the music directory:

 1  int fileCount
 
 2  def myTagAnalyzer = new TagAnalyzerFramework()
 
 3  myTagAnalyzer.atBeginning = {
 4      // Print the CSV file header and initialize the file counter
 5      println "artistDir|albumDir|contentFile"
 6      fileCount = 0
 7  }
 
 8  myTagAnalyzer.onEachLine = { artistDir, albumDir, contentFile ->
 9      // Print the line for this file
10      println "$artistDir.name|$albumDir.name|$contentFile.name"
11      fileCount++
12  }
 
13  myTagAnalyzer.atEnd = {
14      // Print the file counter value
15      System.err.println "fileCount $fileCount"
16  }
 
17  myTagAnalyzer.musicLibraryDirName = '/home/clh/Test/Music'
 
18  myTagAnalyzer.processMusicLibrary()

Line 1 defines a local variable, fileCount, used to count the number of content files. Note that this variable doesn't need to be final.

Line 2 calls the constructor for the TagAnalyzerFramework class.

Line 3 does what looks like a mistake in Java. It appears to refer to a field in a foreign class. However, in Groovy this is actually calling a setter on that field, so it's acceptable, as long as the implementing class "remembers" that it has a contract to supply a setter for this property.

Lines 3-7 create a Closure that prints the bar-separated value header and initialize the fileCount variable.

Lines 8-12 similarly define the Closure that handles the logic for processing each line. In this case,it is simply printing the artist, album and content file names. If I refer back to line 24 of TagAnalyzerFramework, I see that it calls this Closure with three arguments corresponding to the parameters shown here.

Lines 13-16 define the Closure that wraps up the processing once all the data is read. In this case, it prints a count of files to standard error.

Line 17 sets the music library directory name.

And line 18 calls the method to process the music library.

Run the script:

$ groovy MyTagAnalyzer.groovy
artistDir|albumDir|contentFile
Bombino|Azel|07_Igmayagh_Dum_1.6.16.mp3
Bombino|Azel|08_Ashuhada_1.6.16.mp3
Bombino|Azel|04_Tamiditine_Tarhanam_1.6.16.mp3
Bombino|Azel|10_Naqqim_Dagh_Timshar_1.6.16.mp3
[...]
St Germain|Tourist|04_-_St Germain_-_Land Of....flac
fileCount 55
$

Of course the .class files created by compiling the framework class must be on the classpath for this to work. Naturally, I could use jar to package up those class files.

Those who are made queasy by what looks like setting fields in a foreign class could define local instances of closures and pass those as parameters, either to the constructor or processMusicLibrary(), and achieve the same effect.

I could go back to the code samples provided in the earlier articles to retrofit this framework class. I'll leave that exercise to the reader.

Delegation of behavior

To me, the coolest thing happening here is the delegation of behavior, which requires various shenanigans in other languages. For many years, Java required anonymous classes and quite a bit of extra code. Lambdas have gone a long way to fixing this, but they still cannot refer to non-final variables outside their scope.

That's it for this series on using Groovy to manage the tags in my music library. There will be more Groovy articles in the future.

I'll separate the framework I've been using into a separate class and then write a test program to exercise it.

Image by:

Opensource.com

Java Audio and music What to read next How I analyze my music directory with Groovy My favorite open source library for analyzing music files How I use Groovy to analyze album art in my music directory Clean up unwanted files in your music directory using Groovy Clean up music tags with a Groovy script This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

How we track the community health of our open source project

Wed, 08/31/2022 - 15:00
How we track the community health of our open source project Ruth Cheesley Wed, 08/31/2022 - 03:00 1 reader likes this 1 reader likes this

To be an effective leader in an open source community, you need a lot of information. How do I know who the most active members in my community are? Which companies are making the most contributions? Which contributors are drifting away and becoming inactive? Who in the community is knowledgeable about a specific topic?

These were just a few of the questions I had when I started leading the Mautic community at Acquia. But the problem was not a shortage of information. On the contrary, there were so many places our community interacted and so many things to track that I was drowning in data. I could access plenty of data sources, but they were not helping me manage the community effectively or answering my questions.

Tracking all the places

I needed to find a tool to bring all of this together and give the community leadership team a centralized place to see activity everywhere we have people discussing Mautic. More importantly, we needed a tool that could accurately track who was contributing in every way that we defined contributions.

I tried several tools, but the most promising was the open source Community Relationship Manager Savannah CRM, a relative newcomer to the market. What stood out to me in Savannah was its focus on contribution as well as community health. Other tools I reviewed either did not have a clear concept of contributions or did not cover all the places we wanted to track.

I started working locally by checking out the GitHub repository for the Django-based application and quickly began to see the power of bringing all of my metrics into one place. Straight away, I could see a list of new contributors, most active community members, organizations, and even an interactive display allowing me to see how contributors were connected with each other and across the different channels we use.

Image by:

(Michael Hall, CC BY-SA 4.0)

In the early days of using Savannah, this function helped identify potential leaders for teams and initiatives. The tagging feature also meant I could quickly find out who was talking about a specific topic and where those conversations were happening in the community.

As the community matured, notifications alerting me to contributors becoming inactive started to be really helpful in prompting a personal check-in with them. Using projects to track activity and contributor funnels in specific areas of our community has helped us spot where contributions are dropping off. Having the ability to "watch" community members who previously breached the code of conduct made it much easier to keep track of their future conduct and act swiftly if there were more incidents.

Over time we have moved to a hosted plan (mainly because we don't have the contributors to manage our own infrastructure at this time) and have continued to extend how we are using this tool.

It's really at the heart of everything we do in our community, and it helps me proactively manage our community. It supports everything from my monthly recognition shout-outs to determining whether an organization has a sustained history of contributing that would entitle them to become—and remain—a Community Partner.

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources Tracking all the open source contributions

Over the last two years, we have expanded what we track as a contribution in Mautic. Currently, the list includes:

  • Authoring a blog post on mautic.org
  • Creating a community-focused podcast episode
  • Making a pull request (PR) on any of our GitHub repositories
  • Reviewing a PR on any of our GitHub repositories
  • Completing a Jira issue on any of our Jira projects
  • Providing help or feedback on Slack
  • Having an answer accepted as a solution on the Discourse forums
  • Giving help on a Reddit thread
  • Organizing or speaking at an official Mautic event
  • Organizing or speaking at a Meetup
  • Having an answer to a question accepted on Stack Exchange

Most of these are available out of the box with Savannah, but some, such as reviewing a PR or completing a Jira issue, we implemented with the application programming interface (API) and integrations with automation tools.

We also track and highlight the folks who support and engage with others before they contribute, since this often helps the individual make that contribution in the future.

Tracking progress over time

We have several publicly shared reports, including:

Any report in Savannah and any screen can be shared publicly, making it a really easy way to share things with others.

Image by:

(Ruth Cheesley, CC BY-SA 4.0)

For us, it allows folks to see what is happening within the community and also offers a public way to recognize the organizations and individuals who are consistently contributing or engaging in the community.

New features in Savannah

We have experimented with some of the newer features in Savannah, such as tracking when we send swag to contributors and whether it affects future contributions. Another feature I am excited to look into allows us to flag a potential contributor opportunity—for example, if we come across someone we would like to support with writing for the blog, creating a meetup group, or submitting a new feature. Savannah then allows us to track the nurturing of that contributor.

There are often new features being added, which is great to see. Because it is an open source project, you can, of course, make your own PR to implement new features or fix bugs you come across.

So far, Savannah has been an excellent tool for tracking our community health in the Mautic community, and it has really helped us both track and recognize contributions across our far-reaching community. I hope that you find it useful in your communities too!

Mautic chose Savannah CRM to support community building and recognition efforts.

Image by:

Opensource.com

Community management What to read next 3 metrics to measure your open source community health Analyze community health metrics with this open source tool This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Clean up music tags with a Groovy script

Tue, 08/30/2022 - 15:00
Clean up music tags with a Groovy script Chris Hermansen Tue, 08/30/2022 - 03:00 1 reader likes this 1 reader likes this

Lately, I've been looking at how Groovy streamlines Java. In this series, I'm developing several scripts to help in cleaning up my music collection. In my last article, I used the framework developed previously to create a list of unique file names and counts of occurrences of those file names in the music collection directory. I then used the Linux find command to get rid of files I didn't want.

In this article, I demonstrate a Groovy script to clean up the motley assembly of tag fields.

WARNING: This script alters music tags, so it is vital that you make a backup of the music collection you test your code on.

Back to the problem

If you haven't read the previous articles is this series, do that now before continuing so you understand the intended structure of the music directory, the framework I've created, and how to detect and use FLAC, MP3, and OGG files.

Vorbis and ID3 tags

I don't have many MP3 music files. Generally, I prefer to use FLAC. But sometimes only MP3 versions are available, or a free MP3 download comes with a vinyl purchase. So in this script, I have to be able to handle both. One thing I've learned as I have become familiar with JAudiotagger is what ID3 tags (used by MP3) look like, and I discovered that some of those "unwanted" field tag IDs I uncovered in part 2 of this series are actually very useful.

Now it's time to use this framework to get a list of all the tag field IDs in a music collection, with their counts, to begin deciding what belongs and what doesn't:

1        @Grab('net.jthink:jaudiotagger:3.0.1')
2        import org.jaudiotagger.audio.*
3        import org.jaudiotagger.tag.*
4        def logger = java.util.logging.Logger.getLogger('org.jaudiotagger');
5        logger.setLevel(java.util.logging.Level.OFF);
6        // Define the music library directory
7        def musicLibraryDirName = '/var/lib/mpd/music'
8        // Define the tag field id accumulation map
9        def tagFieldIdCounts = [:]
10        // Print the CSV file header
11        println "tagFieldId|count"
12        // Iterate over each directory in the music libary directory
13        // These are assumed to be artist directories
14        new File(musicLibraryDirName).eachDir { artistDir ->
15            // Iterate over each directory in the artist directory
16            // These are assumed to be album directories
17            artistDir.eachDir { albumDir ->
18                // Iterate over each file in the album directory
19                // These are assumed to be content or related
20                // (cover.jpg, PDFs with liner notes etc)
21                albumDir.eachFile { contentFile ->
22                    // Analyze the file and print the analysis
23                    if (contentFile.name ==~ /.*\.(flac|mp3|ogg)/) {
24                        def af = AudioFileIO.read(contentFile)
25                        af.tag.fields.each { tagField ->
26                            tagFieldIdCounts[tagField.id] = tagFieldIdCounts.containsKey(tagField.id) ? tagFieldIdCounts[tagField.id] + 1 : 1
27                        }
28                    }
29                }
30            }
31        }
32        tagFieldIdCounts.each { key, value ->
33            println "$key|$value"
34        }

Lines 1-7 originally appeared in part 2 of this series.

Lines 8-9 define a map for accumulating tag field IDs and counts of occurrences.

Lines 10-21 also appeared in previous articles. They get down to the level of the individual content files.

Lines 23-28 ensures that the files being used are FLAC, MP3, or OGG. Line 23 uses a Groovy match operator ==~ with a slashy regular expression to filter out wanted files.

Line 24 uses org.jaudiotagger.audio.AudioFileIO.read() to get the tag body from the content file.

Lines 25-27 use org.jaudiotagger.tag.Tag.getFields() to get all the TagField instances in the tag body and the Groovy each() method to iterate over that list of instances.

Line 27 accumulates the count of each tagField.id into the tagFieldIdCounts map.

Finally, lines 32-24 iterate over the tagFieldIdCounts map printing out the keys (the tag field IDs found) and the values (the count of occurrences of each tag field ID).

I run this script as follows:

$ groovy TagAnalyzer5b.groovy > tagAnalysis5b.csv

Then I load the results into a LibreOffice or OnlyOffice spreadsheet. In my case, this script takes quite a long time to run (several minutes) and the loaded data, sorted in descending order of the second column (count) looks like this:

Image by:

(Chris Hermansen, CC BY-SA 4.0)

On row 2, you can see that there are 8,696 occurrences of the TITLE field tag ID, which is the ID that FLAC files (and Vorbis, generally) uses for a song title. Down on row 28, you also see 348 occurrences of the TIT2 field tag ID, which is the ID3 tag field that contains the "actual" name of the song. At this point, it's worth going away to look at the JavaDoc for org.jaudiotagger.tag.ide.framebody.FrameBodyTIT2 to learn more about this tag and the way in which JAudiotagger recognizes it. There, you also see the mechanisms to handle other ID3 tag fields.

In that list of field tag IDs, there are lots that I'm not interested in and that could affect the ability of various music players to display my music collection in what I consider to be a reasonable order.

More on Java What is enterprise Java programming? Red Hat build of OpenJDK Java cheat sheet Free online course: Developing cloud-native applications with microservices Fresh Java articles The org.jaudiotagger.tag.Tag interface

I'm going to take a moment to explore the way JAudiotagger provides a generic mechanism to access tag fields. This mechanism is described in the JavaDocs for org.jaudiotagger.tag.Tag. There are two methods that would help clean up the tag field situation:

void setField(FieldKey genericKey,String value)

This is used to set the value for a particular tag field.

This line is used to delete all instances of a particular tag field (turns out some tag fields in some tagging schemes permit multiple occurrences).

void deleteField(FieldKey fieldKey)

However, this particular deleteField() method requires us to supply a FieldKey value, and as I have discovered, not all field key IDs in my music collection correspond to a known FieldKey value.

Looking around the JavaDocs, I see there's a FlacTag which "uses Vorbis Comment for most of its metadata," and declares its tag field to be of type VorbisCommentTag.

VorbisCommentTag itself extends org.jaudiotagger.audio.generic.AbstractTag, which offers:

protected void deleteField(String key)

As it turns out, this is accessible from the tag instance returned by AudioFileIO.read(f).getTag(), at least for FLAC and MP3 tag bodies.

In theory, it should be possible to do this:

  1. Get the tag body using

    def af = AudioFileIO.read(contentFile)
    def tagBody = af.tag
  2. Get the values of the (known) tag fields I want using:

    def album = tagBody.getFirst(FieldKey.ALBUM)
    def artist = tagBody.getFirst(FieldKey.ARTIST)
    // etc
  3. Delete all tag fields (both wanted and unwanted) using:

    def originalTagFieldIdList = tagBody.fields.collect { tagField ->
    tagField.id
    }
    originalTagFieldIdList.each { tagFieldId ->
    tagBody.deleteField(tagFieldId)
    }
  4. Put only the desired tag fields back:

    tagBody.setField(FieldKey.ALBUM, album)
    tagBody.setField(FieldKey.ARTIST, artist)
    // etc

Of course there are few wrinkles here.

First, notice the use of the originalTagFieldIdList. I can't use each() to iterate over the iterator returned by tagBody.getFields() at the same time I modify those fields; so I get the tag field IDs into a list using collect(), then iterate over that list of tag field IDs to do the deletions.

Second, not all files are going to have all the tag fields I want. For example, some files might not have ALBUM_SORT_ORDER defined, and so on. I might not wish to write those tag fields in with empty values. Additionally, I can probably safely default some fields. For example, if ALBUM_ARTIST isn't defined, I can set it to ARTIST.

Third, and for me most obscure, is that Vorbis Comment tags always include a VENDOR field tag ID; if I try to delete it, I end up simply unsetting the value. Huh.

Trying it all out

Considering these lessons, I decided to create a test music directory that contains just a few artists and their albums (because I don't want to wipe out my music collection.)

WARNING: Because this script will alter music tags it is very important to have a backup of the music collection so that when I discover I have deleted an essential tag, I can recover the backup, modify the script and rerun it.

Here's the script:

1        @Grab('net.jthink:jaudiotagger:3.0.1')
2        import org.jaudiotagger.audio.*
3        import org.jaudiotagger.tag.*
4        def logger = java.util.logging.Logger.getLogger('org.jaudiotagger');5        logger.setLevel(java.util.logging.Level.OFF);
6        // Define the music library directory
7        def musicLibraryDirName = '/work/Test/Music'
8        // Print the CSV file header
9        println "artistDir|albumDir|contentFile|tagField.id|tagField.toString()"
10        // Iterate over each directory in the music libary directory
11        // These are assumed to be artist directories
12        new File(musicLibraryDirName).eachDir { artistDir ->
13    // Iterate over each directory in the artist directory
14    // These are assumed o be album directories
15    artistDir.eachDir { albumDir ->
16    // Iterate over each file in the album directory
17    // These are assumed to be content or related18    // (cover.jpg, PDFs with liner notes etc)
19    albumDir.eachFile { contentFile ->
20        // Analyze the file and print the analysis
21        if (contentFile.name ==~ /.*\.(flac|mp3|ogg)/) {
22            def af = AudioFileIO.read(contentFile)
23            def tagBody = af.tag
24            def album = tagBody.getFirst(FieldKey.ALBUM)
25            def albumArtist = tagBody.getFirst(FieldKey.ALBUM_ARTIST)
26            def albumArtistSort = tagBody.getFirst(FieldKey.ALBUM_ARTIST_SORT)
27            def artist = tagBody.getFirst(FieldKey.ARTIST)
28            def artistSort = tagBody.getFirst(FieldKey.ARTIST_SORT)
29            def composer = tagBody.getFirst(FieldKey.COMPOSER)
30            def composerSort = tagBody.getFirst(FieldKey.COMPOSER_SORT)
31            def genre = tagBody.getFirst(FieldKey.GENRE)
32            def title = tagBody.getFirst(FieldKey.TITLE)
33            def titleSort = tagBody.getFirst(FieldKey.TITLE_SORT)
34            def track = tagBody.getFirst(FieldKey.TRACK)
35            def trackTotal = tagBody.getFirst(FieldKey.TRACK_TOTAL)
36            def year = tagBody.getFirst(FieldKey.YEAR)
37            if (!albumArtist) albumArtist = artist
38            if (!albumArtistSort) albumArtistSort = albumArtist
39            if (!artistSort) artistSort = artist
40            if (!composerSort) composerSort = composer
41            if (!titleSort) titleSort = title
42            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.ALBUM|${album}"
43            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.ALBUM_ARTIST|${albumArtist}"
44            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.ALBUM_ARTIST_SORT|${albumArtistSort}"
45            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.ARTIST|${artist}"
46            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.ARTIST_SORT|${artistSort}"
47            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.COMPOSER|${composer}"
48            println "${artistDir.name}|${albumDir.name}|${contentFile.name}
|FieldKey.COMPOSER_SORT|${composerSort}"
49            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.GENRE|${genre}"
50            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.TITLE|${title}"
51            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.TITLE_SORT|${titleSort}"
52            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.TRACK|${track}"
53            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.TRACK_TOTAL|${trackTotal}"
54            println "${artistDir.name}|${albumDir.name}|${contentFile.name}|FieldKey.YEAR|${year}"
55            def originalTagIdList = tagBody.fields.collect {
56                tagField -> tagField.id
57            }
58            originalTagIdList.each { tagFieldId ->
59                println "${artistDir.name}|${albumDir.name}|${contentFile.name}|${tagFieldId}|XXX"
60                if (tagFieldId != 'VENDOR')
61                    tagBody.deleteField(tagFieldId)
62            }
63            if (album) tagBody.setField(FieldKey.ALBUM, album)
64            if (albumArtist) tagBody.setField(FieldKey.ALBUM_ARTIST, albumArtist)
65            if (albumArtistSort) tagBody.setField(FieldKey.ALBUM_ARTIST_SORT, albumArtistSort)
66            if (artist) tagBody.setField(FieldKey.ARTIST, artist)
67            if (artistSort) tagBody.setField(FieldKey.ARTIST_SORT, artistSort)
68            if (composer) tagBody.setField(FieldKey.COMPOSER, composer)
69            if (composerSort) tagBody.setField(FieldKey.COMPOSER_SORT, composerSort)
70            if (genre) tagBody.setField(FieldKey.GENRE, genre)
71            if (title) tagBody.setField(FieldKey.TITLE, title)
72            if (titleSort) tagBody.setField(FieldKey.TITLE_SORT, titleSort)
73            if (track) tagBody.setField(FieldKey.TRACK, track)
74            if (trackTotal) tagBody.setField(FieldKey.TRACK_TOTAL, trackTotal)
75            if (year) tagBody.setField(FieldKey.YEAR, year)
76            af.commit()77        }
78      }
79    }
80  }

Lines 1-21 are already familiar. Note that my music directory defined in line 7 refers to a test directory though!

Lines 22-23 get the tag body.

Lines 24-36 get the fields of interest to me (but maybe not the fields of interest to you, so feel free to adjust for your own requirements!)

Lines 37-41 adjust some values for missing ALBUM_ARTIST and sort order.

Lines 42-54 print out each tag field key and adjusted value for posterity.

Lines 55-57 get the list of all tag field IDs.

Lines 58-62 prints out each tag field id and deletes it, with the exception of the VENDOR tag field ID.

Lines 63-75 set the desired tag field values using the known tag field keys.

Finally, line 76 commits the changes to the file.

The script produces output that can be imported into a spreadsheet.

I'm just going to mention one more time that this script alters music tags! It is very important to have a backup of the music collection so that when you discover you've deleted an essential tag, or somehow otherwise trashed your music files, you can recover the backup, modify the script, and rerun it.

Check the results with this Groovy script

I have a handy little Groovy script to check the results:

1        @Grab('net.jthink:jaudiotagger:3.0.1')
2        import org.jaudiotagger.audio.*
3        import org.jaudiotagger.tag.*
 
4        def logger = java.util.logging.Logger.getLogger('org.jaudiotagger');
5        logger.setLevel(java.util.logging.Level.OFF);
 
6        // Define the music libary directory
 
7        def musicLibraryDirName = '/work/Test/Music'
 
8        // Print the CSV file header
 
9        println "artistDir|albumDir|tagField.id|tagField.toString()"
 
10        // Iterate over each directory in the music libary directory
11        // These are assumed to be artist directories
 
12        new File(musicLibraryDirName).eachDir { artistDir ->
 
13            // Iterate over each directory in the artist directory
14            // These are assumed to be album directories
 
15            artistDir.eachDir { albumDir ->
 
16                // Iterate over each file in the album directory
17                // These are assumed to be content or related
18                // (cover.jpg, PDFs with liner notes etc)
 
19                albumDir.eachFile { contentFile ->
 
20                    // Analyze the file and print the analysis
 
21                    if (contentFile.name ==~ /.*\.(flac|mp3|ogg)/) {
22                        def af = AudioFileIO.read(contentFile)
23                        af.tag.fields.each { tagField ->
24                            println "${artistDir.name}|${albumDir.name}|${tagField.id}|${tagField.toString()}"
25                        }
26                    }
 
27                }
28            }
29        }

This should look pretty familiar by now!

Running it produces results like this before running the fixer script in the previous section:

St Germain|Tourist|VENDOR|reference libFLAC 1.1.4 20070213
St Germain|Tourist|TITLE|Land Of...
St Germain|Tourist|ARTIST|St Germain
St Germain|Tourist|ALBUM|Tourist
St Germain|Tourist|TRACKNUMBER|04
St Germain|Tourist|TRACKTOTAL|09
St Germain|Tourist|GENRE|Electronica
St Germain|Tourist|DISCID|730e0809
St Germain|Tourist|MUSICBRAINZ_DISCID|jdWlcpnr5MSZE9H0eibpRfeZtt0-
St Germain|Tourist|MUSICBRAINZ_SORTNAME|St Germain

Once the fixer script is run, it produces results like this:

St Germain|Tourist|VENDOR|reference libFLAC 1.1.4 20070213
St Germain|Tourist|ALBUM|Tourist
St Germain|Tourist|ALBUMARTIST|St Germain
St Germain|Tourist|ALBUMARTISTSORT|St Germain
St Germain|Tourist|ARTIST|St Germain
St Germain|Tourist|ARTISTSORT|St Germain
St Germain|Tourist|GENRE|Electronica
St Germain|Tourist|TITLE|Land Of...
St Germain|Tourist|TITLESORT|Land Of...
St Germain|Tourist|TRACKNUMBER|04
St Germain|Tourist|TRACKTOTAL|09

That's it! Now I just have to work up the nerve to run my fixer script on my full music library…

I demonstrate a Groovy script to clean up the motley assembly of tag fields.

Image by:

WOCinTech Chat. Modified by Opensource.com. CC BY-SA 4.0

Java Audio and music What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Share screens on Linux with GNOME Connections

Tue, 08/30/2022 - 15:00
Share screens on Linux with GNOME Connections Seth Kenlon Tue, 08/30/2022 - 03:00 1 reader likes this 1 reader likes this

When someone needs to share their screen with you, or you need to share your screen with someone else, you have several options to choose from. Video conferencing software, like the open source Jitsi web app, and while we call that "screen sharing," it's really presenting. You're presenting your screen to others, but they can't interact with it. Sometimes you actually want to share your screen and your mouse cursor with a trusted friend or colleague, and the tool for that is VNC (Virtual Network Computing), and it's built into your Linux desktop.

In any screen sharing scenario, there are two computers and possibly two users. For that reason, this article has two parts. The first part is for the person setting up their computer to accept screen sharing requests, and the second part is for the person trying to connect to someone else's screen.

More Linux resources Linux commands cheat sheet Advanced Linux commands cheat sheet Free online course: RHEL technical overview Linux networking cheat sheet SELinux cheat sheet Linux common commands cheat sheet What are Linux containers? Our latest Linux articles Share my screen on Linux

If you're reading this section, you're the person who needs technical help from a friend, and you want to allow your friend to connect to your screen. You need to configure your desktop to allow screen sharing.

On the GNOME desktop, open the Settings application from the Activities menu. In the Settings window, click on Sharing. In the Sharing window, click on Screen Sharing.

In the Screen Sharing window that appears, you have two choices.

You can set a password so the person connecting to your screen must enter a password to connect. This is convenient when you don't expect to be around the computer when your friend plans on viewing your screen.

You can require a notification so that when someone attempts to connect, you're prompted to let them in (or not.)

Image by:

(Seth Kenlon, CC BY-SA 4.0)

If you're on the KDE Plasma Desktop, then the application to configure screer sharing is called krfb (it stands for "Remote Frame Buffer", the protocol used by VNC). It's the exact same concept, just with a different layout.

Image by:

(Seth Kenlon, CC BY-SA 4.0)

Firewall

Normally, your computer's internal firewall keeps people out of your computer. It does that by indiscriminately blocking incoming all connections. In this case, though, you want to permit one kind of traffic, so you need to open a port in your firewall.

On Fedora, CentOS, Mageia, and many other Linux distributions, you have a firewall whether you know it or not. You may not yet have an app to help you configure your firewall, though. To install the default firewall configuration application, launch GNOME Software and search for firewall.

Once it's installed, launch the Firewall configuration application and scroll through the (very long) list of services to find and enable vnc-server.

Image by:

(Seth Kenlon, CC BY-SA 4.0)

After adding vnc-server, open the Options menu and select Runtime to permanent so your new rule persists even after you reboot.

On Debian, Ubuntu, Linux Mint, and others, you may be running a firewall called ufw, so install gufw instead. In gufw, click the plus (+) icon at the bottom of the Rules tab to add a new rule. In the Add a new firewall rure window that appears, search for vnc and click the Add button.

Image by:

(Seth Kenlon, CC BY-SA 4.0)

Your computer is now configured to accept VNC requests. You can skip down to the [troubleshooting] section.

Viewing a shared screen

If you're reading this section, you're the person providing technical help from afar. You need to connect to a friend or colleague's computer, view their screen, and even control their mouse and keyboard. There are many applications for that, including TigerVNC, KDE's krdc, and GNOME Connections.

GNOME Connections

On your local computer, install the GNOME Connections application from GNOME Software, or using your package manager:

$ sudo dnf install gnome-connections

In GNOME Connections, click the plus (+) icon in the top left to add a destination host. Select the VNC protocol, and enter the user name and host or IP address you want to connect to, and then click the Connect button.

Image by:

(Seth Kenlon, CC BY-SA 4.0)

If the user you're connecting to has had to create a new port for the purposes of port forwarding, then you must append the non-default port to the address. For instance, say your target user has created port 59001 to accept VNC traffic, and their home router address is 93.184.216.34. In this case, you enter username@93.184.216.34:59001 (where username is the user's actual user name.)

If the user of the remote system has required a password for VNC, then you're prompted for a password before the connection is made. Otherwise, the user on the remote machine receives an alert asking whether they want to allow you to share their screen. As long as they accept, the connection is made and you can view and even control the mouse and keyboard of the remote host.

Troubleshoooting screen sharing on Linux

Outside of the work environment, it's common that the user wanting to share their screen and the person who needs to see it are on different networks. You're probably at home, with a router that connects you to the Internet (it's the box you get from your ISP when you pay your Internet bill). Your router, whether you realize it or not, is designed to keep unwanted visitors out. That's normally very good, but in this one special case, you want to let someone trusted through so they can connect to your screen.

To let someone into your network, you have to configure your router to allow traffic at a specific "port" (like a ship port, but for packets of data instead of cargo), and then configure that traffic to get forwarded on to your personal computer.

Unfortunately, there's no single way that this is done. Every router manufacturer does it a little differently. That means I can't guide you through the exact steps required, because I don't know what router you have, but I can tell you what information you need up front, and what to look for once you're poking around your router.

1. Get your local IP address

You need to know your computer's network IP address. To get that, open GNOME Settings and click on Wi-Fi in the left column (or Network if you're on a wired connection.) In the Wi-Fi panel, click the gear icon and find IPv4 Adress in the Details window that appears. A local IP address starts with 192.168 or 10.

For example, my network IP address is 10.0.1.2. Write down your notwork IP address for later.

2. Get your public IP address

Click this link to obtain your public IP address: http://ifconfig.me

For example, my public IP address is 93.184.216.34 Write down your public IP address for later.

3. Configure your router

Router interfaces differ from manufacturer to manufacturer, but the idea is the same regardless of what brand of router you have in your home. First, log in to your router. The router's address and login information is often printed on the router itself, or in its documentation. I own a TP-Link GX90 router, and I log in to it by pointing my web browser to 10.0.1.1, but your router might be 192.168.0.1 or some other address.

My router calls port forwarding "Virtual servers," which is a category found in the router's NAT forwarding tab. = Other routers may just call it Port forwarding or Firewall or even Applications. It may take a little clicking around to find the right category, or you may need to spend some time studying your router's documentation.

When you find the port forwarding setting (whatever it might be titled in your router), you need to add a new rule that identifies an external port (I use 59001) and sends traffic that arrives at it to an internal one (5900 is the standard VNC port.)

In step 1, you obtained your network IP address. Use it as the destination for traffic coming to port 59001 of your router. Here's an example of what my router configuration looks like, but yours is almost sure to be different:

Image by:

(Seth Kenlon, CC BY-SA 4.0)

This configuration sends traffic arriving at external port 59001 to 10.0.1.2 at port 5900, which is precisely what VNC requires.

Now you can tell the friend you're trying to share your screen with to enter your public IP address (in this example, that's 93.184.216.34) and port 59001.

Linux screen sharing and trust

Only share control of your screen with someone you trust. VNC can be complex to setup because there are security and privacy concerns around giving someone other than yourself access to you computer. However, once you've got it set up, you have instant and easy access to sharing your screen when you want to share something cool you're working on, or get help with something that's been confusing you.

Discover the power of VNC for screen sharing on Linux.

Image by:

Opensource.com

Linux What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Clean up unwanted files in your music directory using Groovy

Mon, 08/29/2022 - 15:00
Clean up unwanted files in your music directory using Groovy Chris Hermansen Mon, 08/29/2022 - 03:00 Register or Login to like Register or Login to like

In this series, I'm developing several scripts to help in cleaning up my music collection. In the last article, we used the framework created for analyzing the directory and sub-directories of music files, checking to make sure each album has a cover.jpg file and recording any other files that aren't FLAC, MP3, or OGG.

I uncovered a few files that can obviously be deleted—I see the odd foo lying around—and a bunch of PDFs, PNGs, and JPGs that are album art. With that in mind, and thinking about the cruft removal task, I offer an improved script that uses a Groovy map to record file names and counts of their occurrences and print that in CSV format.

More on Java What is enterprise Java programming? Red Hat build of OpenJDK Java cheat sheet Free online course: Developing cloud-native applications with microservices Fresh Java articles Get started analyzing with Groovy

If you haven't already, read the first three articles of this series before continuing:

They'll ensure you understand the intended structure of my music directory, the framework created in that article, and how to pick up FLAC, MP3, and OGG files. In this article, I facilitate removing unwanted files in the album directories.

The framework and the album files analysis bits

Start with the code. As before, I've incorporated comments in the script that reflect the (relatively abbreviated) "comment notes" that I typically leave for myself:

1        // Define the music libary directory
2        // def musicLibraryDirName = '/var/lib/mpd/music'
3        // Define the file name accumulation map
4        def fileNameCounts = [:]
5        // Print the CSV file header
6        println "filename|count"
7        // Iterate over each directory in the music libary directory
8        // These are assumed to be artist directories
9        new File(musicLibraryDirName).eachDir { artistDir ->
10            // Iterate over each directory in the artist directory
11            // These are assumed to be album directories
12            artistDir.eachDir { albumDir ->
13                // Iterate over each file in the album directory
14                // These are assumed to be content or related
15                // (cover.jpg, PDFs with liner notes etc)
16                albumDir.eachFile { contentFile ->
17                    // Analyze the file
18                    if (contentFile.name ==~ /.*\.(flac|mp3|ogg)/) {
19                        // nothing to do here
20                    } else if (contentFile.name == 'cover.jpg') {
21                        // don't need to do anything with cover.jpg
22                    } else {
23                        def fn = contentFile.name
24                        if (contentFile.isDirectory())
25                            fn += '/'
26                        fileNameCounts[fn] = fileNameCounts.containsKey(fn) ?  fileNameCounts[fn] + 1 : 1
27                    }
28                }
29            }
30        }
31        // Print the file name counts
32        fileNameCounts.each { key, value ->
33            println "$key|$value"
34        }

This is a pretty straightforward set of modifications to the original framework.

Lines 3-4 define fileNameCount, a map for recording file name counts.

Lines 17-27 analyze the file names. I avoid any files ending in .flac, .mp3 or .ogg as well as cover.jpg files.

Lines 23-26 record file names (as keys to fileNameCounts) and counts (as values). If the file is actually a directory, I append a / to help deal with it in the removal process. Note in line 26 that Groovy maps, like Java maps, need to be checked for the presence of the key before incrementing the value, unlike for example the awk programming language.

That's it!

I run this as follows:

$ groovy TagAnalyzer4.groovy > tagAnalysis4.csv

Then I load the resulting CSV into a LibreOffice spreadsheet by navigating to the Sheet menu and selecting Insert sheet from file. I set the delimiter character to &$124;.

Image by:

(Chris Hermansen, CC BY-SA 4.0)

I've sorted this in decreasing order of the column count to emphasize repeat offenders. Note as well on lines 17-20 a bunch of M3U files that refer to the name of the album, probably created by some well-intentioned ripping program. I also see, further down (not shown), files like fix and fixtags.sh, evidence of prior efforts to clean up some problem and leaving other cruft lying around in the process. I use the find command line utility to get rid of some of these files, along the lines of:

$ find . \( -name \*.m3u -o -name tags.txt -o -name foo -o -name .DS_Store \
-o -name fix -o -name fixtags.sh \) -exec rm {} \;

I suppose I could have used another Groovy script to do that as well. Maybe next time.

In this demonstration, I facilitate removing unwanted files in the album directories.

Image by:

Opensource.com

Java Audio and music What to read next How I analyze my music directory with Groovy My favorite open source library for analyzing music files How I use Groovy to analyze album art in my music directory This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

4 ways to use the Linux tar command

Mon, 08/29/2022 - 15:00
4 ways to use the Linux tar command AmyJune Hineline Mon, 08/29/2022 - 03:00 Register or Login to like Register or Login to like

When you have a lot of related files, it's sometimes easier to treat them as a single object rather than 3 or 20 or 100 unique files. There are fewer clicks involved, for instance, when you email one file compared to the mouse work required to email 30 separate files. This quandary was solved decades ago when programmers invented a way to create an archive, and so the tar command was born (the name stands for tape archive because back then, files were saved to magnetic tape.) Today tar remains a useful way to bundle files together, whether it's to compress them so they take up less space on your drive, to make it easier to deal with lots of files, or to logically group files together as a convenience.

I asked Opensource.com authors how they used tar, and related tools like zip and gzip, in their daily work. Here's what they said.

More Linux resources Linux commands cheat sheet Advanced Linux commands cheat sheet Free online course: RHEL technical overview Linux networking cheat sheet SELinux cheat sheet Linux common commands cheat sheet What are Linux containers? Our latest Linux articles Backups and logs

I use tar and zip whenever I need to make a backup or archive of an entire directory tree. For example, delivering a set of files to a client, or just making a quick backup of my web root directory before I make a major change on the website. If I need to share with others, I create a ZIP archive with zip -9r, where -9 uses best possible compression, and -r will recurse into subdirectories. For example, zip -9r client-delivery.zip client-dir makes a zip file of my work, which I can send to a client.

If the backup is just for me, I probably use tar instead. When I use tar, I usually use gzip to compress, and I do it all on one command line with tar czf, where c will create a new archive file, z compresses it with gzip, and f sets the archive filename. For example, tar czf web-backup.tar.gz html creates a compressed backup of my html directory.

I also have web applications that create log files. And to keep them from taking up too much space, I compress them using gzip. The gzip command is a great way to compress a single file. This can be a TAR archive file, or just any regular file like a log file. To make the gzipped file as small as possible, I compress the file with gzip -9, where -9 uses the best possible compression.

The great thing about using gzip to compress files is that I can use commands like zcat and zless to view them later, without having to uncompress them on the disk. So if I want to look at my log file from yesterday, I can use zless yesterday.log.gz and the zless command automatically uncompresses the data with gunzip and send it to the less viewer. Recently, I wanted to look at how many log entries I had per day, and I ran that with a zcat command like:

for f in *.log.gz; do echo -n "$f,"; zcat $f | wc -l; done

This generates a comma-separated list of log files and a line count, which I can easily import to a spreadsheet for analysis.

Jim Hall

Zcat

I introduced the zcat command in my article Getting started with the cat command. Maybe this can act as a stimulus for further discussion of "in-place" compressed data analysis.

Alan Formy-Duval

Zless and lzop

I love having zless to browse log files and archives. It really helps reduce the risk of leaving random old log files around that I haven't cleaned up.

When dealing with compressed archives, tar -zxf and tar -zcf are awesome, but don't forget about tar -j for those bzip2 files, or even tar -J for the highly compressed xz files.

If you're dealing with a platform with limited CPU resources, you could even consider a lower overhead solution like lzop. For example, on the source computer:

tar --lzop -cf - source_directory | nc destination-host 9999

On the destination computer:

nc -l 9999 | tar --lzop -xf -

I've often used that to compress data between systems where we have bandwidth limitations and need a low resource option.

Steven Ellis

Ark

I've found myself using the KDE application Ark lately. It's a GUI application, but it integrates so well with the Dolphin file manager that I've gotten into the habit of just updating files straight into an archive without even bothering to unarchive the whole thing. Of course, you can do the same thing with the tar command, but if you're browsing through files in Dolphin anyway, Ark makes it quick and easy to interact with an archive without interrupting your current workflow.

Image by:

(Seth Kenlon, CC BY-SA 4.0)

Archives used to feel a little like a forbidden vault to me. Once I put files into an archive, they were as good as forgotten because it just isn't always convenient to interact with an archive. But Ark lets you preview files without uncompressing them (technically they're being uncompressed, but it doesn't "feel" like they are because it all happens in place), remove a file from an archive, update files, rename files, and a lot more. It's a really nice and dynamic way to interact with archives, which encourages me to use them more often.

Seth Kenlon

How do you use the tar command? That's what I recently asked our community of writers. Here are some of their answers.

Image by:

WOCinTech Chat. Modified by Opensource.com. CC BY-SA 4.0

Command line Opensource.com community Linux What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 174 points Auckland NZ

Steven is an Open Source Advocate and Technologist working for Red Hat since they opened their New Zealand office in May 2011. Over the last 20+ year's he's helped numerous business see the benefit in adopting a wide range of Open Source technologies, and  has spoken at a number of regional and international conferences including OSDC, linux.conf.au, OpenStack Summit, Linux World and OSCON.

In his spare time he still hacks on MythTV, and debugging random new bits of hardware that really should know better.

Follow him on Twitter at @StevensHat.

| Follow StevensHat Open Minded Linux OpenStack SysAdmin CentOS Fedora Geek Ubuntu Author 4906 points (Correspondent) Minnesota

Jim Hall is an open source software advocate and developer, best known for usability testing in GNOME and as the founder + project coordinator of FreeDOS. At work, Jim is CEO of Hallmentum, an IT executive consulting company that provides hands-on IT Leadership training, workshops, and coaching.

| Follow jimfhall Open Source Sensei People's Choice Award People's Choice Award 2018 Correspondent Contributor Club Author 4455 points (Correspondent) United States

Alan has 20 years of IT experience, mostly in the Government and Financial sectors. He started as a Value Added Reseller before moving into Systems Engineering. Alan's background is in high-availability clustered apps. He wrote the 'Users and Groups' and 'Apache and the Web Stack' chapters in the Oracle Press/McGraw Hill 'Oracle Solaris 11 System Administration' book. He earned his Master of Science in Information Systems from George Mason University. Alan is a long-time proponent of Open Source Software.

| Follow AlanFormy_Duval Open Source Sensei People's Choice Award Apache DevOps Gamer Linux SysAdmin Geek Java Contributor Club Author Comment Gardener Correspondent 27105 points (Team, Red Hat) New Zealand (South Island)

Seth Kenlon is a UNIX geek, free culture advocate, independent multimedia artist, and D&D nerd. He has worked in the film and computing industry, often at the same time. He is one of the maintainers of the Slackware-based multimedia production project Slackermedia.

Open Source Super Star Moderator's Choice Award 2011 100+ Contributions Club Best Interview Award 2017 Author Columnist Contributor Club Register or Login to post a comment.

How I use Groovy to analyze album art in my music directory

Sun, 08/28/2022 - 15:00
How I use Groovy to analyze album art in my music directory Chris Hermansen Sun, 08/28/2022 - 03:00 Register or Login to like Register or Login to like

In this series, I'm developing several scripts to help in cleaning up my music collection. In the last article, I used the framework I created for analyzing the directory and sub-directories of music files and carried out the analysis with the fine open source JAudiotagger library to analyze the tags of the music files in the music directory and subdirectories. In this article, I will do a simpler job:

  1. Use the framework we created in Part 1
  2. Make sure each album directory has a cover.jpg class
  3. Make a note of any other files in the album directory that aren't FLAC, MP3 or OGG.
Music and metadata

If you haven't read part 1 and part 2 of this series, do that now so you understand the intended structure of my music directory, the framework created in that article, and how to pick up FLAC, MP3, and OGG files.

One more thing. Most audio ripping applications and many downloads:

  • Don't come with a useful cover.jpg file
  • Even if they do come with a useful cover.jpg file, they don't link the media files to it
  • Carry in all sorts of other files of dubious utility (for example, playlist.m3u, which gets created by a tagging utility I've used in the past)

As I mentioned in my last article, the ultimate goal of this series is to create a few useful scripts to help identify missing or unusual tags and facilitate the creation of a work plan to fix tagging problems. This particular script looks for missing cover.jpg files and unwanted non-media files, and creates a CSV file that you can load into LibreOffice or OnlyOffice to look for problems. It won't look at the media files themselves, nor does it look for extraneous files left in the artist subdirectories (those are exercises left for the reader).

More on Java What is enterprise Java programming? Red Hat build of OpenJDK Java cheat sheet Free online course: Developing cloud-native applications with microservices Fresh Java articles The framework and album files analysis

Start with the code. As before, I've incorporated comments in the script that reflect the (relatively abbreviated) "comment notes" that I typically leave for myself:

     1  // Define the music library directory
       
     2  def musicLibraryDirName = '/var/lib/mpd/music'
       
     3  // Print the CSV file header
       
     4  println "artist|album|cover|unwanted files"
       
     5  // Iterate over each directory in the music libary directory
     6  // These are assumed to be artist directories

     7  new File(musicLibraryDirName).eachDir { artistDir ->
       
     8      // Iterate over each directory in the artist directory
     9      // These are assumed to be album directories
       
    10      artistDir.eachDir { albumDir ->
       
    11          // Iterate over each file in the album directory
    12          // These are assumed to be content or related
    13          // (cover.jpg, PDFs with liner notes etc)
       
    14          // Initialize the counter for cover.jpg
    15          // and the list for unwanted file names
       
    16          def coverCounter = 0
    17          def unwantedFileNames = []
       
    18          albumDir.eachFile { contentFile ->
       
    19              // Analyze the file
       
    20              if (contentFile.name ==~ /.*\.(flac|mp3|ogg)/) {
    21                  // nothing to do here
    22              } else if (contentFile.name == 'cover.jpg') {
    23                  coverCounter++
    24              } else {
    25                  unwantedFileNames << contentFile.name
    26              }
       
    27          }
    28          println "${artistDir.name}|${albumDir.name}|$coverCounter|${unwantedFileNames.join(',')}"
    29      }
    30  }

Lines 1-2 define the name of the music file directory.

Line 3-4 print the CSV file header.

Lines 5-13 come from the framework created in Part 1 of this article and get down to the album sub-subdirectories.

Lines 14-17 set up the cover.jpg counter (should only ever be zero or one) and the empty list in which we will accumulate unwanted file names.

Lines 18-27 analyze any files found in the album directories:

Lines 20-21 uses the Groovy match operator ==~ and a "slashy" regular expression to check file name patterns. Nothing is done with these files (see Part 2 for that information).

Lines 22-23 count the instances of cover.jpg (it should only ever be zero or one).

Lines 24-26 record the names of any non-media, non-cover.jpg files to show potential cruft or who-knows-what in the album directories.

Line 28 prints out the artist name, album name, cover.jpg count and list of unwanted file names.

That’s it!

Running the code

Typically, I run this as follows:

$ groovy TagAnalyzer3.groovy > tagAnalysis3.csv

Then I load the resulting CSV into a spreadsheet. For example, with LibreOffice Calc , go to the Sheet menu and select Insert sheet from file. When prompted, set the delimiter character to |. In my case, the results look like this:

Image by:

(Chris Hermansen, CC BY-SA 4.0)

I've sorted this in increasing order of the column "cover" to show album sub-subsubdirectories that don't have cover.jpg files. Note that some have cover.png instead. My experience with music players is that at least some don't play well with PNG format cover images.

Also, note that some of these have PDF liner notes, extra image files, M3U playlists, and so on. In my next article, I'll show you how to manage some of the cruft.

Here's how I use open source tools to analyze my music directory including album cover files.

Image by:

Opensource.com

Java Audio and music What to read next How I analyze my music directory with Groovy My favorite open source library for analyzing music files This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

My favorite open source library for analyzing music files

Sat, 08/27/2022 - 15:00
My favorite open source library for analyzing music files Chris Hermansen Sat, 08/27/2022 - 03:00 1 reader likes this 1 reader likes this

In my previous article, I created a framework for analyzing the directories and subdirectories of music files, using the groovy.File class, which extends and streamlines java.File and simplifies its use. In this article, I use the open source JAudiotagger library to analyze the tags of the music files in the music directory and subdirectories. Be sure to read the first article in this series if you intend to follow along.

Install Java and Groovy

Groovy is based on Java, and requires a Java installation. Both a recent and decent version of Java and Groovy might be in your Linux distribution's repositories. Groovy can also be installed directly from the Apache Foundation website. A nice alternative for Linux users is SDKMan, which can be used to get multiple versions of Java, Groovy, and many other related tools. For this article, I use SDK's releases of:

  • Java: version 11.0.12-open of OpenJDK 11
  • Groovy: version 3.0.8
Back to the problem

In the 15 or so years that I've been carefully ripping my CD collection and increasingly buying digital downloads, I have found that ripping programs and digital music download vendors are all over the map when it comes to tagging music files. Sometimes, my files are missing tags that can be useful to music players, such as ALBUMSORT. Sometimes this means my files are full of tags I don't care about, such as MUSICBRAINZ_DISCID, that cause some music players to change the order of presentation in obscure ways, so that one album appears to be many, or sorts in a strange order.

Given that I have nearly 10,000 tracks in nearly 700 albums, it's quite nice when my music player manages to display my collection in a reasonably understandable order. Therefore, the ultimate goal of this series is to create a few useful scripts to help identify missing or unusual tags and facilitate the creation of a work plan to fix tagging problems. This particular script analyzes the tags of music files and creates a CSV file that I can load into LibreOffice or OnlyOffice to look for problems. It won't look at missing cover.jpg files nor show album sub-subdirectories that contain other files, because this isn't relevant at the music file level.

More on Java What is enterprise Java programming? Red Hat build of OpenJDK Java cheat sheet Free online course: Developing cloud-native applications with microservices Fresh Java articles My Groovy framework plus JAudiotagger

Once again, start with the code. As before, I've incorporated comments in the script that reflect the (relatively abbreviated) "comment notes" that I typically leave for myself:

     1  @Grab('net.jthink:jaudiotagger:3.0.1')
     2  import org.jaudiotagger.audio.*
       
     3  def logger = java.util.logging.Logger.getLogger('org.jaudiotagger');
     4  logger.setLevel(java.util.logging.Level.OFF);
       
     5  // Define the music library directory
       
     6  def musicLibraryDirName = '/var/lib/mpd/music'
       
     7  // These are the music file tags we are happy to see
     8  // Some tags can occur more than once in a given file
       
     9  def wantedFieldIdSet = ['ALBUM', 'ALBUMARTIST',
    10      'ALBUMARTISTSORT', 'ARTIST', 'ARTISTSORT',
    11      'COMPOSER', 'COMPOSERSORT', 'COVERART', 'DATE',
    12      'GENRE', 'TITLE', 'TITLESORT', 'TRACKNUMBER',
    13      'TRACKTOTAL', 'VENDOR', 'YEAR'] as LinkedHashSet
       
    14  // Print the CSV file header
       
    15  print "artistDir|albumDir|contentFile"
    16  print "|${wantedFieldIdSet*.toLowerCase().join('|')}"
    17  println "|other tags"
       
    18  // Iterate over each directory in the music libary directory
    19  // These are assumed to be artist directories
       
    20  new File(musicLibraryDirName).eachDir { artistDir ->
       
    21      // Iterate over each directory in the artist directory
    22      // These are assumed to be album directories
       
    23      artistDir.eachDir { albumDir ->
       
    24          // Iterate over each file in the album directory
    25          // These are assumed to be content or related
    26          // (cover.jpg, PDFs with liner notes etc)
       
    27          albumDir.eachFile { contentFile ->
       
    28              // Initialize the counter map for tags we like
    29              // and the list for unwanted tags
       
    30              def fieldKeyCounters = wantedFieldIdSet.collectEntries { e ->
    31                  [(e): 0]
    32              }
    33              def unwantedFieldIds = []
       
    34              // Analyze the file and print the analysis
       
    35              if (contentFile.name ==~ /.*\.(flac|mp3|ogg)/) {
    36                  def af = AudioFileIO.read(contentFile)
    37                  af.tag.fields.each { tagField ->
    38                      if (tagField.id in wantedFieldIdSet)
    39                          fieldKeyCounters[tagField.id]++
    40                      else
    41                          unwantedFieldIds << tagField.id
    42                  }
    43                  print "${artistDir.name}|${albumDir.name}|${contentFile.name}"
    44                  wantedFieldIdSet.each { fieldId ->
    45                      print "|${fieldKeyCounters[fieldId]}"
    46                  }
    47                  println "|${unwantedFieldIds.join(',')}"
    48              }
       
    49          }
    50      }
    51  }

 

Line 1 is one of those awesomely lovely Groovy facilities that simplify life enormously. It turns out that the kind developer of JAudiotagger makes a compiled version available on the Maven central repository. In Java, this requires some XML ceremony and configuration. Using Groovy, I just use the @Grab annotation, and Groovy handles the rest behind the scenes.

Line 2 imports the relevant class files from the JAudiotagger library.

Lines 3-4 configure the JAudiotagger library to turn off logging. In my own experiments, the default level is quite verbose and the output of any script using JAudiotagger is filled with logging information. This works well because Groovy builds the script into a static main class. I'm sure I'm not the only one who has configured the logger in some instance method only to see the configuration garbage collected after the instance method returns.

Lines 5-6 are from the framework introduced in Part 1.

Lines 7-13 create a LinkedHashSet containing the list of tags that I hope will be in each file (or, at least, I'm OK with having in each file). I use a LinkedHashSet here so that the tags are ordered.

This is a good time to point out a discrepancy in the terminology I've been using up until now and the class definitions in the JAudiotagger library. What I have been calling "tags" are what JAudiotagger calls org.jaudiotagger.tag.TagField instances. These instances live within an instance of org.jaudiotagger.tag.Tag. So the "tag" from JAudiotagger's point of view is the collection of "tag fields". I'm going to follow their naming convention for the rest of this article.

This collection of strings reflects a bit of prior digging with metaflac. Finally, it's worth mentioning that JAudiotagger's org.jaudiotagger.tag.FieldKey uses "_" to separate words in the field keys, which seems incompatible with the strings returned by org.jaudiotagger.tag.Tag.getFields(), so I don’t use FieldKey.

Lines 14-17 print the CSV file header. Note the use of Groovy's *. spread operator to apply toLowerCase() to each (upper case) string element of wantedFieldIdSet.

Lines 18-27 are from the framework introduced in Part 1, descending into the sub-sub-directories where the music files are found.

Lines 28-32 initialize a map of counters for the desired fields. I use counters here because some tag fields can occur more than once in a given file. Note the use of wantedFieldIdSet.collectEntries to build a map using the set elements as keys (the key value e is in parentheses, as it must be evaluated). I explain this in more detail in this article about maps in Groovy.

Line 33 initializes a list for accumulating unwanted tag field IDs.

Lines 34-48 analyzes any FLAC, MP3 or OGG music files found:

  • Line 35 uses the Groovy match operator ==~ and a "slashy" regular expression to check file name patterns;
  • Line 36 reads the music file metadata using org.jaudiotagger.AudioFileIO.read() into the variable af
  • Line 37-48 loops over the tag fields found in the metadata:
    • Line 37 uses Groovy's each() method to iterate over the list of tag fields returned by af.tag.getFields(), which in Groovy can be abbreviated to af.tag.fields
    • Line 38-39 counts any occurrence of a wanted tag field ID
    • Line 40-41 appends an occurrence of an unwanted tag field ID to the unwanted list
    • Line 43-47 prints out the counts and unwanted fields (if any)

That's it!

Typically, I would run this as follows:

$ groovy TagAnalyzer2.groovy > tagAnalysis2.csv
$

And then I load the resulting CSV into a spreadsheet. For example, with LibreOffice Calc, I go to the Sheet menu and select Insert sheet from file. I set the delimiter character to |. In my case, the results look like this:

Image by:

(Chris Hermansen, CC BY-SA 4.0)

I like to have the ALBUMARTIST defined as well as the ARTIST for some music players so that the files in an album are grouped together when artists on individual tracks vary. This happens in compilation albums, but also in some albums with guest artists where the ARTIST field might say for example "Tony Bennett and Snoop Dogg" (I made that up. I think.) Lines 22 and onward in the spreadsheet shown above don't specify the album artist, so I might want to fix that going forward.

Here is what the last column showing unwanted field ids looks like:

Image by:

(Chris Hermansen, CC BY-SA 4.0)

Note that these tags may be of some interest and so the "wanted" list is modified to include them. I would set up some kind of script to delete field IDs BPM, ARTWORKGUID, CATALOGUENUMBER, ISRC and PUBLISHER.

Next steps

In the next article, I'll step back from tracks and check for cover.jpg and other non-music files lying around in artist subdirectories and album sub-subdirectories.

Here's how I use the JAudiotagger library with a Groovy script I created to analyze my music files.

Image by:

Opensource.com

Java Audio and music What to read next How I analyze my music directory with Groovy This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

How I analyze my music directory with Groovy

Fri, 08/26/2022 - 15:00
How I analyze my music directory with Groovy Chris Hermansen Fri, 08/26/2022 - 03:00 1 reader likes this 1 reader likes this

Lately, I’ve been looking at how Groovy streamlines the slight clunkiness of Java. In this article, I begin a short series to demonstrate Groovy scripting by creating a tool to analyze my music directory.

In this article, I demonstrate how the groovy.File class extends and streamlines java.File and simplifies its use. This provides a framework for looking at the contents of a music folder to ensure that expected content (for example, a cover.jpg file) is in place. I use the JAudiotagger library to analyze the tags of any music files.

Install Java and Groovy

Groovy is based on Java and requires a Java installation. Both a recent and decent version of Java and Groovy might be in your Linux distribution's repositories. Groovy can also be installed directly from the Apache Foundation website. A nice alternative for Linux users is SDKMan, which can be used to get multiple versions of Java, Groovy, and many other related tools. For this article, I use SDK's releases of:

  • Java: version 11.0.12-open of OpenJDK 11
  • Groovy: version 3.0.8

More on Java What is enterprise Java programming? Red Hat build of OpenJDK Java cheat sheet Free online course: Developing cloud-native applications with microservices Fresh Java articles Music metadata

Lately, I've consolidated my music consumption options. I've settled on using the excellent open source Cantata music player, which is a front end for the open source MPD music player daemon. All my computers have their music stored in the /var/lib/mpd/music directory. In that music directory are artist subdirectories, and in each artist subdirectory are album sub-subdirectories containing the music files, a cover.jpg, and occasionally PDFs of the liner notes.

Almost all of my music files are in FLAC format, with a few in MP3 and maybe a small handful in OGG. One reason I chose the JAudiotagger library is because it handles the different tag formats transparently. Of course, JAudiotagger is open source!

So what's the point of looking at audio tags? In my experience, audio tags are extremely poorly managed. The word "careless" comes to mind. But that may be as much a recognition of my own pedantic tendencies as real problems in the tags themselves. In any case, this is a non-trivial problem that can be solved with the use of Groovy and JAudiotagger. It's not only applicable to music collections, though. Many other real-world problems include the need to descend a directory tree in a filesystem to do something with the contents found there.

Using the Groovy script

Here's the basic code required for this task. I've incorporated comments in the script that reflect the (relatively abbreviated) "comment notes" I typically leave for myself:

1 // Define the music libary directory 2 def musicLibraryDirName = '/var/lib/mpd/music' 3 // Print the CSV file header 4 println "artistDir|albumDir|contentFile" 5 // Iterate over each directory in the music libary directory 6 // These are assumed to be artist directories 7 new File(musicLibraryDirName).eachDir { artistDir -> 8 // Iterate over each directory in the artist directory 9 // These are assumed to be album directories 10 artistDir.eachDir { albumDir -> 11 // Iterate over each file in the album directory 12 // These are assumed to be content or related 13 // (cover.jpg, PDFs with liner notes etc) 14 albumDir.eachFile { contentFile -> 15 println "$artistDir.name|$albumDir.name|$contentFile.name" 16 } 17 } 18 }

As noted above, I'm using groovy.File to move around the directory tree. Specifically:

Line 7 creates a new groovy.File object and calls groovy.File.eachDir() on it, with the code between the { on line 7 and the closing } on line 18 being a groovy.Closure argument to eachDir().

What this means is that eachDir() executes that code for each subdirectory found in the directory. This is similar to a Java lambda (also called an "anonymous function"). The Groovy closure doesn't restrict access to the calling environment in the way lambda does (in recent versions of Groovy, you can use Java lambdas if you want to). As noted above, subdirectories within the music library directory are expected to be artist directories (for example, "Iron Butterfly" or "Giacomo Puccini") so the artistDir is the argument passed by eachDir() to the closure.

Line 10 calls eachDir() on each artistDir, with the code between the { on line 10 and the } on line 17 forming another closure which processes the albumDir.

Line 14, calls eachFile() on each albumDir, with the code between the { on line 14 and the } on line 16 forming the third-level closure that processes the contents of the album.

For the scope of this article, the only thing I need to do with each file is begin to build the table of information, which I'm creating as a bar-delimited CSV file that can be imported into LibreOffice or OnlyOffice, or any other spreadsheet. Right now, the code writes out the first three columns: artist directory name, album directory name, and content file name (also, line 2 writes out the CSV header line).

Running this on my Linux laptop produces the following output:

$ groovy TagAnalyzer.groovy | head
artistDir|albumDir|contentFile
Habib Koite & Bamada|Afriki|02 - Ntesse.flac
Habib Koite & Bamada|Afriki|08 - NTeri.flac
Habib Koite & Bamada|Afriki|01 - Namania.flac
Habib Koite & Bamada|Afriki|07 - Barra.flac
Habib Koite & Bamada|Afriki|playlist.m3u
Habib Koite & Bamada|Afriki|04 - Fimani.flac
Habib Koite & Bamada|Afriki|10 - Massake.flac
Habib Koite & Bamada|Afriki|11 - Titati.flac
Habib Koite & Bamada|Afriki|03 – Africa.flac
[...]
Richard Crandell|Spring Steel|04-Japanese Lullaby [Richard Crandell].flac
Richard Crandell|Spring Steel|Spring Steel.pdf
Richard Crandell|Spring Steel|03-Zen Dagger [Richard Crandell].flac
Richard Crandell|Spring Steel|cover.jpg
$

In terms of performance:

$ time groovy TagAnalyzer.groovy | wc -l
9870

real        0m1.482s
user        0m4.392s
sys        0m0.230s
$

Nice and quick. It processes nearly 10,000 files in a second and a half! Plenty fast enough for me. Respectable performance, compact and readable code—what's not to like?

In my next article, I crack open the JAudiotagger interface and look at the tags in each file.

To simplify Java's clunkiness, I made a Groovy tool to analyze my music directory.

Image by:

WOCinTech Chat. Modified by Opensource.com. CC BY-SA 4.0

Java Audio and music What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

My open source journey from user to contributor to CTO

Fri, 08/26/2022 - 15:00
My open source journey from user to contributor to CTO Jesse White Fri, 08/26/2022 - 03:00 1 reader likes this 1 reader likes this

When people ask me what I love most about open source, my answer is simple: It's the openness. With open source, the work that community developers and contributors do is in the public domain for all to see and benefit from. I couldn't love that philosophy more.

How many people can say that about the fruits of their labor? How many, perhaps 50 years from now, can look back and say, "Check out the code I wrote that day that hundreds/thousands/tens of thousands benefited from." I find that infinitely more exciting than working on software that's hidden from most of the world.

I'm fortunate that my job puts me in the middle of an interesting area where open source and enterprise meet. Today, I'm Chief Technology Officer of The OpenNMS Group, the company that maintains the OpenNMS project. OpenNMS is a leading open source network monitoring and management platform.

While my current role has me firmly rooted in open source, I started as a user and contributor.

In 2007, I got my first real tech job as a network analyst at Datavalet Technologies, a Montreal, Canada-based telecommunications service provider. Within five years, I expanded to a solutions architect role, where I was tasked with helping to select a network management solution for the organization. We chose OpenNMS, and it was through that experience that I realized the true power of open source.

While onboarding the platform, we identified some missing features that would help optimize our experience. A representative from The OpenNMS Group was on site to help us with the deployment and suggested I attend the community's upcoming DevJam to work with the core developers on building the capabilities that we needed.

During that DevJam, I quickly settled in alongside the team and community. We rolled up our sleeves and started coding to create the enhancements Datavalet needed. Within days, the additional features were ready. It was amazing and transformative—this experience really opened my eyes to the power of open source.

I left my job a year later to study math full-time at Concordia University. It was there that I once again had the opportunity to collaborate with The OpenNMS Group, this time on a project for that year's Google Summer of Code. In this annual program, participants aim to successfully complete open source software development projects.

Summer of Code turned out to be a career-changing experience for me—two of the organization's leaders attended our project demo, and a year later, The OpenNMS Group team asked me to come on board as a full-stack developer.

More open source career advice Open source cheat sheets Linux starter kit for developers 7 questions sysadmins should ask a potential employer before taking a job Resources for IT artchitects Cheat sheet: IT job interviews

I worked hard, quickly rose through the ranks, and was named CTO in 2015. I consider this a personal achievement and another validation of what makes the open source world so special—if you enjoy working with the community and love what you do, your contributions are quickly recognized.

The open source ethos also informed my evolution from individual contributor to CTO, where I now lead a product development organization of more than 50 people. The community is inherently egalitarian, and my experience working with community contributors has taught me to lead with context rather than control.

I've had an amazing open source ride, from user to contributor to an executive at an open source company. The open source approach goes beyond the tech, as the barriers to entry and growth often found in proprietary development environments can be overcome through collaboration, transparency, and community. For that reason, the possibilities are endless for anyone thinking about a career in open source. I'm proof of that.

We live in a time when people are deeply examining their lives and the impact they have on the world. Working in an open source company is especially rewarding because I can interact directly with and influence the user community. The typical guardrails between the end user and developer are broken down, and I can see exactly how my work can change someone's daily life or inspire someone to contribute to a project. Building community through a mutual love for a project creates connections that can last a lifetime.

I know this has all been true for me, and it's why I am so passionate about my work. I'm an open source geek to the core and proud of it.

The possibilities are endless for anyone thinking about a career in open source. Here's my story.

Image by:

opensource.com

Careers What to read next Our journey to open source during Google Summer of Code This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Pages