Open-source News

How to display the presence and absence of nth-highest group-wise values in SQL

opensource.com - Fri, 09/02/2022 - 15:00
How to display the presence and absence of nth-highest group-wise values in SQL Mohammed Kamil Khan Fri, 09/02/2022 - 03:00 Register or Login to like Register or Login to like

While skimming through SQL to prepare for interviews, I often come across this question: Find the employee with the highest or (second-highest) salary by joining a table containing employee information with another that contains department information. This raises a further question: What about finding the employee who earns the nth-highest salary department-wide?

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources

Now I want to pose a more complex scenario: What will happen when a department doesn't have an employee earning the nth-highest salary? For example, a department with only two employees will not have an employee earning the third-highest salary.

Here's my approach to this question:

Create department and employee tables

I create a table that includes fields such as dept_id and dept_name.

CREATE TABLE department (
    dept_id INT,
    dept_name VARCHAR(60)
);

Now I insert various departments into the new table.

INSERT INTO department (dept_id,dept_name)
VALUES (780,'HR');
INSERT INTO department (dept_id,dept_name)
VALUES (781,'Marketing');
INSERT INTO department (dept_id,dept_name)
VALUES (782,'Sales');
INSERT INTO department (dept_id,dept_name)
VALUES (783,'Web Dev'); Image by:

Figure 1. The department table (Mohammed Kamil Khan, CC BY-SA 4.0)

Next, I create another table incorporating the fields first_name, last_name, dept_id, and salary.

CREATE TABLE employee (
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    dept_id INT,
    salary INT
);

Then I insert values into the table:

INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Sam','Burton',781,80000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Peter','Mellark',780,90000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Happy','Hogan',782,110000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Steve','Palmer',782,120000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Christopher','Walker',783,140000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Richard','Freeman',781,85000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Alex','Wilson',782,115000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Harry','Simmons',781,90000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Thomas','Henderson',780,95000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Ronald','Thompson',783,130000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('James','Martin',783,135000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Laurent','Fisher',780,100000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Tom','Brooks',780,85000);
INSERT INTO employee (first_name,last_name,dept_id,salary)
VALUES ('Tom','Bennington',783,140000); Image by:

Figure 2. A table of employees ordered by department ID (Mohammed Kamil Khan, CC BY-SA 4.0)

I can infer the number of employees in each department using this table (department ID:number of employees):

  • 780:4
  • 781:3
  • 782:3
  • 783:4

If I want the view the second-highest-earning employees from different departments, along with their department's name (using DENSE_RANK), the table will be as follows:

Image by:

Figure 3. The second-highest-earning employee in each department (Mohammed Kamil Khan, CC BY-SA 4.0)

If I apply the same query to find the fourth-highest-earning employees, the output will be only Tom Brooks of department 780 (HR), with a salary of $85,000.

Image by:

Figure 4. The fourth-highest-earning employee (Mohammed Kamil Khan, CC BY-SA 4.0)

Though department 783 (Web Dev) has four employees, two (James Martin and Ronald Thompson) will be classified as the third-highest-earning employees of that department, since the top two earners have the same salary.

Finding the nth highest

Now, to the main question: What if I want to display the dept_ID and dept_name with null values for employee-related fields for departments that do not have an nth-highest-earning employee?

Image by:

Figure 5. All departments listed, whether or not they have an nth-highest-earning employee (Mohammed Kamil Khan, CC BY-SA 4.0)

The table displayed in Figure 5 is what I am aiming to obtain when specific departments do not have an nth-highest-earning employee: The marketing, sales, and web dev departments are listed, but the name and salary fields contain a null value.

The ultimate query that helps obtain the table in Figure 5 is as follows:

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
FROM (SELECT * FROM (
SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
FROM employee INNER JOIN department
ON employee.dept_id=department.dept_id) AS k
WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * FROM null1
UNION
SELECT * FROM nulled
ORDER BY dept_id)
B;Breakdown of the query

I will break down the query to make it less overwhelming.

Use DENSE_RANK() to display employee and department information (not involving null for the absence of the nth-highest-earning member):

SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4

Output:

Image by:

Figure 6. The fourth-highest earner (Mohammed Kamil Khan, CC BY-SA 4.0)

Exclude the rank1 column from the table in Figure 6, which identifies only one employee with a fourth-highest salary, even though there are four employees in another department.

SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    FROM (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A

Output:

Image by:

Figure 7. The fourth-highest earner table without the rank 1 column (Mohammed Kamil Khan, CC BY-SA 4.0)

Point out the departments from the department table that do not have an nth-highest-earning employee:

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    FROM (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL))
SELECT * FROM full1)B

Output:

Image by:

Figure 8. The full1 table listing the departments without a fourth-highest earner (Mohammed Kamil Khan, CC BY-SA 4.0)

Replace full1 in the last line of the above code with null1:

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    FROM (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL))
SELECT * FROM null1)B Image by:

Figure 9. The null1 table listing all departments, with null values for those without a fourth-highest earner (Mohammed Kamil Khan, CC BY-SA 4.0)

Now, I fill the null values of the dept_id and dept_name fields in Figure 9 with the corresponding values from Figure 8.

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
    FROM (SELECT * FROM (
  SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
   employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
   FROM employee INNER JOIN department
   ON employee.dept_id=department.dept_id) AS k
   WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * FROM nulled) B; Image by:

Figure 10. The result of the nulled query (Mohammed Kamil Khan, CC BY-SA 4.0)

The nulled query uses CASE WHEN on the nulls encountered in the dept_id and dept_name columns of the null1 table and replaces them with the corresponding values in the full1 table. Now all I need to do is apply UNION to the tables obtained in Figure 7 and Figure 10. This can be accomplished by declaring the last query in the previous code using WITH and then UNION-izing it with null1.

SELECT * FROM (WITH null1 AS (SELECT A.dept_id, A.dept_name, A.first_name, A.last_name, A.salary
FROM (SELECT * FROM (
SELECT department.dept_id, department.dept_name, employee.first_name, employee.last_name,
employee.salary, DENSE_RANK() OVER (PARTITION BY employee.dept_id ORDER BY employee.salary DESC) AS Rank1
FROM employee INNER JOIN department
ON employee.dept_id=department.dept_id) AS k
WHERE rank1=4)A),
full1 AS (SELECT dept_id, dept_name FROM department WHERE dept_id NOT IN (SELECT dept_id FROM null1 WHERE dept_id IS NOT NULL)),
nulled AS(SELECT
CASE WHEN null1.dept_id IS NULL THEN full1.dept_id ELSE null1.dept_id END,
CASE WHEN null1.dept_name IS NULL THEN full1.dept_name ELSE null1.dept_name END,
first_name,last_name,salary
FROM null1 RIGHT JOIN full1 ON null1.dept_id=full1.dept_id)
SELECT * FROM null1
UNION
SELECT * FROM nulled
ORDER BY dept_id)
B; Image by:

Figure 11. The final result (Mohammed Kamil Khan, CC BY-SA 4.0)

Now I can infer from Figure 11 that marketing, sales, and web dev are the departments that do not have any employees earning the fourth-highest salary.

A step-by-step breakdown of the query.

Databases What to read next A hands-on tutorial of SQLite3 Improve your database knowledge with this MariaDB and MySQL cheat sheet This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Steam On Linux Marketshare Crawls Up To 1.27% For August 2022

Phoronix - Fri, 09/02/2022 - 08:40
Valve just published the Steam Survey results for August 2022 that show a slight increase to Linux gaming as part of the overall marketshare on a percentage basis...

Ubuntu 20.04.5 LTS Released With Hardware Enablement Stack Backported From 22.04

Phoronix - Fri, 09/02/2022 - 02:49
For those planning to stick to the Ubuntu 20.04 "Focal Fossa" Long-Term Support series still for some time before moving to the newer Ubuntu 22.04 LTS series, Ubuntu 20.04.5 LTS is available today as the newest point release in that older series...

Addressing Cybersecurity Challenges in Open Source Software: What you need to know

The Linux Foundation - Fri, 09/02/2022 - 01:16

by Ashwin Ramaswami

June 2022 saw the publication of Addressing Cybersecurity Challenges in Open Source Software, a joint research initiative launched by the Open Source Security Foundation in collaboration with Linux Foundation Research and Snyk. The research dives into security concerns in the open source ecosystem. If you haven’t read it, this article will give you the report’s who, what, and why, summarizing its key takeaways so that it can be relevant to you or your organization.

Who is the report for?

This report is for everyone whose work touches open source software. Whether you’re a user of open source, an OSS developer, or part of an OSS-related institution or foundation, you can benefit from a better understanding of the state of security in the ecosystem.

Open source consumers and users: It’s very likely that you rely on open source software as dependencies if you develop software. And if you do, one important consideration is the security of the software supply chain. Security incidents such as log4shell have shown how open source supply chain security touches nearly every industry. Even industries and organizations that have traditionally not focused on open source software now realize the importance of ensuring their OSS dependencies are secure. Understanding the state of OSS security can help you to manage your dependencies intelligently, choose them wisely, and keep them up to date.

Open source developers and maintainers: People and organizations that develop or maintain open source software need to ensure they use best practices and policies for security. For example, it can be valuable for large organizations to have open source security policies. Moreover, many OSS developers also use other open source software as dependencies, making understanding the OSS security landscape even more valuable. Developers have a unique role to play in leading the creation of high-quality code and the respective governance frameworks and best practices around it.

Institutions: Institutions such as open source foundations, funders, and policymaking groups can benefit from this report by understanding and implementing the key findings of the research and their respective roles in improving the current state of the OSS ecosystem. Funding and support can only go to the right areas if priorities are informed by the problems the community is facing now, which the research assists in identifying.

What are the major takeaways?

The data from this report was collected by conducting a worldwide survey of:

  • Individuals who contribute to, use, or administer OSS;
  • Maintainers, core contributors, and occasional contributors to OSS;
  • Developers of proprietary software who use OSS; and
  • Individuals with a strong focus on software supply chain security

The survey also included data collected from several major package ecosystems by using Snyk Open Source, a static code analysis (SCA) tool free to use for individuals and open source maintainers.

Here are the major takeaways and recommendations from the report:

  • Too many organizations are not prepared to address OSS security needs: At least 34% of organizations did not have an OSS security policy in place, suggesting these organizations may not be prepared to address OSS security needs.
  • Small organizations must prioritize developing an OSS security policy: Small organizations are significantly less likely to have an OSS security policy. Such organizations should prioritize developing this policy and having a CISO and OSPO (Open Source Program Office).
  • Using additional security tools is a leading way to improve OSS security: Security tooling is available for open source security across the software development lifecycle. Moreover, organizations with an OSS security policy have a higher frequency of security tool use than those without an OSS security policy.
  • Collaborate with vendors to create more intelligent security tools: Organizations consider that one of the most important ways to improve OSS security across the supply chain is adding greater intelligence to existing software security tools, making it easier to integrate OSS security into existing workflows and build systems.
  • Implementing best practices for secure software development is the other leading way to improve OSS security: Understanding best practices for secure software development, through courses such as the OpenSSF’s Secure Software Development Fundamentals Courses, has been identified repeatedly as a leading way to improve OSS supply chain security.
  • Use automation to reduce your attack surface: Infrastructure as Code (IaC) tools and scanners allow automating CI/CD activities to eliminate threat vectors around manual deployments.
  • Consumers of open source software should give back to the communities that support them: The use of open source software has often been a one-way street where users see significant benefits with minimal cost or investment. For larger open source projects to meet user expectations, organizations must give back and close the loop by financially supporting OSS projects they use.
Why is this important now?

Open source software is a boon: its collaborative and open nature has allowed society to benefit from various innovative, reliable, and free software tools. However, these benefits only last when users contribute back to open source software and when users and developers exercise due diligence around security. While the most successful open source projects have gotten such support, other projects have not – even as open source use has continued to be more ubiquitous.

Thus, it is more important than ever to be aware of the problems and issues everyone faces in the OSS ecosystem. Some organizations and open source maintainers have strong policies and procedures for handling these issues. But, as this report shows, other organizations are just facing these issues now.

Finally, we’ve seen the risks of not maintaining proper security practices around OSS dependencies. Failure to update open source dependencies has led to costs as high as $425 million. Given these risks, a little investment in strong security practices and awareness around open source – as outlined in the report’s recommendations – can go a long way.

We suggest you read the report – then see how you or your organization can take the next step to keep yourself secure!

Download Report

The post Addressing Cybersecurity Challenges in Open Source Software: What you need to know appeared first on Linux Foundation.

AMD Releases Orochi 1.0 For Dynamic Runtime Switching Between Radeon HIP & NVIDIA CUDA

Phoronix - Fri, 09/02/2022 - 00:12
Earlier this year AMD's GPUOpen team announced the Orochi project for dynamic HIP/CUDA run-time handling. Orochi makes it easier for application developers to ship AMD HIP and NVIDIA CUDA support within a single code-base / binary that is then selected at run-time based on the GPU in use...

Pages