Subscribe to feed
Updated: 33 min 45 sec ago

What's new in Apache ShardingSphere 5.3.0?

Tue, 01/17/2023 - 16:00
What's new in Apache ShardingSphere 5.3.0? y2so Tue, 01/17/2023 - 03:00

After 1.5 months in development, Apache ShardingSphere 5.3.0 has been released. Our community merged 687 PRs from contributors around the world.

The new release has been improved in terms of features, performance, testing, documentation, examples, etc.

The 5.3.0 release brings the following highlights:

  • Support fuzzy query for CipherColumn.
  • Support Datasource-level heterogeneous database.
  • Support checkpoint resume for data consistency check.
  • Automatically start a distributed transaction while executing DML statements across multiple shards.

Additionally, release 5.3.0 also brings the following adjustments:

  • Remove the Spring configuration.
  • Systematically refactor the DistSQL syntax.
  • Refactor the configuration format of ShardingSphere-Proxy.
4 highlights of the Apache ShardingSphere release 1. Support fuzzy query for CipherColumn

In previous versions, ShardingSphere's Encrypt feature didn't support using the LIKE operator in SQL.

For a while, users strongly requested adding the LIKE operator to the Encrypt feature. Usually, encrypted fields are mainly of the string type, and it is a common practice for the string to execute LIKE.

To minimize friction in accessing the Encrypt feature, our community has initiated a discussion about the implementation of encrypted LIKE.

Since then, we've received a lot of feedback.

Some community members even contributed their original encryption algorithm implementation supporting fuzzy queries after fully investigating conventional solutions.

  • The relevant issue can be found here.
  • For the algorithm design, please refer to the attachment within the issue.

The [single-character abstract algorithm] contributed by the community members is implemented as CHAR_DIGEST_LIKE in the ShardingSphere encryption algorithm SPI.

2. Support datasource-level heterogeneous database

ShardingSphere supports a database gateway, but its heterogeneous capability is limited to the logical database in previous versions. This means that all the data sources under a logical database must be of the same database type.

This new release supports datasource-level heterogeneous databases at the kernel level. This means the datasources under a logical database can be different database types, allowing you to use various databases to store data.

Combined with ShardingSphere's SQL dialect conversion capability, this new feature significantly enhances ShardingSphere's heterogeneous data gateway capability.

3. Data migration: support checkpoint resume for data consistency check

Data consistency checks happen at the later stage of data migration.

Previously, the data consistency check was triggered and stopped by DistSQL (Distributed SQL). If a large amount of data was migrated and the data consistency check was stopped for any reason, the check would've had to be started again—which is sub-optimal and affects user experience.

ShardingSphere 5.3.0 now supports checkpoint storage, which means data consistency checks can be resumed from the checkpoint.

For example, if data is being verified during data migration and the user stops the verification for some reason, with the verification progress (finished_percentage) being 5%, then:

mysql> STOP MIGRATION CHECK 'j0101395cd93b2cfc189f29958b8a0342e882'; Query OK, 0 rows affected (0.12 sec) mysql> SHOW MIGRATION CHECK STATUS 'j0101395cd93b2cfc189f29958b8a0342e882'; +--------+--------+---------------------+-------------------+-------------------------+-------------------------+------------------+---------------+ | tables | result | finished_percentage | remaining_seconds | check_begin_time | check_end_time | duration_seconds | error_message | +--------+--------+---------------------+-------------------+-------------------------+-------------------------+------------------+---------------+ | sbtest | false | 5 | 324 | 2022-11-10 19:27:15.919 | 2022-11-10 19:27:35.358 | 19 | | +--------+--------+---------------------+-------------------+-------------------------+-------------------------+------------------+---------------+ 1 row in set (0.02 sec)

In this case, the user restarts the data verification. But the work does not have to restart from the beginning. The verification progress (finished_percentage) remains at 5%.

mysql> START MIGRATION CHECK 'j0101395cd93b2cfc189f29958b8a0342e882'; Query OK, 0 rows affected (0.35 sec) mysql> SHOW MIGRATION CHECK STATUS 'j0101395cd93b2cfc189f29958b8a0342e882'; +--------+--------+---------------------+-------------------+-------------------------+----------------+------------------+---------------+ | tables | result | finished_percentage | remaining_seconds | check_begin_time | check_end_time | duration_seconds | error_message | +--------+--------+---------------------+-------------------+-------------------------+----------------+------------------+---------------+ | sbtest | false | 5 | 20 | 2022-11-10 19:28:49.422 | | 1 | | +--------+--------+---------------------+-------------------+-------------------------+----------------+------------------+---------------+ 1 row in set (0.02 sec)

Limitation: this new feature is unavailable with the CRC32_MATCH algorithm because the algorithm calculates all data at once.

4. Automatically start a distributed transaction while executing DML statements across multiple shards

Previously, even with XA and other distributed transactions configured, ShardingSphere could not guarantee the atomicity of DML statements that are routed to multiple shards—if users didn't manually enable the transaction.

Take the following SQL as an example:

insert into account(id, balance, transaction_id) values (1, 1, 1),(2, 2, 2),(3, 3, 3),(4, 4, 4), (5, 5, 5),(6, 6, 6),(7, 7, 7),(8, 8, 8);

When this SQL is sharded according to id mod 2, the ShardingSphere kernel layer will automatically split it into the following two SQLs and route them to different shards for execution:

insert into account(id, balance, transaction_id) values (1, 1, 1),(3, 3, 3),(5, 5, 5),(7, 7, 7); insert into account(id, balance, transaction_id) values (2, 2, 2),(4, 4, 4),(6, 6, 6),(8, 8, 8);

If the user does not manually start the transaction, and one of the sharded SQL fails to execute, the atomicity cannot be guaranteed because the successful operation cannot be rolled back.

ShardingSphere 5.3.0 is optimized in terms of distributed transactions. If distributed transactions are configured in ShardingSphere, they can be automatically started when DML statements are routed to multiple shards. This way, we can ensure atomicity when executing DML statements.

3 improvements made in Apache ShardingSphere 1. Remove Spring configuration

In earlier versions, ShardingSphere-JDBC provided services in the format of DataSource. If you wanted to introduce ShardingSphere-JDBC without modifying the code in the Spring/Spring Boot project, you needed to use modules such as Spring/Spring Boot Starter provided by ShardingSphere.

Although ShardingSphere supports multiple configuration formats, it also has the following problems:

  1. When API changes, many config files need to be adjusted, which is a heavy workload.
  2. The community has to maintain multiple config files.
  3. The lifecycle management of Spring bean is susceptible to other dependencies of the project, such as PostProcessor failure.
  4. Spring Boot Starter and Spring NameSpace are affected by Spring, and their configuration styles are different from YAML.
  5. Spring Boot Starter and Spring NameSpace are affected by the version of Spring. When users access them, the configuration may not be identified, and dependency conflicts may occur. For example, Spring Boot versions 1.x and 2.x have different configuration styles.

ShardingSphere 5.1.2 first supported the introduction of ShardingSphere-JDBC in the form of JDBC Driver. That means applications only need to configure the Driver provided by ShardingSphere at the JDBC URL before accessing ShardingSphere-JDBC.

Removing the Spring configuration simplifies and unifies the configuration mode of ShardingSphere. This adjustment not only simplifies the configuration of ShardingSphere when using different configuration modes but also reduces maintenance work for the ShardingSphere community.

More on data science What is data science? What is Python? How to become a data scientist Data scientist: A day in the life Use JupyterLab in the Red Hat OpenShift Data Science sandbox Whitepaper: Data-intensive intelligent applications in a hybrid cloud blueprint MariaDB and MySQL cheat sheet Latest data science articles 2. Systematically refactor the DistSQL syntax

One of the characteristics of Apache ShardingSphere is its flexible rule configuration and resource control capability.

DistSQL is ShardingSphere's SQL-like operating language. It's used the same way as standard SQL and is designed to provide incremental SQL operation capability.

ShardingSphere 5.3.0 systematically refactors DistSQL. The community redesigned the syntax, semantics, and operating procedure of DistSQL. The new version is more consistent with ShardingSphere's design philosophy and focuses on a better user experience.

Please refer to the latest ShardingSphere documentation for details. A DistSQL roadmap will be available soon, and you're welcome to leave your feedback.

3. Refactor the configuration format of ShardingSphere-Proxy

In this update, ShardingSphere-Proxy has adjusted the configuration format and reduced the config files required for startup.

server.yaml before refactoring:

rules: - !AUTHORITY users: - root@%:root - sharding@:sharding provider: type: ALL_PERMITTED - !TRANSACTION defaultType: XA providerType: Atomikos - !SQL_PARSER sqlCommentParseEnabled: true sqlStatementCache: initialCapacity: 2000 maximumSize: 65535 parseTreeCache: initialCapacity: 128 maximumSize: 1024

server.yaml after refactoring:

authority: users: - user: root@% password: root - user: sharding password: sharding privilege: type: ALL_PERMITTED transaction: defaultType: XA providerType: Atomikos sqlParser: sqlCommentParseEnabled: true sqlStatementCache: initialCapacity: 2000 maximumSize: 65535 parseTreeCache: initialCapacity: 128 maximumSize: 1024

In ShardingSphere 5.3.0, server.yaml is no longer required to start Proxy. If no config file is provided by default, Proxy provides the default account root/root.

ShardingSphere is completely committed to becoming cloud-native. Thanks to DistSQL, ShardingSphere-Proxy's config files can be further simplified, which is more friendly to container deployment.

Release Notes API Changes
  1. DistSQL: refactor syntax API; please refer to the user manual
  2. Proxy: change the configuration style of the global rule, remove the exclamation mark
  3. Proxy: allow zero-configuration startup, enable the default account root/root when there is no Authority configuration
  4. Proxy: remove the default logback.xml and use API initialization
  5. JDBC: remove the Spring configuration and use Driver + YAML configuration instead
  1. DistSQL: new syntax REFRESH DATABASE METADATA, refresh logic database metadata
  2. Kernel: support DistSQL REFRESH DATABASE METADATA to load configuration from the governance center and rebuild MetaDataContext
  3. Support PostgreSQL/openGauss setting transaction isolation level
  4. Scaling: increase inventory task progress update frequency
  5. Scaling: DATA_MATCH consistency check support checkpoint resume
  6. Scaling: support drop consistency check job via DistSQL
  7. Scaling: rename column from sharding_total_count to job_item_count in job list DistSQL response
  8. Scaling: add a sharding column in incremental task SQL to avoid broadcast routing
  9. Scaling: sharding column could be updated when generating SQL
  10. Scaling: improve column value reader for DATA_MATCH consistency check
  11. DistSQL: encrypt DistSQL syntax optimization, support like query algorithm
  12. DistSQL: add properties value check when REGISTER STORAGE UNIT
  13. DistSQL: remove useless algorithms at the same time when DROP RULE
  14. DistSQL: EXPORT DATABASE CONFIGURATION supports broadcast tables
  15. DistSQL: REGISTER STORAGE UNIT supports heterogeneous data sources
  16. Encrypt: support Encrypt LIKE feature
  17. Automatically start distributed transactions when executing DML statements across multiple shards
  18. Kernel: support client \d for PostgreSQL and openGauss
  19. Kernel: support select group by, order by statement when a column contains null values
  20. Kernel: support parse RETURNING clause of PostgreSQL/openGauss Insert
  21. Kernel: SQL HINT performance improvement
  22. Kernel: support mysql case when then statement parse
  23. Kernel: support data source level heterogeneous database gateway
  24. (Experimental) Sharding: add sharding cache plugin
  25. Proxy: support more PostgreSQL datetime formats
  26. Proxy: support MySQL COM_RESET_CONNECTION
  27. Scaling: improve MySQLBinlogEventType.valueOf to support unknown event type
  28. Kernel: support case when for federation
Bug Fix
  1. Scaling: fix barrier node created at job deletion
  2. Scaling: fix part of columns value might be ignored in DATA_MATCH consistency check
  3. Scaling: fix jdbc url parameters are not updated in consistency check
  4. Scaling: fix tables sharding algorithm type INLINE is case-sensitive
  5. Scaling: fix incremental task on MySQL require mysql system database permission
  6. Proxy: fix the NPE when executing select SQL without storage node
  7. Proxy: support DATABASE_PERMITTED permission verification in unicast scenarios
  8. Kernel: fix the wrong value of worker-id in show compute nodes
  9. Kernel: fix route error when the number of readable data sources and weight configurations of the Weight algorithm are not equal
  10. Kernel: fix multiple groups of readwrite-splitting refer to the same load balancer name, and the load balancer fails problem
  11. Kernel: fix can not disable and enable compute node problem
  12. JDBC: fix data source is closed in ShardingSphereDriver cluster mode when startup problem
  13. Kernel: fix wrong rewrite result when part of logical table name of the binding table is consistent with the actual table name, and some are inconsistent
  14. Kernel: fix startup exception when use SpringBoot without configuring rules
  15. Encrypt: fix null pointer exception when Encrypt value is null
  16. Kernel: fix oracle parsing does not support varchar2 specified type
  17. Kernel: fix serial flag judgment error within the transaction
  18. Kernel: fix cursor fetch error caused by wasNull change
  19. Kernel: fix alter transaction rule error when refresh metadata
  20. Encrypt: fix EncryptRule cast to TransparentRule exception that occurs when the call procedure statement is executed in the Encrypt scenario
  21. Encrypt: fix exception which caused by ExpressionProjection in shorthand projection
  22. Proxy: fix PostgreSQL Proxy int2 negative value decoding incorrect
  23. Proxy: PostgreSQL/openGauss support describe insert returning clause
  24. Proxy: fix gsql 3.0 may be stuck when connecting Proxy
  25. Proxy: fix parameters are missed when checking SQL in Proxy backend
  26. Proxy: enable MySQL Proxy to encode large packets
  27. Kernel: fix oracle parse comment without whitespace error
  28. DistSQL: fix show create table for encrypt table
  1. Scaling: reverse table name and column name when generating SQL if it's SQL keyword
  2. Scaling: improve incremental task failure handling
  3. Kernel: governance center node adjustment, unified hump to underscore
Links Community Contribution

This Apache ShardingSphere 5.3.0 release is the result of 687 merged PRs, committed by 49 contributors. Thank you for your efforts.

This article originally appeared on ShardingSphere 5.3.0 is released: new features and improvements and is republished with permission.

The latest release of Apache ShardingSphere includes improvements to features, performance, testing, documentation, examples, and more.

Image by:

Databases What to read next 5 new improvements in Apache ShardingSphere This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

How open source is addressing food sovereignty

Tue, 01/17/2023 - 16:00
How open source is addressing food sovereignty ffurtado Tue, 01/17/2023 - 03:00

Our food system is broken. As with so many systems of the 21st century, power is concentrated in the hands of very few companies, often geared toward exploiting people and the planet, under the premise of maximizing profit. Under such a mindset, feeding people is a secondary goal. When it comes to something as important as food, we can and should aim for more than this as a society. What if the goal became getting high-quality, nutritious, and ecologically regenerative produce from farms to plates of every person in the world?

Image by:

(Open Food Network UK, CC BY-SA 3.0)

We believe getting food to everyone is not a radical concept. But getting there requires radical thinking. It requires putting the people most involved in the food system (producers, eaters, and communities) at the heart of the food system. It means putting collaboration above intellectual property. It means creating a commonwealth of knowledge and resources. This is the approach we are taking at the Open Food Network (OFN).

Image by:

Open Food Network, CC BY-SA 4.0

What is the Open Food Network?

OFN was founded in Australia 10 years ago by two farmers. The launch was empowered by the capacity to create a network of small-scale farmers and shops dedicated to improving how food is sold and delivered to people.

Many initiatives already existed at that time (community-supported agriculture CSA, buying groups, local food stores), but none of them were working on the technology needed to leverage cooperation between the different actors on food supply chains. This was the main motivation and goal for the founders of OFN: to provide a web platform to help projects interact and collaborate, as a thriving network.

Image by:

(Open Food Network Australia, CC BY-SA 3.0)

Why use open source?

Joining a network often means having the capacity to access peer knowledge and resources. Creating OFN also meant combining the effort in maintaining infrastructure and tools. This was appealing to everyone who wanted to join, but needed to happen without compromising each actor's sovereignty in the process.

To illustrate, take a look at the following example.

If you decide to launch a local and organic tree farming business, there is a good chance you might need to work on improving the land you've just bought. In some cases this leads to years of hard work. During this time, it's hard to use the services of a young startup that can close their service early or decide to increase their fees. If this happens, you either have to change software or absorb the impact in your own business model. In the farming world, this can put many small businesses in jeopardy.

Yet, the solution to rely only on software solutions from established companies would actually recreate on the software side the same concentration of power you are trying to avoid on the food side.

That's why the OFN software platform was released. From the very start, it operated under the AGPL v3 license. This is a license that allows anyone to contribute or reuse, as long as they share their work under the same license again. This meant working on community guidelines on how contributions can be done. It also meant creating rules for how people work together. Yes, the local startup which offers OFN as a service can still fail and close its doors. But having a community-led open source software means you can find other companies or organizations to run and improve the software for you. You are not tied up with only one organization.

Image by:

(Open Food Network Australia, CC BY-SA 3.0)

However, free software does not prevent you from an unexpected rise in service fees (unless you're actually deploying your own server). That's why OFN operates on an open and transparent governance model. We use collaborative decision-making within our global network of communities. We also work collaboratively with farmers and food enterprises in each country to design software and resources that support their needs.

So, as a software user, you can join the governance of your local provider through membership or shareholding. There, you can share your voice before changes occur in the way the service is provided to you (price, quality, availability).

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources What is OFN doing today?

Ten years after its launch, OFN is a community spreading across three continents with 15 live production instances. More than 6,000 farmers and 700 local shops are using the software on a day-to-day basis.

This was made possible because passionate, dedicated people from all around the world are putting in the time, great ideas, and hard work to build and maintain the software platform. They also create the resources needed to build a better food system.

We work together so that we're all helping each other reach each goal we set, which means that we get more than we could possibly create on our own!

Image by:

(Open Food Network Australia, CC BY-SA 3.0)

To help keep this awesome community moving forward, we recently decided to launch a GitHub sponsor page. While we want each local provider of the software to become sustainable and contribute to the main software maintainability, we are not at this stage yet. We rely heavily on grants and other public or private funding.

Feel free to join our community forum or check out our GitHub repo to learn more about us!

Open Food Network (OFN) is a network of small-scale farmers and shops dedicated to improving how food is sold and delivered to people.

Image by:

Open Food Network, CC BY-SA, 4.0

Sustainability What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 35 points France

Rachel has over ten years’ experience in building and designing web platforms for social and ecological aims. She has joined the Open Food Network 4 years ago as a product lead and works alongside on various open source projects.
You can connect with me on Mastodon.

| Follow rachel_arn | Connect rachelarnould Open Enthusiast Author 31 points Waterfall Country, Wales

Lynne Davis is a product lead working on various open source projects driven by community-led social and ecological transformation, including:
Open Food Network:
Land Explorer:

| Follow linndav | Connect lynne-davis-76256940 Open Enthusiast Author Register or Login to post a comment.

Recover from an unsuccessful git rebase with the git reflog command

Mon, 01/16/2023 - 16:00
Recover from an unsuccessful git rebase with the git reflog command agantony Mon, 01/16/2023 - 03:00

The git rebase command allows you to adjust the history of your Git repository. It's a useful feature, but of course, mistakes can be made. As is usually the case with Git, you can repair your error and restore your repository to a former state. To recover from an unsuccessful rebase, use the git reflog command.

Git reflog

Suppose you perform this interactive rebase:

$ git rebase -i HEAD~20

In this context, ~20 means to rebase the last 20 commits.

Unfortunately, in this imaginary scenario, you mistakenly squashed or dropped some commits you didn't want to lose. You've already completed the rebase, but this is Git, so of course, you can recover your lost commits.

Review your history with reflog

 Run the git reflog command to collect data and view a history of your interactions with your repository. This is an example for my demonstration repository, however, the result will vary depending on your actual repository:

$ git reflog 222967b (HEAD -> main) HEAD@{0}: rebase (finish): returning to refs/heads/main 222967b (HEAD -> main) HEAD@{1}: rebase (squash): My big rebase c388f0c HEAD@{2}: rebase (squash): # This is a combination of 20 commits 56ee04d HEAD@{3}: rebase (start): checkout HEAD~20 0a0f875 HEAD@{4}: commit: An old good commit [...]Find the last good commit

In this example, HEAD@{3} represents the start of your rebase. You can tell because its description is rebase (start).

The commit just under it, 0a0f875 HEAD@{4}, is the tip of your Git branch before you executed your incorrect rebase. Depending on how old and active your repository is, there are likely more lines below this one, but assume this is the commit you want to restore.

More on Git What is Git? Git cheat sheet Markdown cheat sheet New Git articles Restore the commit

To recover the commit you accidentally squashed and all of its parent commits, including those accidentally squashed or dropped, use git checkout. In this example, HEAD@{4} is the commit you need to restore, so that's the one to check out:

$ git checkout HEAD@{4}

With your good commit restored, you can create a new branch using git checkout -b as usual. Replace with your desired branch name, such as test-branch.

Git version control

Git's purpose is to track versions, and its default setting is usually to preserve as much data about your work as feasible. Learning to use new Git commands makes many of its most powerful features available and safeguards your work.

The git rebase command allows you to adjust the history of your Git repository. It's a useful feature, but of course, mistakes can be made. Use the git reflog command to recover.

Image by:

kris krüg

Git What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

How to use the open source MQTT plug-in in JMeter

Mon, 01/16/2023 - 16:00
How to use the open source MQTT plug-in in JMeter chongyuanyin Mon, 01/16/2023 - 03:00

In a previous article, I described how JMeter has built-in support for HTTP, HTTPS, TCP, and other common protocols and has a plug-in extension mechanism.

Through plug-ins, you can support much more than just what's built-in, including MQTT.

MQTT is a mainstream protocol in the IoT world. Although it is not a protocol type that comes with JMeter, it is extremely common in IoT testing scenarios. In order to support the load testing of the MQTT protocol, EMQ developed a JMeter-based open source testing plug-in for the MQTT protocol.

This article introduces how to use the MQTT plug-in in JMeter.

Install the MQTT plug-in on JMeter

The installation of the MQTT plug-in is similar to other JMeter third-party plug-ins:

  1. Download the latest version of the plug-in mqtt-xmeter-2.0.2-jar-with-dependencies.jar from GitHub. The plug-in supports JMeter 3.2 and above.
  2. Copy the plug-in jar package to the plug-in directory of JMeter: $JMETER_HOME/lib/ext.
  3. Restart JMeter.

At the time of writing, the JMeter MQTT plug-in supports a variety of samplers, such as connection, message publish, and message subscription.

These can be combined to build more complex test scenarios.

MQTT connect sampler

The Connect Sampler simulates an IoT device and initiates an MQTT connection.

Image by:

(Chongyuan Yin, CC BY-SA 4.0)

Server name or IP: The address of the MQTT server being tested.

Port number: Taking the EMQX Broker as an example, the default ports are 1883 for TCP connections and 8883 for SSL connections. Please refer to the specific configuration of the server for the specific port.

MQTT version: Presently supports MQTT 3.1 and 3.1.1 versions.

Timeout: Connection timeout setting, in seconds.

Protocols: Supports TCP, SSL, WS, and WSS connections to MQTT servers. When selecting an SSL or WSS encrypted channel connection, a one-way or two-way authentication (Dual) can be selected. If two-way authentication is required, specify the appropriate client certificate (p12 certificate) and the corresponding file protection password (Secret).

User authentication: If the MQTT server is configured for user authentication, provide the corresponding Username and Password.

ClientId: The identity of the virtual user. If Add random suffix for ClientId is enabled, a UUID string is added as a suffix to each ClientId and the whole virtual user identifier.

Keep alive: The interval for sending heartbeat signals. For example, 300 means that the client sends ping requests to the server every 300 seconds to keep the connection active.

Connect attempt max: The maximum number of reconnection attempts during the first connection. If this number is exceeded, the connection is considered failed. If the user wants to keep trying to reconnect, set this to -1.

Reconnect attempt max: The maximum number of reconnect attempts during subsequent connections. If this number is exceeded, the connection is considered failed. If the user wants to keep trying to reconnect, set this to -1.

Clean session: Set this option to false when the user wants to keep the session state between connections or true when the user does not want to keep the session state in new connections.

MQTT message publish sampler (MQTT Pub Sampler)

The message publish sampler reuses the MQTT connection established in the Connection Sampler to publish messages to the target MQTT server.

Image by:

(Chongyuan Yin, CC BY-SA 4.0)

QoS Level: Quality of Service, with values 0, 1, and 2, representing AT_MOST_ONCE, AT_LEAST_ONCE, and EXACTLY_ONCE, respectively, in the MQTT protocol specification.

Retained messages: If you want to use retained messages, set this option to true to have the MQTT server store the retained messages published by the plug-in using the given QoS. When the subscription occurs on the corresponding topic, the last retained message is delivered directly to the subscriber. Therefore, the subscriber does not have to wait to get the latest status value of the publisher.

Topic name: The topic of the published message.

Add timestamp in payload: If enabled, the current timestamp is attached to the beginning of the published message body. Together with the Payload includes timestamp option of the message subscription sampler, this can calculate the delay time reached by the message at the message receiving end. If disabled, only the actual message body is sent.

Payloads and Message type: Three message types are presently supported:

  • String: Ordinary string.
  • Hex String: A string is presented as a hexadecimal value, such as Hello, which can be represented as 48656C6C6F (where 48 corresponds to the letter H in the ASCII table, and so on). Typically, hexadecimal strings are used to construct non-textual message bodies, such as describing certain private protocol interactions, control information, and so on.
  • Random string with a fixed length: A random string of a specified length (in bytes) is generated as a message body.
MQTT message subscription sampler (MQTT Sub Sampler)

The Message Pub Sampler reuses the MQTT connection established in the Connection Sampler to subscribe to messages from the target MQTT server.

Image by:

(Chongyuan Yin, CC BY-SA 4.0)

More on edge computing Understanding edge computing Why Linux is critical to edge computing eBook: Running Kubernetes on your Raspberry Pi Download now: The automated enterprise eBook eBook: A practical guide to home automation using open source tools eBook: 7 examples of automation on the edge What is edge machine learning? The latest on edge

QoS Level: Quality of Service, the meaning is the same as that for the Message Pub Sampler.

Topic name: The topic to which the subscribed message belongs. A single message subscription sampler can subscribe to multiple topics, separated by commas.

Payload includes timestamp: If enabled, the timestamp is parsed from the beginning of the message body, which can be used to calculate the receive delay of the message with the Add timestamp in the payload option of the message delivery sampler. If disabled, only the actual message body is parsed.

Sample on: For the sampling method, the default is specified elapsed time (ms), such as sampling every millisecond. The number of received messages can also be selected, such as sampling once for every specified number of messages received.

Debug response: If checked, the message content is printed in the JMeter response. This option is mainly used for debugging purposes. It isn't recommended to run the test formally when checked in order to avoid affecting the test efficiency.

MQTT disconnect sampler (MQTT DisConnect)

Disconnects the MQTT connection established in the connection sampler.

Image by:

(Chongyuan Yin, CC BY-SA 4.0)

For flexibility, the property values in the above sampler can refer to JMeter's system or custom variables.

MQTT and JMeter

In this article, I've introduced the various test components of the JMeter MQTT plug-in. In another article, I'll discuss in detail how to build test scripts with the MQTT plug-in for different test scenarios.

This article originally appeared on How to Use the MQTT Plug-in in JMeter and is republished with permission.

In order to support the load testing of the MQTT protocol, EMQ developed a JMeter-based open source testing plug-in for the MQTT protocol.

Image by:

Internet of Things (IoT) Edge computing What to read next Build test scripts for your IoT platform This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

A 4-minute guide to Java loops

Sat, 01/14/2023 - 16:00
A 4-minute guide to Java loops sethkenlon Sat, 01/14/2023 - 03:00

A while loop performs a set of tasks for as long as some predefined condition is true. This is considered a control structure that directs the flow of a program. It's a way for you to tell your code what to do by defining a condition that it can test, and take action based on what it finds. The two kinds of while loops in Java are while and do while.

Java while loop

A while loop is meant to iterate over data until some condition is satisfied. To create a while loop, you provide a condition that can be tested, followed by the code you want to run. Java has several built-in test functions, the simplest of which are mathematical operators (<, >, ==, and so on):

package com.opensource.example; public class Example { public static void main(String[] args) { int count = 0; while (count < 5) { System.out.printf("%d ", count); count++; } } }

In this simple example, the condition is that the variable count is less than 5. Because count is instantiated at 0, and then incremented by 1 in the code within the while loop, the program iterates a total of 5 times:

$ java ./ 0 1 2 3 4

Before it can iterate a sixth time, the condition is no longer true, so the loop ends.

The conditional statement for a while loop is vital. Getting it wrong could mean that your loop never executes. For instance, suppose you had set count == 5 as the condition:

while (count == 5) { System.out.printf("%d ", count); count++;

When you run the code, it builds and runs successfully, but nothing happens:

$ java ./ $

The loop has been skipped because count was set to 0, and it's still 0 at the moment the while loop is first encountered. The loop never has a reason to start and count is never incremented.

The reverse of this is when a condition starts as true and can never be false, this results in an infinite loop.

Java do while loop

Similar to the while loop, a do while loop tests for the conditional at the end, not the beginning, of each iteration. With this, the code in your loop runs at least once because there's no gateway to entry, only a gateway to exit:

package com.opensource.example; public class Example { public static void main(String[] args) { int count = 9; do { System.out.printf("%d ", count); count++; } while(count == 5); } }

In this sample code, count is set to 9. The condition for the loop to repeat is that count is equal to 5. But 9 isn't equal to 5. That check isn't performed until the end of the first iteration, though:

$ java ./ 9

More on Java What is enterprise Java programming? An open source alternative to Oracle JDK Java cheat sheet Red Hat build of OpenJDK Free online course: Developing cloud-native applications with microservices Fresh Java articles Java infinite loops

An infinite loop, as its name suggests, never ends. Sometimes they're created by mistake, but an infinite loop does have a valid use case. Sometimes you want a process to continue indefinitely (that's functionally infinite because you can't guarantee when you need it to stop), and so you might set your condition to something impossible to meet.

Suppose you've written an application that counts the number of zombies remaining in your neighborhood during a zombie apocalypse. To simulate uncertainty over how many loops are required to get to 0 zombies, my demo code retrieves a timestamp from the operating system and sets the value of the counter (c) to some number derived from that timestamp. Because this is a simple example and you don't really want to get trapped in an infinite loop, this code counts down to zero and uses the break function to force the loop to end:

package com.opensource.example; public class Example { public static void main(String[] args) { long myTime = System.currentTimeMillis(); int c; if ( myTime%2 == 0 ) { c = 128; } else { c = 1024; } while(true) { System.out.printf("%d Zombies\n", c); // break for convenience if ( c <= 0 ) { break; } c--; } } }

You may have to run it a few times to trigger a different total number of zombies, but sometimes your program iterates 128 times and other times 1,024 times:

$ java ./ 1024 Zombies 1023 Zombies [...] 0 Zombies

Can you tell why the loops end at 0 and not at -1?

Java loops

Loops give you control over the flow of your program's execution. Iteration is common in programming, and whether you use a while loop, a do while loop, or an infinite loop, understanding how loops work is vital.

Whether you use a while loop, a do while loop, or an infinite loop, understanding how loops work is vital to Java programming.

Image by:

Java What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

HPC containers at scale using Podman

Fri, 01/13/2023 - 16:00
HPC containers at scale using Podman lastephey Fri, 01/13/2023 - 03:00

This article describes recent work done at NERSC in collaboration with Red Hat to modify Podman (the pod manager tool) to run at a large scale, a key requirement for high-performance computing (HPC). Podman is an open source tool for developing, managing, and running containers on Linux systems. For more details about this work, please see our paper which will be published in the CANOPIE-HPC Supercomputing 2022 proceedings.

In the following demo video, we walk through pulling an image onto Perlmutter from the NERSC registry, generating a squashed version of the image using podman-hpc, and running the EXAALT benchmark at large scale (900 nodes, 3600 GPUs) via our podman-exec wrapper. NERSC's flagship supercomputing system is Perlmutter, currently number 7 on the Top 500 list. It has a GPU partition with over 6000 NVIDIA A100 GPUs and a CPU partition with over 6000 AMD Milan CPUs. All of the work described in this blog post has been performed on Perlmutter.

NERSC, the National Energy Research Scientific Computing center, is the US Department of Energy's production mission computing facility that serves the DOE Office of Science, which funds a wide range of fundamental and applied research. In the first half of 2022, more than 700 unique users used Shifter, the current container solution at NERSC, and general user interest in containers is growing.

Although NERSC has demonstrated near bare metal performance with Shifter at large scales, several shortcomings have motivated us to explore Podman. The primary factor is that Shifter does not provide any build utilities. Users must build containers on their own local system and ship their images to NERSC via a registry. Another obstacle is that Shifter provides security by limiting the running container to the privileges of the user who launched it. Finally, Shifter is mostly an "in-house" solution, so users must learn a new technology, and NERSC staff have the additional burden of maintaining this software.

Podman provides a solution to all of these major pain points. Podman is an OCI-compliant framework that adheres to a set of community standards. It will feel familiar to users who have used other OCI-compliant tools like Docker. It also has a large user and developer community with more than 15k stars on GitHub as of October 2022. The major innovation that has drawn us to Podman is rootless containers. Rootless containers elegantly constrain privileges by using a subuid/subgid map to enable the container to run in the user namespace but with what feels like full root privileges. Podman also provides container build functionality that will allow users to build images directly on the Perlmutter login nodes, removing a major roadblock in their development workflows.

[ Check out the latest Podman articles on Enable Sysadmin. ]

Enabling Podman at a large scale on Perlmutter with near-native performance required us to address site integration, scalability, and performance. Additionally, we have developed two wrapper scripts to achieve two modes of operation: Podman container-per-process and podman-exec. Podman container-per-process mode describes the situation in which many processes are running on the node (usually in an MPI application), with one individual container running for each process. The podman-exec mode describes the situation in which there is a single container running per node, even if there are multiple MPI processes.

We ran several benchmarks with podman-hpc on Perlmutter to measure the performance of bare metal implementations: Shifter, Podman container-per-process, and podman-exec mode. The EXAALT benchmark runs the LAMMPS molecular dynamics application, the Pynamic benchmark simulates Python package imports and function invocations, and the DeepCAM benchmark is a climate data segmentation deep learning application. In general, the benchmarks suggest comparable performance between bare metal, Shifter, and podman-exec cases. The startup overhead incurred in Podman container-per-process can be seen in the results of both Pynamic and DeepCAM. In general, podman-exec was our best performing configuration, so this is the mode on which we will focus our future development efforts.

Image by:

(Laurie Stephey, CC BY-SA 4.0)

Results from our strong-scaling EXAALT benchmark at 32, 64, 128, and 256 nodes. The average of two bare metal run results are shown in red, Shifter run results are shown in blue, Podman container-per-process run results are shown in dark green, and podman-exec mode results are shown in light green with corresponding error bars.

Image by:

(Laurie Stephey, CC BY-SA 4.0)

The results of the Pynamic benchmark for bare metal (red), Shifter (blue), podman-exec mode (green), and Podman container-per-process mode (light-green) over two job sizes (128 and 256 nodes) using 64 tasks per node. All configurations were run three times.

Image by:

(Laurie Stephey, CC BY-SA 4.0)

Linux Containers What are Linux containers? What is Kubernetes? Free online course: Deploy containerized applications eBook: A guide to Kubernetes for SREs and sysadmins Free online course: Running containers with Red Hat technical overview eBook: Storage patterns for Kubernetes

The results of the MLPerf DeepCAM strong scaling benchmark for Shifter (blue), Podman container-per-process (light green), and podman-exec mode (dark green) over a range of job sizes (16, 32, 64, and 128 Perlmutter GPU nodes). We separate the timing data into container startup, training startup, and training runtime.

We are excited about the results we have seen so far, but we still have work to do before we can open Podman to all NERSC users. To improve the user experience, we aim to explore adding Slurm integration to remove some of the complexity of working with nested wrapper scripts, especially for the podman-exec case. We also aim to get our podman-hpc scripts and binaries into the Perlmutter boot images of all nodes, so staging these to each node will no longer be necessary. We hope to address some of the limitations of the OCI hook functionality (for example, the inability to set environment variables) with the OCI community. Finally, our goal is to get much of our work upstreamed into Podman itself so the larger Podman community can leverage our work.

Learn how Podman is being modified to run at a large scale for high-performance computing (HPC).

Image by:

Kubernetes Containers What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 32 points New Mexico

Andrew J. Younge is the R&D Manager of the Scalable Computer Architectures department at Sandia National Laboratories. His research interests include high performance computing, virtualization & containers, distributed systems, and energy efficient computing.

| Connect ajy4490 Open Enthusiast Author 31 points Portland, OR

Dan Fulton earned his PhD from the University of California, Irvine, in computational plasma physics in 2015. He continued his plasma computation research as a scientist at Tri Alpha Energy (now TAE Technologies) from 2015-2018, and then made a leap to the Future innovation team at Adidas, where he developed software tools to aid in the engineering and design of 3D printed footwear components. He also introduced and deployed JupyterHub to Adidas Future. In 2021, Dan joined the NERSC Data & Analytics Services team at Berkeley National Lab as a Scientific Data Architect, with an ambition to bring lessons from private sector innovation to back to benefit the scientific research community.

| Connect dpfulton Open Enthusiast Author 31 points Open Enthusiast Author 33 points Berkeley, CA | Follow scanon Open Enthusiast Author Register or Login to post a comment.

Fork our open source onboarding program

Thu, 01/12/2023 - 16:00
Fork our open source onboarding program stackedsax Thu, 01/12/2023 - 03:00

Getting started as a contributor to an open source project shouldn't feel like getting bad customer service: "Please hold while we connect you with the first available representative," followed by mind-numbing elevator music on an infinite loop. Nor should new contributors feel they have to scale Mt. Annapurna and go before a wizened greybeard to get their first commit accepted. Too often, junior coders are scared away from open source altogether because everything they do is exposed for all to see.

When I was first starting in open source, after more than a decade of producing closed source, proprietary code for Fortune 500 software companies, I made some (ill-conceived) contribution suggestions to a widely known open source project, and I was taken aback by the abrupt nature of my interactions with the others involved. They were always too busy, or too uninterested, to look at what I was working on, let alone help me.

So what should starting in open source feel like?

We at G-Research are partnering with Major League Hacking (MLH) to bring more coders into open source. We aim to get them started with good, productive experiences so we can build a talent pipeline for the entire open source universe and keep it full for years to come.

MLH started in 2013 as a community for developers that runs hackathons and helps people secure employment. The MLH community is 600,000 strong and sees some 1,000 participants pass through its fellowship programs each year. The MLH Open Source Fellowship runs for 12 weeks and helps new coders get started with key concepts such as submitting pull requests, maintaining projects, and open source best practices.

Finding capable coders isn't easy for employers, and finding new employees empowered to execute on open source can be even more daunting. The Linux Foundation found in a recent survey that 93% of hiring managers had difficulty sourcing sufficient talent with open source experience. The need is particularly acute when it comes to welcoming traditionally underrepresented demographics in tech, such as women and minorities.

Creating good open source code is one thing, but we also see a strong need for maintainers to keep projects nourished and vital. Turnover for maintainers is high by any measure. Tidelift, a company that distributes funds to open source maintainers and connects those maintainers to the companies who use them, reports that 59% of project maintainers have considered quitting. That number is an indication that the experience for those who keep and improve the code needs to be better.

We have a fistful of open source projects ourselves: Armada, a multi-Kubernetes-cluster batch job meta-scheduler; Siembol, a scalable, advanced security analytics framework; and ILGPU, a just-in-time compiler for high-performance GPU programs. There's also the armload of projects we provide our employees time to support as maintainers:, Thanos-remote-read, geras, ParquetShop, a Vault plugin database for Aerospike, Apache Ozone, and Fantomas.

These projects are integral components of what G-Research does, and their smooth operation makes our business stronger. Partnering with MLH will help ensure that the open source projects we rely on continue to attract and nurture top talent. We've already seen tremendous success directly within our team, having hired a number of the Fellows from the MLH program to continue working with us on our projects in some capacity.

Indeed, many contributors report that their time working with our engineers was one of the most positive experiences of their careers. And a few have gone on to work for us on our projects. "The best part of this program is getting to learn from the G-Research devs," says Victor Zeddys, an Apache Ozone Fellow. "They were really kind and informative, and I envied their capabilities to handle such a complex project."

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources

And Victor isn't alone. "I had an amazing experience and met lots of great people," says Celina Cywinska, a DevOps Fellow. "I've learned so much about real-world teamwork, and the maintainers shared knowledge I wouldn't have gained in the classroom." Gaining knowledge from your coworkers is one of the most important aspects of employment and can make an especially big difference for those just starting out.

Just as important as the people we have brought to our team is the sense that we are helping seed the wider open source universe with capable, confident open source engineers. "This fellowship has helped me evolve in ways I couldn't imagine," says Christos Bisias, an Apache Ozone Fellow. "It has been an amazing experience working closely with such professionals and learning all kinds of new technologies and best practices."

We're just as proud of the fellows who have come through our program and applied their experience to other projects or employment. We don't have to hire everyone that comes out of the program to benefit from the things MLH teaches and the experiences it equips its graduates with. We know that somewhere down the road, we will all reap the benefit of having a healthy open source talent pipeline from the very beginnings of the open source journey—no mountaineering required.

This article originally appeared on Major League Hacking and is republished with permission.

The open source software team at G-Research is helping establish an easier on-ramp for getting started in open source.

Image by:

Community management What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 31 points London, UK

Max Mizzi, Head of Growth
Major League Hacking (MLH)

Max started out his career as software engineer, but quickly realised that whilst he loves writing code, there was the opportunity to make a lot of impact by enabling companies to better leverage technology, and making developers lives better. The journey first saw him setting up innovation labs, a chief data office, and a first tech strategy, before consulting on digital transformation and emerging tech adoption. The thread that run through all of this was that the biggest success factor in technology is people. This realisation saw Max move into the EdTech (/Tech Ed!) world, and spend several years at General Assembly running large scale transformational learning engagements in Tech and Data, before joining MLH to lead Growth. In his role at MLH the focus is on early career technologists, where he has established dozens of partnerships to empower hundreds of emerging developers to launch their careers.

Pronouns: He/Him

| Connect maxmizzi Open Enthusiast Author Register or Login to post a comment.

Open source software is transforming healthcare

Thu, 01/12/2023 - 16:00
Open source software is transforming healthcare James Ahern Thu, 01/12/2023 - 03:00

In the summer of 2022, the UK government and NHS England published its Open Source Policy, stating that open source technology is:

Particularly suitable for use within the healthcare industry where, through active collaboration between IT suppliers and user/clinicians communities, solutions can be honed to maximise benefits to delivery of health and social care.

The public statement by NHS England is just the latest development in a broader trend: The wholehearted embrace of open source software by the healthcare sector. And no wonder; open source presents myriad opportunities for this most complex of industries, with potential solutions across various sub-sectors. Yes, open source is now powering everything from medical wearables to healthcare human resource management.

Health informatics

Information technology is playing an increasingly vital role in every sector of the economy, with the healthcare industry no exception. One of the most important developments in this field is the growth of health informatics—in other words, the acquisition and analysis of all types of patient data, including test results, scans, and electronic health records (EHR).

Informatics is all about providing better health outcomes for patients, but essential to this are standardization and interoperability—and this is where open source can make a big difference because of its truly collaborative and "open" nature.

Open source developers have created numerous software solutions for various businesses and organizations within the healthcare sector. First, it is probably worth distinguishing between healthcare software and medical software.

Open source medical software

Medical software refers specifically to medical devices and direct patient care/treatment. It also includes tools for monitoring, analyzing and interpreting data, and a range of other functions. Medical software can be designed for treatment, simulation, or medical training.

Some of the best applications of this are in smartphone apps, allowing patients to track vitals from home. Glucosio is an Android and iOS app enabling people with diabetes to monitor glucose levels while simultaneously supporting diabetes research.

[ Also read Automation: 5 ways it can change lives ]

Imaging and visualization are other fields where open source software provides solutions. One of these is Slicer, a free, open source package written in C++, Python, and Qt. Slicer includes image analysis and scientific visualization tools. Medical professionals use it for various medical applications as diverse as autism, multiple sclerosis, systemic lupus erythematosus, prostate cancer, lung cancer, breast cancer, schizophrenia, orthopedic biomechanics, COPD, cardiovascular disease, and neurosurgery. Studierfenster is another free, open source product. It is an online server-based framework for medical image processing that displays 2D and 3D images in a standard web browser.

Open source healthcare software

Healthcare software is a broader term covering any software developed for the healthcare industry. It encompasses medical solutions, including tools for diagnosis or treatment optimization, but also covers tools that aid with infrastructure services, patient information, public health, and other auxiliary requirements. It is possible here that open source software has the biggest impact on the sector.

Perhaps the most ubiquitous examples of open source software in the healthcare sector are those designed to manage patient records, with applications such as Open Hospital, Open EMR, and Open MRS all helping hospitals and surgeries to hold and manage electronic health records (EHR). Open Dental offers a similar service for dentistry providers and can also be utilized for practice management, including billing and electronic charting.

Hospital Run is another patient record application specifically designed to improve the accessibility of healthcare in developing countries, with an "offline-first" approach to managing healthcare records.

Governments also use open source software for applications in health system management, public health, and biosurveillance. The integrated Human Resource Information System (iHRIS) developed by IntraHealth International helps countries track data about their healthcare workforce and is already being used in over 20 countries worldwide.

Epi Info is statistical software for epidemiology, an open source, public domain tool developed by the USA's Centers for Disease Control and Prevention (CDC). Another tool used to model and visualize the spread of infectious diseases is the Spatiotemporal Epidemiological Modeler, originally developed at IBM Research but freely available through the Eclipse Foundation.

Unique solutions are also being developed for disease management, including collections of applications like Breathing Games—a series of research-backed games under Peer Production licenses created for the prevention, diagnosis, and treatment of chronic respiratory diseases—and Nightscout—a suite of software tools which allow continuous glucose monitoring from the home using cloud technology.

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources Benefits of open source in healthcare

Open source software has the potential to bring together a host of relevant healthcare stakeholders, including government agencies, medical equipment vendors, healthcare service providers, and research agencies, by facilitating standardization and interoperability in health informatics. But the benefits of open source go beyond this.

Because of the transparency of open source development, there is a large degree of flexibility, with many highly customizable solutions. Furthermore, the open source community has a clearly defined vision with strong motivation from users and developers alike to improve and maintain applications. There is also the potential for open source software to provide increased security due to the reduced reliance on third-party suppliers and the adoption of blockchain technologies—most of which remain open source.

And finally, for a sector struggling with many global crises, the improved cost-benefit ratio of open source software is not an insignificant consideration.

Open source—the answer for a sector under unprecedented pressure?

The healthcare sector is being forced to adapt rapidly due to multiple pressures, and as a result, technological solutions are becoming increasingly important. As open source software becomes more reliable, healthcare organizations realize tangible benefits from its transparency, security, and flexibility. It's clear that open source presents an array of benefits for the industry, aiding not just developed but also developing economies.

Healthcare organizations realize tangible benefits from using open source tools. Explore these examples.

Image by:

Science Health What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

7 interesting metrics about open source in sustainability

Wed, 01/11/2023 - 16:00
7 interesting metrics about open source in sustainability Tobias Augspurger Wed, 01/11/2023 - 03:00

Open source culture has demonstrated how transparent and collaborative innovation can support modern digital services, data, and infrastructure. Yet, despite its transformative impact and use within an estimated 97% of digital products, the potential of open source for developing environmentally sustainable technologies is not well understood.

Open source software (OSS) accelerates the transition to a sustainable economy by supporting traceable decision-making, building capacity for localization and customization of climate technologies, and most importantly, helping to prevent greenwashing. This transition requires technological innovation and new opportunities for society to participate in developing and adopting technologies. has recently published a study providing the first analysis of the health and vibrancy of OSS in sustainability and climate technology. The analysis covered multiple dimensions, including technical, social, and organizational. The report also highlights key risks and challenges for users, developers, and decision-makers, as well as opportunities for more systemic collaboration.

For the past two years, more than one thousand actively developed open source projects and organizations were collected and systematically analyzed using qualitative and quantitative methods as part of the Open Sustainable Technology project and the associated database.

Community engagement found 996 active project repositories on GitHub that had at least one commit or a closed issue in the last year. Although stars are not a perfect metric, we counted 127,038 stars across these projects. Still, a search on GitHub revealed 27 projects have more stars than the entire software in environmental sustainability combined! This is one indicator that open source still plays a minor role as a long-term transformation strategy in sustainability compared to other domains.


Additionally, half of all identified projects are in data-rich fields such as climate science, biosphere, energy system modeling, transportation, and buildings. Other topics, such as carbon offsets, battery technology, sustainable investment, emission observation, and integrated assessment modeling, show few notable developments. Based on popularity growth, also identified newly emerging topics, such as green software. Moreover, most identified projects are relatively young, with a median age of 4.45 years.

Image by:

(Tobias Augspurger, CC BY-SA 4.0)

Programming language

The analysis of the number and use of programming languages provided further insight into the coding skills required and the nature of the projects. For example, found Python dominated the OSS movement for sustainability, and is used in 39.8% of all projects, followed by R (16.7%), and lastly Jupyter notebooks (9.34%). This indicates a strong focus on analyzing large datasets, where Python and Jupyter Notebooks are increasingly dominant and less focused on web applications.

Image by:

(Tobias Augspurger, CC BY-SA 4.0)


The use of various licenses revealed potential intellectual property issues related to the use of software packages as well as the general openness of the projects. found permissive licenses like BSD, Apache and MIT are the most popular in sustainability. The MIT license was the top choice, used in 26% of the projects, followed by the copyleft license GPLv3 (17.3% of all projects).

Project size

Analysis of knowledge, work, and project governance distribution revealed small, open source communities lead most of the development. On average, open source software projects rely heavily on a single developer responsible for approximately 70% of the contributions to a project. This indicates a higher contributor risk, which may jeopardize the future of many of these projects.


Most OSS projects (64%) are based in Europe and North America, with a small number of projects from the Global South. Despite having more GitHub users than Europe, Asia accounts for only 1.9% of organizations working in OSS for sustainability.

Image by:

(Tobias Augspurger, CC BY-SA 4.0)


Academia and several government agencies contribute significantly to open source, while the lack of for-profit organizations and startups with open source business models is remarkable. 

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources Recommendations

Based on this analysis, proposes recommendations for those interested in supporting open source software in environmental sustainability more effectively through community building, policy development, and future investment.

  • Collaboration: strengthen the interconnectivity of the identified open source communities and connecting projects to local use cases is paramount for the long-term impact and stability of the ecosystem. It's also key to adapt and extend existing open source projects for underrepresented countries in the Global South.
  • Funding: further support is required in the form of an open earth intelligence incubator and other support programs for open source software in environmental sustainability, as well as ongoing, dedicated funds for development and maintenance.
  • Technical: the OSS community of users and developers should develop better technical interfaces between platforms, data, models, and open-source tools across and within sectors to “stop reinventing the wheel”, and standardize environmental data exchange across different levels of government.
  • Advocacy: close the knowledge gap on the environmental impact of companies through open source principles and transforming financial institutions through transparent and scientific decision-making for sustainable investments.

Digital and sustainable transformation must converge as a digital public good if we are to achieve agreed environmental goals and create a safe and equitable corridor for people and the planet. Open sustainability principles can help governments, research institutes, nongovernmental organizations, and businesses move quickly toward science-based decarbonization and conservation of natural resources and ecosystems by providing critical transparency, traceable decision-making, and collaboration on innovation.

Everyone is invited to participate in future studies of this type. By contributing to in any way, you help us build future reports. Most importantly, you join us in promoting and encouraging open sustainable technology. has recently published a study providing the first analysis of the health and vibrancy of open source software in sustainability and climate technology.

Image by:

Sustainability What to read next Is sustainability still a thing in open source? 5 open source tips to reduce waste in web design How Linux rescues slow computers (and the planet) This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. 31 points | Follow irinimalliaraki | Connect eirinimalliaraki Open Enthusiast Author 32 points

Josh is an entrepreneur, product and data specialist with over a decade of experience in innovation and technology. His focus is on people-centred digital transformation, for the betterment of people and the planet.

With an immense passion for tech for good, Josh works on high-impact initiatives, and has delivered award-winning solutions for leading organisations globally. He is the founding director of Open Corridor, a data-driven non-profit, focused integrated sustainability intelligence.

Josh is a TEDx speaker, EIT (European institute of Innovation and Technology) alumni, associate at Curtin University Sustainability Policy Institute (CUSP), and a passionate advocate for open source.

| Follow joshhoppie | Connect hopkins-josh Open Enthusiast Author Register or Login to post a comment.

Why developers choose open source in the hybrid cloud

Wed, 01/11/2023 - 16:00
Why developers choose open source in the hybrid cloud JozefdeVries Wed, 01/11/2023 - 03:00

The biggest barrier to growth and innovation is a stagnant data platform. Legacy proprietary systems aren't technically viable for most enterprises anymore. The enhanced capabilities of adopting a cloud-centric, open source approach to database management systems are critical to thriving in today's business landscape.

I expect to see fully managed databases in the cloud, known as Database as a Service (DBaaS), become the standard in the next few years. In fact, the DBaaS market is expected to be worth over $24 billion by 2025 (up from just $12 billion in 2020). But while DBaaS is enabling innovation, like any new approach, it comes with its challenges.

More opportunity, more potential obstacles

Developers are key decision-makers in the cloud environment and are increasingly influencing businesses' technology choices. In the cloud, developers have more freedom to choose from a plethora of technologies across their stack. But their employers still need to ensure that corporate directives are met, whether it's cost, legal guidelines, or compliance.

[ Related read: Managed services vs. hosted services vs. cloud services: What's the difference? ]

The dynamic playing out across many industries who are embracing the cloud is one where the developers are influencing technology choices. But the business leaders still need to ensure those choices are being made within the guardrails of their corporate governance policies. Unfortunately, what often ends up occurring is that these decision-makers realize too late that their new cloud vendors are imposing similar dependencies on them as their legacy, proprietary systems: and where's the benefit in that?

Explore the open source cloud Free online course: Developing cloud-native applications with microservices eBook: Modernize your IT with managed cloud services Try for 60 days: Red Hat OpenShift Dedicated Free online course: Containers, Kubernetes and Red Hat OpenShift What is Kubernetes? Understanding edge computing Latest articles for IT architects Open source as a solution

Fortunately, open source technologies are giving developers the flexibility and scalability they're seeking. Open source databases, like PostgreSQL, have endured for decades because of their flexibility and extensibility, and they continue to be a top choice for developers in the future.

Postgres' robust and globally distributed community ensures constant innovation at an enterprise scale. Its ability to be audited, improved, and shared, makes it a favorite tool for professional developers. Additionally, Postgres is a proven technology for overcoming obstacles to cloud and multi-cloud adoption. They regularly outperform other databases in the most critical contexts, including technical performance flexibility across the broadest number of mission-critical enterprise applications.

Developers have the opportunity to make their mark on Postgres. They are not dependent on what a traditional database vendor "thinks" should be in the code. They can actually shape and build the code into what they think it should be to meet the specific needs of their company.

The short of it

With open source technologies like Postgres, developers and business leaders alike can reap the benefits of capabilities such as database performance at scale, security, and reliability. Open source technology provides greater flexibility in hybrid and multi-cloud environments so that developers can focus on what's most important—building with greater speed and innovation, all while using a tool they enjoy.

Open source technology provides greater flexibility in hybrid and multi-cloud environments so that developers can focus on what's most important—building with greater speed and innovation, all while using a tool they enjoy.

Image by:

Databases Cloud What to read next Why it's important to keep the cloud open Getting started with PostgreSQL This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

How to use methods in Java

Tue, 01/10/2023 - 23:25
How to use methods in Java sethkenlon Tue, 01/10/2023 - 10:25

A method in Java (called a "function" in many other programming languages) is a portion of code that's been grouped together and labeled for reuse. Methods are useful because they allow you to perform the same action or series of actions without rewriting the same code, which not only means less work for you, it means less code to maintain and debug when something goes wrong.

A method exists within a class, so the standard Java boilerplate code applies:

package com.opensource.example;

public class Example {
  // code here

A package definition isn't strictly necessary in a simple one-file application like this, but it's a good habit to get into, and most IDEs enforce it.

By default, Java looks for a main method to run in a class. Methods can be made public or private, and static or non-static, but the main method must be public and static for the Java compiler to recognize and utilize it. When a method is public, it's able to be executed from outside the class. To call the Example class upon start of the program, its main method must be accessible, so set it to public.

Here's a simple demonstration of two methods: one main method that gets executed by default when the Example class is invoked, and one report method that accepts input from main and performs a simple action.

To mimic arbitrary data input, I use an if-then statement that chooses between two strings, based on when you happen to start the application. In other words, the main method first sets up some data (in real life, this data could be from user input, or from some other method elsewhere in the application), and then "calls" the report method, providing the processed data as input:

package com.opensource.example;

public class Example {
  public static void main(String[] args) {
    // generate some data
    long myTime = System.currentTimeMillis();
    String weather;

    if ( myTime%2 == 0 ) {
      weather = "party";
    } else {
      weather = "apocalypse";

    // call the other method

  private static void report(String day) {
    System.out.printf("Welcome to the zombie %s\n", day);

Run the code:

$ java ./
Welcome to the zombie apocalypse
$ java ./
Welcome to the zombie party

Notice that there are two different results from the same report method. In this simple demonstration, of course, there's no need for a second method. The same result could have been generated from the if-then statement that mimics the data generation. But when a method performs a complex task, like resizing an image into a thumbnail and then generating a widget on screen using that resized image, then the "expense" of an additional component makes a lot of sense.

More on Java What is enterprise Java programming? An open source alternative to Oracle JDK Java cheat sheet Red Hat build of OpenJDK Free online course: Developing cloud-native applications with microservices Fresh Java articles When to use a Java method

It can be difficult to know when to use a method and when to just send data into a Java Stream or loop. If you're faced with that decision, the answer is usually to use a method. Here's why:

  • Methods are cheap. They don't add processing overhead to your code.
  • Methods reduce the line count of your code.
  • Methods are specific. It's usually easier to find a method called resizeImage than it is to find code that's hidden in a loop somewhere in the function that loads images from the drive.
  • Methods are reusable. When you first write a method, you may think it's only useful for one task within your application. As your application grows, however, you may find yourself using a method you thought you were "done" with.
Functional vs. object-oriented programming

Functional programming utilizes methods as the primary construct for performing tasks. You create a method that accepts one kind of data, processes that data, and outputs new data. String lots of methods together, and you have a dynamic and capable application. Programming languages like C and Lua are examples of this style of coding.

The other way to think of accomplishing tasks with code is the object-oriented model, which Java uses. In object-oriented programming, methods are components of a template. Instead of sending data from method to method, you create objects with the option to alter them through the use of their methods.

Here's the same simple zombie apocalypse demo program from an object-oriented perspective. In the functional approach, I used one method to generate data and another to perform an action with that data. The object-oriented equivalent is to have a class that represents a work unit. This example application presents a message-of-the-day to the user, announcing that the day brings either a zombie party or a zombie apocalypse. It makes sense to program a "day" object, and then to query that day to learn about its characteristics. As an excuse to demonstrate different aspects of object-oriented construction, the new sample application will also count how many zombies have shown up to the party (or apocalypse).

Java uses one file for each class, so the first file to create is, which serves as the Day object:

package com.opensource.example;

import java.util.Random;

// Class
public class Day {
    public static String weather;
    public int count;

// Constructor
  public Day() {
    long myTime = System.currentTimeMillis();

    if ( myTime%2 == 0 ) {
      weather = "paradise";
    } else {
      weather = "apocalypse";

// Methods
  public String report() {
      return weather;

  public int counter() {
    Random rand = new Random();
    count = count + rand.nextInt(100);


In the Class section, two fields are created: weather and count. Weather is static. Over the course of a day (in this imaginary situation), weather doesn't change. It's either a party or an apocalypse, and it lasts all day. The number of zombies, however, increases over the course of a day.

In the Constructor section, the day's weather is determined. It's done as a constructor because it's meant to only happen once, when the class is initially invoked.

In the Methods section, the report method only returns the weather report as determined and set by the constructor. The counter method, however, generates a random number and adds it to the current zombie count.

This class, in other words, does three very different things:

  • Represents a "day" as defined by the application.
  • Sets an unchanging weather report for the day.
  • Sets an ever-increasing zombie count for the day.

To put all of this to use, create a second file:

package com.opensource.example;

public class Example {
  public static void main(String[] args) {
    Day myDay = new Day();
    String foo =;
    String bar =;

    System.out.printf("Welcome to a zombie %s\n", foo);
    System.out.printf("Welcome to a zombie %s\n", bar);
    System.out.printf("There are %d zombies out today.\n", myDay.counter());
    System.out.printf("UPDATE: %d zombies. ", myDay.counter());
    System.out.printf("UPDATE: %d zombies. ", myDay.counter());

Because there are now two files, it's easiest to use a Java IDE to run the code, but if you don't want to use an IDE, you can create your own JAR file. Run the code to see the results:

Welcome to a zombie apocalypse
Welcome to a zombie apocalypse
There are 35 zombies out today.
UPDATE: 67 zombies. UPDATE: 149 zombies.

The "weather" stays the same regardless of how many times the report method is called, but the number of zombies on the loose increases the more you call the counter method.

Java methods

Methods (or functions) are important constructs in programming. In Java, you can use them either as part of a single class for functional-style coding, or you can use them across classes for object-oriented code. Both styles of coding are different perspectives on solving the same problem, so there's no right or wrong decision. Through trial and error, and after a little experience, you learn which one suits a particular problem best.

Learn the definition of a method in Java, how to use methods, and when to use methods in this handy tutorial.

Image by:

Pixabay. CC0.

Java What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

A guide to strings in MySQL

Tue, 01/10/2023 - 16:00
A guide to strings in MySQL HunterC Tue, 01/10/2023 - 03:00

Strings are one of the most common data types you will use in MySQL. Many users insert and read strings in their databases without thinking too much about them. This article aims to give you a bit of a deep dive into how MySQL stores and displays your string variables so that you can have better control over your data.

You can break strings into two categories: binary and nonbinary. You probably think about nonbinary strings most of the time. Nonbinary strings have character sets and collations. Binary strings, on the other hand, store things such as MP3 files or images. Even if you store a word in a binary string, such as song, it is not stored in the same way as in a nonbinary string.

I will focus on nonbinary strings. All nonbinary strings in MySQL are associated with a character set and a collation. A string's character set controls what characters can be stored in the string, and its collation controls how the strings are ordered when you display them.

Character sets

To view the character sets on your system, run the following command:


This command will output four columns of data, including the character set:

  • Name
  • Brief description
  • Default collation
  • Maximum size of each character in the character set

MySQL used to default to the latin1 character set, but since version 8.0, the default has been utf8mb4. The default collation is now utf8mb4_0900_ai_ci. The ai indicates that this collation is accent insensitive (á = a), and the ci specifies that it is case insensitive (a = A).

Different character sets store their characters in various-sized chunks of memory. For example, as you can see from the above command, characters stored in utf8mb4 are stored in memory from one to four bytes in size. If you want to see if a string has multibyte characters, you can use the CHAR_LENGTH() and LENGTH() functions. CHAR_LENGTH() displays how many characters a string contains, whereas LENGTH() shows how many bytes a string has, which may or may not be the same as a string's length in characters, depending on the character set. Here is an example:

SET @a = CONVERT('data' USING latin1);


| LENGTH(@a) | CHAR_LENGTH(@a) |
|     4      |       4         |

This example shows that the latin1 character set stores characters in single-byte units. Other character sets, such as utf16, allow multibyte characters:

SET @b = CONVERT('data' USING utf16);


| LENGTH(@b) | CHAR_LENGTH(@b)  |
|       8    |        4         |

A string's collation will determine how the values are displayed when you run a SQL statement with an ORDER BY clause. Your choice of collations is determined by what character set you select. When you ran the command SHOW CHARACTER SET above, you saw the default collations for each character set. You can easily see all the collations available for a particular character set. For example, if you want to see which collations are allowed by the utf8mb4 character set, run:


A collation can be case-insensitive, case-sensitive, or binary. Let's build a simple table, insert a few values into it, and then view the data using different collations to see how the output differs:

CREATE TABLE sample (s CHAR(5));

 ('AAAAA'), ('ccccc'),  ('bbbbb'), ('BBBBB'), ('aaaaa'), ('CCCCC');

SELECT * FROM sample;

| s         |
| AAAAA     |
| ccccc     |
| bbbbb     |
| BBBBB     |
| aaaaa     |
| CCCCC     |

With case-insensitive collations, your data is returned in alphabetical order, but there is no guarantee that capitalized words will come before lowercase words, as seen below:

SELECT * FROM sample ORDER BY s COLLATE utf8mb4_turkish_ci;

| s         |
| AAAAA     |
| aaaaa     |
| bbbbb     |
| BBBBB     |
| ccccc     |
| CCCCC     |

On the other hand, when MySQL runs a case-sensitive search, lowercase will come before uppercase for each letter:

SELECT * FROM sample ORDER BY s COLLATE utf8mb4_0900_as_cs;

| s         |
| aaaaa     |
| AAAAA     |
| bbbbb     |
| BBBBB     |
| ccccc     |
| CCCCC     |

And binary collations will return all capitalized words before lowercase words:

SELECT * FROM sample ORDER BY s COLLATE utf8mb4_0900_bin;

| s         |
| AAAAA     |
| ccccc     |
| bbbbb     |
| BBBBB     |
| aaaaa     |
| CCCCC     |

If you want to know which character set and collation a string uses, you can use the aptly named charset and collation functions. A server running MySQL version 8.0 or higher will default to using the utf8mb4 character set and utf8mb4_0900_ai-ci collation:

SELECT charset('data');

| charset('data')   |
| utf8mb4           |

 SELECT collation('data');

| collation('data')  |
| utf8mb4_0900_ai_ci |

You can use the SET NAMES command to change the character set or collation used.

To change from the utf8mb4 character set to utf16, run this command:

SET NAMES 'utf16';

If you would also like to choose a collation other than the default, you can add a COLLATE clause to the SET NAMES command.

For example, say your database stores words in the Spanish language. The default collation for MySQL (utf8mb4_0900_ai_ci) sees ch and ll as two different characters and will sort them as such. But in Spanish, ch and ll are individual letters, so if you want them sorted in the proper order (following c and l, respectively), you need to use a different collation. One option is to use the utf8mb4_spanish2_ci collation.

SET NAMES 'utf8mb4' COLLATE 'utf8mb4_spanish2-ci';

More on data science What is data science? What is Python? How to become a data scientist Data scientist: A day in the life Use JupyterLab in the Red Hat OpenShift Data Science sandbox Whitepaper: Data-intensive intelligent applications in a hybrid cloud blueprint MariaDB and MySQL cheat sheet Latest data science articles Storing strings

MySQL allows you to choose between several data types for your string values. (Even more so than other popular databases such as PostgreSQL and MongoDB.)

Here is a list of MySQL's binary string data types, their nonbinary equivalents, and their maximum length:

  • binary: char (255)
  • varbinary: varchar (65,535)
  • tinyblob: tinytext (255)
  • blob: text (65,535)
  • mediumblob: mediumtext (16,777,215)
  • longblob: longtext (4,294,967,295)

One important thing to remember is that unlike the varbinary, varchar, text, and blob types, which are stored in variable length fields (that is, using only as much space as needed), MySQL stores binary and char types in fixed length fields. So a value such as char(20) or binary(20) will always take up 20 bytes, even if you store less than 20 characters in them. MySQL pads the values with the ASCII NUL value (0x00) for binary types and spaces for char types.

Another thing to consider when choosing data types is whether you want spaces after the string to be preserved or stripped. When displaying data, MySQL strips whitespace from data stored with the char data type, but not varchar.

CREATE TABLE sample2 (s1 CHAR(10), s2 VARCHAR(10));

INSERT INTO sample2 (s1, s2) VALUES ('cat       ', 'cat       ');

SELECT s1, s2, CHAR_LENGTH(s1), CHAR_LENGTH(s2) FROM sample2;

| s1      | s2      | CHAR_LENGTH(s1) | CHAR_LENGTH(s2) |
| cat     | cat     |        3        |       10        |
+---------+---------+-----------------------------------+Wrap up

Strings are one of the most common data types used in databases, and MySQL remains one of the most popular database systems in use today. I hope that you have learned something new from this article and will be able to use your new knowledge to improve your database skills.

Learn how MySQL stores and displays your string variables so that you can have better control over your data.

Image by:

Databases What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Learn the Ada programming language by writing a simple game

Mon, 01/09/2023 - 16:00
Learn the Ada programming language by writing a simple game Moshe Zadka Mon, 01/09/2023 - 03:00

When you want to learn a new programming language, it's good to focus on the things programming languages have in common:

  • Variables
  • Expressions
  • Statements

These concepts are the basis of most programming languages. Once you understand them, you can start figuring out the rest. Because programming languages usually share similarities, once you know one language, you can learn the basics of another by understanding its differences.

A good way to learn new languages is practicing with a standard program. This allows you to focus on the language, not the program's logic. I'm doing that in this article series using a "guess the number" program, in which the computer picks a number between one and 100 and asks you to guess it. The program loops until you guess the number correctly.

This program exercises several concepts in programming languages:

  • Variables
  • Input
  • Output
  • Conditional evaluation
  • Loops

It's a great practical experiment to learn a new programming language.

Install Ada

The Ada programming language is a unique and highly structured language with a dedicated developer base. The toolchain for Ada is the GNU Ada Development Environment, better known as GNAT.

You can install GNAT on Linux using your distribution's package manager. On Fedora, CentOS, or similar:

$ sudo dnf install gcc-gnat

On Debian, Linux Mint, and derivatives:

$ sudo apt install gnat

On macOS and Windows, you can download an installer from the Adacore website (choose your platform from the drop-down menu).

Guess the number in Ada

Create a file called game.adb.

The two built-in Ada libraries this program uses are Text_IO and Numerics.Discrete_Random:

with Ada.Text_IO;

use Ada.Text_IO;

with Ada.Numerics.Discrete_Random;Procedure head

The name of the procedure must match the name of the file. The first part is defining the variables.

Note that the discrete_random is specialized to a specific range. In this case, the range of numbers allowed:

procedure Game is
   type randRange is range 1..100;
   package Rand_Int is new ada.numerics.discrete_random(randRange);
   use Rand_Int;
   gen : Generator;
   num : randRange;
   incorrect: Boolean := True;
   guess: randRange;

Programming and development Red Hat Developers Blog Programming cheat sheets Try for free: Red Hat Learning Subscription eBook: An introduction to programming with Bash Bash shell scripting cheat sheet eBook: Modernizing Enterprise Java An open source developer's guide to building applications Procedure logic

The logic starts by reset(gen). This initializes the random number generator, ensuring the number, initialized with random(gen), will be different each time you run the program.

The next step is to run the loop:

  • Output the instructions for a guess
  • Read the line
  • Convert it to randRange
  • Check it against the number

If the number matches, incorrect is set to False, causing the next iteration of the loop to exit.

Finally, the program prints a confirmation of the guess correctness before exiting:

   num := random(gen);
   while incorrect loop
       Put_Line ("Guess a number between 1 and 100");
          guess_str : String := Get_Line (Current_Input);
          guess := randRange'Value (guess_str);
       if guess < num then
           Put_line("Too low");
       elsif guess > num then
           Put_line("Too high");
           incorrect := False;
       end if;
   end loop;
   Put_line("That's right");
end Game;Build the program

The easiest way to compile an Ada program is to use gnatmake:

$ gnatmake game.adb
aarch64-linux-gnu-gcc-10 -c game.adb
aarch64-linux-gnu-gnatbind-10 -x game.ali
aarch64-linux-gnu-gnatlink-10 game.ali

This generates a binary called game.

Run the program

Each run of the program will be a little different. This is one example:

$ ./game 
Guess a number between 1 and 100
Too low
Guess a number between 1 and 100
Too low
Guess a number between 1 and 100
Too low
Guess a number between 1 and 100
Too high
Guess a number between 1 and 100
Too low
Guess a number between 1 and 100
That's rightLearn Ada

This "guess the number" game is a great introductory program for learning a new programming language because it exercises several common programming concepts in a pretty straightforward way. By implementing this simple game in different programming languages, you can demonstrate some core concepts of the languages and compare their details.

Do you have a favorite programming language? How would you write the "guess the number" game in it? Follow this article series to see examples of other programming languages that might interest you!

This "guess the number" game is a great introductory program for learning a new programming language because it exercises several common programming concepts in a pretty straightforward way.

Image by:

Programming What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Use this open source API gateway to scale your API

Mon, 01/09/2023 - 16:00
Use this open source API gateway to scale your API iambobur Mon, 01/09/2023 - 03:00

An API gateway is a single point of entry for incoming calls to an application programming interface (API). The gateway aggregates the services being requested and then returns the appropriate response. To make your API gateway effective, it's vital for you to design a reliable, efficient, and simple API. This is an architectural puzzle, but it's one you can solve as long as you understand the most important components.

API-Led approach

An API-Led approach puts an API at the heart of communication between applications and the business capabilities they need to access in order to consistently deliver seamless functionality across all digital channels. API-Led connectivity refers to the technique of using a reusable and well-designed API to link data and applications.

API-Led architecture

API-Led architecture is an architectural approach that looks at the best ways of reusing an API. API-Led architecture addresses things like:

  • Protecting an API from unauthorized access.
  • Ensuring that consuming applications can always find the right API endpoint.
  • Throttling or limiting the number of calls made to an API to ensure continuous availability.
  • Supporting continuous integration, testing, lifecycle management, monitoring, operations, and so on.
  • Preventing error propagation across the stack.
  • Real-time monitoring of an API with rich analytics and insight.
  • Implementing scalable and flexible business capabilities (for example, supporting a microservice architecture.)
API resource routing

Implementing an API gateway as the single entry point to all services means that API consumers only have to be aware of one URL. It becomes the API gateway's responsibility to route traffic to the corresponding service endpoints, and to enforce policies.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

This reduces complexity on the API consumer side because the client applications don't need to consume functionality from multiple HTTP endpoints. There's also no need to implement a separate layer for authentication, authorization, throttling, and rate limiting for each service. Most API gateways, like the open source Apache APISIX project, already have these core features built in.

API content-based routing

A content-based routing mechanism also uses an API gateway to route calls based on the content of a request. For example, a request might be routed based on the HTTP header or message body instead of just its target URI.

Consider a scenario when database sharding is applied in order to distribute the load across multiple database instances. This technique is typically applied when the overall number of records stored is huge and a single instance struggles to manage the load.

A better solution is to spread records across multiple database instances. Then you implement multiple services, one for each unique datastore, and adopt an API gateway as the only entry point to all services. You can then configure your API gateway to route calls to the corresponding service based on a key obtained either from the HTTP header or the payload.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

In the above diagram, an API gateway is exposing a single /customers resource for multiple customer services, each with a different data store.

API geo-routing

An API geo-routing solution routes an API call to the nearest API gateway based on its origin. In order to prevent latency issues due to distance (for example, a consuming application from Asia calling an API located in North America), you can deploy an API gateway in multiple regions across the world. You can use a different subdomain for each API gateway in each region, letting the consuming application determine the nearest gateway based on application logic. Then, an API gateway provides internal load balancing to make sure that incoming requests are distributed across available instances.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

It's common to use a DNS traffic management service and an API gateway to resolve each subdomain against the region's load balancer to target the nearest gateway.

API aggregator

This technique performs operations (for example, queries) against multiple services, and returns the result to the client service with a single HTTP response. Instead of having a client application make several calls to multiple APIs, an API aggregator uses an API gateway to do this on behalf of the consumer on the server side.

Suppose you have a mobile app that makes multiple calls to different APIs. This increases complexity in the client-side code, it causes over-utilization of network resources, and produces a poor user experience due to increased latency. An API gateway can accept all information required as input, and can request authentication and validation, and understand the data structures from each API it interacts with. It's also capable of transforming the response payloads so they can be sent back to the mobile app as a uniform payload needed for the consumer.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

API centralized authentication

In this design, an API gateway acts as a centralized authentication gateway. As an authenticator, an API gateway looks for access credentials in the HTTP header (such as a bearer token.) It then implements business logic that validates those credentials with an identity provider.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

Centralized authentication with an API gateway can solve many problems. It completely offloads user management from an application, improving performance by responding quickly to authentication requests received from client applications. Apache APISIX offers a variety of plugins to enable different methods of API gateway authentication.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

API format conversion

API format conversion is the ability to convert payloads from one format to another over the same transport. For example, you can transfer from XML/SOAP over HTTPS to JSON over HTTPS, and back again. An API gateway offers capabilities in support of a REST API and can do payload conversions and transport conversions. For instance, a gateway can convert from a message queue telemetry transport (MQTT) over TCP (a very popular transport in IoT) to JSON over HTTPS.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

Apache APISIX is able to receive an HTTP request, transcode it, and then forward it to a gRPC service. It gets the response and returns it back to the client in HTTP format by means of its gRPC Transcode plug-in.

More on Microservices Microservices cheat sheet How to explain microservices to your CEO Free eBook: Microservices vs. service-oriented architecture Free online course: Developing cloud-native applications with microservices arc… Latest microservices articles API observability

By now, you know that an API gateway offers a central control point for incoming traffic to a variety of destinations. But it can also be a central point for observation, because it's uniquely qualified to monitor all traffic moving between the client and service networks. You can adjust an API gateway so that the data (structured logs, metrics, and traces) can be collected for use with specialized monitoring tools.

Apache APISIX provides pre-built connectors so you can integrate with external monitoring tools. You can leverage these connectors to collect log data from your API gateway to further derive useful metrics and gain complete visibility into how your services are being used. You can also manage the performance and security of your API in your environment.

API caching

API caching is usually implemented inside the API gateway. It can reduce the number of calls made to your endpoint, and also improve the latency of requests to your API by caching a response from upstream. If the API gateway cache has a fresh copy of the requested resource, it uses that copy to satisfy the request directly instead of making a request to the endpoint. If the cached data is not found, the request travels to the intended upstream services.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

API fault handling

API services may fail due to any number of reasons. In such scenarios, your API service must be resilient enough to deal with predictable failures. You also want to ensure that any resilience mechanisms you have in place work properly. This includes error handling code, circuit breakers, health checks, fallbacks, redundancy, and so on. Modern API gateways support all the most common error-handling features, including automatic retries and timeouts.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

An API gateway acts as an orchestrator that can use a status report to decide how to manage traffic, send load balances to a healthy node, and can fail fast. It can also alert you when something goes wrong. An API gateway also ensures that routing and other network-level components work together successfully to deliver a request to the API process. It helps you detect a problem in the early stage, and to fix issues. A fault injection mechanism (like the one Apache APISIX uses) at the API gateway level can be used to test the resiliency of an application or microservices API against various forms of failures.

API versioning

This refers to having the ability to define and run multiple concurrent versions of an API. This is particularly important, because an API evolves over time. Having the ability to manage concurrent versions of an API enables API consumers to incrementally switch to newer versions of an API. This means older versions can be deprecated and ultimately retired. This is important because an API, just like any other software application, should be able to evolve either in support of new features or in response to bug fixes.

Image by:

(Bobur Umurzokov, CC BY-SA 4.0)

You can use an API gateway to implement API versioning. The versioning can be a header, query parameter, or path.

Gateway to APISIX

If you want to scale your API services, you need an API gateway. The Apache APISIX project provides essential features for a robust entrypoint, and its benefits are clear. It aligns with an API-Led architecture, and is likely to transform the way your clients interact with your hosted services.

This article has been adapted and republished from the Apache APISIX blog with the author's permission.

Adopt an API-Led architecture with Apache APISIX.

Microservices Networking What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Unlock academic research with this open source open access tool for librarians

Sat, 01/07/2023 - 16:00
Unlock academic research with this open source open access tool for librarians jmpearce Sat, 01/07/2023 - 03:00

I have always liked librarians. Since ancient Greece, long before the first open source code was shared, librarians have been championing the democratization of knowledge. Modern librarians are no different, whether they are recommending good books to get kids hooked on reading or acting as the leaders of Open Access Week on campuses all over the world, librarians are heroes of open knowledge.

Universities still brag about the size of their libraries. But as digitization has taken hold, a new metric of knowledge dominance has taken hold: How big is your institutional repository?

Universities are attempting to ensure that all of their research is publicly accessible. This is happening internally thanks to librarians, and externally because of the long and growing list of funding mandates (for instance, all NIH-funded research must be made freely available). Those that fund science (i.e. taxpayers) have finally clued into the absurdity that even though they've paid for the research, it's locked behind paywalls and they can't access it. One way to meet the funding mandates is to establish campus open access (OA) repositories.

Open access

The vast majority of academic publishers allow academics to legally post their preprints (the papers have the same information but are not typeset by the publisher). For example, Western University has Scholarship@Western, where you can download tens of thousands of papers for free. There are, however, many more papers written by my colleagues than that number. Librarians at all universities are struggling with how to upload millions of manuscripts under numerous license agreements while also linking metadata to make them discoverable. Doing this manually requires around 15 minutes per manuscript by an experienced librarian. The time and cost to do this campus-wide are prohibitive even at wealthy schools — let alone every campus in North America.

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources

To reduce the time and costs of this process and to harvest all past work, install this free and open source software: aperta-accessum. It sounds a bit like a magical enchantment, and like magic, aperta-accessum does seven things:

  • Harvests names and emails from a department's faculty webpage
  • Identifies scholars' Open Researcher and Contributor IDentifiers (ORCID iDs)
  • Obtains digital object identifiers (DOIs) of publications for each scholar
  • Checks for existing copies in an institution's OA repository
  • Identifies the legal opportunities to provide OA versions of all of the articles not already in the OA repository
  • Sends authors emails requesting a simple upload of author manuscripts
  • Adds link-harvested metadata from DOIs with uploaded preprints into a repository.
Get aperta-accessum

Western University chose to use the bepress repository but there are many other repositories including open source ones that are even easier to augment. This is where the open source community could really help. If all universities that already have repos use aperta-accessum on their own campuses, most academic papers will be free for anyone that wants to access them. That could be a powerful force for accelerating innovation.

Image by:

(Emily BP, CC-BY-SA 4.0)

The aperta-accessum source code housed on the Open Science Framework is released under the GNU General Public License (GPL) 3.0. It can be freely modified. You can learn more about it in the open-access study in the Journal of Librarianship and Scholarly Communication. In the article, we show that in the administrative time needed to make a single document OA manually, aperta-accessum can process approximately five entire departments' worth of peer-reviewed articles!

Set information free

Aperta-accessum is an open source OA harvester that enables institutional library's stewardship of OA knowledge on a mass scale. It also radically reduces costs and could also improve science as scientists would have access to the information that would push their work forward. So give your favorite university librarian the gift of aperta-accessum. If your school or alma mater has a different type of repo, please consider sharing a little of your time to customize aperta-accessum for their repo, too.

Aperta-accessum is an open source open access (OA) harvester that enables institutional library's stewardship of OA knowledge on a mass scale.

Image by:

ktchang16 via Flickr. CC BY 2.0

Education What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Use time-series data to power your edge projects with open source tools

Fri, 01/06/2023 - 16:00
Use time-series data to power your edge projects with open source tools zoe_steinkamp Fri, 01/06/2023 - 03:00

Gathering data as it changes over the passage of time is known as time-series data. Today, it has become a part of every industry and ecosystem. It is a large part of the growing IoT sector and will become a larger part of everyday people's lives. But time-series data and its requirements are hard to work with. This is because there are no tools that are purpose-built to work with time-series data. In this article, I go into detail about those problems and how InfluxData has been working to solve them for the past 10 years.


InfluxData is an open source time-series database platform. You may know about the company through InfluxDB, but you may not have known that it specialized in time-series databases. This is significant, because when managing time-series data, you deal with two issues — storage lifecycle and queries.

When it comes to storage lifecycle, it's common for developers to initially collect and analyze highly detailed data. But developers want to store smaller, downsampled datasets that describe trends without taking up as much storage space.

When querying a database, you don't want to query your data based on IDs. You want to query based on time ranges. One of the most common things to do with time-series data is to summarize it over a large period of time. This kind of query is slow when storing data in a typical relational database that uses rows and columns to describe the relationships of different data points. A database designed to process time-series data can handle queries exponentially faster. InfluxDB has its own built-in querying language: Flux. This is specifically built to query on time-series data sets.

Image by:

(Zoe Steinkamp, CC BY-SA 4.0)

Data acquisition

Data acquisition and data manipulation come out of the box with some awesome tools. InfluxData has over 12 client libraries that allow you to write and query data in the coding language of your choice. This is a great tool for custom use cases. The open source ingest agent, Telegraf, includes over 300 input and output plugins. If you're a developer, you can contribute your own plugin, as well.

InfluxDB can also accept a CSV upload for small historical data sets, as well as batch imports for large data sets.

import math
bicycles3 = from(bucket: "smartcity")
    |> range(start:2021-03-01T00:00:00z, stop: 2021-04-01T00:00:00z)
    |> filter(fn: (r) => r._measurement == "city_IoT")
    |> filter(fn: (r) => r._field == "counter")
    |> filter(fn: (r) => r.source == "bicycle")
    |> filter(fn: (r) => r.neighborhood_id == "3")
    |> aggregateWindow(every: 1h, fn: mean, createEmpty:false)

bicycles4 = from(bucket: "smartcity")
    |> range(start:2021-03-01T00:00:00z, stop: 2021-04-01T00:00:00z)
    |> filter(fn: (r) => r._measurement == "city_IoT")
    |> filter(fn: (r) => r._field == "counter")
    |> filter(fn: (r) => r.source == "bicycle")
    |> filter(fn: (r) => r.neighborhood_id == "4")
    |> aggregateWindow(every: 1h, fn: mean, createEmpty:false)

join(tables: {neighborhood_3: bicycles3, neighborhood_4: bicycles4}, on ["_time"], method: "inner")
    |> keep(columns: ["_time", "_value_neighborhood_3","_value_neighborhood_4"])
    |> map(fn: (r) => ({
        r with
        difference_value : math.abs(x: (r._value_neighborhood_3 - r._value_neighborhood_4))

Flux is our internal querying language built from the ground up to handle time-series data. It's also the underlying powerhouse for a few of our tools, including tasks, alerts, and notifications. To dissect the flux query from above, you need to define a few things. For starters, a "bucket" is what we call a database. You configure your buckets and then add your data stream into them. The query calls the smartcity bucket, with the range of a specific day (a 24-hour period to be exact.) You can get all the data from the bucket, but most users include a data range. That's the most basic flux query you can do.

Next, I add filters, which filter the data down to something more exact and manageable. For example, I filter for the count of bicycles in the neighborhood assigned to the id of 3. From there, I use aggregateWindow to get the mean for every hour. That means I expect to receive a table with 24 columns, one for every hour in the range. I do this exact same query for neighborhood 4 as well. Finally, I join the two tables and get the differences between bike usage in these two neighborhoods.

This is great if you want to know what hours are high-traffic hours. Obviously, this is just a small example of the power of flux queries. But it gives a great example of some of the tools flux comes with. I also have a large amount of data analysis and statistics functions. But for that, I suggest checking out the Flux documentation.

import "influxdata/influxdb/tasks"

option task = {name: PB_downsample, every: 1h, offset: 10s}
from(bucket: "plantbuddy")
    |>range(start: tasks.lastSuccess(orTime: -task.every))
    |>filter(fn: (r) => r["_measurement"] == "sensor_data")
    |>aggregateWindow(every: 10m, fn:last, createEmpty:false)
    |>yield(name: "last")
    |>to(bucket: "downsampled")Tasks

An InfluxDB task is a scheduled Flux script that takes a stream of input data and modifies or analyzes it in some way. It then stores the modified data in a new bucket or performs other actions. Storing a smaller data set into a new bucket is called "downsampling," and it's a core feature of the database, and a core part of the time-series data lifecycle.

You can see in the current task example that I've downsampled the data. I'm getting the last value for every 10-minute increment and storing that value in the downsampled bucket. The original data set might have had thousands of data points in those 10 minutes, but now the downsampled bucket only has 60 new values. One thing to note is that I'm also using the last success function in range. This tells InfluxDB to run this task from the last time it ran successfully, just in case it has failed for the past 2 hours, in which case it can go back three hours in time to the last successful run. This is great for built-in error handling.

Image by:

(Zoe Steinkamp, CC BY-SA 4.0)


Checks and alerts

InfluxDB includes an alerting or checks and notification system. This system is very straightforward. You start with a check that looks at the data periodically for anomalies that you've defined. Normally, this is defined with thresholds. For example, any temperature value under 32° F gets assigned a value of WARN, and anything above 32° F gets assigned a value of OK, and anything below 0° F gets a value of CRITICAL. From there, your check can run as often as you deem necessary. There is a recorded history of your checks and the current status of each. You are not required to set up a notification when it's not needed. You can just reference your alert history as needed.

Many people choose to set up their notifications. For that, you need to define a notification endpoint. For example, a chat application could make an HTTP call to receive your notifications. Then you define when you would like to receive notifications, for example you can have checks run every hour. You can run notifications every 24 hours. You can have your notification respond to a change in the value, for example WARN to CRITICAL, or when a value is CRITICAL, regardless of it changing from OK to WARN. This is a highly customizable system. The Flux code that's created from this system can also be edited.

Image by:

(Zoe Steinkamp, CC BY-SA 4.0)

More on edge computing Understanding edge computing Why Linux is critical to edge computing eBook: Running Kubernetes on your Raspberry Pi Download now: The automated enterprise eBook eBook: A practical guide to home automation using open source tools eBook: 7 examples of automation on the edge What is edge machine learning? The latest on edge Edge

To wrap up, I'd like to bring all the core features together, including a very special new feature that's recently been released. Edge to cloud is a very powerful tool that allows you to run the open source InfluxDB and locally store your data in case of connectivity issues. When connectivity is repaired, it streams the data to the InfluxData cloud platform.

This is significant for edge devices and important data where any loss of data is detrimental. You define that you want a bucket to be replicated to the cloud, and then that bucket has a disk-backed queue to store the data locally. Then you define what your cloud bucket should replicate into. The data is stored locally until connected to the cloud.

InfluxDB and the IoT Edge

Suppose you have a project where you want to monitor the health of household plants using IoT sensors attached to the plant. The project is set up using your laptop as the edge device. When your laptop is closed or otherwise off, it stores the data locally, and then streams it to my cloud bucket when reconnected.

Image by:

(Zoe Steinkamp, CC BY-SA 4.0)

One thing to notice is that this downsamples data on the local device before storing it in the replication bucket. Your plant's sensors provide a data point for every second. But it condenses the data to be an average of one minute so you have less data to store. In the cloud account, you might add some alerts and notifications that let you know when the plant's moisture is below a certain level and needs to be watered. There could also be visuals you could use on a website to tell users about their plants' health.

Databases are the backbone of many applications. Working with time-stamped data in a time series database platform like InfluxDB saves developers time, and gives them access to a wide range of tools and services. The maintainers of InfluxDB love seeing what people are building within our open source community, so connect with us and share your projects and code with others!

InfluxData is an open source time-series database platform. Here's how it can be used for edge applications.

Image by:

Edge computing Internet of Things (IoT) Data Science What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

An introduction to DocArray, an open source AI library

Fri, 01/06/2023 - 16:00
An introduction to DocArray, an open source AI library Jina_AI Fri, 01/06/2023 - 03:00

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, and so on. It allows deep-learning engineers to efficiently process, embed, search, store, recommend, and transfer multi-modal data with a Pythonic API. Starting in November of 2022, DocArray is open source and hosted by the Linux Foundation AI & Data initiative so that there’s a neutral home for building and supporting an open AI and data community. This is the start of a new day for DocArray.

In the ten months since DocArray’s first release, its developers at Jina AI have seen more and more adoption and contributions from the open source community. Today, DocArray powers hundreds of multimodal AI applications.

Hosting an open source project with the Linux Foundation

Hosting a project with the Linux Foundation follows open governance, meaning there’s no one company or individual in control of a project. When maintainers of an open source project decide to host it at the Linux Foundation, they specifically transfer the project’s trademark ownership to the Linux Foundation.

In this article, I’ll review the history and future of DocArray. In particular, I’ll demonstrate some cool features that are already in development.

A brief history of DocArray

Jina AI introduced the concept of "DocArray" in Jina 0.8 in late 2020. It was the jina.types module, intended to complete neural search design patterns by clarifying low-level data representation in Jina. Rather than working with Protobuf directly, the new Document class offered a simpler and safer high-level API to represent multimodal data.

Image by:

(Jina AI, CC BY-SA 4.0)

Over time, we extended jina.types and moved beyond a simple Pythonic interface of Protobuf. We added DocumentArray to ease batch operations on multiple DocumentArrays. Then we brought in IO and pre-processing functions for different data modalities, like text, image, video, audio, and 3D meshes. The Executor class started to use DocumentArray for input and output. In Jina 2.0 (released in mid-2021) the design became stronger still. Document, Executor, and Flow became Jina's three fundamental concepts:

• Document is the data IO in Jina
• Executor defines the logic of processing Documents
• Flow ties Executors together to accomplish a task.

The community loved the new design, as it greatly improved the developer experience by hiding unnecessary complexity. This lets developers focus on the things that really matter.

Image by:

(Jina AI, CC BY-SA 4.0)

More on AI What is AI/ML? Cheat sheet: AI glossary eBook: A practical guide to home automation using open source tools Why deploy AI/ML workloads on OpenShift Develop, train, and test your ML workloads with Red Hat OpenShift Data Science Open Source Stories: Road to AI What is edge machine learning? Latest AI and machine learning artilces

As jina.types grew, it became conceptually independent from Jina. While jina.types was more about building locally, the rest of Jina focused on service-ization. Trying to achieve two very different targets in one codebase created maintenance hurdles. On the one hand, jina.types had to evolve fast and keep adding features to meet the needs of the rapidly evolving AI community. On the other hand, Jina itself had to remain stable and robust as it served as infrastructure. The result? A slowdown in development.

We tackled this by decoupling jina.types from Jina in late 2021. This refactoring served as the foundation of the later DocArray. It was then that DocArray's mission crystallized for the team: to provide a data structure for AI engineers to easily represent, store, transmit, and embed multimodal data. DocArray focuses on local developer experience, optimized for fast prototyping. Jina scales things up and uplifts prototypes into services in production. With that in mind, Jina AI released DocArray 0.1 in parallel with Jina 3.0 in early 2022, independently as a new open source project.

We chose the name DocArray because we want to make something as fundamental and widely-used as NumPy's ndarray. Today, DocArray is the entrypoint of many multimodal AI applications, like the popular DALLE-Flow and DiscoArt. DocArray developers introduced new and powerful features, such as dataclass and document store to improve usability even more. DocArray has allied with open source partners like Weaviate, Qdrant, Redis, FastAPI, pydantic, and Jupyter for integration and most importantly for seeking a common standard.

In the DocArray 0.19 (released on Nov. 15), you can easily represent and process 3D mesh data.

Image by:

(Jina AI, CC BY-SA 4.0)

The future of DocArray

Donating DocArray to the Linux Foundation marks an important milestone where we share our commitment with the open source community openly, inclusively, and constructively.

The next release of DocArray focuses on four tasks:

Representing: support Python idioms for representing complicated, nested multimodal data with ease.
Embedding: provide smooth interfaces for mainstream deep learning models to embed data efficiently.
Storing: support multiple vector databases for efficient persistence and approximate nearest neighbor retrieval.
Transiting: allow fast (de)serialization and become a standard wire protocol on gRPC, HTTP, and WebSockets.

In the following sections, DocArray maintainers Sami Jaghouar and Johannes Messner give you a taste of the next release.


In DocArray, dataclass is a high-level API for representing a multimodal document. It follows the design and idiom of the standard Python dataclass, letting users represent complicated multimodal documents intuitively and process them easily with DocArray's API. The new release makes dataclass a first-class citizen and refactors its old implementation by using pydantic V2.

How to use dataclass

Here's how to use the new dataclass. First, you should know that a Document is a pydantic model with a random ID and the Protobuf interface:

From docarray import Document

 To create your own multimodal data type you just need to subclass from Document:

from docarray import Document
from docarray.typing import Tensor
import numpy as np
class Banner(Document):
   alt_text: str
   image: Tensor
banner = Banner(text='DocArray is amazing', image=np.zeros((3, 224, 224)))

Once you've defined a Banner, you can use it as a building block to represent more complicated data:

class BlogPost(Document):
   title: str
   excerpt: str
   banner: Banner
   tags: List[str]
   content: str

Adding an embedding field to BlogPost is easy. You can use the predefined Document models Text and Image, which come with the embedding field baked in:

from typing import Optional
from docarray.typing import Embedding
class Image(Document):
   src: str
   embedding: Optional[Embedding]
class Text(Document):
   content: str
   embedding: Optional[Embedding]

Then you can represent your BlogPost:

class Banner(Document):
alt_text: str
   image: Image
class BlogPost(Document):
   title: Text
   excerpt: Text
   banner: Banner
   tags: List[str]
   content: Text

This gives your multimodal BlogPost four embedding representations: title, excerpt, content, and banner.

Milvus support

Milvus is an open-source vector database and an open-source project hosted under Linux Foundation AI & Data. It's highly flexible, reliable, and blazing fast, and supports adding, deleting, updating, and near real-time search of vectors on a trillion-byte scale. As the first step towards a more inclusive DocArray, developer Johannes Messner has been implementing Milvus integration.

As with other document stores, you can easily instantiate a DocumentArray with Milvus storage:

from docarray import DocumentArray
da = DocumentArray(storage='milvus', config={'n_dim': 10})

Here, config is the configuration for the new Milvus collection, and n_dim is a mandatory field that specifies the dimensionality of stored embeddings. The code below shows a minimum working example with a running Milvus server on localhost:

import numpy as np
from docarray import DocumentArray
N, D = 5, 128
da = DocumentArray.empty(
   N, storage='milvus', config={'n_dim': D, 'distance': 'IP'}
)  # init
with da:
   da.embeddings = np.random.random([N, D])
print(da.find(np.random.random(D), limit=10))

To access persisted data from another server, you need to specify collection_name, host, and port. This allows users to enjoy all the benefits that Milvus offers, through the familiar and unified API of DocArray.

Embracing open governance

The term "open governance" refers to the way a project is governed — that is, how decisions are made, how the project is structured, and who is responsible for what. In the context of open source software, "open governance" means the project is governed openly and transparently, and anyone is welcome to participate in that governance.

Open governance for DocArray has many benefits:

• DocArray is now democratically run, ensuring that everyone has a say.
• DocArray is now more accessible and inclusive, because anyone can participate in governance.
• DocArray will be of higher quality, because decisions are being made in a transparent and open way.

The development team is taking actions to embrace open governance, including:

• Creating a DocArray technical steering committee (TSC) to help guide the project.
• Opening up the development process to more input and feedback from the community.
• Making DocArray development more inclusive and welcoming to new contributors.

Join the project

If you're interested in open source AI, Python, or big data, then you're invited to follow along with the DocArray project as it develops. If you think you have something to contribute to it, then join the project. It's a growing community, and one that's open to everyone.

This article was originally published on the Jina AI blog and has been republished with permission.

DocArray is hosted by the Linux Foundation to provide an inclusive and standard multimodal data model within the open source community and beyond.

Image by:

AI and machine learning Big data Python What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

What's your favorite Mastodon app?

Thu, 01/05/2023 - 16:00
What's your favorite Mastodon app? Don Watkins Thu, 01/05/2023 - 03:00

Mastodon is an open source social networking platform for microblogging. While it has a web-based interface, many users prefer to use a client to access Mastodon. Clients are programs or applications that allow users to access and interact with the Mastodon platform from a variety of devices, including computers, tablets, and phones.

I moved to Mastodon from Twitter back in 2019, and I've been experimenting with a number of different clients ever since. This is a list of my favorite Mastodon clients so far.

Web interface

Like most users, I got started using the web app for Mastodon by pointing my browser at I found an instance to join, created an account, and logged in. I used the web app to read, like, favorite, and reblog posts from my favorite Mastodon users. I also replied to posts without ever having to install anything locally. It was a familiar experience based on other social media websites.

The disadvantage with the web app is that it inherently only logs in to a single instance at a time. If you manage more than one Mastodon account, either based on your own interests or because you're the social media representative for an organization, that can be problematic. While the browser is convenient, it doesn't offer the personalized experience for users that a Mastodon client can.

Mastodon is open source, though, so you have options. in addition to the web apps there are a number of Mastodon clients.

Clients also provide a more organized view of the Mastodon timeline, allowing users to easily find and follow conversations. They come in many shapes and sizes, and can be used on computers, phones, and tablets.

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources Clients

Each client app has its own unique features, UI design, and functionality. But they all ultimately provide access to the Mastodon platform:

  • Halcyon is particularly popular with users who want to access the platform from their computer. It has a modern and easy to use UI, and supports multiple accounts, custom themes, and video embedding. Halcyon is open source and is licensed with GPL v3.

  • Most of my day is spent with my mobile phone and having a good app to access Mastodon is a must. I started using Mastodon on my iPhone with the official Mastodon app for iOS that is openly licensed with GPL v3. I used it for several months.

  • Recently, I've switched to my personal favorite, Metatext, which comes with a GPL v3 license. Metatext allows me to easily share content from the browser to Fosstodon. It has become my favorite mobile app.

  • Tusky is a free and open source Android app with a simple and clean interface. It supports multiple accounts and has features like dark mode, muting, and blocking.

How is Mastodon changing your reading habits? What are your favorite apps? Be sure to let us know in the comments.

Mastodon is an open source social networking platform for microblogging. Here are my favorite Mastodon clients.

Image by:

Alternatives What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

6 tips for building an effective DevOps culture

Thu, 01/05/2023 - 16:00
6 tips for building an effective DevOps culture Yauhen Zaremba Thu, 01/05/2023 - 03:00

Why would you want to build a DevOps culture? There are many benefits to the streamlined collaboration of the development and operations teams. A major goal is efficiency: Increasing the speed of new software deployments and reducing idle time for workers. Fostering trust between colleagues can improve employee satisfaction, produce new innovations, and positively impact profitability.

DevOps is a broad philosophy with a range of interpretations. In other words, you can visit 40 companies and find 40,000 different ideas about using DevOps effectively in the workplace. This diversity of opinion is actually a good thing–so many perspectives are useful for building stronger teams. This guide will look at the top tips for encouraging better collaboration between colleagues within a DevOps culture.

Each section offers a different aspect of DevOps culture and looks at ways to introduce it into your workforce.

Image by:

(Seth Kenlon, CC BY-SA 4.0)

Continuous development of processes

This core tenet of DevOps culture sets it apart from many other types of workplace ethos. The DevOps philosophy says that it is essential to make mistakes because it shows you are trying out new ideas.

The heart of DevOps culture is a commitment to evolving creativity. Practically, that means not yelling at your direct reports when test results show that things were better before they changed it. It means recognizing that progress is not linear and success is never a straight line.

DevOps expert Gene Kim advocates for risk-taking and experimentation. This implies letting your team work on unusual tasks to find new insights.

Should your organization be profit-driven? Can you allow your teams to try something new? I'm talking about something other than unrelated passion projects. Continuous process development means being open to upgrading present methods. Great sales leaders appreciate that results matter more than presenteeism, so it is always crucial to focus on how teams are working rather than how much.

Readily give feedback and actively seek it

Increased trust between individuals is another key feature of a thriving DevOps culture. Whether your staff is learning how to build affiliate network contacts or trying to design their next UX survey, everyone should be open to feedback on their work. But this will never happen until your teammates respect each other's opinions and trust that feedback is given in a spirit of good intention.

This culture may sound impossible to cultivate; indeed, some companies will struggle to achieve this more than others. Granted, a large part of the success of giving and receiving feedback depends on the personalities of your employees. It is possible to screen for this during the recruitment process.

Before you expect staff to readily offer feedback to colleagues and seek it in the first place, you should lead by example. Members of the C-suite should be modeling this behavior, openly asking members of the company to pose probing questions about their strategic decisions, and providing balanced feedback.

Image by:

(Seth Kenlon, CC BY-SA 4.0)

Always look for improvements

Building on increased intellectual trust between colleagues, your team should look for ways to improve its work. The nature of DevOps means the software development team will be producing deployments more rapidly than with traditional approaches.

However, this culture of openness to improvement can positively impact departments beyond development and operations. Ask yourself what other areas of your business could do with a burst of optimism.

Be on the lookout for training and upskilling opportunities. Even if a training course is less salient than advertised, the chance to network with industry professionals and build contacts for the future can only enhance the diversity of ideas within your organization.

Save ideas for later development

Part of your DevOps toolchain should be a heavily used account on Git. You can use Git as a common repository for scripts produced during software development and other related projects. Known as "version control," Git allows programmers to save iterations of their work and reuse or improve the work of others.

You're aiming for the ability to keep hold of good ideas for future use. A certain pathway did not work out for specific reasons. However, just because that set of ideas was wrong for the time it was conceived does not mean it can never become helpful in the future.

As the entire focus of DevOps rests on end-to-end ownership of software in production, saving iterations of developments truly supports this principle. You want to see an improved focus on and commitment to the software testing project at hand.

A simple way to incorporate this is to request that developers include ideas for future work in the developer contract and final project report. Make sure tech services managers know they should ask for examples of side-branching ideas that cropped up during the build. The more minds aware of these little innovations, the more likely someone will remember one when needed.

More DevOps resources What is DevOps? The ultimate DevOps hiring guide DevOps monitoring tools guide A guide to implementing DevSecOps Download the DevOps glossary eBook: Ansible for DevOps Latest DevOps articles Sit close together (physically or virtually)

The goal is to share a common understanding of one another's job roles and how they interrelate. You can achieve this in a few simple ways, summarized by three words: Sit close together. Invite other teams to your meetings and share user feedback reports in their entirety. Have lunch together, plan virtual happy hours together, and generally make sure your colleagues are in close proximity. About 90% of teams with a mature DevOps protocol report a clear understanding of their responsibilities to other teams compared to only about 46% of workers in immature DevOps teams.

Although it can be tempting to form cliques with like-minded folk and only hang out with staff hired to carry out the same tasks as you, this is terrible for the business as a whole. Whether you like it or not, all humans are multi-faceted and capable of contributing their unique talents to a whole host of scenarios.

The idea of closer collaboration is to honor the ability of anyone to suggest improvements to the products or work processes going on around them. If you only ever sit at a distance from the other departments within the company, you will miss countless opportunities to share intelligent ideas. After all, you often learn best in the free flow of ideas during a conversation.

Commit to automation

You should be looking to automate mundane and repetitive tasks in the name of efficiency and process acceleration. Every industry has boring–and quite frankly, silly–exercises carried out daily or weekly.

Whether this is manually copying data from one page to another or typing out audio transcripts by hand, staff at every level should insist that machines take on such burdens where possible. The reality is automation technology advances every single year, and operational processes should, too. Automation testing is so crucial to DevOps that it is the second principle of the CALMS framework (the "C" of which stands for "culture").

How can you make this happen? Invite staff to openly express which aspects of their job they feel could be automated and then–here is the crucial part–support the facilities needed to automate them. That might mean a $600 annual subscription to a software program, a complete enterprise application modernization, or two days of developers' time to build a new tool to use in-house.

Either way, you should assess the benefits of automation and consider how much time you could save for everyone. DevOps statistics continually indicate just how much better off modern companies are by integrating these beneficial principles year after year.

Explore new ways of working successfully

A culture shift doesn't happen overnight. The sooner you start, though, the sooner you see results. In my experience, people embrace change when it's a genuine improvement on what has gone before. DevOps provides a framework for such improvements. Whether you're just getting started with DevOps in your organization or simply want to improve your existing culture, consider the above points and how they relate to your organization's future.

Whether you're just getting started with DevOps in your organization or simply want to improve your existing culture, consider these tips and how they relate to your organization's future.

Image by:

DevOps What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Customize Apache ShardingSphere high availability with MySQL

Wed, 01/04/2023 - 16:00
Customize Apache ShardingSphere high availability with MySQL zhaojinchao Wed, 01/04/2023 - 03:00

Users have many options to customize and extend ShardingSphere's high availability (HA) solutions. Our team has completed two HA plans: A MySQL high availability solution based on MGR and an openGauss database high availability solution contributed by some community committers. The principles of the two solutions are the same.

Below is how and why ShardingSphere can achieve database high availability using MySQL as an example:

Image by:

(Zhao Jinchao, CC BY-SA 4.0)


ShardingSphere checks if the underlying MySQL cluster environment is ready by executing the following SQL statement. ShardingSphere cannot be started if any of the tests fail.

Check if MGR is installed:

SELECT * FROM information_schema.PLUGINS WHERE PLUGIN_NAME='group_replication'

View the MGR group member number. The underlying MGR cluster should consist of at least three nodes:

SELECT COUNT(*) FROM performance_schema.replication_group_members

Check whether the MGR cluster's group name is consistent with that in the configuration. The group name is the marker of an MGR group, and each group of an MGR cluster only has one group name:

SELECT * FROM performance_schema.global_variables WHERE VARIABLE_NAME='group_replication_group_name' 

Check if the current MGR is set as the single primary mode. Currently, ShardingSphere does not support dual-write or multi-write scenarios. It only supports single-write mode:

SELECT * FROM performance_schema.global_variables WHERE VARIABLE_NAME='group_replication_single_primary_mode'

Query all the node hosts, ports, and states in the MGR group cluster to check if the configured data source is correct:

SELECT MEMBER_HOST, MEMBER_PORT, MEMBER_STATE FROM performance_schema.replication_group_membersDynamic primary database discovery

ShardingSphere finds the primary database URL according to the query master database SQL command provided by MySQL:

private String findPrimaryDataSourceURL(final Map<String, DataSource> dataSourceMap) {
    String RESULT = "";
    String SQL = "SELECT MEMBER_HOST, MEMBER_PORT FROM performance_schema.replication_group_members WHERE MEMBER_ID = "
            + "(SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = 'group_replication_primary_member')";
    FOR (DataSource each : dataSourceMap.values()) {
        try (Connection connection = each.getConnection();
             Statement statement = connection.createStatement();
             ResultSet resultSet = statement.executeQuery(SQL)) {
            IF ( {
                RETURN String.format("%s:%s", resultSet.getString("MEMBER_HOST"), resultSet.getString("MEMBER_PORT"));
        } catch (final SQLException ex) {
            log.error("An exception occurred while find primary data source url", ex);

Compare the primary database URLs found above one by one with the dataSources URLs configured. The matched data source is the primary database. It will be updated to the current ShardingSphere memory and be perpetuated to the registry center, through which it will be distributed to other compute nodes in the cluster.

Image by:

(Zhao Jinchao, CC BY-SA 4.0)

Dynamic secondary database discovery

There are two types of secondary database states in ShardingSphere: enable and disable. The secondary database state will be synchronized to the ShardingSphere memory to ensure that read traffic can be routed correctly.

Get all the nodes in the MGR group:

SELECT MEMBER_HOST, MEMBER_PORT, MEMBER_STATE FROM performance_schema.replication_group_members

Disable secondary databases:

private void determineDisabledDataSource(final String schemaName, final Map<String, DataSource> activeDataSourceMap,
                                         final List<String> memberDataSourceURLs, final Map<String, String> dataSourceURLs) {
    FOR (Entry<String, DataSource> entry : activeDataSourceMap.entrySet()) {
        BOOLEAN disable = TRUE;
        String url = NULL;
        try (Connection connection = entry.getValue().getConnection()) {
            url = connection.getMetaData().getURL();
            FOR (String each : memberDataSourceURLs) {
                IF (NULL != url && url.contains(each)) {
                    disable = FALSE;
        } catch (final SQLException ex) {
            log.error("An exception occurred while find data source urls", ex);
        IF (disable) {
            ShardingSphereEventBus.getInstance().post(NEW DataSourceDisabledEvent(schemaName, entry.getKey(), TRUE));
        } ELSE IF (!url.isEmpty()) {
            dataSourceURLs.put(entry.getKey(), url);

Whether the secondary database is disabled is based on the data source configured and all the nodes in the MGR group.

ShardingSphere can check one by one whether the data source configured can obtain Connection properly and verify whether the data source URL contains nodes of the MGR group.

If Connection cannot be obtained or the verification fails, ShardingSphere will disable the data source by an event trigger and synchronize it to the registry center.

Enable secondary databases:

private void determineEnabledDataSource(final Map<String, DataSource> dataSourceMap, final String schemaName,
                                        final List<String> memberDataSourceURLs, final Map<String, String> dataSourceURLs) {
    FOR (String each : memberDataSourceURLs) {
        BOOLEAN enable = TRUE;
        FOR (Entry<String, String> entry : dataSourceURLs.entrySet()) {
            IF (entry.getValue().contains(each)) {
                enable = FALSE;
        IF (!enable) {
        FOR (Entry<String, DataSource> entry : dataSourceMap.entrySet()) {
            String url;
            try (Connection connection = entry.getValue().getConnection()) {
                url = connection.getMetaData().getURL();
                IF (NULL != url && url.contains(each)) {
                    ShardingSphereEventBus.getInstance().post(NEW DataSourceDisabledEvent(schemaName, entry.getKey(), FALSE));
            } catch (final SQLException ex) {
                log.error("An exception occurred while find enable data source urls", ex);

After the crashed secondary database is recovered and added to the MGR group, the configuration will be checked to see whether the recovered data source is used. If yes, the event trigger will tell ShardingSphere that the data source needs to be enabled.

More great content Free online course: RHEL technical overview Learn advanced Linux commands Download cheat sheets Find an open source alternative Explore open source resources Heartbeat mechanism

The heartbeat mechanism is introduced to the HA module to ensure that the primary-secondary states are synchronized in real-time.

By integrating the ShardingSphere sub-project ElasticJob, the above processes are executed by the ElasticJob scheduler framework in the form of Job when the HA module is initialized, thus achieving the separation of function development and job scheduling.

Even if developers need to extend the HA function, they do not need to care about how jobs are developed and operated:

private void initHeartBeatJobs(final String schemaName, final Map<String, DataSource> dataSourceMap) {
    Optional<ModeScheduleContext> modeScheduleContext = ModeScheduleContextFactory.getInstance().get();
    IF (modeScheduleContext.isPresent()) {
        FOR (Entry<String, DatabaseDiscoveryDataSourceRule> entry : dataSourceRules.entrySet()) {
            Map<String, DataSource> dataSources = dataSourceMap.entrySet().stream().filter(dataSource -> !entry.getValue().getDisabledDataSourceNames().contains(dataSource.getKey()))
                    .collect(Collectors.toMap(Entry::getKey, Entry::getValue));
            CronJob job = NEW CronJob(entry.getValue().getDatabaseDiscoveryType().getType() + "-" + entry.getValue().getGroupName(),
                each -> NEW HeartbeatJob(schemaName, dataSources, entry.getValue().getGroupName(), entry.getValue().getDatabaseDiscoveryType(), entry.getValue().getDisabledDataSourceNames())
                            .execute(NULL), entry.getValue().getHeartbeatProps().getProperty("keep-alive-cron"));
}Wrap up

So far, Apache ShardingSphere's HA feature has proven applicable for MySQL and openGauss HA solutions. Moving forward, it will integrate more MySQL HA products and support more database HA solutions.

As always, if you're interested, you're more than welcome to join us and contribute to the Apache ShardingSphere project.

Apache ShardingSphere Open Source Project Links:

Learn how and why ShardingSphere can achieve database high availability using MySQL as an example.

Image by:

Image by Mapbox Uncharted ERG, CC-BY 3.0 US

Databases What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.