Open-source News

Analyze web pages with Python requests and Beautiful Soup

opensource.com - Thu, 06/16/2022 - 15:00
Analyze web pages with Python requests and Beautiful Soup Seth Kenlon Thu, 06/16/2022 - 03:00 1 reader likes this 1 reader likes this

Browsing the web probably accounts for much of your day. But it's an awfully manual process, isn't it? You have to open a browser. Go to a website. Click buttons, move a mouse. It's a lot of work. Wouldn't it be nicer to interact with the Internet through code?

You can get data from the Internet using Python with the help of the Python module requests:

import requests

DATA = "https://opensource.com/article/22/5/document-source-code-doxygen-linux"
PAGE = requests.get(DATA)

print(PAGE.text)

In this code sample, you first import the module requests. Then you create two variables: one called DATA to hold the URL you want to download. In later versions of this code, you'll be able to provide a different URL each time you run your application. For now, though, it's easiest to just "hard code” a test URL for demonstration purposes.

The other variable is PAGE, which you set to the response of the requests.get function when it reads the URL stored in DATA. The requests module and its .get function is pre-programmed to "read” an Internet address (a URL), access the Internet, and download whatever is located at that address.

That's a lot of steps you don't have to figure out on your own, and that's exactly why Python modules exist. Finally, you tell Python to print everything that requests.get has stored in the .text field of the PAGE variable.

Beautiful Soup

If you run the sample code above, you get the contents of the example URL dumped indiscriminately into your terminal. It does that because the only thing your code does with the data that requests has gathered is print it. It's more interesting to parse the text.

Python can "read” text with its most basic functions, but parsing text allows you to search for patterns, specific words, HTML tags, and so on. You could parse the text returned by requests yourself, but using a specialized module is much easier. For HTML and XML, there's the Beautiful Soup library.

This code accomplishes the same thing, but it uses Beautiful Soup to parse the downloaded text. Because Beautiful Soup recognizes HTML entities, you can use some of its built-in features to make the output a little easier for the human eye to parse.

For instance, instead of printing raw text at the end of your program, you can run the text through the .prettify function of Beautiful Soup:

from bs4 import BeautifulSoup
import requests

PAGE = requests.get("https://opensource.com/article/22/5/document-source-code-doxygen-linux")
SOUP = BeautifulSoup(PAGE.text, 'html.parser')

# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    # do a thing here
    print(SOUP.prettify())

The output of this version of your program ensures that every opening HTML tag starts on its own line, with indentation to help demonstrate which tag is a parent of another tag. Beautiful Soup is aware of HTML tags in more ways than just how it prints it out.

Instead of printing the whole page, you can single out a specific kind of tag. For instance, try changing the print selector from print(SOUP.prettify() to this:

  print(SOUP.p)

This prints just a

tag. Specifically, it prints just the first

tag encountered. To print all

tags, you need a loop.

More Python resources What is an IDE? Cheat sheet: Python 3.7 for beginners Top Python GUI frameworks Download: 7 essential PyPI libraries Red Hat Developers Latest Python articles Looping

Create a for loop to cycle over the entire webpage contained in the SOUP variable, using the find_all function of Beautiful Soup. It's not unreasonable to want to use your loop for other tags besides just the

tag, so build it as a custom function, designated by the def keyword (for "define”) in Python.

def loopit():
    for TAG in SOUP.find_all('p'):
        print(TAG)

The temporary variable TAG is arbitrary. You can use any term, such as ITEM or i or whatever you want. Each time the loop runs, TAG contains the search results of the find_all function. In this code, the

tag is being searched.

A function doesn't run unless it's explicitly called. You can call your function at the end of your code:

# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    # do a thing here
    loopit()

Run your code to see all

tags and each one's contents.

Getting just the content

You can exclude tags from being printed by specifying that you want just the "string” (programming lingo for "words”).

def loopit():
    for TAG in SOUP.find_all('p'):
        print(TAG.string)

Of course, once you have the text of a webpage, you can parse it further with the standard Python string libraries. For instance, you can get a word count using len and split:

def loopit():
    for TAG in SOUP.find_all('p'):
        if TAG.string is not None:
            print(len(TAG.string.split()))

This prints the number of strings within each paragraph element, omitting those paragraphs that don't have any strings. To get a grand total, use a variable and some basic math:

def loopit():
    NUM = 0
    for TAG in SOUP.find_all('p'):
        if TAG.string is not None:
            NUM = NUM + len(TAG.string.split())
    print("Grand total is ", NUM)Python homework

There's a lot more information you can extract with Beautiful Soup and Python. Here are some ideas on how to improve your application:

  • Accept input so you can specify what URL to download and analyze when you launch your application.
  • Count the number of images ( tags) on a page.
  • Count the number of images ( tags) within another tag (for instance, only images that appear in the div, or only images following a

    tag).

Follow this Python tutorial to easily extract information about web pages.

Image by:

Opensource.com

Python What to read next A guide to web scraping in Python using Beautiful Soup A beginner's guide to web scraping with Python This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Using habits to practice open organization principles

opensource.com - Thu, 06/16/2022 - 15:00
Using habits to practice open organization principles Ron McFarland Thu, 06/16/2022 - 03:00 1 reader likes this 1 reader likes this

Habits are a long-term interest of mine. Several years ago, I gave a presentation on habits, both good and bad, and how to expand on good habits and change bad ones. Just recently, I read the habits-focused book Smart Thinking by Art Markman. You might ask what this has to do with open organization principles. There is a connection, and I'll explain it in this two-part article on managing habits.

In this first article, I talk about habits, how they work, and—most important—how you can start to change them. In the second article, I review Markman's thoughts as presented in his book.

The intersection of principles and habits

Suppose you learned about open organization principles and although you found them interesting and valuable, you just weren't in the habit of using them. Here's how that might look in practice.

Community: If you're faced with a significant challenge but think you can't address it alone, you're likely in the habit of just giving up. Wouldn't it be better to have the habit of building a community of like-minded people that collectively can solve the problem?

Collaboration: Suppose you don't think you're a good collaborator. You like to do things alone. You know that there are cases when collaboration is required, but you don't have a habit of engaging in it. To counteract that, you must build a habit of collaborating more.

Transparency: Say you like to keep most of what you do and know a secret. However, you know that if you don't share information, you're not likely to get good information from others. Therefore, you must create the habit of being more transparent.

Inclusivity: Imagine you are uncomfortable working with people you don't know and who are different from you, whether in personality, culture, or language. You know that if you want to be successful, you must work with a wide variety of people. How do you create a habit of being more inclusive?

Adaptability: Suppose you tend to resist change long after what you're doing is no longer achieving what you had hoped it would. You know you must adapt and redirect your efforts, but how can you create a habit of being adaptive?

What is a habit?

Before I give examples regarding the above principles, I'll explain some of the relevant characteristics of a habit.

  • A habit is a behavior performed repeatedly—so much so that it's now performed without thinking.
  • A habit is automatic and feels right at the time. The person is so used to it, that it feels good when doing it, and to do something else would require effort and make them feel uncomfortable. They might have second thoughts afterward though. 
  • Some habits are good and extremely helpful by saving you a lot of energy. The brain is 2% of the body's weight but consumes 20% of your daily energy. Because thinking and concentration require a lot of energy, your mind is built to save it through developing unconscious habits.
  • Some habits are bad for you, so you desire to change them.
  • All habits offer some reward, even if it is only temporary.
  • Habits are formed around what you are familiar with and what you know, even habits you don’t necessarily like.
The three steps of a habit
  1. Cue (trigger): First, a cue or trigger tells the brain to go into automatic mode, using previously learned habitual behavior. Cues can be things like seeing a candy bar or a television commercial, being in a certain place at a certain time of day, or just seeing a particular person. Time pressure can trigger a routine. An overwhelming atmosphere can trigger a routine. Simply put, something reminds you to behave a certain way.
  2. Routine: The routine follows the trigger. A routine is a set of physical, mental, and/or emotional behaviors that can be incredibly complex or extremely simple. Some habits, such as those related to emotions, are measured in milliseconds.
  3. Reward: The final step is the reward, which helps your brain figure out whether a particular activity is worth remembering for the future. Rewards can range from food or drugs that cause physical sensations to joy, pride, praise, or personal self-esteem.
Bad habits in a business environment

Habits aren't just for individuals. All organizations have good and bad institutional habits. However, some organizations deliberately design their habits, while others just let them evolve without forethought, possibly through rivalries or fear. These are some organizational habit examples:

  • Always being late with reports
  • Working alone or working in groups when the opposite is appropriate
  • Being triggered by excess pressure from the boss
  • Not caring about declining sales
  • Not cooperating among a sales team because of excess competition
  • Allowing one talkative person to dominate a meeting

Learn about open organizations Download resources Join the community What is an open organization? How open is your organization? A step-by-step plan to change a habit

Habits don't have to last forever. You can change your own behavior. First, remember that many habits can not be changed concurrently. Instead, find a keystone habit and work on it first. This produces small, quick rewards. Remember that one keystone habit can create a chain reaction.

Here is a four-step framework you can apply to changing any habit, including habits related to open organization principles.

Step one: identify the routine

Identify the habit loop and the routine in it (for example, when an important challenge comes up that you can't address alone). The routine (the behaviors you do) is the easiest to identify, so start there. For example: "In my organization, no one discusses problems with anyone. They just give up before starting." Determine the routine that you want to modify, change, or just study. For example: "Every time an important challenge comes up, I should discuss it with people and try to develop a community of like-minded people who have the skills to address it."

Step two: experiment with the rewards

Rewards are powerful because they satisfy cravings. But, we're often not conscious of the cravings that drive our behavior. They are only evident afterward. For example, there may be times in meetings when you want nothing more than to get out of the room and avoid a subject of conversation, even though down deep you know you should figure out how to address the problem.

To learn what a craving is, you must experiment. That might take a few days, weeks, or longer. You must feel the triggering pressure when it occurs to identify it fully. For example, ask yourself how you feel when you try to escape responsibility.

Consider yourself a scientist, just doing experiments and gathering data. The steps in your investigation are:

  1. After the first routine, start adjusting the routines that follow to see whether there's a reward change. For example, if you give up every time you see a challenge you can't address by yourself, the reward is the relief of not taking responsibility. A better response might be to discuss the issue with at least one other person who is equally concerned about the issue. The point is to test different hypotheses to determine which craving drives your routine. Are you craving the avoidance of responsibility?
     
  2. After four or five different routines and rewards, write down the first three or four things that come to mind right after each reward is received. Instead of just giving up in the face of a challenge, for instance, you discuss the issue with one person. Then, you decide what can be done.
     
  3. After writing about your feeling or craving, set a timer for 15 minutes. When it rings, ask yourself whether you still have the craving. Before giving in to a craving, rest and think about the issue one or two more times. This forces you to be aware of the moment and helps you later recall what you were thinking about at that moment.
     
  4. Try to remember what you were thinking and feeling at that precise instant, and then 15 minutes after the routine. If the craving is gone, you have identified the reward.
Step three: isolate the cue or trigger

The cue is often hard to identify because there's usually too much information bombarding you as your behaviors unfold. To identify a cue amid other distractions, you can observe four factors the moment the urge hits you:

Location: Where did it occur? ("My biggest challenges come out in meetings.")

Time: When did it occur? ("Meetings in the afternoon, when I'm tired, are the worst time, because I'm not interested in putting forth any effort.")

Feelings: What was your emotional state? ("I feel overwhelmed and depressed when I hear the problem.")

People: Who or what type of people were around you at the time, or were you alone? ("In the meetings, most other people don't seem interested in the problem either. Others dominate the discussion.")

Step four: have a plan

Once you have confirmed the reward driving your behavior, the cues that trigger it, and the behavior itself, you can begin to shift your actions. Follow these three easy steps:

  1. First, plan for the cue. ("In meetings, I'm going to look for and focus my attention on important problems that come up.")
     
  2. Second, choose a behavior that delivers the same reward but without the penalties you suffer now. ("I'm going to explore a plan to address that problem and consider what resources and skills I need to succeed. I'm going to feel great when I create a community that's able to address the problem successfully.")
     
  3. Third, make the behavior a deliberate choice each and every time, until you no longer need to think about it. ("I'm going to consciously pay attention to major issues until I can do it without thinking. I might look at agendas of future meetings, so I know what to expect in advance. Before and during every meeting, I will ask why should I be here, to make sure I'm focused on what is important."
Plan to avoid forgetting something that must be done

To successfully start doing something you often forget, follow this process:

  1. Plan what you want to do.
  2. Determine when you want to complete it.
  3. Break the project into small tasks as needed.
  4. With a timer or daily planner, set up cues to start each task.
  5. Complete each task on schedule.
  6. Reward yourself for staying on schedule.
Habit change

Change takes a long time. Sometimes a support group is required to help change a habit. Sometimes, a lot of practice and role play of a new and better routine in a low-stress environment is required. To find an effective reward, you need repeated experimentation.

Sometimes habits are only symptoms of a more significant, deeper problem. In these cases, professional help may be required. But if you have the desire to change and accept that there will be minor failures along the way, you can gain power over any habit.

In this article, I've used examples of community development using the cue-routine-reward process. It can equally be applied to the other open organization principles. I hope this article got you thinking about how to manage habits through knowing how habits work, taking steps to change habits, and making plans to avoid forgetting things you want done. Whether it's an open organization principle or anything else, you can now diagnose the cue, the routine, and the reward. That will lead you to a plan to change a habit when the cue presents itself.

In my next article, I'll look at habits through the lens of Art Markman's thoughts on Smart Thinking.

Follow these steps to implement habits that support open culture and get rid of those that don't.

Image by:

opensource.com

The Open Organization What to read next This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

How to Limit Network Bandwidth in NGINX Web Server

Tecmint - Thu, 06/16/2022 - 14:13
The post How to Limit Network Bandwidth in NGINX Web Server first appeared on Tecmint: Linux Howtos, Tutorials & Guides .

Previously, in our NGINX traffic management and security controls series, we have discussed how to limit the number of connections the same client can make to your web resources, using client identification parameters such

The post How to Limit Network Bandwidth in NGINX Web Server first appeared on Tecmint: Linux Howtos, Tutorials & Guides.

AlmaLinux 9 Running Well, Performance On Par With RHEL 9.0

Phoronix - Thu, 06/16/2022 - 02:30
Released at the end of May was AlmaLinux 9.0 as the first "community" distribution out of the gates based on Red Hat Enterprise Linux 9.0 that reached GA in mid-May. I've been running AlmaLinux 9.0 on a few Intel and AMD servers to great success. And, yes, as expected the performance matches that of upstream RHEL9.

Godot 4.0 Alpha 10 Brings Temporal AA

Phoronix - Thu, 06/16/2022 - 01:54
The tenth alpha release of the Godot 4.0 open-source game engine is now available for testing with some interesting additions...

Pages