job_post_fetcher_app

This is an old revision of the document!


Job Post Fetcher App

This Python app is designed to fetch recent job posts from a specific website (in this case, the IUA job board) and send notifications to a specified Telegram channel when new posts are available.

Introduction

The script retrieves job posts by scraping the webpage using BeautifulSoup. It looks for posts that match today's date or posts within the last 7 days. If new posts are found, the app sends a notification with the list of recent job posts to a Telegram bot.

How It Works

  • Web Scraping: The app uses requests to fetch the webpage and BeautifulSoup to parse the HTML content. It extracts article elements with job post information and their publication dates.
  • Date Handling: The dates are published in Spanish (e.g., 16 septiembre, 2024). The app converts the Spanish date format to English for easier processing and comparison with the current date.
  • Telegram Notification: If job posts from today are found, the app sends a list of job posts from the last 7 days using a Telegram bot. The bot sends the message using the requests library with the correct TOKEN, CHAT_ID, and message body.

Code Highlights

main.py
import requests
from bs4 import BeautifulSoup
from datetime import datetime, timedelta
import re
 
# Telegram bot details
TOKEN = "160707****************************Hympz6rI"
CHAT_ID = "-476*******"
TELEGRAM_URL = f"https://api.telegram.org/bot{TOKEN}/sendMessage"
 
# Map Spanish month names to their number equivalents
month_map = {
    'enero': 'January', 'febrero': 'February', 'marzo': 'March', 'abril': 'April',
    'mayo': 'May', 'junio': 'June', 'julio': 'July', 'agosto': 'August',
    'septiembre': 'September', 'octubre': 'October', 'noviembre': 'November', 'diciembre': 'December'
}
 
def convert_spanish_date(date_str):
    """Convert Spanish month names to English."""
    for spanish_month, english_month in month_map.items():
        if spanish_month in date_str:
            date_str = date_str.replace(spanish_month, english_month)
            break
    return date_str
 
def get_job_posts(url, days=1):
    """Fetch job posts from the URL within the given number of days."""
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
 
    # Get today's date
    today = datetime.now().date()
    date_limit = today - timedelta(days=days)
 
    job_posts = []
 
    for post in soup.find_all('article'):
        date_str = post.find(class_='posted-on').text.strip()
 
        # Use regex to extract the date
        match = re.search(r'\d{1,2} \w+?, \d{4}', date_str)
        if match:
            date_str = match.group(0)
 
            # Convert the Spanish date to English format
            date_str = convert_spanish_date(date_str)
 
            try:
                post_date = datetime.strptime(date_str, '%d %B, %Y').date()
 
                if post_date >= date_limit:
                    title = post.find('h2', class_='entry-title').text.strip()
                    job_posts.append(f"{title}; {post_date.strftime('%d %B, %Y')}")
            except ValueError:
                print(f"Error parsing date: {date_str}")
 
    return job_posts
 
def send_telegram_message(message):
    """Send a message using the Telegram bot."""
    data = {
        'chat_id': CHAT_ID,
        'text': message
    }
    response = requests.post(TELEGRAM_URL, data=data)
    if response.status_code == 200:
        print("Message sent successfully.")
    else:
        print(f"Failed to send message. Status code: {response.status_code}")
 
# URL of the job-post board
url = 'https://egresados.iua.edu.ar/?cat=3'
 
# Get job posts from today
job_posts_today = get_job_posts(url, days=1)
 
# If there are job posts from today, notify via Telegram
if job_posts_today:
    # Get job posts from the last 7 days
    job_posts_week = get_job_posts(url, days=7)
 
    # Format the message with job posts
    message = "There are new posts:\n" + "\n".join(job_posts_week)
 
    # Send the message via Telegram
    send_telegram_message(message)
else:
    print("No new posts today.")

Code Highlights

Here are some key points of the script:

  • Date Parsing: The script uses datetime to handle date comparison. The app uses regular expressions to extract the date and convert it from Spanish month names to English.
  • Telegram Bot Integration: The bot sends a message to a specified chat or group, containing the titles of the job posts and their respective dates.

Potential Modifications

  • Custom Date Range: You can modify the days argument to fetch posts from a different range of days (e.g., last 30 days).
  • Content Filtering: To improve the script, you could filter posts by category, author, or any other relevant metadata from the webpage.
  • Error Handling: Adding more robust error handling (for network issues or parsing failures) can make the script more reliable in production.

Why It Works

The app leverages well-established libraries like requests for web requests and BeautifulSoup for HTML parsing. The combination of these with Python’s date manipulation tools makes it effective for scraping time-sensitive information. The Telegram bot integration adds a convenient notification feature, allowing immediate action when new posts are available.

Final Thoughts

This app can be easily adapted for other job boards or content websites. By modifying the URL and tweaking the HTML element selectors, you can use this as a template for a wide range of web scraping and notification tasks.

job_post_fetcher_app.1726515355.txt.gz · Last modified: 2024/10/17 21:42 (external edit)