Table of Contents
Job Post Fetcher App
This Python app is designed to fetch recent job posts from a specific website (in this case, the IUA job board) and send notifications to a specified Telegram channel when new posts are available.
Introduction
The script retrieves job posts by scraping the webpage using BeautifulSoup. It looks for posts that match today's date or posts within the last 7 days. If new posts are found, the app sends a notification with the list of recent job posts to a Telegram bot.
How It Works
- Web Scraping: The app uses
requeststo fetch the webpage andBeautifulSoupto parse the HTML content. It extracts article elements with job post information and their publication dates. - Date Handling: The dates are published in Spanish (e.g.,
16 septiembre, 2024). The app converts the Spanish date format to English for easier processing and comparison with the current date. - Telegram Notification: If job posts from today are found, the app sends a list of job posts from the last 7 days using a Telegram bot. The bot sends the message using the
requestslibrary with the correctTOKEN,CHAT_ID, and message body.
Code Highlights
- main.py
import requests from bs4 import BeautifulSoup from datetime import datetime, timedelta import re # Telegram bot details TOKEN = "160707****************************Hympz6rI" CHAT_ID = "-476*******" TELEGRAM_URL = f"https://api.telegram.org/bot{TOKEN}/sendMessage" # Map Spanish month names to their number equivalents month_map = { 'enero': 'January', 'febrero': 'February', 'marzo': 'March', 'abril': 'April', 'mayo': 'May', 'junio': 'June', 'julio': 'July', 'agosto': 'August', 'septiembre': 'September', 'octubre': 'October', 'noviembre': 'November', 'diciembre': 'December' } def convert_spanish_date(date_str): """Convert Spanish month names to English.""" for spanish_month, english_month in month_map.items(): if spanish_month in date_str: date_str = date_str.replace(spanish_month, english_month) break return date_str def get_job_posts(url, days=1): """Fetch job posts from the URL within the given number of days.""" response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') # Get today's date today = datetime.now().date() date_limit = today - timedelta(days=days) job_posts = [] for post in soup.find_all('article'): date_str = post.find(class_='posted-on').text.strip() # Use regex to extract the date match = re.search(r'\d{1,2} \w+?, \d{4}', date_str) if match: date_str = match.group(0) # Convert the Spanish date to English format date_str = convert_spanish_date(date_str) try: post_date = datetime.strptime(date_str, '%d %B, %Y').date() if post_date >= date_limit: title = post.find('h2', class_='entry-title').text.strip() job_posts.append(f"{title}; {post_date.strftime('%d %B, %Y')}") except ValueError: print(f"Error parsing date: {date_str}") return job_posts def send_telegram_message(message): """Send a message using the Telegram bot.""" data = { 'chat_id': CHAT_ID, 'text': message } response = requests.post(TELEGRAM_URL, data=data) if response.status_code == 200: print("Message sent successfully.") else: print(f"Failed to send message. Status code: {response.status_code}") # URL of the job-post board url = 'https://egresados.iua.edu.ar/?cat=3' # Get job posts from today job_posts_today = get_job_posts(url, days=1) # If there are job posts from today, notify via Telegram if job_posts_today: # Get job posts from the last 7 days job_posts_week = get_job_posts(url, days=7) # Format the message with job posts message = "There are new posts:\n" + "\n".join(job_posts_week) # Send the message via Telegram send_telegram_message(message) else: print("No new posts today.")
Code Highlights
Here are some key points of the script:
- Date Parsing: The script uses
datetimeto handle date comparison. The app uses regular expressions to extract the date and convert it from Spanish month names to English. - Telegram Bot Integration: The bot sends a message to a specified chat or group, containing the titles of the job posts and their respective dates.
Potential Modifications
- Custom Date Range: You can modify the
daysargument to fetch posts from a different range of days (e.g., last 30 days). - Content Filtering: To improve the script, you could filter posts by category, author, or any other relevant metadata from the webpage.
- Error Handling: Adding more robust error handling (for network issues or parsing failures) can make the script more reliable in production.
Why It Works
The app leverages well-established libraries like requests for web requests and BeautifulSoup for HTML parsing. The combination of these with Python’s date manipulation tools makes it effective for scraping time-sensitive information. The Telegram bot integration adds a convenient notification feature, allowing immediate action when new posts are available.
Final Thoughts
This app can be easily adapted for other job boards or content websites. By modifying the URL and tweaking the HTML element selectors, you can use this as a template for a wide range of web scraping and notification tasks.
