In this post I’m going to show how a well structured HTML, Python and a loyal Raspberry Pi can help you to find a nice position in a company you like.
At the end, you’ll be able to receive this kind of message:
Overview
Crawling jobs portal
To crawl a website in Python the suggested way to go is to use Beautiful Soup. To use it, put the BeatifulSoup.py at the same folder of your script.
The code below is commented, but what it does is basically:
- Get html content from a page
- Find elements with a certain tag (in our case, the “job” one)
- Get the elements’ text and compare with key-words provided
- Do it for all urls provided
- Format the text
- Send an e-mail when there are desired positions
Name it job_finder.py
.
#encoding='utf-8'
unicode("utf-8")
import urllib
from BeautifulSoup import *
from datetime import datetime
from send_me_msg import send_me_msg
def make_msg_from_positions(warning_email_positions, url):
"""
Format a text block with the url and all the interesting
positions from it.
"""
msg = "Positions from " + url + ": \n"
for i, position in enumerate(warning_email_positions):
msg += str(i+1) + ") " + position + "\n"
return msg
wanted_positions = ["Data Scientist",
"Machine",
"Data",
"Intelligence",
"Software"]
urls_list = ["https://nubank.workable.com/",
"https://eduk.workable.com/",
"https://brandbastion.workable.com/"]
final_msg = ""
for url in urls_list:
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
### Retrieve all of the anchor tags
jobs = soup.findAll("li", { "class" : "job" })
### Keep the text of the available positions
available_positions = []
for job in jobs:
for a_element in job.findAll("a"):
available_positions.append(a_element.text)
warning_email_positions = []
### Compare the available positions with all the desired key-words
for position in available_positions:
for wanted_position in wanted_positions:
if wanted_position.lower() in position.lower():
warning_email_positions.append(position)
### Make sure there's no duplicated
warning_email_positions = set(warning_email_positions)
### If there are interesting positions, format it
if len(warning_email_positions) != 0:
msg = make_msg_from_positions(warning_email_positions, url)
### Concatenate to make a single final string
final_msg += "\n" + msg
### If there are interesting positions, send an email informing it
if len(final_msg) > 0:
send_me_msg(final_msg, "Interesting positions Spotted!")
Sending an e-mail
To make the above script work we need another one to send an e-mail.
from email.mime.text import MIMEText
from datetime import datetime
import base64
def send_me_msg(msg, subject):
now = datetime.now()
server = smtplib.SMTP('smtp.gmail.com', 587)
server.starttls()
server.login("your.email@gmail.com", "your_email_password")
### A list of e-mails to send it, maybe you want to warn some friends!
REPLY_TO_ADDRESSES = ["your.email@gmail.com", "friend.email@gmail.com"]
watch = now.strftime("%H:%M")
### Just a custom message
msg = MIMEText("Hi, Moneda, how things are going? It's " + watch + "\n\n" + msg )
msg["Subject"] = subject + " " + str(now.day) + "/" + str(now.month)
for ADDRESS in REPLY_TO_ADDRESSES:
msg.add_header('reply-to', ADDRESS)
server.sendmail("your.email@gmail.com", ADDRESS, msg.as_string())
server.quit()
if __name__ == "__main__":
pass
You need to enable less segure login in your Gmail account. For that, use this Gmail’s settings page..
Configuring RPi to run a script periodically
From the scratch:
- Download the OS from Raspberry Pi Website
- Write the image in the SD card (I’ve used Etcher in MAC OS)
- Enable ssh:
sudo raspi-config
->Interfacing Options
->SSH
- Connect it to your network via ethernet cable or Wi-Fi
- Use the
ifconfig
command to check your pi’s network address, it’s theinet
field.
Now you can ssh your pi. Use the address you’ve checked using ifconfig
. My pi address in my network is 192.168.1.130
, replace it by yours and do ssh pi@182.168.1.130
.
Then you need to transfer the files you’ve tested locally to the pi:
- Create a folder with
mkdir job_finder
using ssh; - From your regular computer and at the scripts folder, do
scp job_finder.py pi@192.168.1.130:/home/pi/job_finder
,scp job_finder.py pi@192.168.1.130:/home/pi/job_finder
.
You can test if the script work at your Raspberry Pi doing python job_finder.py
. It should work just out-of-box.
Ok, now we need to make the Raspberry call this script periodically. For this, we’re going to use crontab, a linux tool to schedule commands.
sudo crontab -e
Scroll down and insert a new line:
0 0 * * * python /home/pi/job_finder/job_finder.py
Basically, the 5 arguments before the command you want to schedule are to define the minute, hour, day of the month, month and day of week (5 = Friday). When you use asterisk you’re saying you want it to run every time for that field. Here we’ve schedule it to run everyday, so it runs at 0 minute of the 0 hour every day of every month.
If you want a more extensive guide about crontab, I suggested this post.