I was looking for projects to do since I’ve been practicing Python, then i i thought, why not automate Craigslist with python?!
Situation/Problem :
Craigslist has a pretty cool “Free” section. People can post items they want to give away (maybe a store closing, or someone moving etc)Some of those items, i may be interested in. But who has time to scroll through all of the pages? Lets Automate!
Solution :
You can easily use Python and a few libraries to browse CL , pull up posts with the items you want, and send them to an email.
Imports
To start, you will need to import some Libraries (pre-written pieces of Python code with modules that have functions, classes, and methods you can use) Import these at the top of your file with ‘from’ and ‘import’ statements below
from bs4 import BeautifulSoup
from datetime import datetime
import requests
import time
import smtplib
BeautifulSoup is a powerful library that lets you read HTML from the web and extract information, datetime is used to add timestamps, requests is used to connect to the web and pull down web pages , and SMTPlib is used to compose and send emails. The last line tells the file to import all variables from another file called config.py (this needs to be secure and accessible from your current file)
Variables
The first step is to define the starting page, which is Craigslist’s free section. This will vary based on where you are so update it accordingly.
Free_CL_URL = "https://newyork.craigslist.org/d/free-stuff/search/zip"
Function 1 : Navigating the "Free section" page by page
You will need to define a couple of functions. The first one crawlFree checks if your on the first page in the free section and returns the HTML. If you are not it passes in the argument of “page value” which is how CL tracks its pages in multiples of 120 for example :
https://newyork.craigslist.org/search/zip?s=120 (page 1)
def crawlFree(pageval):
if pageval == 0:
r = requests.get(Free_CL_URL).text
soup = BeautifulSoup(r,'html.parser')
else:
r = requests.get(Free_CL_URL+"?s="+str(pageval)).text
time.sleep(1)
soup = BeautifulSoup(r, 'html.parser')
return soup
Function 2 : Parsing Output & Search
Now lets define another function called searchItems, this first creates a blank list called itemlist, then parses an input (the pages we just scraped) strips all of the URLs which is the title of each posting on the page as BeautifulSoup parses all of the links
def searchItems(input):
# in each page crawled from crawlFree , extract the titles, lower the character case and compare against search strings to append a result list
itemlist = []
for i in input:
TitleSplit = str(i.contents[0]).split()
TitleSplit = str([TitleSplit.lower() for TitleSplit in TitleSplit])
if "piano" in TitleSplit:
print(str("\n" + i.contents[0]))
itemlist.append(i.contents[0])
print((i.attrs['href']))
itemlist.append(i.attrs['href'])
elif "tv" in TitleSplit:
print(str("\n" + i.contents[0]))
itemlist.append(i.contents[0])
print((i.attrs['href']))
itemlist.append(i.attrs['href'])
elif "watch" in TitleSplit:
print(str("\n" + i.contents[0]))
itemlist.append(i.contents[0])
print((i.attrs['href']))
itemlist.append(i.attrs['href'])
return itemlist
pageval = 0
totalist = []
Main While Loop
Now we incorporate the first function with a While loop, which makes a soup object (html parsed page) via the crawlFree() function, and then searchs each parsed page to see if its at the last page (that text shows when you browse to a page that doesnt exist)
Next, it uses the soup.findall function to pull out all of the links per page
It puts these in a list, then searches the links for items we want via searchItems and appends this to a final list call totallist
finally it adds 120 to the page value so the nexttime the crawl function is run it is on the next page
while True:
time.sleep(0.2)
soup = crawlFree(pageval)
# crawl paege until you hit a page with the following text, signifing the end of the catagory
if "search and you will find" and "the harvest moon wanes" in soup.text:
print("\nEnd of Script")
break
else:
print("\nSearching page " + str((int(pageval / 120))))
links = soup.find_all('a', class_="result-title hdrlnk")
itemlist = searchItems(links)
totalist.append(itemlist)
pageval += 120
Email the Final List
Lastly, we email the list by composing a message with SMTP module, it uses 2 basic for loops to go through the list and prints this in the message
Then connects to O365 (my provider) and emails the message, make sure to keep your username and password in a separate secure file.
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
# message compliation and delivery
message = "Subject:CL Free Bot Report - " + str(len(totalist)) + "\n\n"
for i in totalist:
for i in i:
message += str("\n" + str(i) + "\n")
s = smtplib.SMTP('smtp.office365.com', 587)
s.starttls()
s.login(sender_email, password)
s.sendmail(sender_email, reciver_email, message.encode("utf-8"))
print(message)
print("sent mail")
s.quit()
Complete Script :
Here’s the complete script, now you can Automate Craigslist with Python! Try some of the other sections and adjust your variables. If you liked this check out another script i wrote to automate finding delivery windows on AmazonFresh
from bs4 import BeautifulSoup
from datetime import datetime
import requests
import time
import smtplib
from config import *
Free_CL_URL = "https://newyork.craigslist.org/d/free-stuff/search/zip"
def crawlFree(pageval):
# crawls the free items section and parses HTML
if pageval == 0:
r = requests.get(Free_CL_URL).text
soup = BeautifulSoup(r, 'html.parser')
else:
r = requests.get(Free_CL_URL + "?s=" + str(pageval)).text
time.sleep(1)
soup = BeautifulSoup(r, 'html.parser')
return soup
def searchItems(input):
# in each page crawled from crawlFree , extract the titles, lower the character case and compare against search strings to append a result list
itemlist = []
for i in input:
TitleSplit = str(i.contents[0]).split()
TitleSplit = str([TitleSplit.lower() for TitleSplit in TitleSplit])
if "piano" in TitleSplit:
print(str("\n" + i.contents[0]))
itemlist.append(i.contents[0])
print((i.attrs['href']))
itemlist.append(i.attrs['href'])
elif "tv" in TitleSplit:
print(str("\n" + i.contents[0]))
itemlist.append(i.contents[0])
print((i.attrs['href']))
itemlist.append(i.attrs['href'])
elif "watch" in TitleSplit:
print(str("\n" + i.contents[0]))
itemlist.append(i.contents[0])
print((i.attrs['href']))
itemlist.append(i.attrs['href'])
return itemlist
pageval = 0
totalist = []
while True:
time.sleep(0.2)
soup = crawlFree(pageval)
# crawl paege until you hit a page with the following text, signifing the end of the catagory
if "search and you will find" and "the harvest moon wanes" in soup.text:
print("\nEnd of Script")
break
else:
print("\nSearching page " + str((int(pageval / 120))))
links = soup.find_all('a', class_="result-title hdrlnk")
itemlist = searchItems(links)
totalist.append(itemlist)
pageval += 120
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
# message compliation and delivery
message = "Subject:CL Free Bot Report - " + str(len(totalist)) + "\n\n"
for i in totalist:
for i in i:
message += str("\n" + str(i) + "\n")
s = smtplib.SMTP('smtp.office365.com', 587)
s.starttls()
s.login(sender_email, password)
s.sendmail(sender_email, reciver_email, message.encode("utf-8"))
print(message)
print("sent mail")
s.quit()