dependencies
| (this space intentionally left almost blank) | |||||||||||||||||||||
Obviously, the entry point of our application.
Here we import | (ns alexa-monitor.core
(:require [alexa-monitor.collector :as collector]
[alexa-monitor.database :as database]
[overtone.at-at :as at])
(:gen-class)) | |||||||||||||||||||||
Create thread pool for at-at scheduling. | (def thread-pool (at/mk-pool)) | |||||||||||||||||||||
Crawl the page and grab some useful data via This function calls Then we | (defn update-rank
[& args]
(let [domains (database/domain-list)]
(doall (map #(-> %
collector/main
(dissoc :domain)
database/new-entry)
domains)))) | |||||||||||||||||||||
Create a simple schedule and call | (defn -main [& args] (at/every 300000 update-rank thread-pool :desc "updating ranks")) | |||||||||||||||||||||
Here we handle everything relative to crawling the webpage and extracting
the parts we need. Let's choose cool function names; If it's about getting
some html page, why not call it Here we need 3 different things from the webpage, so we have one function for each! They look a lot alike, but I decided not to mix them into one function, so the result would be easier to read and maintain. Maybe later we can find a way to remove duplication without making it look messy. | (ns alexa-monitor.collector
(:require [hato.client :as web]
[hickory.core :as hick]
[hickory.select :as s])
(:gen-class)) | |||||||||||||||||||||
Trim string and return its digit part. If it finds any digit in the provided string,
It returns | (defn digitize
[string]
(->> string
(re-seq #"[\d]+") ; Get all digits in the string,
(#(if (nil? %) ; Empty string returns `nil`.
0 ; If it was an empty string, return `0`.
(bigint (clojure.string/join %)))))) | |||||||||||||||||||||
Retrieve the page and returns html output. Creates a simple HTTP GET request, ignores the status and only returns the HTML part. The url is hard-coded here. You need to run a simple webserver to serve
contents of You need to change this url to TODO: Error handling. | (defn web-grab
[url]
(println "Getting updates for: " url)
(try (:body
#_(web/get (str "http://localhost:8000/" url "/"))
(web/get (str "https://www.alexa.com/minisiteinfo/" url)))
(catch Exception e ""))) | |||||||||||||||||||||
Generate hiccup from html input. Let's have 'hickory' handle everything. | (defn hiccupize
[html]
(-> html
hick/parse
hick/as-hickory)) | |||||||||||||||||||||
Proces the hiccup and read | (defmulti scrape :what) | |||||||||||||||||||||
Extract the rank of our domain from the webpage and | (defmethod scrape :rank
[params hiccup]
(-> (s/select (s/child (s/class "nopaddingbottom")
(s/tag :div)
(s/tag :a))
hiccup)
first
:content
second
digitize)) | |||||||||||||||||||||
Extract the backlink count, then | (defmethod scrape :backlink
[domain-map hiccup]
(-> (s/select (s/child (s/class "nopaddingbottom")
(s/tag :div)
(s/class "nomargin")
(s/tag :a))
hiccup)
first
:content
first
digitize)) | |||||||||||||||||||||
Entry point. Creates a nice data-flow through every function of this namespace.
First request the url and create a hicuup its HTML elements, Then extract
| (defn main
[domain-map]
(let [hiccup (-> :domain
domain-map
web-grab
hiccupize)]
(-> domain-map
(conj {:rank (scrape {:what :rank} hiccup)})
(conj {:backlink (scrape {:what :backlink} hiccup)})))) | |||||||||||||||||||||
Small namespace with few useful functions to work with database. | (ns alexa-monitor.database (:require [clojure.java.jdbc :refer :all]) (:gen-class)) | |||||||||||||||||||||
We currently use sqlite. Since it's a small project, even a CSV file could
do the trick. But let's keep things professional!
MySQL or similar database would eliminate some troubles we have with
| (def db
{:classname "org.sqlite.JDBC"
:subprotocol "sqlite"
:subname (.getAbsolutePath (new java.io.File "db/database.db"))}) | |||||||||||||||||||||
Create database and table. For now, we create the | (defn create-db
[]
(try (db-do-commands
db
(create-table-ddl :rank
[[:id :integer :primary :key :asc]
[:domain_id :integer]
[:rank :integer]
[:backlink :integer]
[:timestamp :datetime :default :current_timestamp]
[:foreign :key "(domain_id)" :references :domains "(domain_id)"]]))
(catch Exception e
(.println *err* (.getMessage e))))
(try (db-do-commands
db
(create-table-ddl :domains
[[:domain_id :integer :primary :key :asc]
[:domain :text]]))
(catch Exception e
(.println *err* (.getMessage e))))) | |||||||||||||||||||||
Get list of tracked websites from | (defn domain-list
[]
(-> db
(query ["SELECT * from domains"]))) | |||||||||||||||||||||
Get the last recorded rank for given domain name. It returns a lazy sequence. | (defn last-rank
[domain-id]
(-> db
(query [(str "SELECT rank FROM rank where domain_id='" domain-id "' order by id desc limit 1")])
first
:rank)) | |||||||||||||||||||||
Check if current rank is | (defn new-entry
[domain-map]
(if-not (nil? (:rank domain-map))
(let [domain (:domain_id domain-map)
last-record (last-rank domain)]
(if (not= last-record (:rank domain-map))
(do
(insert! db "rank" domain-map)
(println "New rank:" domain-map))
(println "No change:" domain-map)))
(.println *err* "Connection error:" domain-map))) | |||||||||||||||||||||