dependencies
| (this space intentionally left almost blank) | |||||||||||||||||||||
Obviously, the entry point of our application.
Here we import | (ns alexa-monitor.core (:require [alexa-monitor.collector :as collector] [alexa-monitor.database :as database] [overtone.at-at :as at]) (:gen-class)) | |||||||||||||||||||||
Create thread pool for at-at scheduling. | (def thread-pool (at/mk-pool)) | |||||||||||||||||||||
Crawl the page and grab some useful data via This function calls Then we | (defn update-rank [& args] (let [domains (database/domain-list)] (doall (map #(-> % collector/main (dissoc :domain) database/new-entry) domains)))) | |||||||||||||||||||||
Create a simple schedule and call | (defn -main [& args] (at/every 300000 update-rank thread-pool :desc "updating ranks")) | |||||||||||||||||||||
Here we handle everything relative to crawling the webpage and extracting
the parts we need. Let's choose cool function names; If it's about getting
some html page, why not call it Here we need 3 different things from the webpage, so we have one function for each! They look a lot alike, but I decided not to mix them into one function, so the result would be easier to read and maintain. Maybe later we can find a way to remove duplication without making it look messy. | (ns alexa-monitor.collector (:require [hato.client :as web] [hickory.core :as hick] [hickory.select :as s]) (:gen-class)) | |||||||||||||||||||||
Trim string and return its digit part. If it finds any digit in the provided string,
It returns | (defn digitize [string] (->> string (re-seq #"[\d]+") ; Get all digits in the string, (#(if (nil? %) ; Empty string returns `nil`. 0 ; If it was an empty string, return `0`. (bigint (clojure.string/join %)))))) | |||||||||||||||||||||
Retrieve the page and returns html output. Creates a simple HTTP GET request, ignores the status and only returns the HTML part. The url is hard-coded here. You need to run a simple webserver to serve
contents of You need to change this url to TODO: Error handling. | (defn web-grab [url] (println "Getting updates for: " url) (try (:body #_(web/get (str "http://localhost:8000/" url "/")) (web/get (str "https://www.alexa.com/minisiteinfo/" url))) (catch Exception e ""))) | |||||||||||||||||||||
Generate hiccup from html input. Let's have 'hickory' handle everything. | (defn hiccupize [html] (-> html hick/parse hick/as-hickory)) | |||||||||||||||||||||
Proces the hiccup and read | (defmulti scrape :what) | |||||||||||||||||||||
Extract the rank of our domain from the webpage and | (defmethod scrape :rank [params hiccup] (-> (s/select (s/child (s/class "nopaddingbottom") (s/tag :div) (s/tag :a)) hiccup) first :content second digitize)) | |||||||||||||||||||||
Extract the backlink count, then | (defmethod scrape :backlink [domain-map hiccup] (-> (s/select (s/child (s/class "nopaddingbottom") (s/tag :div) (s/class "nomargin") (s/tag :a)) hiccup) first :content first digitize)) | |||||||||||||||||||||
Entry point. Creates a nice data-flow through every function of this namespace.
First request the url and create a hicuup its HTML elements, Then extract
| (defn main [domain-map] (let [hiccup (-> :domain domain-map web-grab hiccupize)] (-> domain-map (conj {:rank (scrape {:what :rank} hiccup)}) (conj {:backlink (scrape {:what :backlink} hiccup)})))) | |||||||||||||||||||||
Small namespace with few useful functions to work with database. | (ns alexa-monitor.database (:require [clojure.java.jdbc :refer :all]) (:gen-class)) | |||||||||||||||||||||
We currently use sqlite. Since it's a small project, even a CSV file could
do the trick. But let's keep things professional!
MySQL or similar database would eliminate some troubles we have with
| (def db {:classname "org.sqlite.JDBC" :subprotocol "sqlite" :subname (.getAbsolutePath (new java.io.File "db/database.db"))}) | |||||||||||||||||||||
Create database and table. For now, we create the | (defn create-db [] (try (db-do-commands db (create-table-ddl :rank [[:id :integer :primary :key :asc] [:domain_id :integer] [:rank :integer] [:backlink :integer] [:timestamp :datetime :default :current_timestamp] [:foreign :key "(domain_id)" :references :domains "(domain_id)"]])) (catch Exception e (.println *err* (.getMessage e)))) (try (db-do-commands db (create-table-ddl :domains [[:domain_id :integer :primary :key :asc] [:domain :text]])) (catch Exception e (.println *err* (.getMessage e))))) | |||||||||||||||||||||
Get list of tracked websites from | (defn domain-list [] (-> db (query ["SELECT * from domains"]))) | |||||||||||||||||||||
Get the last recorded rank for given domain name. It returns a lazy sequence. | (defn last-rank [domain-id] (-> db (query [(str "SELECT rank FROM rank where domain_id='" domain-id "' order by id desc limit 1")]) first :rank)) | |||||||||||||||||||||
Check if current rank is | (defn new-entry [domain-map] (if-not (nil? (:rank domain-map)) (let [domain (:domain_id domain-map) last-record (last-rank domain)] (if (not= last-record (:rank domain-map)) (do (insert! db "rank" domain-map) (println "New rank:" domain-map)) (println "No change:" domain-map))) (.println *err* "Connection error:" domain-map))) | |||||||||||||||||||||