dnseen: simple DNS queries analyzer

dnseen is my new tiny open source project to ease collection of DNS requests coming from the local machine and aggregating statistics over given period of time. dnseen is written in Clojure programming language (with babashka) and works as a wrapper over tcpdump.

It’s written with the following goals in mind:

Goals

  1. Get a top-list of websites that I visit intentionally

This is mostly to reduce media consumption and remove distracting websites that make you a dophamine addict and bring so little (yes, HackerNews is one of them!). For now I simply add domains I want to block to my /etc/hosts like this:

0.0.0.0 domain

But as my next step I’d like to use dnsmasq and maintain an extended collection of hosts files, possibly using Steven Blacks hosts files collection (upd. see my dnsmasq guide).

  1. Find anomalies related to tracking activities

Even the software that claims to protect user’s privacy may track too much. So I want to collect DNS requests over longer period of time to see if I can find any suspicious activity.

Alternatives

Sure enough, one can use tcpdump, wireshark or other packet analyzer to do the job. But they are relatively complex and designed for troubleshooting rather than long-term analytics based on aggregated data.

Or one can use tools developed to track DNS queries specifically, like dnstop does. It’s nice, but again, works more like top tool showing queries happening at the moment rather than trends.

I needed something that allows me to collect data over longer time windows, apply filters and get some analysis done on the aggregated results.

Update 2024-01-07: the closest alternative is probably Pi-hole. It’s rich in features and visualisations, but probably a bit too “intrusive” to be run on my local machine: too much code that I don’t have time to investigate, webserver with php website for the stats dashboard. I’d prefer to use it on a dedicated machine (like a RaspberryPi device) if needed.

dnseen

dnseen is a simple DNS queries analyzer that works on top of the tcpdump logs. For now it works on Linux only (tested with Fedora 39), and get use of the systemd. Under the hood it’s just a systemd service running tcpdump and writing simple plain-text log to the file system. The analyzer program parses logs, applies filters from command line options, and prints the statistics report to the terminal. Reports can be printed for humans in pretty tabular format or as a plain text to be further used in the Unix pipelines. The program also comes with the installer script to make the installation quick and easy.

Findings

So far dnseen seems to be handy. When it comes to finding sources of excessive media consumption, it allowed me to block some news and social websites that I don’t really want to spend my time with.

Regarding detecting tracking activities, Fedora’s NetworkManager turned out to send regular requests to help with the captive portal detection that I don’t need on my home desktop at all. I also blocked quite a few Mozilla’s domains related to the services I never allowed Firefox to use, like Mozilla Location Service.

Conclusion

All in all, the project seems to be useful. tcpdump logs that systemd service’s collecting can also be used for further analysis with other tools, e.g. to determine DNS queries per domain with the breakdown by hours.

Also, doing data analysis with Clojure is such a pleasure! Threading macros is a game changer as it brings the power of Unix pipelines to the functional programming language.

Using babashka feels good too. It’s not my first babashka project, but the first one using babashka.cli for command-line options parsing. It’s simple, yet powerful!

Developing with Clojure and Babashka allow me to work in a flow, with almost no distractions to external world, as the language and its ecosystem are very well designed, self-sufficient, and come with nice tooling like Emacs CIDER. It feels like tinkering in the garage in my childhood!