this project is can be found on GitHub
The main purpose was to create a simple scraper that made use of the same functionality as a browser’s reader mode. It’s a feature in most modern browsers.
etc.
I make use of the readability library that was opensourced by Mozilla.
It simply uses axios
to get the content of a page then sanitize it and put
into a virtual DOM using jsdom
and DOMPurify
. Finally, return the
readability
payload as an API response.
It definitely isn’t the most optimized scraper ever and probably won’t work for most webpages too well, but it’s a cheap and dirty option for self-hosting or testing with other projects.