Creating an AO3 Web Scraper With Node

Howard Lee
Geek Culture
Published in
4 min readApr 6, 2021

--

I was doing a personal project involving AO3 involving the results from a user’s works, and to my distress, there existed no API that I could have easily made data fetching requests to.

So I decided to create my own web scraper to (partly) remedy that. (I believe an API is on the AO3 team’s todo list but why wait?)

After doing some research and assessing my own requirements, I opted to go with a Node.js CLI scraper. I understand that most web scrapers use Python, but my personal project involves data fetching for a Sapper app, so having a web scraper in JavaScript made much more sense.(it’s a bit disappointing that the development team is not continuing with this as this has rapidly turned out to be one of my favorite framework, but what can one do?).

Initial steps

To begin, you will need basic familiarity with Node.js/JavaScript in general and an understanding of HTML.

Set up your Node project as usual. If you have issues at this stage, here is a handy guide to help.

The dependencies I used are: Axios, Cheerio, and FS.

Axios here will be used to make requests to the url. In this instance, we are only making simple get requests. You are also free to use fetch or any other dependencies you prefer.

Cheerio is used to parse the markdown on the webpage information we retrieve with Axios. Again, feel free to use whatever dependency you prefer if you have an alternative in mind.

I am using FS here to write the results we scrape to a .txt file. If you want, you can easily modify this app to write the results instead to an Excel or a .csv file, or even to a database.

Install all these before proceeding.

This is how your package.json should look after installing all the dependencies.

package.json after installing dependencies

Retrieving the web page info

I created a webscraper_util.js file and used Axios to make http requests to the AO3 user page that I want the information from.

We make a simple GET request using Axios; the result that will be returned will be the web page’s HTML.

Web scraping and formatting the results

Now comes the hard part. After we retrieve the HTML results from AO3 (very important note: Axios will return the HTML, not JSON or any other easy to parse data), we will need to use Cheerio to parse the HTML that we have.

After retrieving the HTML, we use the Promise feature to create our dataset for use.

After we retrieve and parse the HTML, we can manipulate the data to our preferred format. Here it is a bit tricky, though, as we will be getting all the results in an array and putting it onto an object

Cheerio syntax is basically the same, or at least very similar to JQuery, and uses the same methodology to select various elements of the DOM. Below is a sample:

In this specific instance, we grab the HTML element that has the title of AO3 user’s works and convert it into an array variable.

Writing to txt

I opted to write the results to a .txt file. This isn’t particularly useful, but as mentioned, this is only for simple demo purposes. For more effective use of this, you can opt to write the result (after formatting it appropriately) to any other files you want or even to a database for a backend server.

Below is the code snippet for writing to a .txt file.

Done!

There you have it! A simple web scraper for an AO3 user page.

You can run the CLI app in your terminal using node.

You can view the full repo here.

Thanks for reading! Let me know if you have any comments or questions. For me personally, I’m aiming to add more interactivity to the CLI.

--

--

Howard Lee
Geek Culture

“Your sacred space is where you can find yourself over and over again.” — Joseph Campbell