Scraping Southwest Airlines Over The Web

Crawling a website — that is, programmatically pulling data from a URL — has millions of use cases. Whether you’re watching stock prices, analyzing data, or even seeking the best vacation deals, crawling websites can provide significant information.

Your Options When Crawling Websites

You can crawl the web in a few different ways:

  1. Utilize public APIs for straightforward data retrieval.
  2. Scrape HTML when dealing with pre-rendered website content.
  3. Navigate web applications via a browser, mimicking human interaction.

Each method has its place, depending on the complexity of the data and the website structure.

Puppeteer Example With Southwest

In this example we used Puppeteer to scrape Southwest over the browser.

scraper.gif

Our scraper, showcased above, automates the search for one-way tickets from Seattle to Austin.

Read more to learn about the different data gathering options and how we created this scraper.

Demos & Highlights

Carpet Geocoding

Analyzing GPS data in 2024 can be trivial. You can run queries on pollen levels, solar exposure, and air quality with the Google Maps API.

But looking at just one city’s data under a microscope isn’t that straightforward. Most city boundaries are irregular, and some border on chaotic (who said they look like “cartoon characters”?). We’ve found one approach that yields precision:

  1. Draw a square that encompasses the whole area. Save the coordinates of each corner of the square.
  2. Use Uber’s h3 library to generate the evenly-spaced coordinates within the square. We recommend a granularity of “9” which yields a few thousand coordinates for an average American city.
  3. Enumerate every coordinate and ask the Google Geocoding API if the point is part of the area. Throw out the one’s from neighboring cities.

This approach can find the boundaries of even the most misshapen areas. Below you can find Aurora, IL’s official map, and the map that we came up with.

Aurora Chicago
Official Aurora, IL map.
Aurora Chicago
Generated map: Red is part of Aurora and blue is not.

What Caught Our Eye

  • U.S. Sues Apple: The Department of Justice claims Apple is maintaining an illegal monopoly. The lawsuit highlights several practices, such as obstructing “super apps”, blocking cloud-streaming services, degrading inter-platform messaging quality, and limiting third-party accessory compatibility.

  • Hacking Hotel Rooms: Security researchers revealed a hack to gain access to any room of the widely used Saflok hotel keycard locks, affecting 3 million doors globally. Despite efforts, a full fix remains incomplete.

  • Alarming MFA Attack Hits Apple Users: Cybercriminals are exploiting Apple’s password reset feature to spam users with system prompts, followed by phishing calls spoofing Apple support. Victims are pressured into sharing a one-time code that could compromise their Apple ID and personal devices.

  • Youtube to Disclose AI Generated Content: YouTube introduced a new feature in Creator Studio, mandating creators to disclose the use of altered or synthetic media in content that appears realistic. They claim this initiative is aimed at bolstering viewer trust and transparency.

  • How to Start Google: Paul Graham of Y Combinator, gives advice for aspiring entrepreneurs. He delineates the journey to startup success through three critical steps: 1) Mastering a technology via passion & practice, 2) Identifying unmet needs and 3) Selecting the right co-founders.