Join us for our biggest AI launch event on 10/31

Announcing Visual Copilot - Figma to production in half the time

Builder.io logo
Contact Sales
Platform
Developers
Contact Sales

Blog

Home

Resources

Blog

Forum

Github

Login

Signup

×

Visual CMS

Drag-and-drop visual editor and headless CMS for any tech stack

Theme Studio for Shopify

Build and optimize your Shopify-hosted storefront, no coding required

Resources

Blog

Get StartedLogin

‹ Back to blog

AI

Introducing GPT Crawler - Turn Any Site Into a Custom GPT With Just a URL

November 14, 2023

Written By Steve Sewell

Let's create a custom GPT in just two minutes using a new open-source project called GPT Crawler. This project lets us provide a site URL, which it will crawl and use as the knowledge base for the GPT.

You can either share this GPT or integrate it as a custom assistant into your sites and apps.

Why create a custom GPT from a site

I created my first custom GPT based on the Builder.io docs site, forum, and example projects on github and it can now answer detailed questions with code snippets about integrating Builder.io into your site or app. You can try it here (currently requires a paid ChatGPT plan).

Our hope is that by making our docs site interactive, people can more simply find the answers they are looking for using a chat interface.

And this can help not just with discoverability, saving people time not having to dig through to find the specific docs they need, but also personalize the results, so even the most esoteric questions can be answered.

This method can be applied to virtually anything to create custom bots with up-to-date information from any resource on the web.

First, we'll use this new GPT crawler project that I've just open-sourced.

To get started, all we need to do is clone the repository, which we can do with a brief git clone command.

git clone https://github.com/builderio/gpt-crawler

After cloning, I'll cd into the repository and then install the dependencies with NPM install.

cd gpt-crawler
npm install

Next, we open the config.ts file in the code and supply our configuration. Within this file, we specify a base URL as the starting point for the crawl and define the criteria for the links to crawl on subsequent pages. We can also set up a matching pattern; for example, I might want to crawl only 'docs' and exclude everything else.

export const config: Config = {
  // Start the crawl at this URL
  url: "https://www.builder.io/c/docs/developers",
  // Only crawl URLs matching this pattern
  match: "https://www.builder.io/c/docs/**",
  // Only grab the text from within this selector
  selector: `.docs-builder-container`,
  // Don't crawl more than 1000 pages
  maxPagesToCrawl: 1000,
  // The file name that our results will output to
  outputFileName: "output.json",
};

I recommend providing a selector as well. For the Builder docs, for example, I set it to scrape only a specific area and not the sidebar, navigation, or other elements.

Now, we can run npm start in our terminal, and in real time the crawler processes our pages.

npm start

This crawler uses a headless browser, so it can include any markup, even those that are purely client-side rendered. You can also customize the crawler to log into a site to crawl non-public information.

After the crawl is complete, we'll have a new output.json file, which includes the title, URL, and extracted text from all the crawled pages.

[
  {
    "title": "Creating a Private Model - Builder.io",
    "url": "https://www.builder.io/c/docs/private-models",
    "html": "..."
  },
  {
    "title": "Integrating Sections - Builder.io",
    "url": "https://www.builder.io/c/docs/integrate-section-building",
    "html": "..."
  },
  ...
]

We can now upload this directly to ChatGPT by creating a new GPT, configuring it, and then uploading the file we just generated for knowledge. Once uploaded, this GPT assistant will have all the information from those docs and be able to answer unlimited questions about them.

Alternatively, if you want to integrate this into your own products, you can go to the OpenAI API dashboard, create a new assistant, and upload the generated file in a similar manner.

This way, you can access the assistant over an API, providing custom-tailored assistance within your products that have specific knowledge about your product right from your docs or any other website, just by providing a URL and crawling the web.

If you have a use case where you or others would value a custom GPT specifically focused on a given topic or information set that can be scanned via a website, give this a try and I can’t wait to see what you build!

And if you see ways to make this project better, send a PR!

Introducing Visual Copilot: convert Figma designs to high quality code in a single click.

Try Visual Copilot

Share

Twitter
LinkedIn
Facebook
Hand written text that says "A drag and drop headless CMS?"

Coming soon: add interactivity and data to your designs

Reserve Your Spot
Newsletter

Like our content?

Join Our Newsletter

Continue Reading
Visual Editing7 MIN
Visual editing is bridging the gap between developers and designers
October 11, 2024
SEO10 MIN
A helpful approach to navigating the SEO AI shift
October 3, 2024
Personalization12 MIN
High-Performance Personalization For Modern Frontends
September 26, 2024