Furnishing your documentation with Q&A software using GPT-3, embeddings, and Datasette

Jan 16, 2023

If you’ve spent any amount of time using large language models, you may have considered the potential benefits of being able to utilize GPT-3 or ChatGPT with a specific, up-to-date collection of text or documentation to enhance its ability to answer questions. As it turns out, there is a clever method for doing just that. This article is based on research by Simon Willison. 

Through some blood, sweat and coding, Willison found a shortcut for this process through a utility he created called Datasette. He used Datasette as a tool for quickly testing new artificial intelligence techniques through the use of custom SQL functions.

Here’s how to do this using Simon’s Semantic Search Answers Pattern:

  • Conduct a search (either text or semantic) in your documentation to identify relevant information to the user's inquiry.
  • Collect relevant parts of that information and combine them into a single block of text.
  • Create a prompt by combining that text with the statement "Given the above content, answer the following question: " and the user's question.
  • Use the GPT-3 API to process the entire prompt and return a response.

In Simon’s words: ‘This is a form of prompt engineering (and hence is prone to prompt injection attacks, so bear that in mind). You need to fit your overall prompt into the token limit for GPT-3, which is currently 4,000 tokens. A token is more-or-less the same thing as a word.’

How to implement Q&A against your documentation with GPT3

To increase your understanding of Simon’s process, we’ve included a short example of the Q&A section he ran against his personal blog. 

Q: ‘What does shot scraper do?’

A: ‘Shot-scraper is a command-line tool that helps automate the process of taking screenshots of web pages and scraping data from them using JavaScript. It can be used to take one-off screenshots or take multiple screenshots in a repeatable way by defining them in a YAML file. It can also be used to execute JavaScript on a page and return the resulting value.’

Simon used his method to generate these two pieces of text. You can also institute a semantic search using embeddings. Here’s how Simon describes the process. 

‘An embedding is a list of floating point numbers. As an example, consider a latitude/longitude location: it’s a list of two floating point numbers. You can use those numbers to find other nearby points by calculating the distances between them. Add a third number and now you can plot locations in three-dimensional space—and still calculate distances between them to find the closest points.

This idea keeps on working even as we go beyond three dimensions: you can calculate distances between vectors of any length, no matter how many dimensions they have.

So if we can represent some text in a many-multi-dimensional vector space, we can calculate distances between those vectors to find the closest matches.’

Now, I’m not a coder, and the rest of Simon’s research goes to a place that I’m afraid to say makes my little journalist head spin. So, if you’d like more clarity on Simon is and what he does, you can find his blog here


Tutorials & Tips

Previous Post: «
Next Post: «


  1. bruh said on August 18, 2023 at 1:25 pm

    Uhh, this has already been possible – I am not sure how but remember my brother telling me about it. I’m not a whatsapp user so not sure of the specifics, but something about sending the image as a file and somehow bypassing the default compression settings that are applied to inbound photos.

    He has also used this to share movies to whatsapp groups, and files 1Gb+.

    Like I said, I never used whatsapp, but I know 100% this isn’t a “brand new feature”, my brother literally showed me him doing it, like… 5 months ago?

  2. 💥 said on August 18, 2023 at 3:55 pm

    Martin, what happened to those: 12 Comments (https://www.ghacks.net/chatgpt-gets-schooled-by-princeton-university/#comments). Is there a specific justifiable reason why they were deleted?

    Hmm, it looks like the gHacks website database is faulty, and not populating threads with their relevant cosponsoring posts.

  3. 45 RPM said on August 19, 2023 at 6:29 pm

    The page on ghacks this is on represents the best of why it has become so worthless, fill of click-bait junk that it’s about to be deleted from my ‘daily reads’.

    It’s really like “Press Release as re-written by some d*ck for clicks…poorly.” And the subjects are laughable. Can’t wait for “How to search for files on Windows”.

    1. owl said on August 20, 2023 at 12:51 am

      > The page on ghacks this is on represents the best of why it has become so worthless, fill of click-bait junk…

      Sadly, I have to agree.

      Only Martin and Ashwin are worth subscribing to.
      Especially Emre Çitak and Shaun are the worst ones.

      If ghacks.net intended “Clickbait”, it would mark the end of Ghacks Technology News.
      Ghacks doesn’t need crappy clickbaits. Clearly separate articles from newer authors (perhaps AIs and external sales person or external advertising man) as just “Advertisements”!

      We, the subscribers of Ghacks, urge Martin to make a decision.

  4. chessandonions said on August 20, 2023 at 12:40 am

    because nevermore wants to “monetize” on every aspect of human life…

  5. Frank Rizzo said on August 20, 2023 at 11:52 pm

    “Threads” is like the Walmart of Social Media.

  6. Ashray said on August 21, 2023 at 4:06 pm

    How hard can it be to clone a twitter version of that as well? They’re slow.

  7. Paul(us) said on August 21, 2023 at 5:16 pm

    Yes, why not mention how large the HD files can be?
    Why, not mention what version of WhatsApp is needed?
    These omissions make the article feel so bare. If not complete.

    1. Paul(us) said on August 21, 2023 at 5:18 pm

      Sorry posted on the wrong page.

  8. Marc said on August 21, 2023 at 6:00 pm

    such a long article for such a simple matter. Worthless article ! waste of time

  9. plusminus_ said on August 21, 2023 at 7:54 pm

    I already do this by attaching them via the ‘Document’ option.

  10. John G. said on August 21, 2023 at 11:43 pm

    I don’t know what’s going on here at Ghacks but it’s obvious that something is broken, comments are being mixed whatever the article, I am unable to find some of my later posts neither. :S

  11. Tom Hawack said on August 23, 2023 at 2:28 pm

    Quoting the article,
    “As users gain popularity, the value of their tokens may increase, allowing investors to reap rewards.”

    Besides, beyond the thrill and privacy risks or not, the point is to know how you gain popularity, be it on social sites as everywhere in life. Is it by being authentic, by remaining faithful to ourselves or is it to have this particular skill which is to understand what a majority likes, just like politicians, those who’d deny to the maximum extent compatible with their ideological partnership, in order to grab as many of the voters they can?

    I see the very concept of this Friend.tech as unhealthy, propagating what is already an increasing flaw : the quest for fame. I won’t be the only one to count himself out, definitely.

    1. Tom Hawack said on August 23, 2023 at 2:34 pm

      @John G. is right : my comment was posted on [https://www.ghacks.net/2023/08/23/what-is-friend-tech/] and it appears there but as well here at [https://www.ghacks.net/2023/07/08/how-to-follow-everyone-on-threads/]

      This has been lasting for several days. Fix it or at least provide some explanations if you don’t mind.

  12. Tom said on August 24, 2023 at 11:53 am

    > Google Chrome is following in Safari’s footsteps by introducing a new feature that allows users to move the Chrome address bar to the bottom of the screen, enhancing user accessibility and interaction.

    Firefox did this long before Safari.

  13. Mavoy said on September 16, 2023 at 2:17 pm

    Basically they’ll do anything except fair royalties.

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.