OpenAI reveals new web crawler named "GPTBot"

Aug 8, 2023

Misc

OpenAI has recently announced its new web crawler, GPTBot. This bot will collect publicly available data for the purpose of training AI models, which the business claims will be done transparently and responsibly.

According to OpenAI's release documentation, the web crawler will filter to eliminate sources that need paywall access as well as personally identifiable information (PII) or material that violates company regulations. According to the inventor of the GPT, letting the bot will assist in increasing the accuracy and capabilities of AI systems in the future.

This revolutionary step not only promises to improve the precision, capabilities, and safety of AI models, but it also ignites deep debates about data ethics, ownership, and use in the digital age. Though OpenAI admits that it scrapes the internet for training huge language models like GPT-4, this appears to be a half-baked solution to the ethical issues around taking data from other people's websites.

GPTBot access can be limited

In acknowledging the variety inherent in digital environments, OpenAI gives webmasters the ability to choose the amount to which GPTBot interacts with their websites. Webmasters can limit GPTBot's access totally or specify the directories it can browse by making cautious changes to their robots.txt files.

The launch of GPTBot provides webmasters and content providers with a new viewpoint, providing a window into the exploration of their digital domains. Webmasters may analyze GPTBot's interactions with their websites thanks to extensive documentation, and they can control access using the standard robots.txt protocol.

Watch out for these ChatGPT scams

Access control is a simple technique that entails including the following directives:

User-agent: GPTBot Disallow: /

The following structure can be used for a more refined approach that allows for more selective access:

User-agent: GPTBot Allow: /directory-1/ Disallow: /directory-2/
Balancing Act: Legal, Ethical, and Ownership Considerations

Recently, OpenAI applied for a trademark for 'GPT-5,' implying that the firm is training its next version of GPT-4, which, according to various sources, will be close to AGI, which has been the company's objective all along. GPTBot will undoubtedly assist the organization in gathering additional data from around the internet in order to train this model. On the other side, the corporation also stopped using its AI Classifier to recognize GPT-produced text.

OpenAI reveals new web crawler named "GPTBot"

GPTBot access can be limited

Related content

Tutorials & Tips

How to delete all Google history from every device

The only Starfield performance optimization guide you need

How to fix Disney+ Hotstar Error code: PB_WEB_DR-6007-001_X

How to fix Roblox error code 277: Explained

Comments

Leave a Reply Cancel reply

Advertisement

Spread the Word

Advertisement

Hot Discussions

Advertisement

Recently Updated

Latest from Softonic

Advertisement

About gHacks

OpenAI reveals new web crawler named "GPTBot"

GPTBot access can be limited

Related content

Canalys: every PC sold will be an AI PC by 2030

Valve introduces Steam Families to allow members to share their libraries simultaneously

HP's All-In-Plan will let you rent printers, but it monitors them

Microsoft adds games that you can play without download to the Microsoft Store

The 10 best hidden Google Games that you can play in your browser

Xbox's Auto-Upload feature may get your account banned

Tutorials & Tips

How to delete all Google history from every device

The only Starfield performance optimization guide you need

How to fix Disney+ Hotstar Error code: PB_WEB_DR-6007-001_X

How to fix Roblox error code 277: Explained

Comments

Leave a Reply Cancel reply

Advertisement

Spread the Word

Advertisement

Hot Discussions

Advertisement

Recently Updated

Latest from Softonic

Advertisement

About gHacks