OpenAI reveals new web crawler named "GPTBot"

OpenAI has recently announced its new web crawler, GPTBot. This bot will collect publicly available data for the purpose of training AI models, which the business claims will be done transparently and responsibly.
According to OpenAI's release documentation, the web crawler will filter to eliminate sources that need paywall access as well as personally identifiable information (PII) or material that violates company regulations. According to the inventor of the GPT, letting the bot will assist in increasing the accuracy and capabilities of AI systems in the future.
This revolutionary step not only promises to improve the precision, capabilities, and safety of AI models, but it also ignites deep debates about data ethics, ownership, and use in the digital age. Though OpenAI admits that it scrapes the internet for training huge language models like GPT-4, this appears to be a half-baked solution to the ethical issues around taking data from other people's websites.

GPTBot access can be limited
In acknowledging the variety inherent in digital environments, OpenAI gives webmasters the ability to choose the amount to which GPTBot interacts with their websites. Webmasters can limit GPTBot's access totally or specify the directories it can browse by making cautious changes to their robots.txt files.
The launch of GPTBot provides webmasters and content providers with a new viewpoint, providing a window into the exploration of their digital domains. Webmasters may analyze GPTBot's interactions with their websites thanks to extensive documentation, and they can control access using the standard robots.txt protocol.
Watch out for these ChatGPT scams
Access control is a simple technique that entails including the following directives:
- User-agent: GPTBot Disallow: /
The following structure can be used for a more refined approach that allows for more selective access:
- User-agent: GPTBot Allow: /directory-1/ Disallow: /directory-2/
- Balancing Act: Legal, Ethical, and Ownership Considerations
Recently, OpenAI applied for a trademark for 'GPT-5,' implying that the firm is training its next version of GPT-4, which, according to various sources, will be close to AGI, which has been the company's objective all along. GPTBot will undoubtedly assist the organization in gathering additional data from around the internet in order to train this model. On the other side, the corporation also stopped using its AI Classifier to recognize GPT-produced text.
Advertisement
Missing from the “story”: Ukraine’s agreement to never use Starlink for military purposes. This is why.
Ghacks quality is AI driven and very poor these days since AI is really artificial stupidity.
“Elon Musk biographer Walter Isaacson forced to ‘clarify’ book’s account of Starlink incident in Ukraine War
“To clarify on the Starlink issue: the Ukrainians THOUGHT coverage was enabled all the way to Crimea, but it was not. They asked Musk to enable it for their drone sub attack on the Russian fleet. Musk did not enable it, because he thought, probably correctly, that would cause a major war.”
https://nypost.com/2023/09/11/elon-musk-biographer-walter-isaacson-corrects-detail-about-starlink-in-ukraine/
I posted above comment to:
https://www.ghacks.net/2023/09/08/elon-musk-turned-off-starlink-during-ukranian-offence/
Not to the following article about Geforce where I currently also can see it published:
https://www.ghacks.net/2023/08/29/how-to-fix-geforce-experience-error-code-0x0003/
Well, using Brave, I can see Llama 2 being decent, but it is still not great?
All these AI stuff seems more like a ‘toy’ than anything special, I mean, it is good for some stuff like translations or asking quick questions but not for asking anything important.
The problem is Brave made it mostly for summarizing websites and all that, but all these Big tech controlled stuff, won’t summarize articles it doesn’t agree with, so it is also useless in many situations where you just want it to give you a quick summarization, and then it starts throwing you little ‘speeches’ about how it doesn’t agree with it and then it never summarizes anything, but give you all the 30 paragraphs reasons why the article is wrong, like if I am asking it what it thinks.
SO all this AI is mostly a toy, but Facebook with all the power they have will be able to get so much data from people, it can ‘train’ or better say, write algorithms that will get better with time.
But It is not intelligence, it is really not intelligence all these AI technology.
Article Title: Tech leaders meet to discuss regulation of AI
Article URL: [https://www.ghacks.net/2023/09/14/artificial-intelligence-regulation-tech-leaders/]
—
The eternal problematic of regulating, here applied to AI. Should regulations (interventionism) have interfered in the course of mankind ever since Adam and Eve where would we be now? Should spirituality, morality, ethics never have interfered where would we be now? I truly have always believed that the only possible consensus between ethics and freedom is that of individuals’ own consciousness.
Off-topic : Musk’s beard looks like a wound, AI-Human hand-shake is a quite nice pic :)
Haha, oh dear, Tom.
I thought that the comments system issue where comments shows up under a totally different article was fixed. But seeing your comment here, the “error” is clearly still active. Hopefully it is sorted as soon as possible.
Article Title: Tech leaders meet to discuss regulation of AI
Article URL: [https://www.ghacks.net/2023/09/14/artificial-intelligence-regulation-tech-leaders/]
—
Hi Karl :) Well, let’s remain positive and see the good sides : one’s comment appearing within different articles (the one it was written form and for, another unrelated one) brings ubiquity to that comment : say it once and it’s published twice, double your pleasure and double your fun (“with double-mint, double-mint gum” and old ad!). Let’s forget the complications and inherited misunderstandings it leads to. Not sure the fun is worth the complications though. Which is why, with a few others here, I include Article Title & URL with comment, to ease a bit the pain.
This said, I’m trying to find a logic key which would explain the mic-mac. One thing is sure : comments appearing twice keep the same comment number.
For instance my comment to which you replied just above is originally :
[https://www.ghacks.net/2023/09/14/artificial-intelligence-regulation-tech-leaders/#comment-4573676]
It then got duplicated to :
[https://www.ghacks.net/2023/08/29/how-to-fix-geforce-experience-error-code-0x0003/#comment-4573676]
Same comment number, which let’s me imagine comments are defined by their number as before but now dissociated in a way from their full path : that’s where something is broken, as i see it.
First amused me, then bothered, annoyed (I took some holidays to lower the pressure), then triggered curiosity.
I’m putting our best detectives on the affair, stay tuned.
Hehe, yes indeed, staying positive is what we should do. Good comes for those who wait, as the old saying goes. Hopefully true for this as well.
Interesting that the comments number stays the same, I noted that one thing is added to the duplicated comment in the URL, an error code, the following: “error-code-0x0003”.
Not useful for us, but hopefully for the developers (if there are any?), that perhaps will be able to sort this comments error out. Or our detectives, I hope they work hard on this as we speak ;).
Cheers and have a great weekend!
Whoops, my bad. I just now realized that the error I saw in your example URL (error-code-0x0003) was part of the linked article title and generated by Geforce! Oh dear! Why did I try to make it more confusing than it already is lol!
Original comment:
https://www.ghacks.net/2023/09/08/elon-musk-turned-off-starlink-during-ukranian-offence/#comment-4573788
Duplicate:
https://www.ghacks.net/2023/09/14/iphone-12-radiation-levels-are-too-high/#comment-4573788
Article Title: Tech leaders meet to discuss regulation of AI
Article URL: [https://www.ghacks.net/2023/09/14/artificial-intelligence-regulation-tech-leaders/]
—
@Karl, you write,
“I noted that one thing is added to the duplicated comment in the URL, an error code, the following: “error-code-0x0003”.”
I haven’t noticed that up to now but indeed brings an element to those who are actually trying to resolve the issue.
I do hope that Softonic engineers are working on fixing this issue, which may be more complicated than we can imagine. Anything to do with databases can become a nightmare, especially when the database remains accessed while being repaired, so to say.
P.S. My comment about remaining positive was, in this context, sarcastic. Your literal interpretation could mean you are, factually, more inclined to positiveness than I am myself : maybe a lesson of life for me :)
Have a nice, happy, sunny weekend as well :)
Correct: AI is certainly overhyped, it’s also advertised by some shady individuals. It’s can also be misused to write poor quality articles or fake your homework.
https://wordpress.com/support/post-vs-page/
https://wordpress.com/support/restore/
16 September 2023, this website is still experiencing issues with posts erroneously appearing in the wrong threads. There are even duplicates of the exact same post ID within the same page in some places.
Clerical error “[It] can also be misused …” you just can’t get the staff nowadays.
Obviously [#comment-4573795] was originally posted within [/2023/09/14/artificial-intelligence-regulation-tech-leaders/]. However, it has appeared misplaced within several threads.
Including the following:
[/2023/09/15/redmi-note-13-specs-release-date-and-more/]
[/2023/08/29/how-to-fix-geforce-experience-error-code-0x0003]
“How much radiation is dangerous?
Ionizing radiation, such as X-rays and gamma rays, is more energetic and potentially harmful. Exposure to doses greater than 1,000 millisieverts (mSv) in a short period can increase the risk of immediate health effects.
Above about 100 mSv, the risk of long-term health effects, such as cancer, increases with the dose.”
This ban is about NON-ionizing radiation limits, because there is too much radio wave power from the iphone. This has nothing to do with the much more dangerous ionizing radiations like X-rays, that are obviously not emitted at all by mobile phones. I invite you to correct your article.
“Aaro.mil makes history as the first official UFO website”
I wonder if it’s just smelly crowdsourcing for the spotting of chinese balloons or whatever paranoia they’re trying to instigate, or if they are also intentionally trying to look stupid enough to look for alien spaceships, for whatever reason. Maybe trying to look cute, instead of among the worst butchers of history ?
“The tech titan’s defense”
“Whether he provides a clear explanation or justifies his actions”
“the moral compass”
You take it for granted that this company should agree being a military communications provider on a war zone, and so directly so that his network would be used to control armed drones charged with explosives rushing to their targets.
You don’t need to repeat here everything you read in the mainstream press without thinking twice about it. You’re not just pointing interestingly that his company is more involved in the war that one may think at first and that this power is worrying, you’re also declaring your own support for a side in an imperialist killfest, blaming him for not participating enough in the bloodshed.
Now your article is unclear on how this company could be aware that its network is used for such military actions at a given time, which has implications of its own.
Reading other sources on that quickly, it seems that the company was: explicitly asked ; to extend its network geographically ; for a military attack ; at a time when there was no war but with the purpose of triggering it, if I understood well. You have to be joking if you’re crying about that not happening at that time. But today you have your war, be happy.