Private GitHub Repos Still Reachable Through Copilot After Being Made Private

Agencies Ghacks
Feb 26, 2025
Development, Misc
|
0

Security researchers have discovered that thousands of GitHub repositories, which were once publicly accessible but have since been made private, remain accessible through AI-powered tools like GitHub Copilot. This issue highlights the persistent nature of data exposure on the internet, where information, even if briefly public, can be retained and utilized by generative AI systems long after it has been restricted.

GitHub Copilot, developed by GitHub in collaboration with OpenAI and Microsoft, is an AI-based coding assistant that suggests code snippets and completions to developers. It has been trained on a vast corpus of publicly available code, enabling it to provide contextually relevant suggestions. However, this training data includes code from repositories that were public at the time of training but have since been made private. As a result, Copilot may still generate code suggestions based on content from these now-private repositories.

This situation raises significant concerns about data privacy and security. Developers who inadvertently exposed sensitive information in public repositories, even for a short duration, may find that this data has been ingested by AI models and can still be accessed indirectly through tools like Copilot. This underscores the importance of exercising caution when sharing code publicly and the challenges of completely retracting information once it has been exposed online.

In response to these concerns, GitHub has implemented features to enhance transparency and control over AI-generated code suggestions. For instance, Visual Studio now supports code referencing for GitHub Copilot completions, allowing developers to verify if suggestions are based on public code, which could have licensing implications. This feature provides detailed information on any public code matches found, enabling developers to make informed decisions about incorporating suggested code into their projects.

Despite these measures, the incident serves as a reminder of the enduring nature of data once it has been made public. Developers are advised to thoroughly review their code for sensitive information before making it public and to be aware that, even after making a repository private, previously exposed data may still be accessible through AI tools trained on prior public data.

Source: Techcrunch

Advertisement

Tutorials & Tips


Previous Post: «
Next Post: «

Comments

There are no comments on this post yet, be the first one to share your thoughts!

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.