Open-Source AI Is Increasingly Popular But Not Risk-Free
Open-source AI has become increasingly prominent, with many organizations adopting it into their workflows.

Open-source AI projects are seeing explosive growth, and are contributing to the estimated $15.7 trillion impact AI will have on the global economy by 2030, according to PwC.
However, some enterprises have been hesitant to fully embrace AI. While over 70% of companies were experimenting with AI in 2023, only 20% were willing and able to invest more, according to VentureBeat.
Open-source tooling gives businesses cost-effective and accessible AI use, including customization, transparency, and platform independence. That said, it also carries significant costs for the unprepared. Managing these risks becomes critical as enterprises expand their AI experimentation.
Risk #1: Training Data
Many AI tools rely on vast amounts of training data to develop models and generate outputs.
For example, OpenAI’s GPT-3.5 was supposedly trained on 570 gigabytes of online text data, totaling roughly 300 billion words. More advanced models typically require even larger and often less transparent datasets. Some open-source AI tools launch with a lack of dataset disclosures, or overwhelming disclosures, thus limiting useful model evaluations and posing potential risks.
A code generation AI tool, for instance, could be trained on proprietary, licensed datasets without permission, leading to unlicensed output, and possible liability.
Open-source AI tools that use open datasets still face challenges, such as evaluating data quality. This helps ensure a dataset hasn’t been corrupted, is regularly maintained, and includes data suited for the tool’s intended purpose.
Regardless of the data’s origins, enterprises should carefully review training data sources and tailor future datasets to the use case where possible.
Risk #2: Licensing
Proper data, model, and output licensing present difficult questions for AI proliferation.
The open-source community has been discussing the suitability of traditional open-source software licenses for AI models. Current licensing ranges from freely open to partial use restrictions. Unclear criteria for qualifying as “open source” can lead to licensing confusion. The licensing question can trickle downstream: If a model produces output from a source with a viral license, you may need to adhere to that license’s requirements.
With models and datasets constantly evolving, evaluate every AI tool’s licensing against your chosen use case. Legal teams should help you understand limitations, restrictions and other requirements, like attribution or a flow-down of terms.
Risk #3: Privacy
As global AI regulations emerge and discussions swirl around the misuse of open-source models, companies should assess regulatory and privacy concerns for AI tech stacks. Be comprehensive in your risk assessments. Ask AI vendors direct questions, such as:
- Does the tool use de-identification to remove personal identifiable information (PII), especially from training datasets and outputs?
- Where is training data and fine-tuning data stored, copied and processed?
- How does the vendor review and test accuracy and bias, and on what cadence?
- Is there a way to opt in or out of data collection?
Where possible, implement explainability for AI and human review processes. Build trust and the business value of AI by understanding the model and datasets enough to explain why the AI returned a given output.
Risk #4: Security
Open-source software’s security benefits simultaneously pose security risks.
Many open-source models can be deployed in your environment, giving you your security controls’ benefit. However, open-source models can expose the unsuspecting to new threats, including output manipulation and harmful content by bad actors.
AI tech startups offering tools built on open AI can lack adequate cyber security, security teams, or secure development and maintenance practices. Organizations evaluating these vendors should ask direct questions, such as:
- Does the open project address cybersecurity issues?
- Are the developers involved in the project demonstrating secure practices like those outlined by OWASP?
- Have vulnerabilities and bugs been promptly remediated by the community?
Enterprises experimenting with AI tooling should continue following internal policies, processes, standards, and legal requirements. Consider best security practices like:
- The tool’s source code should remain subject to vulnerability scanning.
- Enable branch protection for AI integrations.
- Interconnections should be encrypted in transit and databases at rest.
- Establish boundary protection for the architecture and use cases.
A strong security posture will serve enterprises well in their AI explorations.
Risk #5: Integration and Performance
Integration and performance of AI tooling matters for both internal and external use cases at an organization.
Integration can affect many internal elements, like data pipelines, other models and analytics tools, increasing risk exposure and hampering product performance. Tools can also introduce dependencies upon integration, like open-source vector databases supporting model functionality. Consider how those elements affect your tool integration and use cases, and determine what additional adjustments are needed.
Monitor AI’s impact on system performance after integration. AI vendors may not carry a performance warranty, causing your organization to handle development if open-source AI does not meet your expectations.
The costs associated with maintaining and scaling AI functions, including data cleaning and subject matter expertise time, climb quickly.
Know Before You Go Open Source
Open-source AI tooling offers enterprises an accessible and affordable way to accelerate innovation.
Successful implementation requires scrutiny and a proactive compliance and security posture. An intentional evaluation strategy for the hidden costs and considerations of leveraging open-source AI will ensure ethical and intelligent use.

About the Author
Jessica Hill
Principal Legal Counsel, Product and Privacy, New Relic
Jessica Hill is an experienced product and privacy attorney adept at navigating the intersection of law and technology. She has been at New Relic for over three years, where she specializes in cross-functional initiatives to drive compliance and innovation.