pip-license-checker: detect third-party dependencies with “wrong” licenses

Intro

A software project may have hundreds of third-party dependencies. Even when available freely over the Internet, these dependencies may be distributed under licenses with some conditions, more or less restrictive. There are over a hundred popular free software and open-source licenses used nowadays. All of them may be roughly divided into a few license types depending on their conditions for usage, attribution, modification and distribution. Third-party dependencies under licenses of some types are good for your project, while others may result in license violations. License violations can pose high risks, both financial and reputation losses. Understanding what license types are good for your project’s 3rd-party dependencies is important. Once you know what are you looking for, check your dependencies regularly. pip-license-checker is a tool for automating license type checks that may help.

What is pip-license-checker?

pip-license-checker is a license compliance tool. It detects license types for your project’s third-party dependencies. The tool covers the majority of popular free/libre and open-source (FLOSS) licenses of all types: public domain, permissive, copyleft. It also detects proprietary licenses or EULA as well as technical problems obtaining the license for a dependency.

Originally developed with Python packages in mind, the tool operates in two modes:

  1. Detecting license names and types for the list of Python dependencies
  2. Detecting license types for any list of dependencies with their license names

The tool supports the output of some popular language-specific license detection plugins:

as well as universal CSV format.

The tool is distributed both as a:

so that it’s easy to use locally on your machine or incorporate into the project’s CI/CD pipeline.

Takeaways

pip-license-checker is a tool to check license types in your project’s third-party dependencies. It covers the majority of popular FLOSS licenses. It has integration with some popular license fetching plugins. You can easily incorporate the tool into CI/CD pipeline to automate license compliance checks.

Why license checks are needed?

Nobody develops software from scratch today. Software development has evolved from an artisan shop’s craft to a heavy industry assembly line. We, software developers, rely heavily on third-party code: libraries, packages, platforms, operating systems. Building something new has become, to a larger extent, assembling parts, applying some glue, implementing custom logic with the code written by someone else, someone outside of the project or the company. Most of such third-party dependencies today are freely available over the internet, both as executable code that a machine can run, and as human-readable code, we, developers, can use, copy and learn from.

But does being freely available over the network mean these third-party dependencies are really free? Free as in “free lunch”? Free as in “freedom”? Most of the dependencies are distributed under popular FLOSS licenses, e.g. Apache License, MIT License, Mozilla Public License or GNU General Public License to name a few. The licenses differ on the permissiveness/restrictiveness scale, ranging from the public-domain-like licenses with no conditions to the licenses requiring the authors of derivative works to give away the source code and use the same or compatible license under certain circumstances. Some dependencies are distributed as “freeware”, i.e. under proprietary end-user license agreements (EULA), usually with some permissions and conditions.

The conditions some licenses impose may be unacceptable for your personal project or your company. License violation is a legal risk. The court cases are real. Both financial and reputation losses may be high. In such cases preventing license violations is crucial to mitigate or avoid risks.

Sometimes choosing a “wrong” license for your FLOSS project may become an obstacle for its wide adoption (e.g. copyleft license for a project in a presence of many other alternative proprietary or free software projects of high quality). Checking licenses for project’s dependencies may help to understand your options when choosing the license most suitable for your goals.

Takeaways

Third-party dependencies freely available over the internet are usually distributed under some terms and conditions of FLOSS licenses or EULA. The conditions vary in their permissiveness. Some license types may be too restrictive to be used in your project.

Legal implications and reputation losses of license violations are real. To mitigate or avoid the risks regular license compliance checks are needed.

Why automating license compliance checks is needed?

The number of third-party dependencies (deps) in a project may be high: from tens in a tiny personal pet project to hundreds in a big one, to even thousands in the enterprise settings. As the project evolves the list of dependencies may change over time too.

The list of popular FLOSS licenses is also long. Over 100 licenses and counting! While some licenses are short and easy to understand, others are challenging reads that may require a lawyer qualification to grasp the conditions. Arguably, no average software developer ever read all of the FLOSS licenses that exist in the wild.

Increasing dependencies versions (so-called version bumps) in the project may also result in dependencies changing their licenses. This is the most relevant for the deps at the early stages of their development.

All in all, when managing a project’s 3rd-party dependencies, we need not only to comply with their licenses but do these license checks regularly, in an automated way. That requires a tool to be incorporated into the project’s continuous integration and continuous delivery (CD/CD) pipeline.

This is exactly what pip-license-checker is for.

Takeaways

Modern projects rely on 10s, 100s or even 1000s of third-party dependencies. Their list is changing over time as projects evolve. This is too much to check licenses manually.

Free Software Foundation and Open Source Initiative, the two most respected non-profit organisations for free/libre and open-source software, approved 100s of popular FLOSS licenses. Reading and understanding their conditions is a challenge for an average software developer. Expertise is needed.

Dependencies themselves may change their licenses over their lifetime too.

Automated regular license compliance checks are needed.

Who needs to check licenses?

Anyone who:

  • develops a proprietary project based on other FLOSS code
  • creates, contributes to, or maintains a FLOSS project

As an independent open-source developer, you may want to check your project deps’ licenses to make sure the project’s license itself complies with the libraries and packages it’s based on. That will help to avoid the risk of reputation losses or enable you to choose the right license for your project to make headway.

As a pro-profit company, you may want to make sure your proprietary software product doesn’t violate dependencies licenses to avoid legal risks, financial and reputation losses. As a startup, you may want to have license compliance in place before an investment round, valuation or acquisition.

Why pip-license-checker?

There are dozens of tools and plugins to detect license names for the third-party dependencies used in your project. But as it has been said above, for a large number of dependencies and an ever-growing number of open source licenses, having the list of license names may bring little or no value without legal expertise applied.

pip-license-checker brings some of the expertise needed by detecting license types, so that a user may decide what types are unacceptable in her/his project and get alerted whenever the licenses of “wrong” types are found among the project’s third-party dependencies.

What are license types?

Despite a large number of licenses used by third-party dependencies, all of them can be split into a relatively small number of categories.

Public-domain-like licenses

The most permissive type of FLOSS licenses is public domain equivalent licenses. Zero-Clause BSD is a great example of public-domain-like licenses. This license type has no conditions, but may have a liability waiver.

Permissive licenses requiring attribution

Permissive licenses with attribution clause usually allow modification and redistribution under the license of choice with the only condition of preserving original copyright notice. Some licenses are indiscriminate about how exactly the attribution is to be done, while others require prominent notice, e.g. preserving the original notice in the source code and adding the copyright notice to the application’s legal section. Apache License and MIT License both fall under this license type.

Along with public-domain-like licenses, this license type is the most permissive of all FLOSS licenses, which makes them easy to be hijacked into a proprietary close-sourced code base or incorporated into a FLOSS project with no relicensing.

Copyleft

Copyleft is the next big category of licenses. The licenses of this type may also be called reciprocal as they require that modification and distribution of derivative works are done under the same or compatible license with the source code made available.

The copyleft licenses differ on the “permissiveness/restrictiveness” scale as well and can be broken down into the following fine-grained copyleft license types.

Weak copyleft

Weak copyleft licenses trigger the copyleft (reciprocal share-alike mechanism requiring to disclose the derivative work’s source code and to use the same or compatible license) when an original file or a library under the weak copyleft license is modified and redistributed. Simply using or incorporating the file or a library under weak copyleft into a larger work (e.g. by dynamic linking) and distributing it does not require giving away the source code or use the corresponding weak copyleft license. Mozilla Public License and Eclipse Public License are good examples of weak copyleft type of FLOSS licenses.

Third-party deps under weak copyleft licenses try to be business-friendly by allowing incorporating into a larger work but care about sharing the improvements made to the original work. Weak copyleft licenses may be unacceptable in some cases but very welcome in others.

Strong copyleft

The strong copyleft license type tries to go even further in its conditions that require to disclose the source code of the derivative works and require to use the same strong copyleft license. Combining your proprietary modules with a module under strong copyleft into a single executable file; dynamic linking of a library under strong copyleft license in your permissive FLOSS project; or even exchanging data between your program and a standalone program under strong copyleft in such a way that semantics of the communication are intimate enough, exchanging complex internal data structures, may be considered a work based on the software under the strong copyleft (derivative work) so that in case of distribution the source code must be given away and the same strong copyleft license must be used for the derivative work. GNU General Public License is the most illustrative example of a strong copyleft license type.

Third-party dependencies under strong copyleft licenses cannot be part of a proprietary code delivered to the outside users (unless a proprietor decided to give away a source code of the project and relicense it under a corresponding strong copyleft license). Private usage of a derivative work without distribution to outside users or customers (e.g. on your computer only, or on your company’s server only) is not considered a distribution and does not trigger the copyleft.

Copyleft over a network

The most restrictive license category of all copyleft licenses. Network copyleft license type picks up where the strong copyleft has left, i.e. it expands what distribution is and amends what private usage is by considering users interaction with a derivative work over a network as the work’s distribution. GNU Affero General Public License, a notable example of the copyleft-over-a-network license, aims at programs that are used on servers:

[…] if you run a modified program on a server and let other users communicate with it there, your server must also allow them to download the source code corresponding to the modified version running there.

Copyleft over a network license type may be too restrictive in many cases so that detecting it in your project’s dependencies is important.

License categorisation

Categorisation of popular FLOSS licenses into a few above-mentioned license types is the primary value of the pip-license-checker tool. Technically speaking, it’s just a mapping between regular expressions for license names to license types. But making up such a mapping requires reading and understanding the license texts.

Some licenses are derived from others, e.g. a license ZYX version 3.0 is a successor of the deprecated license ZYX version 2.0. In such cases, instead of reading both text categorisation could have relied on the latest version’s text along with the license’s official FAQ explaining the differences between the two versions.

Lacking lawyers support in the license categorisation process could have also resulted in some inaccuracies. Please, report issues regarding the license types.

Technical facts

pip-license-checker is a program written in Clojure. With quality in mind, the code has high test coverage and uses generative property-based testing. The tool’s own CI/CD pipeline includes a step for checking 3rd-party dependencies’ licenses with pip-license-checker.

Outro

Make use you know what license types are safe/desirable for your project’s 3rd-party dependencies. Is your project an open-source library for others to dynamically link in their software? Or a closed-source mobile app available over a popular digital distribution service as an executable? Or an embedded software for a device for wide public use? A website’s frontend software delivered to users when the website is requested via a browser? Or a backend software privately run in the background on your company’s server? In other words, what’s your project’s usage and distribution “profile” and what are the actions that are not acceptable for the project: source code disclosure, an obligation to change the license to reciprocal one, dependency’s authors’ attribution, etc.?

Once you know the answers, try to incorporate third-party deps license checks into your project’s CI/CD pipeline to detect licenses you are not safe to use. The pip-license-checker tool may help with that.

P.S. See the tool’s documentation in the pip-license-checker GitHub repository and the correspondent GitHub Action. Your feedback is welcome via email or GitHub Issues.

Disclaimer

pip-license-checker and this blog post are provided on an “as-is” basis and make no warranties regarding any information provided through them and disclaim liability for damages resulting from using them. Using pip-license-checker and this blog post does not constitute legal advice nor does it create an attorney-client relationship.

Addendum

Interview 2021-09-15

I’ve been interviewed by Yegor Bugayenko recently. We’ve talked about the pip-license-checker. As my very first public presentation of the project, the interview isn’t a perfect one. I’ve updated and extended this blog post in an attempt to fix all the flaws of my answers to the interviewer’s questions. Overall, the interview was good to show the project’s shortcomings and to give some directions for further development.