Skip to content

Commit

Permalink
Merge pull request clearlydefined#141 from clearlydefined/add-criteri…
Browse files Browse the repository at this point in the history
…a-for-clearlydefined-sources

adds documentation on sources for license information
  • Loading branch information
nellshamrell committed May 13, 2021
2 parents 05d2d78 + 1873d36 commit 176722b
Show file tree
Hide file tree
Showing 2 changed files with 122 additions and 37 deletions.
89 changes: 52 additions & 37 deletions docs/_includes/sidebar.html
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ <h3>Get involved</h3>
<li>
<a href="{{ '/contributing-code' | relative_url }}">Contribute code</a>
</li>
<li>
<a href="{{ '/adding-sources' | relative_url }}">Adding a Harvest Source</a>
</li>
<li>
<a href="{{ '/adopting' | relative_url }}">Adopt ClearlyDefined</a>
</li>
Expand All @@ -49,48 +52,60 @@ <h3>Legal</h3>

</div>

<div id="tools-buttons" style="width: 100%; text-align: left">
{% if site.google_cse_token %}
<script>
(function() {
var cx = "{{site.google_cse_token}}";
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = 'https://cse.google.com/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
</script>
<gcse:search></gcse:search>
{% else %}
<form method="GET" action="{{ site.github.repository_url }}/search">
{% if site.use_github_wiki %}
<input type="hidden" name="type" value="Wikis">
{% endif %}
<input class="search-text" type="text" name="q" placeholder="Text to find"><input class="search-button" type="submit" value="Search">
</form>
{% endif %}
<div id="tools-buttons" style="width: 100%; text-align: left">
{% if site.google_cse_token %}
<script>
(function () {
var cx = "{{site.google_cse_token}}";
var gcse = document.createElement('script');
gcse.type = 'text/javascript';
gcse.async = true;
gcse.src = 'https://cse.google.com/cse.js?cx=' + cx;
var s = document.getElementsByTagName('script')[0];
s.parentNode.insertBefore(gcse, s);
})();
</script>
<gcse:search></gcse:search>
{% else %}
<form method="GET" action="{{ site.github.repository_url }}/search">
{% if site.use_github_wiki %}
<input type="hidden" name="type" value="Wikis">
{% endif %}
<input class="search-text" type="text" name="q" placeholder="Text to find"><input class="search-button"
type="submit" value="Search">
</form>
{% endif %}

{{page.relative_path}}
{% if site.use_github_wiki %}
<span class="tools-element"><a target="_blank" href="{{ site.github.repository_url }}/{{page.folder}}{{url | remove: '.html' | append: ''}}/_edit">Edit</a></span>
<span class="tools-element"><a target="_blank" href="{{ site.github.repository_url }}/{{page.folder}}{{url | remove: '.html' | append: ''}}/_history">History</a></span>
<span class="tools-element"><a target="_blank" href="{{ site.github.repository_url }}/{{page.folder}}{{url | remove: '.html' | append: '.md'}}/">Source</a></span>
{% else %}
<span class="tools-element"><a target="_blank" href="{{ site.github.repository_url }}/edit/{{site.git_branch}}{{page.folder}}{{url | remove: '.html' | append: '.md'}}">Edit Page</a></span><br/>
{% if site.use_prose_io %}
<span class="tools-element"><a target="_blank" href="http:https://prose.io/#{{site.github.repository_nwo}}/edit/{{site.git_branch}}{{page.folder}}{{url | remove: '.html' | append: '.md'}}">Edit Page with Prose.io</a></span><br/>
{% endif %}
<span class="tools-element"><a target="_blank" href="{{ site.github.repository_url }}/commits/{{site.git_branch}}{{page.folder}}{{url | remove: '.html' | append: '.md'}}">Page History</a></span><br/>
<span class="tools-element"><a target="_blank" href="{{ site.github.repository_url }}/blob/{{site.git_branch}}{{page.folder}}{{url | remove: '.html' | append: '.md'}}">Page Source</a></span><br/>
{% endif %}
</div>
{{page.relative_path}}
{% if site.use_github_wiki %}
<span class="tools-element"><a target="_blank"
href="{{ site.github.repository_url }}/{{page.folder}}{{url | remove: '.html' | append: ''}}/_edit">Edit</a></span>
<span class="tools-element"><a target="_blank"
href="{{ site.github.repository_url }}/{{page.folder}}{{url | remove: '.html' | append: ''}}/_history">History</a></span>
<span class="tools-element"><a target="_blank"
href="{{ site.github.repository_url }}/{{page.folder}}{{url | remove: '.html' | append: '.md'}}/">Source</a></span>
{% else %}
<span class="tools-element"><a target="_blank"
href="{{ site.github.repository_url }}/edit/{{site.git_branch}}{{page.folder}}{{url | remove: '.html' | append: '.md'}}">Edit
Page</a></span><br />
{% if site.use_prose_io %}
<span class="tools-element"><a target="_blank"
href="http:https://prose.io/#{{site.github.repository_nwo}}/edit/{{site.git_branch}}{{page.folder}}{{url | remove: '.html' | append: '.md'}}">Edit
Page with Prose.io</a></span><br />
{% endif %}
<span class="tools-element"><a target="_blank"
href="{{ site.github.repository_url }}/commits/{{site.git_branch}}{{page.folder}}{{url | remove: '.html' | append: '.md'}}">Page
History</a></span><br />
<span class="tools-element"><a target="_blank"
href="{{ site.github.repository_url }}/blob/{{site.git_branch}}{{page.folder}}{{url | remove: '.html' | append: '.md'}}">Page
Source</a></span><br />
{% endif %}
</div>

<div style="padding-top: 30px;">
<p xmlns:dct="http:https://purl.org/dc/terms/" xmlns:vcard="http:https://www.w3.org/2001/vcard-rdf/3.0#">
<a rel="license" href="http:https://creativecommons.org/publicdomain/zero/1.0/">
<img src="http:https://i.creativecommons.org/p/zero/1.0/88x31.png" style="border-style: none;" alt="CC0" />
</a>
</p>
</div>
</div>
70 changes: 70 additions & 0 deletions docs/adding-sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Adding a Harvest source

ClearlyDefined currently harvests several types of packages, the full list can be seen on the [ClearlyDefined Stats Page](https://clearlydefined.io/stats).

## Current Harvest Sources

### NPM

We pull NPM (Node.js) license information from https://www.npmjs.com/

### Gem

We pull Gem (Ruby) license information from https://rubygems.org/

### Pypi

We pull PyPi (Python) license information from https://pypi.org/

### Maven

We pull Maven (Java) license information from multiple sources including:

* https://mvnrepository.com/repos/central
* https://maven.google.com/

### Nuget

We pull Nuget (.NET) license information from https://www.nuget.org/

### Git

We pull Git license information from https://github.com

### Crate

We pull Crates (Rust) license information from https://crates.io/

### Deb

We pull Deb license information from http:https://ftp.debian.org/

### Debsrc

We pull Debsrc license information from http:https://ftp.debian.org/

### Composer

We pull Composer (PHP) license information from https://packagist.org/

### Pod

We pull Pod (Swift and Objective-C) license information from https://cocoapods.org/

## Adding a new Harvest Source

If you would like to add a new Harvest source to ClearlyDefined, consider this criteria:

**Discoverability** – how are the packages for this language discovered? Is the repository searched by the build tooling without the user having to customize their client?

**Primary Source** – is this the primary repository that the package is published to? Or is this repository a mirror of an existing repository? We should always harvest from primary sources.

**Reputability** – is this repository operated by a reputable organization? What is the purpose behind running this repository? Is there an identifiable team that can be reached in the event of any issues?

**Security** – how secure is the repository? Is there a team that is available to handle issues in a timely manner when they arise? How fast do they respond to issues, such as when a security vulnerability is planted as a backdoor in a package?

**Automation** – does the repository support an API to support pulling of information? If not, is the package index organized in a schematized format that can programmatically queried using the package name and version and queried using HTTP(s). When using HTTP to mine data, ClearlyDefined should check for the existence of robots.txt or robot headers that indicate such mining is unacceptable. How much effort is it to automate the process?

**Relationship** – reach out to the organization that maintains the repository to indicate that ClearlyDefined wishes to harvest data from their repository, with an explanation on how harvesting is done, what the data is used for and how much additional traffic this could result in. Identify/Resolve any concerns and provide a contact from ClearlyDefined in the event they need to support in case of an issue.

To add a new harvest source, open an issue on the [ClearlyDefined Service Repo](https://github.com/clearlydefined/service) for comment. Make sure to include how you believe that source fits the above criteria and the community will discuss it with you.

0 comments on commit 176722b

Please sign in to comment.