IP Clearance #128

justinmclean · 2020-01-19T00:43:49Z

All files developed at the ASF need to have an ASF header [1], 3rd party headers for the most part need to be retained [2]

btashton · 2020-01-23T21:11:17Z

@justinmclean Can you help me understand our requirements here a little bit more with a couple examples:
1.

https://github.com/apache/incubator-nuttx/blob/master/arch/risc-v/include/arch.h
It would seem that this needs to keep the BSD header until Ken re-licenses it under Apache, and we need to call this file out in the LICENSE file as BSD-3, it would not need to be called out in the NOTICE file.

https://github.com/apache/incubator-nuttx/blob/master/arch/arm/src/arm/arm.h
This one we can put the Apache header on, but do not need to make and additions to the NOTICE or LICENSE files beyond the boilerplate Apache. This is because Greg has agreed to re-licence this code.

https://github.com/apache/incubator-nuttx/blob/master/arch/arm/src/imxrt/hardware/rt102x/imxrt102x_ccm.h
This one we can put the Apache header on, but do not need to make and additions to the NOTICE or LICENSE files beyond the boilerplate Apache. This is because Greg has agreed to re-licence this code, and while there are other Authors listed he is the sole copyright holder listed.

https://github.com/apache/incubator-nuttx/blob/master/arch/arm/src/imxrt/imxrt_lcd.c
It would seem that this needs to keep the BSD header unless NXP is willing to relicense it under Apache even though portions are copyrighted by Greg, and we need to call this file out in the LICENSE file as BSD-3, it would not need to be called out in the NOTICE file.

General Questions:
When do we need to be adding the "Based on source code originally developed by" to the NOTICE file. In a couple of the files coming from FreeBSD I see entries like

Portions of this software were developed by David Chisnall

under sponsorship from the FreeBSD Foundation.

I know we have other files with other license or cases to go through, but this should cover the vast majority and can get us moving in the right direction.

btashton · 2020-01-26T02:07:49Z

@justinmclean any thoughts on these examples. I'm trying to be 100% sure I understand what we need to do here to move this forward in a meaningful way.

justinmclean · 2020-01-27T21:37:38Z

Correct
Correct
It would depend on the history of the file and changes made. In general unless teh changes are significant the original license and header should be kept.
Would need to be discussed, in general 3rd party headers should not be changed without permission. Looks like they have here and it would be best to revert to the original header.

justinmclean · 2020-01-27T21:38:14Z

Note that with a WIP disclaimer none of this actually blocks a release.

adamfeuer · 2020-07-24T15:56:50Z

License clearing wiki page (with draft process and tools): https://cwiki.apache.org/confluence/display/NUTTX/License+Clearing

This was used in release 9.0.0 and 9.1.0.

xiaoxiang781216 · 2020-08-02T08:37:33Z

Add the related email thread:
https://lists.apache.org/thread.html/r0d30d8c95e861826a3027499fc43bc3851e19f89fdaf8606eada1818%40%3Cdev.nuttx.apache.org%3E
https://lists.apache.org/thread.html/r3149c844791bd0164a3016cbebc690edd9277905678cfb33526937cb%40%3Cdev.nuttx.apache.org%3E
https://lists.apache.org/thread.html/r897f825f1bfcd3501c132438acc9403a70d415652119d1e528f7349f%40%3Cdev.nuttx.apache.org%3E

xiaoxiang781216 · 2020-08-02T10:01:00Z

@adamfeuer do you have enough free time to collect the statistics inforamtion? My team leader reserve a dedicated resource help you to improve the tools and generate the report. @PeterBee97.

adamfeuer · 2020-08-04T01:50:37Z

Thanks @xiaoxiang781216 – I should have enough time to do a high-level analysis this week or next, and I could definitely use the help!

@PeterBee97 are you able to help me do this? If so, reply here or send me an email (it's on my profile), and we'll work out what to do. 🙂

PeterBee97 · 2020-08-04T07:39:11Z

@adamfeuer Hi Adam, sure I'm here to help. BTW I spent some time yesterday on a script that doesn't modify anything yet but only tries to extract information. Hope this helps :)

adamfeuer · 2020-08-04T16:15:55Z

@PeterBee97 Great work with the script and database! I'll update my tools branch and post it here– would you be willing to do a PR to that, so we can have a single branch that we're working on? I'm hoping we can merge these tools to master so that others can help us or continue our work.

Here's a few questions:

Are you subscribed to the [email protected] email list? If not, would you be willing to subscribe?
What's your email address? Will you either post it here, send me an email at [email protected]? So we can correspond with the NuttX email list if necessary.
What time zone are you in? I am in Seattle WA USA, Pacific Time Zone, UTC-7.
Have you seen the NuttX license clearing wiki page? The process we need to follow and improve is there, as well as a few tools.
The authors in the file are good to have, but not enough to clear the licenses– we need to look at the git log and get authors from that. There's a script on the wiki page above that can do that.
Would you be willing to make the script you wrote also emit a plain text file, ideally tab delimited CSV?

adamfeuer · 2020-08-04T17:56:01Z

@PeterBee97 I updated my license-clearing tools branch to upstream/master, here's where I've put my tools: https://github.com/starcat-io/incubator-nuttx/tree/feature/license-clearing-tools/tools/license-clearing

adamfeuer · 2020-08-04T18:08:37Z

@PeterBee97 Let's try running the process that we did on the sched/ module on either fs/ or mm/– only the estimation part, not the whole clearing process. They have 100-250 files each, so it's a smaller chunk. We need git authors as well a what is in the file headers. Once we have a way to get stats for that module and all files, then we can try to do it for the whole project.

You can see what we did on sched/ at this wiki subpage: https://cwiki.apache.org/confluence/display/NUTTX/Analysis+March+2020

PeterBee97 · 2020-08-05T02:57:12Z

@PeterBee97 Great work with the script and database! I'll update my tools branch and post it here– would you be willing to do a PR to that, so we can have a single branch that we're working on? I'm hoping we can merge these tools to master so that others can help us or continue our work.

Here's a few questions:

Are you subscribed to the [email protected] email list? If not, would you be willing to subscribe?

What's your email address? Will you either post it here, send me an email at [email protected]? So we can correspond with the NuttX email list if necessary.

What time zone are you in? I am in Seattle WA USA, Pacific Time Zone, UTC-7.

Have you seen the NuttX license clearing wiki page? The process we need to follow and improve is there, as well as a few tools.

The authors in the file are good to have, but not enough to clear the licenses– we need to look at the git log and get authors from that. There's a script on the wiki page above that can do that.

Would you be willing to make the script you wrote also emit a plain text file, ideally tab delimited CSV?

Not yet, sure I'm willing to subscribe
[email protected]
I'm in Beijing, UTC+8 so my work time will be about 7 pm to 7 am in your timezone :(
Yes, I browsed through the docs and mailing lists before making that tool
Yeah, actually my tool is based on your script. The author0~author2 are from git log
Sure, exporting to csv file is just one command in sqlite

@PeterBee97 Let's try running the process that we did on the sched/ module on either fs/ or mm/– only the estimation part, not the whole clearing process. They have 100-250 files each, so it's a smaller chunk. We need git authors as well a what is in the file headers. Once we have a way to get stats for that module and all files, then we can try to do it for the whole project.

You can see what we did on sched/ at this wiki subpage: https://cwiki.apache.org/confluence/display/NUTTX/Analysis+March+2020

By typing sched/ in the DB Browser filter I can see that these files either have apache license already or only owe copyrights to Greg or Xiaomi & Pinecone, which should have already approved the license change.

The csv files are uploaded && PR created. https://github.com/PeterBee97/incubator-nuttx/tree/feature/license-clearing-tools/tools/license-clearing

adamfeuer · 2020-08-05T04:18:11Z

@PeterBee97 Cool, thanks– I didn't realize the script already used git to find the authors, sorry for missing that. We will need all the authors, not just the top 3. I'll take a closer look tomorrow.

Re: Xiaomi and Pinecone already approving the license change, do you know if they have filed an Apache Software Grant Agreement (SGA)?

Would you be willing to run your tool on fs and mm directories, and see if you can extract a report of the authors for each section and file? That way we can see if we're dealing with 10 authors, 100 authors, etc.

I think another next step is to get you an account on the NuttX Fossology instance. At some point we'll need to get the data into there. I'll email Brennan and you on the list.

Thanks again for being willing to help with this!

PeterBee97 · 2020-08-05T04:31:11Z

@PeterBee97 Cool, thanks– I didn't realize the script already used git to find the authors, sorry for missing that. We will need all the authors, not just the top 3. I'll take a closer look tomorrow.

Re: Xiaomi and Pinecone already approving the license change, do you know if they have filed an Apache Software Grant Agreement (SGA)?

Would you be willing to run your tool on fs and mm directories, and see if you can extract a report of the authors for each section and file? That way we can see if we're dealing with 10 authors, 100 authors, etc.

I think another next step is to get you an account on the NuttX Fossology instance. At some point we'll need to get the data into there. I'll email Brennan and you on the list.

Thanks again for being willing to help with this!

Top 3 was my idea, given that some 1 commit contributors can be ignored(can't they?). For license issue I don't know exactly the details, @xiaoxiang781216 knows better. I ran the tool on the whole proj already so those two directories can just be filtered. I'll try to get a report for particular files.
You're welcome :)

patacongo · 2020-08-05T04:54:42Z

@PeterBee97 <https://github.com/PeterBee97> Cool, thanks??? I didn't realize the script already used git to find the authors, sorry for missing that. We will need all the authors, not just the top 3. I'll take a closer look tomorrow.

I mentioned this before, but it bears repeating. The NuttX project was 13 years old in February of 2010. For the first 6 to 6 and a half years, the project used CVS and SVN. You will find no authorship or contact information for the first half of the project's life in the current GIT authors. The log will show me as the sole author for during that time. I did by far most the changes in those days, but not all. Prior to GIT, contributors were noted only in commit comments. It should be possible to get the names, or in most cases just user handles, from the comments but with no contact information. Github apparently does not even know how to parse that early activity. If you look at https://github.com/apache/incubator-nuttx/graphs/contributors you would conclude that the project has only existed since sometime in 2013. The project was actually created in February of 2007. This is clearer in the Bitbucket statistics[1]: https://bitbucket.org/nuttx/nuttx/addon/bitbucket-graphs/graphs-repo-page#!graph=contributors&uuid=4430abf9-a782-49ff-bd16-bc1df696048e&type=c&group=weeks which goes all the way back to the day the project was created. I think that is because prior to GIT, authors were NOT referenced by email address, but rather with some UUID. [1]Note you have to be logged into Bitbucket to see the statistics there.

adamfeuer · 2020-08-05T04:57:15Z

@patacongo Are the original CVS and SVN archives saved anywhere?

patacongo · 2020-08-05T05:05:17Z

@patacongo Are the original CVS and SVN archives saved anywhere?

No

adamfeuer · 2020-08-05T05:20:27Z

@patacongo Ok. I'll see if I can look through the commit message to see if I can see what's going on there.

I'm logged in to Bitbucket, but for some reason I can't view the graph link you posted. Maybe it's a permissions issue or I don't have access to the graphs addon?

xiaoxiang781216 · 2020-08-05T14:40:18Z

@PeterBee97 https://github.com/PeterBee97 Cool, thanks??? I didn't realize the script already used git to find the authors, sorry for missing that. We will need all the authors, not just the top 3. I'll take a closer look tomorrow.
I mentioned this before, but it bears repeating. The NuttX project was 13 years old in February of 2010. For the first 6 to 6 and a half years, the project used CVS and SVN. You will find no authorship or contact information for the first half of the project's life in the current GIT authors. The log will show me as the sole author for during that time. I did by far most the changes in those days, but not all. Prior to GIT, contributors were noted only in commit comments. It should be possible to get the names, or in most cases just user handles, from the comments but with no contact information. Github apparently does not even know how to parse that early activity. If you look at https://github.com/apache/incubator-nuttx/graphs/contributors you would conclude that the project has only existed since sometime in 2013. The project was actually created in February of 2007. This is clearer in the Bitbucket statistics[1]: https://bitbucket.org/nuttx/nuttx/addon/bitbucket-graphs/graphs-repo-page#!graph=contributors&uuid=4430abf9-a782-49ff-bd16-bc1df696048e&type=c&group=weeks which goes all the way back to the day the project was created. I think that is because prior to GIT, authors were NOT referenced by email address, but rather with some UUID. [1]Note you have to be logged into Bitbucket to see the statistics there.

@PeterBee97 can we add a column in the database to indicate the source code exist before git is used? @patacongo, we need gather the statistics information first and convert the unambiguous code base automatically(of course we need review the PR carefully) and then work on the rest case by case, otherwise NuttX can never become the TOP LEVEL PROJECT.

adamfeuer · 2020-08-05T16:14:07Z

@xiaoxiang781216 @patacongo @PeterBee97 I cloned the Bitbucket repo last night (https://bitbucket.org/nuttx/nuttx/src/master/), looked through the commit logs, and I can see what @patacongo is talking about. I didn't compare to the github log, but we should probably also do that. Then we can see if we can do anything with the information there.

It seems like we should be able to come up with a strategy for dealing with this:

If we can get names and contact info from the commit messages, then we can run the license clearing process we already have, maybe with some additional steps about that process.
At the very least, we can collect statistics about how many contributors we are talking about.
If we can't get names and contact info from the commit messages, then we need to get help to address what @xiaoxiang781216 is talking about, so NuttX can graduate from podling status. Surely other Apache projects have faced this same issue.

Let me know if you have other thoughts about this.

@PeterBee97 Will you clone the Bitbucket repo and look at the logs to see if you have some insight about it?

patacongo · 2020-08-05T16:22:59Z

This is also informative:

git log | grep author

The will produce over 30 thousand lines but you clearly see that the last several thousand commits have author:

patacongo patacongo@42af7a65-404d-4744-a932-0658087f49c3

That, I think is a bogus email that was created when the SVN repository was converted to GIT.

Then there are several thousand with author:

Gregory Nutt [email protected]

That is GIT, but when I was still using GIT as though it were SVN with no authors.

The first author that is not me appears at:

commit b0507038494cd1ae9d14807db758d4e3ae98a1ef
Author: jeditekunum <[email protected]>
Date:   Sat Jan 24 14:31:35 2015 -0600

First step at porting to MoteinoMEGA.  LED shows assert failure at boot.  Appears to be short double blink, short off (~1sec), followed by 250ms toggle cycles.  Most of it derived from amber board.

So it appears that there is authorship information for the first 8 years. Only for the last 5 years.

adamfeuer · 2020-08-05T16:31:59Z

@patacongo @PeterBee97 If do git log --reverse and search for ' by ' I find commits like this:

commit f03cb0ff3ababdcc84245d75d795ab956d110e09
Author: patacongo <patacongo@42af7a65-404d-4744-a932-0658087f49c3>
Date:   Tue Mar 16 00:53:32 2010 +0000

    Bugfixes submitted by David Hewson


    git-svn-id: svn:https://svn.code.sf.net/p/nuttx/code/trunk@2543 42af7a65-404d-4744-a932-0658087f49c3

There are others. They seem to indicate patches or other code from contributors, committed by Greg.

adamfeuer · 2020-08-05T16:43:47Z

@patacongo Thanks for pointing this out again, I am sorry I didn't remember this.

patacongo · 2020-08-05T17:38:46Z

Bugfixes submitted by David Hewson

David Hewson I know. We are connected on LinkedIn. He just started working for HPE. He did a some of the LPC31 port in the 2010 timeframe but has not been involved significantly since.

patacongo · 2020-08-05T17:41:50Z

If do git log --reverse and search for ' by ' I find commits like this

"by" or "from" would both be good search keys. I also recorded the authors in the old ChangeLog files that were recently removed from the repositories because they are not used in the current workflow. That should be a complete list of authors except for a few trivial things like typo fixes that weren't normally included in the ChangeLog.

PeterBee97 · 2020-08-06T07:12:30Z

@PeterBee97 Will you clone the Bitbucket repo and look at the logs to see if you have some insight about it?

I cloned the bitbucket repo today but the git log seems to be the same with that on GitHub...

So I found the latest ChangeLog from NuttX 9.0.0 RC0 and tried to filter out the names with keywords from|by and the help of some NLP library and put the results in names-changelog.txt. Also processed the git log in the same way and the result is names-gitlog.txt. Still the commit messages of earlier SVN commits are incomplete and many commits are authorless.

This may help cover some corner cases. Maybe we can open an issue and mention these users? But before that let's filter out the "safe" files first as @xiaoxiang781216 suggests.

patacongo · 2020-09-13T00:55:40Z

Any updates here? I think this is only blocker issue to prevert us graduate, let's try to make progress.

It seems to me that there are people who have interest and good ideas but there is not significant progress being made. The job is really two large for a couple of people to accomplish working now and then.

protobits · 2020-09-13T01:15:13Z

Could we start with the easy cases? I feel that reducing the size of the problem also makes it less intimidating to approach.
We are already manually changing headers from BSD to apache for files whose authors are commiters with ICLAs so I think making an automated pass for this case should not be that hard: parse header for authors, see if all are commiters, replace with apache header. If that sounds right I can script that and give it a try.

What confuses me though is that we're worrying about git authors whereas I believe that if someone contributes a file without listing themselves as the authors in the header (for the BSD case), didn't the author concede rights over the code by doing so? At least that was my understanding at the time when I submitted patches to existing files and I did not include an extra line to add me as author to every affected file. In case this is not the correct assumption, I agree that a "best effort" approach (by comparing git author to authors on header) is the only remaining possibility.

justinmclean · 2020-09-13T09:08:48Z

Hi

What confuses me though is that we're worrying about git authors whereas I believe that if someone contributes a file without listing themselves as the authors in the header (for the BSD case), didn't the author concede rights over the code by doing so?

Without an ICLA (or an equivalent) this is not the case. Copyright automatically applies. They may not even own rights to the code they commit if their employment contract says otherwise. Thanks, Justin

justinmclean · 2020-09-13T09:30:12Z

Hi, BTW Apache doesn’t use author tags in any new code, doing so implies ownership by a person rather than the whole project. Thanks, Justin

xiaoxiang781216 · 2020-09-13T13:40:48Z

Hi
What confuses me though is that we're worrying about git authors whereas I believe that if someone contributes a file without listing themselves as the authors in the header (for the BSD case), didn't the author concede rights over the code by doing so?
Without an ICLA (or an equivalent) this is not the case. Copyright automatically applies. They may not even own rights to the code they commit if their employment contract says otherwise. Thanks, Justin

So @justinmclean is it safe we do the batch conversion if the source code meet all following critieria?
1.The source code isn't converted from SVN or CVS
2.All commiters(or his company) in git log sign ICLA or SGA
3.The copyright holder in the source code sign ICLA or SGA
And I also have one queston: do we need the contributor to sign ICLA if he/she just modify a small portion of code(e.g. ~10 lines)? The quantity number is also important to write an automation tools .

justinmclean · 2020-09-13T20:15:53Z

Hi,

So @justinmclean <https://github.com/justinmclean> is it safe we do the batch conversion if the source code meet all following critieria? 1.The source code isn't converted from SVN or CVS

I’m not sure what you mean by that.

2.All commiters(or his company) in git log sign ICLA or SGA

Small contributions don’t have to have a CLA, but the person who committed that contribution takes responsibility for ensuring teh code’s IP. If possible it's best to have one.

3.The copyright holder in the source code sign ICLA or SGA

Take care with this. The copyright holder in source may or may not be the correct one. Thanks, Justin

patacongo · 2020-09-13T22:10:37Z

Take care with this. The copyright holder in source may or may not be the correct one.

Similarly, the author in GIT may not be the author of the file. Often the copyright holder in the source file header is the correct one, even though that person many not appear in GIT history.

Many people copy files that wrote into different locations (very often for new architectures and for new boards which are very similar to older architectures and boards). Very often, I am the author of the file in these cases.

Bottom line: There is no magic, automated way to correct determine the author. It requires collecting data and then also applying human insight.

@justinmclean https://github.com/justinmclean For many cases there are multiple contributors of changes to a file. There is an original author, the original committer (who might be a different person) and people who have made trivial changes (as trivial as a spelling fix) or who have made substantial enhancements or re-designs. The former would not be treated as authors or copyright holders, but the latter may be. Is there any rule of thumb for what constitutes a significant change warranting rights to the file? Or does this also require human insight.

There are thousands of files involved here. This is potentially multiple man years of effort. I don't see how we can ever accomplish this.

protobits · 2020-09-13T22:37:34Z

We can only operate on the information we have. If authorship information was lost from CVS and SVN era (git author is Greg) and the header does not list anyone else than Greg, we can either "play safe" and leave the BSD header (we would respecting original authors license even if we don't know who it really was) or assume that without further information the original author cannot prove authorship either then we are safe to change to Apache. For these "unknown" cases, I don't see any other way. We just need to decide and then act.

For other cases where there is indeed information I think we can script a header change based on various scenarios of git author/header author/author aliases where all have ICLAs. This change can be made to create one commit per file change and add the reason for the safety of the change to the commit message for traceability. Then, we can review each commit in a PR and decide if manual intervention is needed (throwing out unsafe changes, for example).

patacongo · 2020-09-13T22:44:36Z

We can only operate on the information we have. If authorship information was lost from CVS and SVN era (git author is Greg) and the header does not list anyone else than Greg, we can either "play safe" and leave the BSD header (we would respecting original authors license even if we don't know who it really was) or assume that without further information the original author cannot prove authorship either then we are safe to change to Apache. For these "unknown" cases, I don't see any other way. We just need to decide and then act.

In the SVN/CVS days, I did always give credit to the contributor in comments. However, the task of reading all comments in those 15 thousand or so commits is a very onerous task. The information is there, just not easily accessible.

AFAIK there are no un-credited changes in the repositories.

protobits · 2020-09-13T22:56:23Z

We can try to see what wording you used in general and use some regular expression to try to match the attribution.

What I'm thinking is that in any case we will always need to analyze a file by looking at its complete git history to extract git author + header author + commit msg attribution right? The "easy" cases would then be files only touched by current commiters.

justinmclean · 2020-09-13T22:57:38Z

Hi,

@justinmclean <https://github.com/justinmclean> https://github.com/justinmclean <https://github.com/justinmclean> For many cases there are multiple contributors of changes to a file. There is an original author, the original committer (who might be a different person) and people who have made trivial changes (as trivial as a spelling fix) or who have made substantial enhancements or re-designs.

Ideally we wold have CLAs for those who have made significant changes or who owned the IP on the original contribution, whose owner may or may not be the author.

There are thousands of files involved here. This is potentially multiple man years of effort. I don't see how we can ever accomplish this.

I would try solving for the low hanging fruit e.g files you know that only people who currently have CLA have contributed to and work from there and change the licenses to ALv2. I think this has already been suggested. Other code is under a compatible license so that’s the fallback position. Thanks, Justin

Apache9 · 2020-09-14T03:01:27Z

Let's clear the license for the files we own first. I think it is OK to have some files under compatibile licenses for a ASF project. You just need to mention them in the NOTICE file. And there is another possible solution is to rewrite these files so we can change the license. Anyway, this depends on the number of files we can not change license.

Thanks.

patacongo · 2020-09-14T03:18:14Z

Let's clear the license for the files we own first. I think it is OK to have some files under compatibile licenses for a ASF project. You just need to mention them in the NOTICE file. And there is another possible solution is to rewrite these files so we can change the license. Anyway, this depends on the number of files we can not change license.

I don't think anyone has committed to do that work. Adam and Peter have, I guess, but they don't apparently have the bandwidth required to do that effectively. I think that even the first baby steps would require a substantial, committed, full time effort.

Apache9 · 2020-09-14T03:21:16Z

I think @xiaoxiang781216 has already found someone wish to help here? But anyway, we need at least a committer to review the work...

protobits · 2020-09-17T18:19:04Z

I've been writing some scripts which convert the output of git log (over a given file) into JSON format, to obtain metadata for each revision of the file. The final JSON contains (among other information): commit author, commit message and blob hash for the file.
I then started writing a python script to parse the JSON and extract (using regular expressions) authors from commit message and file header, in each commit. It is working nicely so far.
The final goal would be to determine if a given file passes the previously discussed checks for the easy cases that can be moved to Apache header. The python script could also be used to make the header change and commit the result.

I will work a bit more on this and open a draft PR (to add the script inside tools/).

patacongo · 2020-09-17T18:42:01Z

I've been writing some scripts which convert the output of git log (over a given file) into JSON format, to obtain metadata for each revision of the file. The final JSON contains (among other information): commit author, commit message and blob hash for the file.

People have been using Fossology to get historical information: https://www.fossology.org/

adamfeuer · 2020-09-17T18:51:31Z

Yeah, life intervened and I haven't been able to get back to this. I have less time for it than I thought.

@PeterBee97 made some progress in parsing out the list of contributors from the Git log messages. I will see if I can take his list and see if I can get a list of files and also number of lines of code for each contribution... anyway that seems to be the next steps:

get a list of people who contributed
get a list of the commits they were involved with
work out how many lines of code per person are involved
sort the list largest to smallest – this will give us an idea of how big the job is
try contacting people with the n largest contributions

There are several other approaches. This is just the one that seems most straightforward to me. If anyone wants to help, we could use help with:

writing a script that could take a list of commits and output the contribution size in lines
getting a list of names and commits from the git log (Peter's scripts are this, or very close I think)

protobits · 2020-09-17T19:04:52Z

Please see #1834

I know @PeterBee97 started some of this work but to be honest it was quite difficult for me to take advantage of those, considering it was based on sqlite databases. I chose JSON format since it is quite easy to read and parse with different programming languages.

patacongo · 2020-09-17T19:15:25Z

Please see #1834

I know @PeterBee97 started some of this work but to be honest it was quite difficult for me to take advantage of those, considering it was based on sqlite databases. I chose JSON format since it is quite easy to read and parse with different programming languages.

I have to be in favor of anything that makes forward progress.

adamfeuer · 2020-09-17T19:46:37Z

@patacongo Re: anything that makes forward progress, me too.

@v01d yes, text-based json or csv/tsv formats would be great. The scripts in #1834 look cool. Maybe we combine them into one python script with the sh module. I'll try them out.

protobits · 2020-09-17T20:28:04Z

@v01d yes, text-based json or csv/tsv formats would be great. The scripts in #1834 look cool. Maybe we combine them into one python script with the sh module. I'll try them out.

There's quite a bit of escaping going on in the bash script, so embedding it inside python would probably require some work. Not sure if it is worth it, but we can think about it.

protobits · 2020-09-17T20:44:40Z

Comment moved to #1834

protobits · 2020-09-17T21:05:41Z

Comment moved to #1834

protobits · 2020-09-17T21:06:08Z

Oops, thought I was on the PR, I'll move the comments there

yy-gu · 2020-10-30T08:16:04Z

@justinmclean @adamfeuer

Hi guys, we made some progress and post it here.
#1954

Basically, we collected the author/company list which have not signed the agreement. So the next step is to contact them via email and get them sign the agreement.

My questions are the following:

Is there an email template for contacting the authors?
Where do we return the signed ICLA to? Is there somebody from Apache Foundation to collect and verify them?

justinmclean · 2020-10-31T22:07:57Z

ICLAs are emailed to [email protected] see https://www.apache.org/licenses/contributor-agreements.html

yy-gu · 2020-11-02T07:32:04Z

@justinmclean Thanks！One more question, how would you normally contact companies to get their SGA signed? Do you contact people you know from the company to get introduced? What department is normally responsible for this?

For other authors, shall we just auto send email to contact them?

yy-gu · 2020-11-02T07:39:20Z

@justinmclean One more question, shall we ask authors to send ICLA directly to [email protected]? Will someone from Apache Secretary process the mails and update the list and sync with us on the author list?

patacongo · 2023-04-29T13:57:59Z

I think this issue can be closed:

It is inactive. There have been no comments since 2020
NuttX has since graduated to a TLP so all IP clearance issues must have been resolved.

If there is something I am missing please just re-open.

btashton added ASF Apache Software Foundation related blocker Release Blocker labels Jan 19, 2020

xiaoxiang781216 changed the title ~~Files missing ASF headers~~ IP Clearance Aug 2, 2020

patacongo closed this as completed Apr 29, 2023

IP Clearance #128

IP Clearance #128

Comments

justinmclean commented Jan 19, 2020

btashton commented Jan 23, 2020

btashton commented Jan 26, 2020

justinmclean commented Jan 27, 2020

justinmclean commented Jan 27, 2020

adamfeuer commented Jul 24, 2020 • edited Loading

xiaoxiang781216 commented Aug 2, 2020

xiaoxiang781216 commented Aug 2, 2020

adamfeuer commented Aug 4, 2020

PeterBee97 commented Aug 4, 2020

adamfeuer commented Aug 4, 2020 • edited Loading

adamfeuer commented Aug 4, 2020

adamfeuer commented Aug 4, 2020 • edited Loading

PeterBee97 commented Aug 5, 2020 • edited Loading

adamfeuer commented Aug 5, 2020

PeterBee97 commented Aug 5, 2020 • edited Loading

patacongo commented Aug 5, 2020 via email • edited Loading

adamfeuer commented Aug 5, 2020

patacongo commented Aug 5, 2020

adamfeuer commented Aug 5, 2020

xiaoxiang781216 commented Aug 5, 2020

adamfeuer commented Aug 5, 2020

patacongo commented Aug 5, 2020

adamfeuer commented Aug 5, 2020 • edited Loading

adamfeuer commented Aug 5, 2020

patacongo commented Aug 5, 2020

patacongo commented Aug 5, 2020 • edited Loading

PeterBee97 commented Aug 6, 2020 • edited Loading

patacongo commented Sep 13, 2020 via email

protobits commented Sep 13, 2020

justinmclean commented Sep 13, 2020 via email

justinmclean commented Sep 13, 2020 via email

xiaoxiang781216 commented Sep 13, 2020 • edited Loading

justinmclean commented Sep 13, 2020 via email

patacongo commented Sep 13, 2020 • edited Loading

protobits commented Sep 13, 2020

patacongo commented Sep 13, 2020

protobits commented Sep 13, 2020

justinmclean commented Sep 13, 2020 via email

Apache9 commented Sep 14, 2020

patacongo commented Sep 14, 2020 via email

Apache9 commented Sep 14, 2020

protobits commented Sep 17, 2020

patacongo commented Sep 17, 2020

adamfeuer commented Sep 17, 2020

protobits commented Sep 17, 2020

patacongo commented Sep 17, 2020

adamfeuer commented Sep 17, 2020 • edited Loading

protobits commented Sep 17, 2020

protobits commented Sep 17, 2020 • edited Loading

protobits commented Sep 17, 2020 • edited Loading

protobits commented Sep 17, 2020

yy-gu commented Oct 30, 2020

justinmclean commented Oct 31, 2020

yy-gu commented Nov 2, 2020

yy-gu commented Nov 2, 2020

patacongo commented Apr 29, 2023

adamfeuer commented Jul 24, 2020 •

edited

Loading

adamfeuer commented Aug 4, 2020 •

edited

Loading

adamfeuer commented Aug 4, 2020 •

edited

Loading

PeterBee97 commented Aug 5, 2020 •

edited

Loading

PeterBee97 commented Aug 5, 2020 •

edited

Loading

patacongo commented Aug 5, 2020 via email •

edited

Loading

adamfeuer commented Aug 5, 2020 •

edited

Loading

patacongo commented Aug 5, 2020 •

edited

Loading

PeterBee97 commented Aug 6, 2020 •

edited

Loading

xiaoxiang781216 commented Sep 13, 2020 •

edited

Loading

patacongo commented Sep 13, 2020 •

edited

Loading

adamfeuer commented Sep 17, 2020 •

edited

Loading

protobits commented Sep 17, 2020 •

edited

Loading

protobits commented Sep 17, 2020 •

edited

Loading