Records and Reports

Meeting Record I

Date

1st june 2020 Monday

Partcipants

P+K+PMR

Key points

Discussion about the agenda behind openVirus , to build a system so that anybody can understand the science behind the current pandemic
To write and document everything for InternX
Installation of ferret to retrieve papers from medrxiv
To document a wiki about 'how to get started with ferret '
To create a 'hello mask' program for ferret

Meeting Record II

Date

4th June 2020 Thursday

Participants

PMR+G+P+K+Pruthiv

Key Points

Introduction to the "standup" routine followed by P+K
Introduction given to Pruthivrajan and brief overview of his role in project
Run getpapers to collect 200 papers on viral epidemics
Discussion on installation and running of ami
Search with ami using country disease funders
To read TIGR2ESS on using dictionaries
To read about Wikidata
The current team goal is to run ferret against medrxiv and retrieve papers from it

Agenda

welcome to Pruthivrajan and brief overview of his role in project
record of last meeting (June 1 Monday)
"standup" (2 mins each, P and K). See https://en.wikipedia.org/wiki/Stand-up_meeting#Software_development This is a no-blame experiment. Let's see how it goes. The current "goal" is to get Ferret running against medrxiv

Meeting Record III

Date

8th June 2020 Monday

Participants

P+K+PMR+PR+G+Ambreen

Agenda

Welcome to new members
Allocation of regular responsibilities
Standup (for those present last meeting) what did you do in the last 4 days that helped the team ? what are you going to do in the next 3 days that helped the team? are you blocked on anything?
Record of last meeting
Priorities:

installing and running ami
management of dictionaries (one per member)
documenting
potential miniprojects

Testing medrxiv on getpapers and comparing with ami download

Key Points

General instructions given to all about Gitanjali maam being the Personal and Academic Manager for the interns. And PMR being the Project Manager of openVirus.
Welcome new intern Ambreen
Standup by P+K+Pruthiv and introduction by Ambreen
We are going to have Hello Mask program for getpapers and ami of 100 papers
Work assigned to each intern
- Kareena- Meeting record and documentation
- Ambreen- beta testing
- Rajan- Technical support for organising biweekly meetings
- Priya- ????
All interns should documents their 100 papers on viral epidemics on wiki in a paragraph
Current goal is to install and run ami Followed by allocation of different project dictionaries to all. One per member.
Aim to start "miniprojects" - where different viral epidemics will be given to all of us for documentation , process called "SCOPING"
Retrieve papers from medrxiv using getpapers by creating a query on PMC so that only medrxiv papers are downloaded.
Everyone must run git . Required for running ami Install ami using git
Clone the new ami repository
Dictionaries assigned: Create a Wiki of your dictionary.

countries (Ambreen)
diseases (Priya)
viruses (Kareena)
drugs (Rajan)
funders (Vaishali)

Meeting Record IV

Date

11th June 2020 Thursday

Participants

PMR+P+K+Pruthiv+Ambreen

Agenda

Allocation of regular responsibilities
Standup (for those present last meeting) what did you do in the last 4 days that helped the team ? what are you going to do in the next 3 days that helped the team? are you blocked on anything?
Record of last meeting
Priorities:

everyone able to install and run getpapers
installing and running amisearch with builtin dictionaries
running amisearch with local dictionaries
creating dictionaries with amidict from lists, wikipedia categories

Key Points

Welcome new intern- Vaishali
Work assigned to Priya- Support for new interns
Discussion on running getpapers to retrive papers from medrxiv
Running maven to build ami
Issues faced by interns while running ami after building it succesfully , documentation error " How to run ami on output of getpapers ?" Report your issues well so that you can receive accurate guidance. How to report problems;

what were you trying to do?
what did you do?
what happened?
your assessment of the problem

Document your installation steps followed by usage and also issues, if any on the Wiki
CProject directory and its relation with getpapers FAQ taken up by Ambreen
Explanation of a dictionary given by PMR , "Ocimum sanctum" in reference to TIGR2ESS. Introducton to 'xml' markup language and other components such as 'elements', 'attributes' and Q number for wikidata
Next task

Go to the TIGR2ESS tutorials and read about dictionaries. "What do dictionaries do?"
Try and bring the tutorials across to viral epidemics in openVirus. Search briefly about your particular dictionary. Write your own idea about your dictionary. "What does your dictionary do?" "How do you use it?"
Install and run ami and search about your dictionary through it.

Meeting Record V

Date

15th June 2020 Monday

Participants

PMR+P+K+Pruthiv+Ambreen+Vaishali

Agenda

Allocation of regular responsibilities
Standup (for those present last meeting)

what did you do in the last 4 days that helped the team ?
what are you going to do in the next 3 days that helped the team?
are you blocked on anything?

Record of last meeting
Update on bringing Vaishali into synchronization for the project

Dictionary assigned - ????

Priorities:Each intern will have particular responsibility for:

a dictionary
a exploration / project

DICTIONARY Most of you will have a nearly correct dictionary, but it will need cleaning and updating. The tasks include:

checking title of dictionary is the same as filename (else it will fail)
for each entry:

checking that Wikipedia links are present
checking Wikidata links
checking that term is a useful noun of phrase Much of this can be done automatically

PROJECT This project is primarily to test the software. DO NOT ASSUME THE RESULTS ARE MORE GENERALLY USEFUL (i.e. don't tell the world you have made a medical breakthrough - we don't have enough data or knowledge.) The project consists of:

creating a query, running it, and refining the query iteratively.
downloading up to 1000 articles (your CProject)
searching them with 3-6 dictionaries for co-occurrence
manually evaluating how useful co-occurrence is
refining dictionaries
repeat

Key Points

Dictionary assigned to Vaishali- 'funders'
Discussion on INYAS and interns who will be joining us.
Review of the individual tasks, Interns to come up with their own dictionaries and projects.
Dictionaries:

Creating your own dictionary and provide answers to "How many entries does your dictionary have?" "Where was it created from?" Each Intern should have AUTHORITY for their dictionary. For eg: country - ISO
However, one issue we all may face is SYNONYMS, each term in the dictionary has potential for synonyms such as UK/ England/ Democratic Republic of Great Britain. Wikidata may solve this issue.
All the dictionaries to be placed here https://github.com/petermr/openVirus/tree/master/dictionaries Everyone to create your dictionary folder in it. ( One folder per dictionary lower case names)
"Why are we creating dictionaries?" FAQ taken up by Vaishali. "How can we update the dictionaries"? FAQ by Ambreen. You can edit these in the FAQ page.

Projects

Each intern to think and decide of a project which relates to viral epidemics for your use. (Personal interest) https://github.com/petermr/openVirus/tree/master/miniproject for example: face masks in viral epidemics, drugs used in viral epidemics, vaccines and viral epidemics, organizations and funders in viral epidemics, timeline of usage of dictionary terms in scholpub (cf Google trends).
Create a new dictionary for your project
To create your project, you will need indexing, information retrieval and information extraction
Indexes to be used solr lucene ami

Next tasks:

Create your dictionary folders in the link given above.
Download and mechanically upload your search query results. You can create, read, update, delete and tidy up your dictionaries.
Decide a particular project you would like to work on.

Meeting Record VI

Date

18th June 2020 Thursday

Participants

PMR+P+K+Pruthiv+Ambreen+Vaishali

Agenda

Allocation of regular responsibilities
Standup (for those present last meeting)

what did you do in the last 4 days that helped the team ?
what are you going to do in the next 3 days that helped the team?
are you blocked on anything?

Record of last meeting
KARYA students
Priorities:

Scientific Strategy.
General principles for what we hope to do. Systematic reviews, Discovery of hidden knowledge.
Projects. Formalize titles and project owners
Technology . amidict, SPARQL
Machine Learning.

Key Points

Discussion on KARYA students affiliated to DST Rajasthan who will be joining us from the next week. Each intern will be assigned for one student and both of you will be working on the project. 2 interns from there will be working for long term. 5 interns from Indian National Young Academy Scientists. We are taking 5 students, for 1 month and each intern will mentor one. The disciplines range over physics, chemistry, maths, bioscience.
Scientific strategy discussion over "Why we are doing these Projects?" To build an organised system using technological tools in order to create informative site related to Viral Epidemics.
Systematic review- It is an action that brings research papers together and look for common words and phrases to present a systematic review of our search. Done using ami

4. Formalizing the Project-

Each project should have a scientific target. It must involve technology development.

5. How will you work on the project?

Creating a spreadsheet would be the first thing to begin with.
Mostly, manual work is involved and we would be starting with limited number of papers i.e; 50. You have to be able to go through all the papers, look for what word or phrase (must not be a false positive) you are trying to find, and produce your search results.
This can be made easier by SECTIONING your search as each paper contains an introduction, methods followed by conclusion.
Decide the tools which you will need. (An exercise for machine learning)

6. AIM- The aim is to come up with an appropriate project plan to achieve its target. Following are the targets for our interns:

AMBREEN: Determine the role of country in viral epidemics.
PRIYA: Which diseases co-occur in viral epidemics? (Whether the viral spread causes other diseases as well. Example: Spanish flu 1919 caused number of bacterial infections too. )
RAJAN: Which drugs are regularly used for treating viral epidemics? Particularly what drugs are used to treat symptoms, and not the virus.
VAISHALI: What funders are the most active in funding research during viral epidemics? The papers contains a particular section about its funding.
KAREENA: Which viruses are reported as being involved to cause viral epidemics? Not all viruses cause a pandemic or an epidemic. To find out which viruses can cause or have caused an epidemic.
All projects have an element of machine classification ("learning") and natural language processing (NLP). The main uses are: is this paper really/mainly about viral epidemics?, does your concept (above) co-occur in the same sentence as the virus/disease - i.e. is it tightly coupled? For example is "India" related to "virus in India" or is it unrelated (e.g. the reagent came from an Indian supplier?)
The main packages will be: ami for sectioning in CProjects and dictionary searching, KNIME for workflow and analytical tools, R for workflow and analytical tools, Keras for machine learning, Jupyter for logging and reusable scripts

7. PROJECT PROPOSAL

Each intern should come up with half a page project proposal on what you plan to do on your project. It should be believable and compact mentioning your strategies and goals. Prepare your own queries, plans and mention the tools which you might require. Basically, how you plan to work in order to achieve your target.
This project proposal will be presented to the fellow KARYA students so that they can choose what they would like to work on and with who.

8. What are tools which we are going to use in these projects?

Firstly we are going to create a COMMUNAL PROJECT CORPUS related to viral epidemics called the epidemic5050 papers on viral epidemics that allow us to test our software and ideas. Everyone will use this to get trained on software, and all software should be able to use it. There will be false positives in it and also problem files.Later we have to analyse it independently and come up with our own corpus.
ami based tools for retrieving documents and parts of documents
ami sectioning
beta testing
Then, we would need WORKFLOWS. Something like "workflows-->ami-->commandline(CLI)-->KNIME-->GUI"
Each of you should install and try KNIME an alternate for amibut consists of more tools to work with. Contact the Expert- Clyde.
ENTITY EXTRACTION : finding particular words or phrases in papers, done using ami or KNIME
Natural Language Processing (NLP) but with few aspects it has.
R : Contains tools for summarizing things
Jupyter
Keras
Excel
SPARQL

Create a Wiki page for each of these technologies for simplification on Github. (installation and usage)

10. Steps to do the project:

RETRIEVAL of papers
BINARY CLASSIFICATION
SECTIONING
IDENTIFY & EXTRACT your information

11. Prepare the following:

Spreadsheets
Data Displays for scoping review ( Histogram, Timeline, Pie charts etc )

12. NEXT TASKS:

Go to the TIGR2ESS tutorial on SPARQL & WikiData - an easy way to create dictionaries.
Create your project proposal for the volunteers.
Get up to speed with Binary classification using Python/Keras, KNIME and R. Create a wiki page for Binary Classification.

Meeting Record 7

Date

22nd June 2020

Participants

PMR+P+K+Rajan+Ambreen+ Vaishali+ New interns

Agenda

welcome new collaborators Zeyang Charles Li and Vanisha Arora
getting started. This is for interns to add documentation to (https://github.com/petermr/openVirus/wiki/GETTING-STARTED)
Standup (for those present last meeting)

what did you do in the last 4 days that helped the team ?
what are you going to do in the next 3 days that helped the team?
are you blocked on anything?

review of minutes
miniprojects review

Key Points

Welcome new interns Charles and Vanisha. Brief introduction given to both about getting started, projects and dictionaries. Projects assigned

Charles - Non pharmacological interventions
Vanisha - Testing and Tracing of Viral epidemics

Install and use KNIME followed by its documentation on Wiki

3. Common exercise given to all the interns (to be done individually):

Analyse the 50 papers on viral epidemics given here https://github.com/petermr/openVirus/tree/master/miniproject/epidemic50noCov
Create your own spreadsheet in csv format using Excel and extract valuable information from each papers. Tutorial given by PMR using screen sharing during the meet.
About each paper, create question for analyzing such as "Is the paper about Viral Epidemics?", "Does mention the country where the epidemic took place?", "Does it talk about the drugs used?", "Does it mentions the diseases which co-occur?", "Does it involve other viruses?", "Does it talk about the funders?"
These spredsheets will be key for comparing results with others. Assess without dictionaries.
Create queries such as "Does the paper contain annotations (features)?" Features include diagrams, pictures or images.
Mention human blind annotations: (A) Viral epidemics - yes/no (B) 1 to 7 features present - yes/no (C) Metadata- year of publication (D) Type of paper- research article/abstract/review

Each intern to create a corpus of 950 articles individually using amisearch
Create wiki pages for machine learning, workflows and data analysis tools. Data formats to work on - R , Keras and spreadsheets.
All interns to publish their tool set in miniproject wiki. Create an inventory of tools (for the next meet).
Explanation by PMR about using xml and json.

7. Next tasks:

Create your own spreadsheets.
Install and run KNIME. Document your experience.
Devise your project plan and update it on wiki. Create your project tool set.
Install R , Keras

Meeting record 8

Date

25th June 2020 Thursday

Participants

PMR+P+K+Rajan+Ambreen+Vaishali+Vanisha

Agenda

Allocation of regular responsibilities
Standup (for those present last meeting)

what did you do in the last 4 days that helped the team ?
what are you going to do in the next 3 days that helped the team?
are you blocked on anything?

Record of last meeting
Priorities:

nonCov50 data set.
review of (5) miniprojects (each person to report on their page, resources, progress)
widening projects to include 3 new participants
review of (5) dictionaries (each owner to report). Discussion of further dictionaries
workflow tools (KNIME, Jupyter, ami, etc.). inventory of experience.
sectioning (PMR) . ami section
review of strategy.

Key Points

Welcome new intern Sana. Brief introduction given about getting started.
Task assigned to Vanisha, to create a spreadsheet containing all intern names and their project details.

3. Review of 50 papers

discussion about false positives
BLIND assessment of papers
Types of papers we all came across- Scientific article/ Abstract only/ Review paper/ Case study, clinical trial or others.
For the miniproject, get 1000 papers , develop a classification scheme to divide papers into categories, so that people can know.

4. CLASSIFICATION:

For doing Binary classification , we have to split the data set into- "training" , "testing" and "validation"
Find a tool like KNIME or Keras
Test your classifier and the improve your algorithm
Perform the classification for features (words, data, diagrams)

Review of each intern's miniproject- goals, strategies, progress, queries.

6. Review of each intern's dictionary:

Create a dictionary of your miniproject.
create a communal dictionary (builtin)
Each dictionary should have "find ability" , "comprehensiveness" , "syntax" , "maintenance" , "documentation"
Dictionaries can be created using 3 options: copy from authority (such as ISO for country), copy from wikidata, SPARQL, using list of terms.
Dictionaries can be found as: inbuilt, contentmine dictionary.
Every dictionary should have wiki page (total 8) documented the by intern about about how they created it.

7. SECTIONING:

run by ami section (automatic) works on PMC papers. It converts JATS-ami->sections
3 sections of paper- front (bibliography, abstract, journal, title, author, DOI), body (intro, background, methods, experimental, discussion), back (funders, admin, ethics, references, citations, acknowledgment)
ami search annotates body. we need to use a new approach called 'xpath' - a way of navigating sections in a paper, eg: front/article/title

8. Next tasks:

vanisha to create spreadsheet
Charles and Sana to come up with an introduction for interns
Each intern to begin with building their dictionary and create a wiki page to document their methods and experience.
Each intern to update their miniproject wiki page, what you did? , what is your next steps?, what are you blocked on?
Perform Binary classification using tools which you prefer
Create your corpora of 950 papers

Meeting Record 9

Date

29th June 2020

Participants

PMR+P+K+Rajan+Ambreen+Vaishali+Vanisha+Sana

Agenda

Allocation of regular responsibilities
record of last meeting
Standup by each intern
dictionaries. Please report on 7 public facing dictionaries, especially the core 5 (which INYAS will use) . country, disease, drugs, viruses, funders.
noncov50 dataset. Report any ongoing problems
miniprojects . Please report public facing project pages.
brief review of workflow
brief review of sectioning (PMR)
review of strategy.
multilingual dictionaries (Hindi?)

Key Points

Preparing introduction for INYAS interns (getting started)
To create wiki tool of ami dict for creating dictionaries
To create SPARQL wiki tool for extracting wikidata search attributes
Discussion on false positives and how they can be classified during binary classification
Review of each intern's dictionary. Each one to create their dictionary's own wiki page. Assigned Ambreen as maintainer of index of dictionaries.
Review of each intern's miniproject wiki page. Display your corpus of 950 articles on github. (to be continued.)

Meeting record 10

Date

2nd July 2020

Participants

PMR+P+K+Rajan+Ambreen+Vaishali+Vanisha+Sana+INYAS interns

Agenda

Allocation of regular responsibilities
record of last meeting
Standup by each intern
welcome to INYAS interns. Review of induction and any problems. Getting_started materials
review of miniprojects, especially so INYAS can appreciate their roles.
review of dictionaries. INYAS can immediately have a role in checking dictionaries.
problems and debugging. (out of memory error)

Key Points

Welcome to INYAS interns: Pooja, Urja, Simranleen, Dheeraj and Jitu. Brief introduction by each intern and getting started.
Discussion on creating dictionary using a text file containing list of terms. Explained by Ambreen and PMR.
Allocating INYAS interns to their miniprojects.
Review of PMC papers to explain the interns about sectioning and output of xml files.

Meeting Record 11

Date 6th July 2020 Monday

HACKATHON over slack #coordination
Review of all miniprojects and progress made by mentors and their mentees

Summary

each INYAS student and their mentor should discuss how to create a single page , in Markdown, for Thursday which reports the work to their classmates. The INYAS student should create this but ask for help whenever they need it. It can address:
what is the aim of the miniproject?
what resources are you using (don't just give a list; try to write something they would understand).
what has been done so far (again in terms they would understand)
mentors now each have a miniproject and a dictionary (possibly two). These should all have the same format and be organized in a consistent directory structure. Work between yourselves to ensure this (i.e. look at each others' miniprojects and dictionaries).

Meeting Record 12

Date

9th July 2020 Thursday

Participants

PMR,Priya,Kareena,Rajan,Ambreen,Sana,Vaishali,Vanisha,Charles,INYAS interns,GY

Agenda

Record of last hackathon
Standup by each intern
INYAS presentations 1-page summary of the project (by INYAS) intern
summary of progress (by mentor) on (miniproject page)

Priorities

KNIME
ami section/search/amidict
machine-learning

Key Points

Allocation of regular responsibilities: Kareena (records and reports), Priya(software management), Rajan (technical management), Ambreen (miniproject management), Vaishali (dictionary management), Sana ( Managing and coordination INYAS interns), Vanisha (managing mini hackathon), Charles (???)
Presentation by each INYAS intern on Monday, To present their review in such a way so that their classmates are able to understand it.
Rajan to coordinate with people using different OS (Windows10, Mac/Unix, Windows7, Mobile etc). INYAS interns Dheeraj and Om to create a wiki page summarizing the mobile properties (Github and Slack)
Review of each miniproject (update your project pages with progress made, Create pages for tools which you use)
Everyone to commit their miniproject data on Github
Everyone to identify true negatives manually (papers not about viral epidemics)
Update your ami problems on PMR TODO for PMR to take action.

Meeting Record 13

Date

13th july 2020 Monday

Participants

PMR,priya,kareena,rajan,ambreen,sana,vaishali,vanisha,charles,INYAS students (6),Clyde

Agenda

Allocation of regular responsibilities: Kareena (records and reports), Priya(software management), Rajan (technical management), Ambreen (miniproject management), Vaishali (dictionary management), Sana ( Managing and coordination INYAS interns), Vanisha (managing mini hackathon), Charles (???)
Record of last meeting
Standup by each intern

Key Points

Review of all dictionaries by core miniproject owners given together here https://github.com/petermr/openVirus/tree/master/dictionaries/test
Review of all miniprojects and the progress made (release of dictionary, release of corpus950, release of full.data.Tables using amisearch )
Discussion on bringing up a communal project for all INYAS students to create a new dictionary of Indian geo-names (states, cities, etc.) . This will allow us to pinpoint papers describing viral epidemics in Indian regions. Vanisha and Sana will coordinate this. The resulting dictionary will have permanent value as it can support a wide range of projects (e.g. TIGR2ESS crops, climate change, etc.) INYAS students will also continue to be associated with their core project.

Meeting Record 14

Date

16th July 2020 Thursday

Participants

PMR,priya,kareena,rajan,sana,vaishali,vanisha,charles,INYAS interns

Key Points

Allocation of regular responsibilities: Kareena (records and reports), Priya(software management), Rajan (technical management), Ambreen (miniproject management), Vaishali (dictionary management), Sana ( Managing and coordination INYAS interns), Vanisha (managing mini hackathon), Charles (???)
Record of last meeting
Standup by each core intern + inyas intern
Discussion on Wikidata volunteers, for starting communication with wikipedia people and editing page(that means getting a wikpedia name)
Dheeraj, Jitu and Om - to create a wiki page on using software on mobile. Priya to coordinate with them.
SPARQL and Wikidata issues (if any) and progress made by interns
Use of wikibase language and label
Sparql tutorial to be created by all the interns who have used it.
any other technical issues faced by anyone

Midweek update (debug)

For all those who faced issue with empty cooccurence (bug) after running ami search on corpus, To Re-run git pull and re-install ami for getting the desired output in cooccurence.

Meeting Record 15

Date

20th July 2020 Monday

Participants

PMR,Priya,Ambreen,Vaishali,Charles,Sana,INYAS interns(Urja,Dheeraj,Pooja,Simranleen)

Key points

Reviewing each of their mini-projects by interns => Project review (similar to Code review). The other attendees critiqued the wiki and the presentation.
Ambreen and PMR explained the importance and use of Smoke test and ML technique in mini-projects.
Vaishali and Priya done Smoke test for KNIME. Ambreen pursuing with ML technique for her mini-project.
A separate project for the INYAS interns was called off and were told to pursue with their mentors in their mini-projects.
Standups given by INYAS interns. They are evolved as "middle management" and the beta-testers in their mini-projects.
The issues regarding dictionaries were reported.
Needs for additional tools, especially (AMI, AMIDict) <---> toolBox (Jupyter, R, KNIME) as feature requests will be submitted as Issues.
The presentations are considered to make as short video clips as part of the output.

Meeting Record 16

Date

23rd July 2020 Thursday

Participants

PMR,GY,priya,kareena,rajan,ambreen,sana,vaishali,vanisha,inyas interns(dheeraj,pooja,jitu,simranleen)

Key Points

Brief project review by PMR to GY regarding the progress made and upcoming tasks. Each intern to record a 2 minute clip on his/her miniproject and their learning experiences after they joined openVirus. To describe the work they did in this project and how it can help the world. Planning to conduct a live video con session on youtube.
Review by each core intern and inyas interns about their experiences and ideas if any
Discussion and solving ami issues faced by few
Discussion on SPARQL queries and downloading the .xml file

Meeting Record 17

Date

27th July 2020 Monday

Participants

PMR,priya,kareena,rajan,sana,ambreen,vaishali,vanisha,charles, inyas interns (dheeraj,pooja,urja,jitu)

Agenda

Review of the 5 main projects:

country
disease
drug
funder
virus

Please be prepared to report:

analysis of corpus950 (or smaller)
manual classification
creation of dictionary
machine learning
notebooks

DICTIONARY

We should now put our dictionaries in one place, separate from ami3 and check out regularly. I have started this . It has a semi structured directory of dictionaries in a repository,a symbolic reference that AMI can use (maybe) referencing dictionaries through URLs NOTE: I haven't finished mapping ami names to SPARQL names. We should discuss having default names in SPARQL. Also we should review progress in terms in Hindi, Tamil and other languages. We should now converge on the essential parts of a SPARQL query.

Key Points

Allocation of regular responsibilities
Standup by everyone
Review of five main mini projects PROGRESS AND PLANNING - country(ambreen), disease (priya), drug (rajan), funder (vaishali), virus (kareena) followed by zoonosis (sana), testing and tracing (vanisha)
People to report on issues faced during uploading corpus or in getpapers so that PMR can fix it
If everybody is able to use new ami release
Review of each dictionary- if each contains name, term, wikidata ID, wikidata label, description, wkipedia URL. Specific entries include ISO3166 code (country), ICD10 code (disease), CrossRef ID (funder), ICTV virus ID (virus)
Review from each INYAS student about their learning experience, work, progress, if any blockers, work on mobile for jitu, dheeraj and om
Discussion on open access projects. Arianna Becerril Garria from Mexico.
Final video clips to be created by each intern and submitted to GY for review by the date 7th Aug
Discussion on machine learning tools and NLP. Hindi part of speech (POS) tagging to sentences
Extended discussion on dictionaries, editing the wikipage dict schema, different terms and elements of wikidata
ami commands to create a wikisparql dictionary

Meeting Record 18

Date

30th July 2020

Participants

PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles,INYAS interns- urja,pooja,dheeraj,simranleen

Key Points

Last official meeting with the INYAS students as they complete their four weeks internship. Review by each INYAS student about their learning experiences in this project. GY told them that they can informally continue with their work and attend the meetings if willing to do so in future.
All interns to prepare for the live videocon meeting to be streamed on youtube on 6th August 2020 Thursday.
All interns to prepare a 2 minute video clip about their own experiences and review of the project followed by their work. (all videos to be compiled together and directed by Simranleen)
Discussion by PMR on Open Access, what scientists do and why, how search engines work, other resources such as Redalyc/MX, India and Indonesia rxiv, theses in repositories, data scrape/clean and lots more. We are iterating in the design <-> prototype <-> deployment chain. We have advanced designs for dictionaries and sectioned documents. We have built prototypes and are testing them. This means a small amount of redesign. We now try to share all development on the wiki. Interns to put queries in on the wiki and everyone can comment. This will be particularly important for NLP and machine learning. Most of the NLP and ML tasks can be supported by packages and libraries.

Meeting Record 19

Date

6th August 2020 Thursday

Participants

PMR, GY, priya,kareena,rajan,ambreen,vaishali,vanisha,sana,charles,dheeraj

Key Points

Update on dictionaries by each intern (use of wikisparql) (amidict update) To create dictionaries containing synonyms of terms
LIVE streaming on youtube in the coming week, 1 minute intro/slides.github wiki to be prepared by the interns to give an overview of their miniproject (inyas youtube channel)
Discussion on retrieval of material/information from pre-existing literature. To clean and annotate the data using tools for analyse and display and later- publish
Discussion on creating sparql query for languagees other than english MULTILINGUALITY (taken up by rajan in tamil)
Everyone to re-run ami for updates
Discussion on creating sparql dict for inclusing AltLabel and synonyms.
Overview by PMR on machine learning tools and progress

Meeting Record 20

Date

10th August 2020 Monday

Participants

PMR, priya,kareena,rajan,ambreen,vaishali,vanisha,sana,charles,dheeraj

Key Points

Running different queries on sparql for creating a dictionary (SPARQL to AMI)
UPDATE by each intern on miniproject (Google workbook created, taken up by Ambreen) "Our Progress so far" https://docs.google.com/spreadsheets/d/1DI3sJnLq7MntJElah-xD4crHVEF-gLpkAL_-Qp35qx0/edit#gid=0
Machine learning tools, brief explanation given by Ambreen
Data analyses tools discussion by PMR
Discussion on update, progress, usage, blockers in ami section and ami summary

Meeting Record 21 (YOUTUBE LIVE STREAMING OF THE MEETING)

Date

12th August 2020 Wednesday

Participants (16)

PMR, GY, priya,kareena,rajan,ambreen,vaishali,vanisha,sana,charles,dheeraj,jitu ram,urja,pooja,om prakash, simranleen

Key Points

First live meeting session conducted by the openVirus team, (INYAS-KARYA) including the INYAS interns, streamed on the INYAS youtube channel https://www.youtube.com/watch?v=XiTngk-POm8

Meeting Record 22

Date

17th August 2020 Monday

Participants

PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles,sana

Key Points

Updates on status
Standup
Debugging of dictionaries
Debugging of search
Progress on machine learning.
Invitation from COAR for Sept 10th 2020 (shared by PMR, everyone is welcome to attend, Ambreen to present )
See https://github.com/petermr/openVirus/wiki/Presentation-COAR

Meeting Record 23

Date

20th August 2020 Thursday

Participants

PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles

Key Points

standups
a brief review of internships and newcomers
ideas from interns about the COAR presentation
movie status to be updated by Simranleen soon
technical issues that affect more than one person
Interns facing issue with ami search : to use minicorpus10 for tutorials (testing the dictionary against small corpus)
Discussed about ami summary tool
About the importance of Open Science
PMR: Idea of adding tooltips to ami search tables in different languages.
Ambreen to present a workshop on Jupyter Notebook in the next meet.
Thanks to Dheeraj for adding the concept of multilinguality to the dictionaries and for staying with us.

Meeting Record 24

Date

24th August 2020 Monday

Participants

PMR,priya,kareena,rajan,ambreen,vaishali,vanisha,charles,dheeraj

Key Points

Review by each intern, how helpful is the project corpus? Any useful information to mention?
ami update: --delete command to clear the edits (not manually)
Representing the results/info graphically eg: name of funder- logo, use of statistics to display data
Important terms/ commonest terms you find in corpus or dictionaries, SUMMARIZE these in pictorial representation form (use wikipedia)
ami problems:

install ami
run acceptance tests
try tutorial for standard dictionaries
create own dict and validate
run standard corpus against standard dict
run standard corpus against own dict
Put in automated validation, ami search -> results.xml ( test that results.xml exist, used by cooccurrence and data tables)

Screen sharing by Ambreen, how to execute code in jupyter.(codes, function,cleantext,libraries)

Classify data in excel(csv)

Meeting Record 25

Date

27th August 2020 Thursday

Participants

PMR,GY,priya,kareena,rajan,ambreen,vaishali,vanisha,charles

Key Points

Catchup by all interns and review of each miniproject
Discussion on COAR Presentation by Ambreen (reviews by others on ppt)
Discussion by PMR on Cambridge Hackathon (online hackathon on genomics, bioinformatics, databses) Project: Cambridge-India "openVirus"
Put together software, databases, dictionaries, so that people can see it in virtual environment- (Virtual sensors)
PMR Debugging people's problems using share screen

Meeting Record 26

Date

7th September 2020 Monday

Participants

PMR,priya,kareena,rajan,ambreen,vaishali,vanisha,charles, dheeraj, anugrah, shweta

Key Points

Welcome new interns Anugrah and Shweta
Standup by interns (those who are present)
Review for COAR presentation (10th) given by Ambreen. Registration to be done by each intern
Changes made in COAR ppt
Debugging people's problems and resolving issues

Meeting Record

Date: 12th Oct. 2020

Participants:

PMR, Rajan, Vanisha, Shweata, Ambreen, Anugrah, Ayush, Mukul, Dheeraj

Key points:

Future directions to the project. Labelling dictionaries with associated concepts(relative terms, broader terms). Finding the main subject in relevant papers.
Preprint, Introduction of Hypergraph.
Review of Ambreen's Jupyter Notebook(ML).
New members were allocated mini-projects. Ayush to work with Ambreen on Countries, and Mukul to work with Kareena on Virus.
openVirus repository is becoming very huge. New GitHub repository specially dedicated to Dictionaries and Mini-corpora.

Meeting Record 28

Date: 19th Oct. 2020

Participants: PMR, GY, Rajan, Shweata, Ambreen, Ayush, Vanisha, Vaishali, Priya, Dheeraj

Key points

New Dictionary Manager - Rajan
New repository exclusively for dictionaries
Ayush went over the codes he had written to display frequency from results.xml
Enhance the dashboard to include links to Wikidata, and make it multilingual.
Discussed the rough outline for Wikicite presentation. More information can be found here

Meeting Record

Date: 2nd November 2020

Participants:

PMR, GY, Shweata, Aishwarya, Ambreen, Ayush, Vanisha, Vaishali, Dheeraj, Kareena, Rajan, Anugrah

Key Points:

Discussed on the new and exciting directions to the project. The mini-projects would continue to be worked upon. Along with that, we would also have the Plant Science component to the project, in the future.
We will also have new software projects:

getpapers in Python
ami-search in Python
ami-words in Python
display in Python
containerisation using Docker
Dictionary testing

Each of these software projects will have an issue, and it will have to:

collect mini teams
specify goals in detail
propose an architecture
build proof-of-concept (PoC)
Test-driven development (TDD)

Briefly went over the test Jupyter Notebook, PMR had written. We then reviewed various libraries useful for our purposes of text mining. Link to the notebook discussed can be found, here

Meeting Record

Date: 5th Nov. 2020

Participants

PMR, Ayush, Dheeraj, Vanisha, Vaishali, Ayush, Shweata, Mukul, Rajan

Key Points

Getting to know each other's computational backgrounds.
New Repository for development purposes

Technical Discussion

Algorithms + Data Structure(ami dictionaries, CProject, in our case) = Programs
JATS(Journal Archiving and Interchange Tag Set) https://jats.nlm.nih.gov/archiving/tag-library/1.3d1/element/arc-elem-sec-intro.html
What are the problems that we run into when we use terms(given by the dictionary) to search the papers?
- Synonyms- That's where Wikidata is helpful
- Not knowing the context in which the terms are used
- General concepts(like 'illness' or anything that can't be represented in a term) can't be retrieved from papers easily
EPMC(where you get data from, in JATS) -> clean, classify -> text mining
Text mining tools in python- nltk textblob , glob
Went over PMR's Jupyter Notebook. https://github.com/petermr/ami3/blob/master/src/ipynb/text.ipynb

Meeting Record

Date: 09th Nov. 2020

Participants: PMR, GY, Anugrah, Ayush, Dheeraj, Shweata

Key Points

Presentation(MPhil Computational Biology Seminar Series) on Wednesday. PMR, Ambreen and Shweata to present.
Discussed the current status of each member's project

Technical Discussions

https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
https://docs.python.org/3/howto/unicode.html
Have classes in Python. Classes have attributes and methods.

Meeting Record:

Date: 12th Nov. 2020

Participants: PMR, Aishwarya, Ayush, Dheeraj, Mukul, Rajan, Shweata

Key Points:

Everybody shared their ideas for new software tools, and describe it in a sentence. The following are the ideas that came up.

New Softwares by each member:

Words tool (Keywords and Stopwords) - Ayush and Aishwarya
Dictionary Editor - Shweata
Sections Search - PMR
Summarizer - Mukul
Wikipedia links - Rajan
Enhanced Display - Dheeraj

Dictionary Editor, Sections Search and Words tool would be the most relevant pertaining to the current requirements of the project.
We first decided to work on Dictionary Editor.

Dictionary Editor: Each of the members present in the meeting came up with the below list of items which the Dictionary Editor needs to encompass.
- Remove unnecessary terms
- Duplicate terms
- Heteronyms
- Collaborative editor
- Context
- Versioning system

Tasks:

Ayush: To find more about Version Control in GitHub
Unit Test

Meeting Record

Date: 16th Nov. 2020

Participants: PMR, Ayush, Aishwarya, Anugrah, Dheeraj, Shweata

Key Points

Dictionary Editor- Opened an issue
Review of what a Dictionary is
- Dictionaries are in .XML format
- Root element is Dictionary, and it must have a title. And it's got a number of entry elements. Entry element has a large number of attributes.
- Synonyms are child elements under entry
update.ipynb A Jupyter Notebook to validate the dictionaries.
What we hope to do is to validate our dictionary against the openvirus schema in Jupyter Notebook

Meeting Record

Participants: PMR, Shweata, Rajan, Ambreen, Vanisha, Dheeraj, Anugrah

Date: 19th Nov. 2020

Key Points

Find out where the latest dictionaries are. Moved the latest ones to dictionary repository. Moved the new ones to Dictionary repository.
We then checked and validated those dictionaries using the Jupyter Notebook Peter had written (available on our dictionary repository.

Meeting Record

Participants: PMR, Ayush, Vanisha, Dheeraj, Anugrah, Aishwarya

Date: 23rd Nov. 2020

Key Points:

Dictionary specification: PMR created an issue https://github.com/petermr/dictionary/issues/2
What a dictionary contains- Dictionary elements: Attributes, child elements, entry, etc.
How software is developed in practice- How "Customers" provide formal requirements and the implementer creates and test the code.
A brief discussion on Regular expressions (RegExp)
Miniprojects updates:
- Aishwarya to work with Dheeraj on the miniproject: "Diseases".
- Ayush to work with Vanisha on "Test and trace".
- Anugrah to work on " Non-pharmaceutical interventions".

Meeting Record

Participants: PMR, Ambreen, Dheeraj, Shweata, Vanisha, Anugrah

Date: 30th Nov. 2020

Key points:

Discussed the recent problem that we encountered with SPARQL query, as reported by Dheeraj. More info, here(https://www.wikidata.org/wiki/Wikidata:Request_a_query#Re-running_queries_on_earlier_versions_of_Wikidata)
We should record our Wikidata Queries so that we don't encounter similar problems.
Reviewed Test and Trace dictionary created by Vanisha. Synonyms and language equivalents need to be added.
Reviewed Country, Organisation and Disease dictionary as well.
Related items need to be added in the organisation dictionary.
PMR raised several questions about Wikidata to the Wikimedia community.
Communal Tasks:
- Retrieve entries for the list of Q Ids to add synonyms, language equivalents, etc.
Ambreen to draft a list of ancillary files for creating and maintaining dictionaries.
Examples:
- Jupyter Notebook
- MD for explaining the files, names and purposes. (Converge on a communal naming scheme),
- SPARQL query,
- SPARQL-XML output

Meeting Record

Participants: PMR, Shweata, Dheeraj, Vanisha, Matthew Dunstan, Ambreen

Date: 3rd Dec. 2020

Key points:

We had a chat about the importance of AI in Science. We are building some of the foundations of this revolution. We also discussed about DeepMind, a recent breakthrough that came about in the field of protein folding problem.
People were added to the dictionary repository.
github.com/petermr/dictionary/issues/3 We now have a way to save all our queries (with the help of RESTful URL) in the dictionary itself. Look at the comment of this issue to know more.
We were joined by Matthew Dunstan, today. Peter demonstrated the progress on the battery project so far.
We are starting to come up with a unified Dictionary Naming Scheme. Follow the link to know more. https://github.com/petermr/dictionary/wiki/Dictionary:-Naming-Scheme

Meeting Record

Date: 7th Dec. 2020

Participants: PMR, Ambreen, Aishwarya, Dheeraj, Shweata, Ayush

Key Points:

One person running a project is always fragile. Communal projects are the way to go.
Mini-Tech Project:
- Dictionary-based search
- Updating dictionaries
Divide ourselves into groups to work on the mini-tech project.

Updating Dictionary	Dictionary-Based Search
Shweata	Aishwarya
Dheeraj	Ayush
Ambreen	Anugrah
Rajan	Vanisha
New PlantScience Intern	Vaishali

Ayush, Ambreen: Tech-Lead
Aishwarya, Shweata: Project Manager ( Record, Keeping things up to date. Are the unit test* passings? Tutorials? Where people are at, right now? and so on)
*Unit test is important. Automated tests on an application to ensure that the application meets the intended design.
alpha testing -> Preliminary tests for software.
beta testing -> Find and report errors. The errors could be Keyboards, Character, Time Zones, date, File system.
REPORT PROBLEM. Don't try to fix it yourself.
Wiki pages for each of the two our tech projects
- https://github.com/petermr/dictionary/wiki/Project:-update-dictionary
- https://github.com/petermr/dictionary/wiki/Project:-dictionary-based-search
Requirments for Search Mini-Project:
- a way of determining the type of an article (scientific article, review, comment, editorial ...).
- a way of identifying and (re)naming sections

Meeting Record

Date: 10th Dec. 2020

Participants: PMR, Ambreen, Anugrah, Aishwarya, Dheeraj, Shweata

Key Points:

PMR: We need to agree on data and not the code.
PMR suggested the use of PyCharm for writing code
Ambreen demonstrated Smart Sheep Breeder (Decision Support System developed by her), received feedback.
All new topics will be discussed in discussions rather than slack.
Discussed the use of unittests and PMR demonstrated with an example (test code)

Meeting Record

Date: 14th Dec. 2020

Participants: PMR, Ayush, Dheeraj, Shweata, Anugrah, Ambreen

Key Points:

Jupyter Notebook, though useful in several fronts, isn't scalable.
https://dev.to/codemouse92/series/290 -> We will follow the structures given in this series.
Structuring projects is really important and often not talked.
PMR created a new project in dictionary repository.
unittest

Records for the meetings are now moved to https://github.com/petermr/dictionary/wiki/Records-of-Meetings
All further meetings are recorded in the dictionary repository.