* TODO file for ROSE -*- outline -*- Tell us if you feel like volunteering for any of these ideas, listed more or less in decreasing order of priority. Some TODO items are implicit from received email. Significant contributions require written assignments and disclaimers. ----------------- June 27, 2008 Clean up the ROSE source tree for the public release SciDAC repository: ROSE (core) EDG Binaries Open Fortran Parser jar file Local repository: EDG source Testsuite: POP, OMP tests, developerscratch, Python example test, ROSEHPCT tests Old projects: SimpleCallGraph CompassDist BinaryContextlookup DatalogAnalysis MOPS A++1999/2003 OMPPreprocessor Sub directories : keep (Y) , GNU_HEADERS Y qmsh Y Papers/talks Y TAU headers Y A++ Y P++ Y OvertureCode Y SLA Y MSTL Y remove (N) OpenAnalysis N Open64 N TXT2HTML N aterm_bundle N proposals N Boost test N mpich N MySQL N COCO N undecided checkpoint Lib ? PDFlib ? ----------------- DONE * Multidimensional support is not finished yet (mostly there) Could be made less dependent upon array objects (?) DONE * Better unparser (in development by Gary Lee) WORKING * Better support to automatic code generation (for grammar implementations) * Need to implement the new transformation specification mechanism. * Need more sophisticated examples. DONE * Need to build Sage II in such a way that we can better maintain it and make it portable. DONE (see below) * Need a way to display the AST (currently we can print out the AST associated with the sage representation) but we have no mechanism to print out what is in the EDG representation. This would be helpful. * Need to consider the requirements of the SGI FORTRAN 95 open source front-end. I have read the WHIRL Intermediate Language Specification and this appears to be possible. * Require an inliner mechanism for the transformation support. Great student project! * Reference counting of the SAGE classes has not been implemented The referenceCount data member is there now, but it needs to be incremented and decremented to record references. The new and delete operators have not been implemented either (though this is not required for reference counting). This is not a serious problem, except that it takes purify a long time to process the memory leaks it reports (as a resutl I have turned off this test in purify (using -leaks-at-exit=no -inuse-at-exit=no)). * It would be good to have some simple and moderately sophisticated tree traversal mechanisms within ROSE. Sage originally provided one, but it was difficult to use for anything meaningful. I don't know if we could design a better one, or perhaps the Sage version would be fine for limited sorts of operations. Using a flag that is reset but records if graphs vertices have been processed would be one approach. Likely a common technique. Reasons for traversing the program tree: 1) display the program tree (this is problematic since this can amount to endless recursion (because the program tree (AST) is not really a tree but an arbitrary graph). 3) Analysis: a) simple analysis * where const is cast away b) moderately sophisticated analysis * recording memory access patterns (?) * similar error checking This might require call graph analysis and dependence analysis c) complex analysis * seeing the context of how statements are used together * recognizing array statements 3) Transformations: a) simple transformation might be possible using this approach * instrumentation * Converting the expanded CPP ASSERT macro back into a call to "ASSERT(expression)"; * converting the expanded CPP NULL macro back into "NULL" (I think this is a bit harder than ASSERT). b) moderately sophisticated transformations * ? c) complex transformations, likely a traversal mechanism is not powerful enough to recognize where these transformation should be done. But likely once identified (using higher-level grammars) the transformations could be automated by a tree traversal mechanism on the higher-level grammars AST. * array statement optimization * DONE: As an alternative to writting out the AST as an asci file, it might be helpful to write it out as a hypertext format so that scopes/files could be colapsed. There must be software around that does this. Currently we output a very large file that uses indentation to should scope, something better would be helpful and could be folded into the SvPablo (a AST viewer?). The current mechanism uses PDF Bookmarks to display the hierarchy. Mechanisms are implemented to display both the EDG and SAGE ASTs. The implementation is still incomplete and should be filled in by anyone who wants to learn about the ASTs. The EDG AST is traversed by scope instead of by source_sequence_list entry. We really need both since the future connection of EDG to SAGE should likely be done using a source_sequence_list entry traversal of the EDG AST! * The currently implemented mechanism for handling the use of extern "C" surrounding a #include directive is limited to one #include directive per extern "C" {} modifier. The mechanism is a bit of a hack and not very robust. A more details comment is in the ChangeLog about what was done ans its limitations. A better method would build the include tree and search all brances and leaves of the include tree to verify that each declaration was really an 'extern "C" with braces' modifier. This would suppliment the current approach which uses a lex rule to find the 'extern "C" {' modifier (when directives are extracted) and justs adds the closing brace "}" after the next #include directive (not very safe or smart). This problem needs to be revisited. * DONE: Multi-file support. We will at some point need to input a collection of files for processing. This will (at least on small enough projects) permit a full call graph to be processed. Other mechanisms for building a project wide call graph will have to be developed at some point (if required). * Currently the column position can be fooled by tabs (which are counted as a three columns). To fix this we should first make a pass (it can be optional) over the source code to expand out all tabs (to some user defined value). Then we can process the code through the EDG front-end and get more reliable column data for statements and expressions. This will permit us to do a more accurate job in the unparsing. This is not a high priority at the moment. I have verified that TABS in ROSE are currently interpreted to be 3 spaces (even though emacs interprets them to be 7 in my setup), and that the column information displayed in the AST is wrong as a result. This should be fixed before any effort at fixing other formatting problems since this might be the cause of some of them and this issue would get in the way of fixing other formatting problems. * DONE: Formatting of the unparsed code is done fairly well. I think that there is some additional line feeds inserted in the unparsing of preprocessor declarations and comments. I don't know what tests of this should be done (or how picky we want to be). * DONE: Call graph support is required for many other features to be added to ROSE (e.g. dependence analysis). This is a great student project. DOT could be used to visualize the graph (seems to work well for even large graphs in Doxygen). * Our ROSE unparser writes directly to an output stream. I have noticed that other unparsers used a separate structure with several function pointers (e.g. EDG). Is this better than what we do currently? It might be that we could improve our unparser by putting the output string into the unparser class (then again I think this is exactly what we do). So maybe there is nothing to improve here? * DONE: EDG records the location of "{" and "}" associated with statement blocks. This could permit a more accurate positioning of comments in the unparsing phase. This is not a high priority. * It appears that the strings used to define template declarations might already be in the EDG AST so we would not require Danny Thorne's modification of EDG to save them explicitly. Need to finish the PDF display of the EDG AST to figure this out. * DONE: The implementation of SAGE mixes the use of definitions with declarations in ways that makes the code confusing. For example a SgTypedefDeclaration has a tag TYPEDEF_STMT. This should be fixed at some point to make the naming self consistant. * It seems that declarations of functions are output with the "(...)" arguments while member function declarations are not (need to fix this). The temp fix for this is in the ROSE/src/unparser/unparse_stmt.C (in the Unparser::unparseTypeDefStmt function). * DONE: ROSE preprocessors don't correctly report when illegal options are used. For example, "--help" yields the message: "Assertion failed: ROSE::numberOfSourceFileNames == 1, file /home/dquinlan/ROSE/NEW_ROSE/src/command_line_options/buildCommandLine.C, line 876" This needs to be addressed at some point. * Unparser bug (not too major) Functions are unparsed with function argument names missing (see TESTS/CompileTests/C++Code/test2001_11.C) Function parameters not unparsed correctly with the variable name. With SUN CC we are just missing the variable name: original C++ code: void foo (int i); unparsed C++ code: extern void foo (int ); * Erin Parker's try_catch_test.C fails to pass EDG and SAGE II. The fix to EDG is to enable the exceptions_enabled in EDG/src/cmd_line.h, and the error is in SAGE II. Within the implementation of the SgTryStmt::replace_statement(SgStatement *o, SgStatement *n) there is a message: printf ("ERROR: STL us not fixed in code -- exiting in SgTryStmt::replace_statement \n"); abort(); These STL iterators have to be put into place in about 5 places in the AST Restructuring Tools source. I have modified the defaults for EDG so that EDG now accpects exception handling code. Then I modified the C++ grammar to correctly build the AST. The remaining problem is that the unparser does not generate the correct code. This remains as work left to do. dqDevelopmentDirectory/test2001_29.C demonstrates the error. * DONE Support of multiple invocations of the SgFile EDG has a problem when EDG_MAIN is called more than once. The problem is that the command line processing fails. To best fix so far for this problem has been to return from the top of the EDG proc_command_line(int argc, char *argv[]) function when the value of option_descriptions_used is greater than 0 (which seems to indicate that it is part of the second invocation instead of the first (it is initialized to 0 as a static file scope variable). To change the primary source filename I have modified EDG cmd_line.c to set the primary fine name to a global variable that I have defined above the proc_command_line function in cmd_line.c. This variable is then set by the code in specification.C. As a result the command line can not be changed after the first invocation, except to change the primary source file. This seems to be a good enough fix since we can assume that we only want to change the source file name between invocations. However, it does mean that the command line (except for the source file name) is ignored after the first invocation of the SgFile constructor (which is called by the SgProject constructor). * Transformations: ** Need to define interfaces for transformations. ** Need to handle multiple transformations. ** Need more examples of transformations. ** How should targets be recognized? Much of this work is done but there remain a few details: 1) Currently work uses a function which is called by the global transform function, but if we are to bury the global transfer function within the specification mechanism then we need a better approach. A function pointer might work well here. Templates seem to greatly complicate the interface since then the global tree traversal mechanism would have to be templated. 2) Our primary approach within ROSE is to use the terminals of a specific grammar as targets for transformations. The use of a function to identify the target of a transformation can hide this and even make it optionally more specific (C++ declarations of a specific type could be a target for example, even if just using the C++ grammar). We need examples of transformations that use higher-level grammars (however, we need more infrastructure in place for this). ** The current function TransformationSpecificationTypes::tripAwayWrapping( SgFile & file ); does not get the statements representing the transformation out of the function into which they are placed. As a result the whole wrapper function is inserted into the application's AST. * Markus Kowarschik is maintaining a separate todo file in this directory: TODO_MK It contains a list of bugs (with detailed descriptions) to be fixed. * Things to remove in ROSE so that the development directory checked out from CVS is easy to understand and so that the distributions build by autoconf/automake are clear. remove ROSE/ROSETTA/old_simpleGrammar remove ROSE/grammar.old_dir remove ROSE/src/SAGE remove ROSE/src/Padding (make sure it is in the A++Preprocessor) remove ROSE/src/PaddingTrans (make sure it is in the A++Preprocessor) remove ROSE/src/TransformBaseClass (make sure it is in the A++Preprocessor) remove ROSE/src/Transform_2 (make sure it is in the A++Preprocessor) remove ROSE/src/Transform_3 (make sure it is in the A++Preprocessor) remove ROSE/src/Transform_4 (make sure it is in the A++Preprocessor) check ROSE/SAGE/grammarBaseClass.C (I think this is used by ROSETTA grammars or the A++ preprocessor) check ROSE/ROSETTA/MetaProgramExample.C (make sure this is required) check ROSE/ROSETTA/parser.C (does not seem to be used) check ROSE/ROSETTA/*.implementation (I think these can be removed) check ROSE/ROSETTA/*.include (I think these can be removed) remove ROSE/ROSETTA/grammarData.C (No longer used) Update all the README files in each directory stating the purpose and organization of the directory * Configurations issue: STL-1995 is a link to STL (and it should be the other way around). STL is a directory where the 1995 version of STL is located. I'm sure that this was some sort of mistake. But it need to be fixed so that the configuration is sane! Currently in order to switch to a new version of STL, I have make STL-1995 a link to the new STL located in ROSE/STLPort/STLport-4.5b8/stlport. I have fixed the makelinks file to correctly build this "misleading" link * Testing by different people Each person should have a directory containing a TestsDirectory and a Preprocessor directory. This would allow each person to build there own specialized preprocessor for their own testing and there own test codes to test the preprocessor. * Current handling of true and false are located in several locations (instead of centralized into bool.h). * Better support for acmacros (not available to everyone doing ROSE development) Need to have configuration check if the acmacros directory is built and if not untar the binary acmacro.tar.gz file and use it. We will have to define a mechanism to keep this binary file up to date as well. Currently I have built the binary acmacros.tar.gz file and added it to the cvs repository (the simple part of the process). This is a problem for people who have access to the CVS repository but who are not in the casc group and so cannot access Brian Gunney's cvs repository for acmacros. * Currently ROSE will not compile with KCC on the Sun or Linux Previously I had ROSE compiling with KCC on the SUN (so likely this is not too much to fix). Bobby says that the problem is the same for both SUN and Linux. **************************************************** TODO: Retargetable Compiler Support (work by Bobby): **************************************************** There is a little bit of work remaining to make ROSE portable to different compilers. Bascially a mechanism has been defined to permit ROSE to be made specific to any back-end C++ compiler. However, an implementation showing how this works has only been implemented for the g++ and KCC compilers (and the KCC example has remaining problems (on both Sun and Linux) and g++ on SUN also currently fails). The process involves copying the compiler specific header files (suprise, all compilers seem to have some!) to a location within the ROSE compile tree. For many compilers these files must be edited to remove dependence upon "#include_next" (which is spelled differently by each compiler) which has the same semantics for all compilers (at least KCC and g++). The semantics of #include_next uses the list of include paths and permits all targets (of the #include_next ) to consider only include directories listed AFTER the directory of the header file where the #include_next is read. For all compilers (that Bobby has looked at; g++ and KCC) the target of the #include_next is always located in "/usr/include" so "#include_next " is equivalent to "#include ". We can't be sure that this is always the case. Our modification of the compiler specific header files currently replaces all instances of "#include_next " with "#include ". We will see if this is sufficently robust. Note that the semantics of #include_next does not force the target file to have the same name as the file where the #include_next directive appears, but this is the way it is always used and so we have taken advantage of that as well in defining the automatic transformations of the comiler specific files. Bobby added files: compiler-defs.m4, create_system_headers, and dirincludes to the ROSE/config directory and modified the ROSE/ROSETTA/Grammar/Support.code file to add an optional compiler name to the SgProject::compileOutput function and modified the construction of the -D list of options handed to the EDG frontend to take the list from the macros defined in the config.h file. * KCC currently fails to compile ROSE on both SUN and Linux (this used to work on the SUN so it should be to much work to fix). Both platforms have the same problem (an STL problem). Also, Bobby reports that the configure scripts report the size of float, int, double, etc are all ZERO. So something about the autoconf test for size of primative types needs to be looked checked. LATER checks have confirmed that if KCC is used as the C compiler then the AC_SIZEOF_TYPE macro returns zero, but if the default C compiler is used then the size of all tyes appear to be correct. So don't use KCC as the C compiler! * EDG seems to have a problem being compiled with the g++ compiler on the SUN. The problem is nto that it will not compile (it will), but that the executable built will not run. Some sort of crash when it starts to use the EDG front-end. Not clear is this is fixable, since we don't want to work on the EDG front-end, so we will see. Debugging this might be a job for purify or Insure. * Currently the create_system_headers shell script can take an optional parameter to specify the source directory of the target back-end compilers compiler-specific header files. This should eventually be able to be specified as an option on the configure command line (perhaps the target directory should be specified on the configure command line as well). * In the compiler-defs.m4 shell script the compiler should have it's possible path stripped off using "compilerName=`basename $1`" * ROSE/EDG/configure.in calls GET_COMPILER_SPECIFIC_DEFINES and so it sets the -D options and the compiler name, it might be that only the compiler name is really needed. * ROSE/configure.in includes a line "rm -rf '$(CXX_TEMPLATE_REPOSITORY_PATH)'" to remove the template directory that is built when the C++ autoconf test codes are compiled before $(CXX_TEMPLATE_REPOSITORY_PATH) is defined. This avoids a directory called "$(CXX_TEMPLATE_REPOSITORY_PATH)" in ROSE after configure has been run (which was always sort of ugly). However, this only makes sense on the SUN using the sun C++ compiler. The fix should likely envolve a test to see what compiler is being used so that the the "ti_files" directory can be removed when using the KCC compiler. The current test which just removes "$(CXX_TEMPLATE_REPOSITORY_PATH)" fails if using g++ or KCC, but it seems that the configure script goes on. * CC is now the defalt back-end C++ compiler for development on the SUN. I have added an configure option to specify the backend directly. ************************************************************* END of TODO section specific to Retargetable Compiler Support ************************************************************* * Preprocessor macro definitions should perhaps not be unparsed in the final output. We might make this an option (or the default). ************************************************************* * Configuration TODO list: ************************************************************* DONE * Put in the AC macro call to make sure that version 2.52 or higher of autoconf is used * Change config.h (all four of them) to edg_config.h, sage_config.h, rosetta_config.h, and rose_config.h DONE * Call AC_CONFIG_SUBDIR multiple times dependent upon conditionals instead of just once. DONE * Remove references to: sla_PREFIX sla_INCLUDES sla_LIBS (now that we have removed sla as an option) DONE * Remove references to subdirectory's Makefiles within distclean-local rules (I think in ROSE/Makefile.am) * Need a way to make it clear when "make check" has passed all tests. ************************************************************* * END Configuration TODO list ************************************************************* * (4/8/2002) Program Statistics Gathering The purpose of gathering data about programs will become more clear in later research work when we try to optimize the performance of applications. Initially we should consider just gathing information about a target application. Such information could include: Number of inlined functions (to determine potential overuse of inlining which could effect compile times) Information about functions Length of functions Length of inlined vs. non-inlined functions Relative location of functions (optimizations can include reordering to provide instruction cache optimizations) Data Structures used Complexity of data structures Number of uses of library abstractions Access Patterns ************************************************************************** * (4/8/2002) Possible Summer Student Projects: ************************************************************************** 1) Unparser built on query mechanism 2) Program statistics gathering (envolves development of a lot of specialized queries) 3) Flush out more interesting parts of the Name Query and Number Query Libraries (could generate call graph). Number of branch points (conditionals) as a metric for complexity. 4) Visualization of program analysis (generate PDF file containing AST, source code, with links to call graph, dependence graph, etc.). 5) Cache optimization for loop carry dependence optimization of array statements (merge the two loops associated with the case when the lhs operand appears on the rhs). 6) Include the option to introduce a transformation which times the transformed and non-transformed code. 7) Include the option to test a transformation (compute the transformed and non-transformed code and compare the results). This could be a general transformation into which many transformation could be put (as long as they have the property of preserving the semantics). Is semantics preservation a property in the classification of transformations? 8) Where statement transformations in A++/P++ Preprocessor 9) Indirect addressing in A++/P++ Preprocessor 10) Scalar Indexing transformation in A++/P++ Preprocessor * We need to classify abstractions and transformations (define a hiearchy of properties that describe transformations) Need to find some literature on this topic Common properties associated with semantics of abstractions (upon which we introduce transformations): Element semantics: operations on the collection are applied to the elements (with no reduction) Reduction semantics: operations on a collection generate a smaller collection Widdening Semantics: operations on a collection generate a larger collection Common properties of transformations: preserve the semantics: array transformations in A++/P++ Preprocessor (axx) change the semantics: e.g. Automatic differentiation 11) The context should be represented in the inherited attribute, the code to represent the context of any libraries use within an application should be generated automatically. This should be possible for ROSETTA. This would be a project for anyone familiar with ROSETTA. How similar is this to the recognition of abstractions or it is just the correct place to do it. ************************************************************************** * Documentation stuff to do ************************************************************************** * Explain what files are generated and so should not be changed by the user * Explain our testing process * Explain the CVS checkin policy ************************************************************************** * Member functions that would simplify use of Sage (future features) ************************************************************************** * Member function in SgFunctionCallExp (and maybe SgFunctionType): // returns true if the function is a member function (false if not a member function) bool isMemberFunction(); // returns name of class for which function is a member (or empty string if not a member function). string get_class_name(); ************************************************************************** * (8/11/03) Work to do before release of ROSE ************************************************************************** SAGE work: 1) Consistant class names and function names 2) Consistant function interfaces (remove some and add others) a. Define new function interfaces 1. getName functions at nodes containing names 2. getList (or "list") functions at nodes containing lists 3. virtual containsList function function returning bool implemented on SgNode 4. "insert" "replace" "remove" functions 1. insert would be implemented as a non-virtual function only at the astnodes where it makes sense. 2. virtual replace function implemented on SgNode (checks node variant and handles case replace on list of declarations vs. list of statements) 3. remove function implemented on SgNode 5. Discuss if NULL pointeres are permited in interface. b. Select existing functions to be removed 1. append, prepend, differnet insert functions in scopes etc. 2. getStatementList and getDeclarationList functions (in favor of "getList" or "list") 3) Fix duplicate (twin) SgInitializedName nodes ******************************************************************************* * (9/12/2003) Program Visualization (How to Use Nil's OpenGL AST visualization) ******************************************************************************* Visualization Software: Visualization of Compiler Graphs (VCG): http://rw4.cs.uni-sb.de/~sander/html/gsvcg1.html#description Graph Drawing Tools and Related Work: http://rw4.cs.uni-sb.de/~sander/html/gstools.html Mkfunctmap (Call graph visualization, shows clustering by file): http://seclab.cs.ucdavis.edu/~hoagland/mkfunctmap.html Types of graphs: 1) Data Structure Graphs * Data structure visualization in support of automated Dump/Restart code generation * Diff of two data structures (useful for EDG updates) * Data Stucture Dependence Graph Given two data structures A and B, A is dependent on B iff data used from data structure B is used to define data in data structure A. 2) Module Dependence Graphs (related to Data Stucture Dependence Graph) * Automated Testing within massive program changes (such as occur in scientific computing) 3) ********************* Example of how to run Nils' graphics display of dot files: 0 bash 1 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/thuerey1/local/lib/graphviz 2 cd /home/thuerey1/rose/dot3d 3 Process dot file for use with interative dot viewer dot -Tdot test2004_11.C.dot > test2004_11.C.pdot 4 ./dot3d pl/prelayout_small.dot 5 ./dot3d pl/prelayout_xxxlarge.dot 9 ./dot3d pl/prelayout_large.dot ************************************************************************** * (9/26/2003) Updates To SAGE Interface (Draft Proposal) ************************************************************************** 1) Rewrite mechanism level 1: insert(), replace(), remove() taking both single SgStatement pointers and a list of SgNode pointers 2) Add member functions to SgNode to permit simple access to the SgProject, SgFile, and SgGlobal objects higher (toward the root) in the AST. a) SgProject* SgNode::getProject(SgNode*); b) SgFile* SgNode::getFile(SgNode*); c) SgGlobal* SgNode::getGlobalScope(SgNode*); d) Maybe also, SgDeclarationStatement SgNode::getDeclaration(SgNode*); e) SgForInitStatement* SgForStatement::getInitializerStatements(); 3) Naming mechanisms (member functions which return names a) string SgType::getName() to return the name of a type (function currently in ROSE/src/transformationSupport) b) string SgFunctionDeclaration::getName() c) string SgClassDeclaration::getName() d) string SgFunctionDeclaration::getMangledName() e) string SgClassDeclaration::getMangledName() 4) Getting next and previous statements (at least one is implemented in ROSE::getPreviousStatement()) a) SgStatement* SgStatement::getPreviousStatement(); b) SgStatement* SgStatement::getNextStatement(); 5) Generating subcollections (using the Query Library and replacing the SgSymbolTable) a) list SgScopeStatement::functionDeclarations() b) list SgScopeStatement::memberFunctionDeclarations() c) list SgScopeStatement::variableReferences(SgInitializedName*) d) others, etc. 6) Symbol renaming (possible only after deleting the SgSymbolTable) a) rename variables b) rename types c) rename functions 7) Things to remove: a) enum Cxx_GrammarVariants (use enum VariantT instead) b) SgC_PreprocessorDirectiveStatement and all derived classes c) SgCommentStatement and all derived classes d) SgFunctionTypeTable (not used by anyone???) 8) Things to rename A. Change Class names a) SgEllipse --> SgEllipsis b) Statements named with suffix "Stmt" to be renamed with suffix "Statement" c) Statements names that don't contain the substring "statement" should be fix to contain the suffix "statement" d) SgExprStatement --> SgExpressionStatement e) SgForInitStatement --> SgForInitializerDeclarationStatement (???) f) Affected SgStatement classes: SgBreakStmt SgCaseOptionStmt SgCatchStatementSeq SgContinueStmt SgDeclarationStatement (see derived classes) SgAsmStmt SgClassDeclaration SgTemplateInstantiationDecl SgCtorInitializerList SgEnumDeclaration SgFunctionDeclaration SgMemberFunctionDeclaration SgFunctionParameterList SgTemplateDeclaration SgTypedefDeclaration SgVariableDeclaration SgVariableDefinition SgDefaultOptionStmt SgReturnStmt SgScopeStatement (see derived classes) SgBasicBlock SgCatchOptionStmt SgClassDefinition SgTemplateInstantiationDefn SgDoWhileStmt SgFunctionDefinition SgGlobal SgIfStmt SgWhileStmt SgSpawnStmt SgTryStmt B. Change Function Names a) Use of sage in function names virtual const char* sage_class_name() const; to virtual const string className() const = 0; b) Variant access function variantT() const; C. Change Enum Names a) VariantT -> D. Change SgBaseClassList to be a list of pointers to SgBaseClass (this detail is an inconsistancy in SAGE III, inherited from SAGE II) 8) Previously defined TODO list* ((8/11/03) Work to do before release of ROSE) 1) Consistant class names and function names 2) Consistant function interfaces (remove some and add others) a. Define new function interfaces 1. getName functions at nodes containing names 2. getList (or "list") functions at nodes containing lists 3. virtual containsList function function returning bool implemented on SgNode 4. "insert" "replace" "remove" functions 1. insert would be implemented as a non-virtual function only at the astnodes where it makes sense. 2. virtual replace function implemented on SgNode (checks node variant and handles case replace on list of declarations vs. list of statements) 3. remove function implemented on SgNode 5. Discuss if NULL pointeres are permited in interface. b. Select existing functions to be removed 1. append, prepend, differnet insert functions in scopes etc. 2. getStatementList and getDeclarationList functions (in favor of "getList" or "list") 3) Fix duplicate (twin) SgInitializedName nodes 9) Member function in SgFunctionCallExp (and maybe SgFunctionType): a) // returns true if the function is a member function (false if not a member function) bool isMemberFunction(); b) // returns name of class for which function is a member (or empty string if not a member function). string get_class_name(); OR string getClassName(); 10) Root directory reorganization The current root directory for ROSE has a lot of subdirectories which make the project confusing. a) src directory does not contain but a little bit of the source code for ROSE b) Analysis directory (what analysis is this and shouldn't several current subdirectories of the root directory go here? c) ExamplePreprocessors (perhaps this should be renamed to ExampleTranslators?) d) ContainerParallelizer should not go into the root directory (I think). If we need a location for projects and the ExamplePreprocessors directory is not good enough, then let's build a PROJECTS (pick a better name) directory. 11) Things to Add (Possible things to add) a) SgFunctionForwardDeclaration Currently a forward declaration is just a normal SgFunctionDeclaration node with a null pointer to a SgFunctionDefinition. It has a valid pointer to a SgFunctionParameterList object but the scope of the SgFunctionParameterList is the SgFunctionDefinition not the SgFunctionDeclaration (since a SgFunctionDeclaration is not a SgScopeStatement object). We could add a SgForwardFunctionDeclaration and a SgForwardFunctionParameterList which would handle these special case. Then the SgForwardFunctionDeclaration would contain a SgForwardFunctionParameterList, but the get_scope() function would return NULL for a SgForwardFunctionParameterList (which is what it does now). It is not clear if this is a problem worth fixing or not. ******************************************************************************** * (10/11/2003) Things to add to SAGE III to support F90 (Initial Draft Proposal) ******************************************************************************** This draft proposal for new IR nodes for SAGE III should open up some discussion. I have based it largely on the documented FORTRAN 90 support withn Sage++ (I had a copy of the old documentation at home). I have been reviewing FORTRAN 90 (I have two books on F90) and so it has been quite interesting. I have tried to separate out what might be new IR nodes and what IR nodes might be reused to stand for both C++ and F90. I am still not certain we want to have a union of IRs within SAGE III, so this detail has to discussed still. We could use ROSETTA to build us a separate F90 IR so that the C++ IR is unchanged. Then the input to ROSETTA would have to specify what is shared and what belongs separately to the IRS for each language. Then the details would appear in ROSETTA (which might be where it should be). Many details have to be discussed and sorted out. But independent of that I think this draft details what needs to exist in the F90 IR. Alternatively it represents what would have to be hidden in the C++ IR (via Markus's suggested possible approach, if we somehow encoded the F90 into the C++ (not cerain that is possible, but we can talk about that)). Since these would build on many existing SAGE III IR classes, I think that they could be added without much trouble via ROSETTA. They contain no F90 semantics so they are not complicated. The trick is only to build these so that all the Whirl IR elements have a place to map to within SAGE III. Since there are similar to the f90 support in Sage++, these should represent a close to complete set of required IR nodes. And of course as Beata pointd out SAGE now starts to come full circle on it's original implementation (but hopefuly more robust this time with leaveraging connections to EDG for C and c++ and leveraging a connection to Open64 for FORTRAN 90). A. Possible new IR nodes for SAGE III (these likely match, in some way, F90 constructs in the RICE version of Whirl (I would guess), if source-to-source processing is possible): SgProgramHeaderStatement : SgStatement // Fortran Program Blocks SgSymbol* name SgBasicBlock* body string getName() SgBasicBlock* getBody() SgProcedureHeaderStatement : SgStatement // Fortran subroutines SgSymbol* name SgBasicBlock* body string getName() SgBasicBlock* getBody() SgBlockDataStatement : SgStatement // Fortran block data statements SgSymbol* name SgBasicBlock* body string getName() SgBasicBlock* getBody() SgModuleStatement : SgStatement // Fortran Module statements SgSymbol* name SgBasicBlock* body string getName() SgBasicBlock* getBody() SgInterfaceStatement : SgStatement // Fortran 90 operator interface statements SgSymbol* name SgBasicBlock* body SgStatement* scope string getName() SgBasicBlock* getBody() SgStatement* getScope() SgParameterStatement : SgDeclarationStatement // Fortran constants SgExpression* constants SgExpression* values SgImplicitStatement : SgDeclarationStatement // Fortran implicit type declarations SgExpression* implicitLists (???) SgStatementFunctionStatement // Fortran statement function declarations // likely want an initialized name list instead of name and argument lists ??? SgSymbol name SgExpresssionList arguments SgBasicBlock* body SgInitilaizedNameList* arguments() SgBasicBlock* getBody() SgStructureDeclarationStatement // Fortran 90 structure declarations SgSymbol name SgExpresssion* attributes SgBasicBlock* body SgExpression* getAttributes() SgBasicBlock* getBody() bool isPrivate() bool isPublic() bool is Sequence() SgUseStatement // Fortran 90 module usage statements SgSymbol moduleName SgExpression renameList SgStatement scope SgMiscellaniousStatement // Fortran 90 "contains" statements, "private" statements, and "sequence" statements SgLogicalIfStatement // Fortran logical if statement SgIfElseIfStatement // Fortran if ... then ... elseif statements SgArithmeticIfStatement // Fortran arithmetic if statements SgWhereStatement // Fortran where statement SgWhereBlockStatement // Fortran where ... elsewhere statement SgAssignmentStatement // Fortran assignment statements SgExpression* lhs SgExpression* rhs SgPointerAssignmentStatement // Fortran pointer assignment statement SgHeapStatement // Fortran allocate and deallocate SgNullifyStatement // Fortran pointer initialization statement SgLabelListStatement // Fortran statements containing lists of labels SgAssignedGotoStatement // Fortran assigned goto statement SgComputedGotoStatement // Fortran computed goto statement SgStopOrPauseStatement // Fortran stop statement SgCallStatement // Fortan call statement SgIOStatement // Fortran input/output and their control statements SgInputOutputStatement // Fortran read, write, and print statements SgIOControlStatement // Fortran open, close, inquire, backspace, rewind, endfile, and format statements SgCycleStatement // Fortran cycle statement SgExitStatement // Fortran exit statement SgSubscriptExpression // Fortran array references of the form lowerBound:upperBound:stride SgKeywordValueExpression // Fortran keyword values in I/O statements, etc. SgReferenceExpression // Fortran const references, type references, and interface references SgVectorConstantExpression // Fortran vector constants of the form [expr1,expr2,expr3] SgObjectListExpression // Fortran equivalence, namelist, and common statements SgSpecPairExpression // Fortran default control arguments to I/O statements SgIOAccessExpressions // Fortran index variable bound instantiations and do loop range representions SgImplicitTypeExpression // Fortran index variable bound instantiations SgTypeExpression // Fortran type expressions SgSequenceStatement // Fortran seq expressions SgStringLengthExpression // Fortran string length expressions SgDefaultExpression // Fortran default SgLabelReferenceExpression // Fortran label reference SgConstantExpression // Fortran array constants SgStructureConstructorExpression // Fortran structure constructor SgAttributeExpression // Fortran attributes SgKeywordArgumentExpression // Fortran keyword arguments SgUseOnlyExpression // Fortran ONLY attribute of USE statements SgUseRenameExpression // Fortran USE statement renamings SgLabelVariableSymbol // Fortran symbols for label variable for assigned goto statements SgExternalSymbol // Fortran symbols for external functions SgContructSymbol // Fortran symbols for construct names SgModuleSymbol // Fortran symbols for module statements SgInterfaceSymbol // Fortran symbols for module interface statements B. Common IR nodes (between C,C++,F90) These nodes would be shared between C++ and F90 (it is not yet complete) SgNode SgLocatedNode SgStatement SgDeclarationStatement SgVariableDeclararionStatement SgDoWhileStatement SgIfStatement SgSwitchStatement SgExpressionStatement SgContinueStatement SgReturnStatement SgGotoStatement SgExpression SgValueExpression SgFunctionReferenceExpression SgFunctionCallExpression SgExpressionListExpression SgVariableReferenceExpression SgArrayReferenceExpression SgInitializedNameList SgUnaryExpression SgBinaryExpression SgSymbol SgVariableSymbol SgConstantSymbol SgFunctionSymbol SgType SgArrayType SgClassType SgFunctionType SgSupport SgModifierNodes ******************************************************************************** * (10/23/2003) Data Structure Analysis ******************************************************************************** Many types of proposed projects within ROSE (or projects requiring compiler infrastructure more generally) operate on user-defined data structures. It is for this broad reason (and many specific ones) that we propose a mechanism to handle user-defined data structures. Currently work is to graph the data structures (Andreas), but the requirements are much more broad that visualization. 1) Data structure visualization (if we can visualize it then likely we can't expect to do much more so this is the first milestone). 2) Data Structure Queries a) Where in the program is data accessed (there are different resolutions of access and different types of access) Resolution scope (is this another type of access?, e.g. global scope vs. local scope) read accesses write accesses object access read accesses write accesses data member access (field) read accesses write accesses 3) structural form of data class hierachy 4) Transformations Grouping code that accesses the same data (for temporal/spatial locality) 5) Data references (local or non-local writes) ******************************************************************************** * (10/31/2003) Ways to extend DOT graphs and PDF output ******************************************************************************** 1) Overloaded Operator Support Additional edges could be added to make clear (and debug) what the support for overloaded operators includes. Basically this reduces the expressions to a simpler expression tree (what you would expect) over what SAGE III provides based on the language constructs (which are rather complex for overloaded functions). The additional edges in the DOT graph could be represented in either a different color or a different line type (dotted, dashed, etc.) 2) High-Level Grammars Coloring of the parts of an application's AST specific to high-level grammars generated from domain-specific libraries. 3) Additional node info in each node (context information). a) Variable names b) Function names c) Unparsed strings for nodes where the string's length would be defined a priori. 4) Need way to add options to the graph (not just nodes and edges). ******************************************************************************** * (11/21/2003) Current errors ******************************************************************************** Can't compile tutorial/database directory Can't compile Projects/DataBase directory ******************************************************************************** * (12/9/2003) Add unparse_info object to control backend ******************************************************************************** Andreas's idea to handle graphing of data structures defined in libraries by unparsing the header files with the application code. Talk to Andreas. ******************************************************************************** * (12/19/2003) Current errors with new version of EDG (v3.3) ******************************************************************************** * empty file with coments loose the comments when unparsed ******************************************************************************** * (12/19/2003) Current errors in AST (reported by Nils) ******************************************************************************** get_scope used in replace basic block of for statement assumes that the current scope is returned if called on a for statement. Need to update where get_scope is used in such functions so that it's new semantics is not a problem. "return 0" can't replace return statement in int main() { return 0; } intermediate file's function prefix defines function to be void. Explain why the synthesized attributes for the AST rewrite mechanism have to define a copy constructor, operator=, and operator+=. Need to document this and explain what happens if these rules are not followed. ******************************************************************************** * (12/23/2003) Current errors in AST (reported by Andreas) ******************************************************************************** Anonymous typedefs in unparser save a state which is problematic when debugging. If unparseToString is called then the structure in an anonymous typedef if output but then it is not output in the final unparsed file (rose_.C). This is because state is saved internally in the SgUnparse_Info object. problematic field value: static SgTypePtrList p_structureTagProcessingList; The fix should be to not have this be static, but I can't be certain. There was some reason why it had to be static in the first place. Perhaps because it would be copied by copy constructors and the state across copy constructor calls had to be maintained (something like that). ******************************************************************************** * (1/12/2004) Documentation Requirements (reported by Beata) ******************************************************************************** DONE: Specify where to get DOT (GraphViz) and Doxygen software used in ROSE developers (added where to find MySQL as well). Add mechanism to pass command line options to the EDG frontend. Something like -edg: