This is a design document, it details the design of rtop version 0.1 guided by these requirements. This is meant for use by any developer who wants to understand/modify the rtop code. There are some prerequisites for this document to be well understood. It is assumed that
- developer is familiar with system process monitors in general, whether on Windows (Task Manager) or Linux (htop, top, etc).
- developer knows what a Linux API is, knows what we mean by POSIX or SuSv3 compliant
- developer is familiar with C/C++ and data structures including lists, hash tables and trees
This document talks about two things - the architecture and design of rtop. In the architecture section, a broad overview of the main subsystems of rtop is provided with references to important requirements about the visual structure and functionality of rtop. Also provided is a list (in some cases with rationale) of the tooling used in this project, so that the developer can jump start his/her changes with minimal tooling research or replace some tools based on his/her preferences. The design section details the subsystem design. We detail how each subsystem achieves its goals by describing in detail the rationale behind its decomposition into classes to achieve the desired functionality. Design alternatives will be discussed where ever possible (??). The language will be terse and diagrams will be used as a substitute for words.
If you want to try rtop's features to get a feel for things, see this README for installation instructions
What is a text user interface? Let's break it down. It is a user interface - the user can interact with it (i.e. manipulate it and also receive visual feedback). The visual feedback is all text-based - all you can see on your screen is text. It may be tabulated/highlighted but that is pretty much the extent of it. When interacting with it, the user will typically be navigating across the screen and selecting some options. However, unlike the graphical user interface, navigation is restricted, in the sense that one can only move in quanta of horizontal characters and vertical lines on the screen. Text user interfaces (TUI) are commonly used on today's terminal emulators which emulate character-based visual displays which did not have the capability of pixel-level resolution. The limited spatial navigation capability may sound limited, but you may find that many text-based user interfaces are quite rich - using just text highlighting and structured layout, a text user interface can have tables, hidden menus and so forth.
So what does it take to build such a text user interface? The following sections discuss the major points in building the text interface for rtop which is deployable on Linux, specifically Ubuntu 16.04 LTS Desktop version
TODO??: should we relate our design to MVVM or MVC (don't even know what they are?)
Every text interface can be thought of as consisting of multiple building blocks each of which performs a specific function. There are three major concerns we are trying to address when building a text user interface.
-
How to build a text interface on the Linux systems? How to display information on a Linux system? How to acquire input from the user's keyboard and mouse actions?
-
Second is about the specifics of the application or what is commonly referred to as the business logic. How to acquire and store information about processes? How to present this information?
-
How to design the system in a maintainable, extensible and clean manner?
We address our concerns by first assigning them to different subsystems, which are described below
There are many ways one can obtain information about a process's properties. The Linux kernel maintains this information in kernel data structures. Linux offers an API (see procfs in design section) that allows users to access this information by reading process specific files. This information must be acquired at regular intervals or asynchronously (on user input), stored at one place, from where it can be efficiently accessed or manipulated (for example sorted).
System interfacing module's job is to provide a simple API (to be used by other modules) that abstracts away the details of how the process specific information is acquired.
One good reason for hiding the OS specific details of how process information is accessed is portability - different revisions or distributions of the Linux kernel store processes information differently and make is accessible using different APIs (??citation). The other reason is that it makes the development work more modular and therefore efficient.
This module concerns itself with how the information is laid out (how many views? what is in each view?). This part follows from the user interface requirements. It also concerns itself with error handling. How to manage wrong input and conditions that can potentially cause the system to crash? How to recover from such inputs and crashes? Finally, it concerns itself with usability. What are all the features accessible to the user?
So the business logic module draws heavily on the functional requirements - what the user can/cannot do. What is bad input? , how is bad input handled? This module also orchestrates the actions of the other modules to achieve the desired functionality.
The reason for existence of this module is to separate the rules/logic from the object on which they act. This module is similar to the model in the MVC design pattern.
This module is concerned with managing the mechanisms underlying the display of text user interfaces on Linux systems, and how to acquire input through mouse and keyboard. It is also responsible for ensuring that input acquisition and output generation are sufficiently fast. For example, there should not be too much of a delay from a key press to it being detected by the system, nor a delay between instruction to display information and it being displayed on screen.
This module's job is to abstract away the lower level details such as configuring and managing terminal emulators on Unix-like system and provide a flexible API that allows one to build a sufficiently detailed layout of the interfaces, one that can be extended easily. One important benefit that this module allows is simplifying the task of the business logic to create layouts and capture user input. This simplifies the business logic significantly and hence reduces the overall complexity of the system. Secondly, it increases the portability of the business module, by allowing the business module to use the same API across different terminal emulators and different systems. If you want to display the text user interface on different terminal emulators, you only need to generalize this module to handle that, the business logic remains the same. You can view this module as an amalgamation of the view and controller in the MVC design pattern.
This subsystem is concerned with how to handle error conditions generated in the various subsystems and the overall operation of the program. It concerns itself with providing information to the developer/user to enable them to find the sources of errors and the actions that need to be taken to mitigate them. The error handling depends on the specifics of the function/class etc, but in general, this module will try to do one of three things:
- ignore those errors
- take some corrective measures
- log and crash so that some debug information is accessible to the developer
This module contains logic that recognizes error conditions, defines error messages, defines log formats and interacts with logging framework. The development of this module progress with design as more error conditions are uncovered.
So this module's job is to define errors and handle them in appropriate ways - such as ignoring, exiting the application, logging, etc. Logging (read more on logging in design section) is a minimum requirement.
The most important benefit that error handling provides is proper logging, which helps in uncovering bugs during development as well post shipping.
To better understand the working of rtop, it is important to understand the interaction between its different subsystems. The discussion is at a very high level, and the goal is ??
This is a somewhat passive activity where the user can view the periodically updated process information. Occasionally, the user may interact with the view by scrolling through the process list or clicking on a specific property list to change the sorting criteria.
The data flow during the 3 use cases is described below:
- During a periodic update, the business logic initiates a read of system information from the system interfacing module. It is estimated that ~25-100KB of information travels between the system interfacing and business logic modules (25 property fields per process, 10 characters per field and 100's of such processes). The data is then transferred from the business logic to the user interface module for display. There is also information travel from the application's system interfacing module to kernel databases (not shown in Fig 1)
Fig. 1 Inter-module data flow during a periodic graphical update
- Scrolling the process property list: When the user is scrolling through the process property list, the ↑/↓ key input information travels from the user interface module to the business logic module. The business logic module performs the required changes to its representation of displayed data and sends back the new position of the scroll cursor to the user interface module for display. There is no data flow into or out of the system interfacing module.
Fig. 2 Inter-module data flow during scrolling
- Sorting the process property list: When the user mouse clicks a particular property to change the sorting criteria, the click information travels from the user interface module to the business logic module. The business logic module instructs the system interface module to acquire new process information (why read process information for a sort request?) resulting in data flow from the system interface module to business logic. The business logic module sorts this information and passes it onto the user interface module.
Fig. 3 Inter-module data flow during sorting
The user can configure the user interface so that only the relevant process properties are viewable. Further, the user can also reorder the property lists for more suitable viewing. To accomplish these objectives, the user carries out a series of key presses as described in rtop demo in README
During all such activities, the data flow only occurs between the user interface module and the business logic module. As soon as the business logic receives a key input, it updates its internal database and sends fresh display information back to the user interface module. The data flow would be similar to the diagram corresponding to the scrolling use case (Fig 3).
TODO??: revise
We would like to build upon the exiting version viewing of additional information.
- is to allow the system to [add new process properties to view]
- add additional views for configuring other properties.
- add additional views for viewing overall system resource usage information, and associated view for viewing those.
- add additional functionlity in process view that allows the user to filter/signal processes, or view them in some alternate format altogether
Interface layout can be specified in text files. This is standard practices in most user interface technologies - whether .NET or Xamarin forms or anything else. This makes the task of specifying additional views in termns of layout easy. Offcourse any additional view does not exist in vaccum, we must hook it up appropriately so that it connect to some business component or data source appropriately. Therefore we must develop a hooking infracstructure that should allow us to easily accomodate new views
Any new view should allow some intraview navigation , for example can be scroll in it, how do we navigate between the different panels within it and so on. There should be some framework, according to which we build a view, that is what can and cannot go into it.
The source code will be implemented exclusively in C/C++. The terminal interfacing code is written using the Ncurses library API (which is in C). It is called from within C++ (how??, is this portable??). Build scripts are written in make compliant language. Configuration files are written in XML (v1.0, UTF-8).
The reason for choosing C++ (as mentioned above) is to get familiar with it. An important advantage is that C++ has excellent support for object-oriented design which is a must have for building complex text user interfaces.
Since this project aims to deploy this application primarily on Linux based systems, Linux system calls and GNU C calls are used to read process information inside of the kernel database. The choice of specific API calls is dependent on whether they require special privileges, making them either inaccessible from either user space or by a non-admin user. The other consideration is that their use should not jeopardize portability across different Linux systems and versions (near future and past Linux distributions). To avoid such problems, an attempt will be made to follow best practices such as using POSIX compliant system calls.
TODO: need to add specific guidelines
Google C++ style guidelines will be followed. Some of the important ones have been listed below. The main purpose is to ensure good maintainability
- google c++ style guidelines, source tree hierarchy
- namespace organization
- class naming, class declaration
- class public, private organization
- class method naming and font
- class method arguments naming and font conventions
- standalone functions, globals, static
- commenting and doxygen documenting: comments on classes, functions, globals, standalone functions
Git has been used for revision control. Since this is a single developer project, most features are added sequentially. Features are developed on the dev branch, this includes code implementation, testing and documentation. Once a feature is reasonably complete which means its code is working and tested well and its documentation is in good shape, it is merged into master/release branch. This merged commit, is a new release which is identifiable using a tag. The tags are named according to a numbering scheme of the form x.y, where x is version number followed by y which is features number. This process is clarified through the diagram below
A not so uncommon scenario is that a developer may get stuck on some particularly tricky feature while progress can be made on some non-interfering feature. In such a case, a temporary branch is created off the dev branch (not shown in the diagram). Once work is complete on the temp branch, it is merged back into the dev branch (NOT master)
A user or developer can access the release commits using tags that specify version numbers. Each release has its README, REQUIREMENTS and DESIGN documents. REQUIREMENTS and DESIGN documents corresponding to the same version of software to which README belongs are accessible from hyperlinks within the README
Debugging is performed using GNU debugger >= gdb v7.11. Memory leaks are checked using valgrind tools suite. Logging is performed using the Boost logging library. Further details on logging system are provided here
?? How have we done the debugging ?? lcurses_g option
TBD
Releases are available on the master branch rtop project repo. Each release contains the source code, installation/use instructions, requirements, and design document, provided under MIT License. The source code organization is described below
# source tree
.
├── bin
├── config
├── docs
├── include
├── libs
├── README.md
├── src
└── tests
Source Tree Object | Description |
---|---|
src | folder containing source code i.e. the .cpp files |
include | folder containing header files |
src/Makefile | make recipe file |
bin | top level directory of build tree |
config | folder containing configuration files |
tests | folder containing testing code |
In the future, an appropriate package distribution method will be used for each targeted Linux distribution
TODO??:
- add license notices to source code
- also need to check if license is compatible with license of tooling used.
The major criteria for selecting a logging system are as follows
- should support formatted output
- should print thread specific information
- should have a C++ API
- should be open source, ideally copy-left like GPL, but non-copy FOSS is also fine (a requirement for using 3rd party code)
- should be reliable and well supported (as indicated by a large user base)
- should have a stable API (fewer problems for maintaining code base)
- compatible with Linux and other Unix-like systems
For this project, Boost.Log package has been chosen for logging.
The development was done on an Ubuntu 16.04 LTS, with the following tools
- gnu compiler collection: >= gcc 5.4.0
- vim editor: >= vim 7.4. 7.4 or greater. This is a personal preference. Any editor suitable to the developer could be used
TBD
-
Ncurses: TBD
-
Boost.Log: TBD
-
Pugixml:TBD
This section dives into the design of rtop. It will serve as a guide for current or future developers, by helping them understand the source code, add new features and improve application performance. The discussion begins with a bird's eye view of the system and then dives deeper into implementation details (with rationale) of specific modules, classes, and functions. The system function is illustrated through studying the interactions amongst rtop's components while exercising important use cases.
From a systems perspective, at the top-most level, rtop interacts with 3 external entities (see diagram below) They are a 1) a persistent database, 2) process file system (procfs), and 3) the user. The rtop application loads information from the database during initialization and stores information into it before shutdown. procfs provides process information. The user consumes data by viewing the results displayed by rtop. The user also provides commands to rtop through its text based user interface.
Fig. 5 Data flow between the external environment and the rtop application. User and rtop exchange data through the keyboard and screen. rtop interacts with persistent database on the file system to access configuration information. It reads process information residing in kernel databases using the Unix's /proc file system API
At a lower level abstraction, the rtop application is decomposed into submodules (as mentioned in the architecture section). These modules are depicted in the diagram below
Fig. 6 Data flow between the different submodules of rtop, and between the submodules and external entities. Business logic reads configuration information and initializes all other modules. It frequently interacts with both the system interface to read process information and the user interface to display that information. The user interface interacts with the user and the system interface reads information from the process file system, procfs
Before diving deeper into architecture and design, it will be helpful to have some perspective on the meta design goals - those that fall into good design practices.
Maintainability - Software modules should have a clearly defined purpose. The classes within these modules should do one thing and one thing only - single responsibility principle (SRP). The interfaces should be thin (few arguments). Code should be well commended.
Extensibility - rtop will be extended in many ways. One way would be to make it work across different Unix derivatives, another important way it may change is by addition of new features or modification of existing ones. These include allowing the user to change key bindings, to use rtop exclusively through mouse/touchpad or a keyboard. Thirdly rtop may undergo internal changes to its data structures/algorithms for tweaking performance. The system should be easy to extend if one is trying to achieve any of the above objectives.
Clean design - rtop will lean towards simpler design - simple data structures and algorithms when there is no significant benefit of using a more elaborate one.
Performance - at this point in time, specific performance criteria have not been defined. In general, rtop's performance is deemed adequate it it's able to display information at regular intervals and uses memory within limits (say a few megabytes)
The following are design decisions that sit at the boundary between architecture and design. They govern important ways in which we approach the various design problems we face
The application has two major activity loops that run on independent threads. One loop is responsible for periodic graphical updates while the other handles the user's key inputs. Handling user's key input in a separate thread allows optimization of user experience by ensuring that none of the user's key inputs are missed because the system is busy performing graphical updates. If the CPU is switching between the two threads sufficiently fast, the key input loop will be able to capture all of the user's key inputs.
Fig. 7 Activity loops of rtop. Each loop runs on a separate thread.
It is important to be aware of the limitation of this scheme. Although key inputs are being captured on a separate thread, a similar problem as running everything on one thread is encountered. If the key press results in a time consuming graphical update, the key input thread will be blocked (inside of the resolveKey(int key)
function). Any key input that arrives during this time will be lost. If the user is pressing the keys frequently, the interface will appear unresponsive. So the question then is - how much time does the thread have before it should be available again looking for key input?
For answering the above question, one must estimate the fastest frequency at which the key inputs can arrive. Reports (1, 2) suggest that for typical keyboards, key input speed ranges between 30-60 key presses per second, which translates to 1 key press every 15-30 msec. The key input speed was measured on the development system, and found to be approximately 30 key presses per second.
Here is the reference code for doing that
// using ncurses api
int main(){
// ncurses initialization
initscr();
raw();
noecho();
keypad(stdscr, TRUE);
set_escdelay(100);
int c;
int count_arrow = 0; // holds number of key presses
while(1)
{
c = getch(); // capture key input
if (c == KEY_F(10)) // exit on F10
break;
count_arrow++;
}
endwin();
std::cout<<"KEY_DOWN pressed "<<count_arrow<<" times\n";
}
Note: While the program was running, a key was pressed (not F10 since that exits from the program) and kept pressed continuously for 10 seconds. Then, F10 was pressed to exit the program. On exit, the program prints the number of key presses registered in the 10 second duration. Note that the measurement loop wastes little time doing anything else, and most time is spent waiting for a key input. The measurements are not underreported due to the slowness of the measurement loop - it is quite obvious that the measurement loop is much faster. The computation performed in the loop - one comparison, and one increment - doesn't take more than a few cycles, and on a GHz CPU, a few nanoseconds at most. Even if thread switching is taken into account, it should not take more than a few milliseconds. So, the above measurement loop should correctly report key input speeds upto 100HZ or more. Another assurance comes from the fact that the numbers align well with the above mentioned reports
To understand rtop's design, it is necessary to understand the somewhat abstract idea of key's context. The application should be viewed as consisting of different contexts. Actions such as exiting the application or viewing a separate view (a view covers the entire terminal window) belong to the application context. Actions such as navigating within a view by accessing its different panels ( a view is divided into several spatially separate regions called panels) belong to the view context. Finally, actions such as browsing/modifying an item list within a panel belongs to the panel context.
Therefore, each key has a context in which it makes sense, and the context alone knows how to interpret the key. Also, note that the same key may have meaning in multiple contexts, therefore a single key can lead to actions in several contexts.
So how does rtop actually interpret the key? The information required to interpret the key resides in a UI object. A UI object may be required to resolve more than 1 key, it certainly may be required to perform several actions on receiving a single key input.
One potential strategy is to devise a specialized UI object that handles a particular set of keys and carries out a particular set of actions. Adding a new key-actions pair requires the creation of a new type (of UI object). This would lead to a proliferation of the types of UI objects based on the number of key-action combinations (which can run into 100s). This is clearly not scalable for maintainability.
Another strategy is to allow the UI object to hold this information in another object, one similar to a dictionary. This dictionary object represents key context, and it can be identified using a universal unique identifier (UUID, see its use later callbacks discussion). Importantly, it contains an actual dictionary (a hash table), containing a list of key-value pairs, where each key represents a key input, and a value is a list of actions. Note that there can be more than one action associated with a single key value. You'll find the implementation details in the discussion on the key dictionary. Let's consider an example to clarify the idea. Assume, that a key input arrives at UI object. On this input, 5 different actions need to be performed. The key dictionary associated with this UI object is asked to resolve the key (its a little more complicated, see state machines, which it will do by looking it up in its lookup table. If the key is found, all the actions associated with it will be invoked in a sequential manner.
This strategy offers a major improvement over the first one. One can easily define a new key dictionary object that encodes a certain behavior. These objects can then be used to initialize the UI object at the time of application startup. There is no need to create new types of UI objects, only a single type, that can hold different key dictionary objects, and able to resolve different sets of keys. For example View1
object of type View resolves F1 a certain way, and View2
object of the same View type, resolves F1 another way because they hold different key dictionary objects.
The key dictionary mechanism described above has to be a little more general, this is needed to address the following problem - association between a set of keys and actions is not fixed. This is a very common situation, for example when one presses a specific button on a digital watch, watch screen contents change, and now the same button has a different meaning, infact many different keys have their meaning changed.
This problem is handled using a state machine, instead of a KeyDict
associated with a UI object, we have a state machine of KeyDict
objects associated with each UI object. The states in the state machine are the KeyDict
object UUIDs.
State machines are also used for other purposes in the application. There are state machines for UI objects, specifically state machines for View
and Panel
types. These state machines help the UI object determine as to which of its child object has focus
A state machine's API does 3 basic things. It allows initialization of the state machines' transition table, performs the transitions based on some input - specifically key input and allows accessing state information by other modules in the application.
TBD
Although performance optimization is not a priority at this time, there is one area where upfront optimization is well advised. To understand its need, it is important to understand how NCurses performs graphical updates.
Ncurses introduces some abstractions between the data to be put on screen and actual physical screen. The crux of the matter is to use these abstraction properly so as not to unnecessarily slow down the graphical updates.
NCurses internally maintains a representation of the physical screen (the state of the physical screen when last updated), and a virtual screen (which is the screen's state that the user wants to render). Ncurses also has an abstraction called WINDOW, which is an object that represents a section of the screen, to be more accurate, a WINDOW represents a section of the virtual screen. A screen is usually subdivided into multiple windows - we call this a tiled screen because the windows are non-overlapping. rtop's screens are tiled screens. So let's say you have a tiled screen, you modify a small region of the screen corresponding to one or more WINDOWs and now you want to put the information onto the physical screen (top row Fig. 8 )
The function to perform this update has the following prototype
wrefresh(WINDOW* win);
On each wrefresh
two internal functions, woutrefresh(WINDOW* win)
and doupdate()
are called, the first one puts the information in the WINDOW data structure onto the virtual screen. The doupdate
then compares the physical and virtual screen and pushes the difference onto the physical screen. So instead of everything being redrawn, only the region specific to the changed window is updated. This brings us to the first optimization
Fig. 8 Data flow during graphical updates using NCurses API
Optimization 1: Divide the screen, into separate windows, such that separate activities map to different windows. This allows us to update only specific windows and reduce the cost of physical screen update. This sequence is represented in middle row of Fig. 8
It was mentioned above that each wrefresh
corresponds to two actions. If two window updates are happening relatively quickly (so that the user may not notice), one should avoid calling wrefresh
for each window. This brings us to the second optimization
Optimization 2: Where ever possible instead of multiple wrefresh
, do multiple woutrefresh
and a single doupdate
.
rtop runs on terminal emulators rather than real terminals, and thus unlike real terminals, the user can resize the screen. This user action is not handled by the terminal emulator, meaning that although the screen size is changed (by the windows manager), a signal (SIGWINCH) to the client process is sent. It is the job of the client process (in this case, rtop) to handle this appropriately. If we don't notify rtop, it will continue to assume the old window size and will try to render information on regions that do not exist anymore.
This behavior has been verified by running rtop in a debugger (such as gdb). When the debugger was configured to handle the signals and run, the following output was generated
> handle SIGWINCH print
> r CONFIG_FILE # run rtop_v0_1
# window resized
Thread 1 "rtop_v0_1" received signal SIGWINCH, Window size changed.
terminate called without an active exception
The solution to above problem is to handle the signal. The signal handler scales and repositions the layout based on the screen size information. It is important to keep in mind usability by preventing extreme resizing of the screen - very small screen size is not useful for reading information.
Process information is accessed using the proc file system, available on all Unix based systems. On Linux, the process information is distributed amongst multiple text files inside subdirectories of /proc
directory. There is a unique subdirectory per process that is named with the process id (PID) value. For example to access information associated with the init process which has PID of 1, files in directory /proc/1/
should be accessed.
rtop draws process information from 3 files - /proc/pid/stat
, /proc/pid/cmdline
and /proc/pid/status
. The discussion below focuses on the race conditions that might be present when accessing the proc file system. For details on how to read a particular text file, refer to its format information.
The algorithm for accessing and parsing process files is pretty straightforward. The contents of the /proc
directory are read using the opendir
and readdir
system calls. The opendir
call provides a pointer object that allows scanning through the directories' contents. This pointer is passed to readdir
which performs the scan through the /proc
directory, visiting each process specific subdirectory in turn, accessing files inside of it and parsing them. However, since processes are getting added and removed asynchronously, one has to careful in using these system calls and understand the guarantees they offer. The Linux Programming Interface by Kerrisk, 2nd says
If the contents of a directory change while a program is scanning it with readdir(), the program might not see the changes. SUSv3 explicitly notes that it is unspecified whether readdir() will return a filename that has been added to or removed from the directory since the last call to opendir() or rewinddir(). All filenames that have been neither added nor removed since the last such call are guaranteed to be returned.
This forces us to add logic to handle situations where subdirectories are getting added or removed from /proc
directory. One must also understand that files inside the proc file system are not actual files, they are just an abstraction, what is actually happening when a file's information is accessed is that it is read immediately from the kernel database. It is also implied that the directory /proc
is populated when it is opened for reading - sort of like a snapshot. When the /proc
directory is scanned using readdir
, we inspect each process's pid directory. It is at this time that the /proc/pid
information is updated again. If we try to access the /proc/pid
, but the process by that pid does not exist, the open call will return an error. New process may be also added while scanning /proc
using readdir, in such a case, one may not scan through it and miss it.
A summary of all the different scenarios, their implications and the decision that algorithm makes to handle them are described below:
Table 1
Scenario | Analysis | Decision |
---|---|---|
after reading process files in /proc/pid/, but before completing scan, process is killed | the algorithm will print the information, however it will be a bit outdated, but most likely the user won't notice. most certainly it will be removed at the next scan | no change, works as is |
after opening /proc and beginning scan, a new process is added. readdir does not guarantee that it will show up in the scan |
if the algorithm misses it, it will be read at the next scan. if it's picked in this scan, that's a good outcome too | no change. works as is |
process subdirectory was present when we opened /proc but died before scan reached it |
if the algorithm tries to open file in /proc/pid , it will return an error. on error, algorithm interprets the process to have died, and ignores it |
no change. algorithm correctly ignores the dead process. works as is |
process /proc/pid was read. before scan finished, it was killed, another process added by same pid, and then /proc/pid was read again before scan completes |
this is highly unlikely because linux does not assign a recently deceased process's pid to another process for some time, while the read finishes much quicker on 10s of msec. If it does happen, probably the dead process's information is overwritten with alive (new) process information | no change. works as is |
a process dies during reading of /proc/pid files |
if algorithm is reading the last file, it would not report any error and report the process as alive - same as 1st case. if algorithm is reading the 1st or 2nd file, it will complete the read, then when it goes onto next file (2nd or 3rd), it will report an error | only in the case that last file is being read do we report an alive process. it will be updated at next scan. no change works as is |
Below is the algorithm pseudocode
read procfs:
get pointer to /proc
for each /proc/pid in /proc:
read /proc/pid/status, /proc/pid/stat, /proc/pid/cmdline
if no error in opening any of the files:
store information in database
Any process that died after its files in /proc/pid
were read is considered alive, and its information is displayed.
Before moving onto the description of the application functioning, it is useful to get familiar with some important classes that make up the application. It helps to know about them as when data flow in different use cases is explained, things start to fall in place easily. You can skip ahead to use case analysis and come back refer back to this section.
Three types of user interface classes, cooperate together to handle key inputs from the user and graphical update requests from business logic modules. These classes are organized in a hierarchical fashion. ScreenManager, sits at the top of the hierarchy, it references one or more objects of the View type. At the lower most level is the SimplePanelData type (which contains data and methods associated with a panel). A View
class references one or more SimplePanelData
class objects. There is also a specialized panel type called ProcViewPanel that references other simpler panels such as BrowsePanelData.
These class relationships are depicted through the class diagrams below. Note that in these diagram references to other types of objects are also made, which will be discussed shortly
Fig. 9 UI Classes and their relationship to other UI classes and StateMachines
These classes functionally belong to the UI module, and play central role in key input processing. Each of the UI class consist of key dictionary state machines(see below) that help these classes resolve the key input. They also consist of child UI object state machines (such as a View
in case of ScreenManager
or a SimplePanelData
in case of a View
), which helps the UI object determine which child UI object is in focus, and thus the receiver of key information
As mentioned above, the raison d' taire of a key dictionary is to hold keys and their associated actions. The KeyDict class implements the key dictionary. Any UI object is able to resolve keys because its consists of state machines of KeyDict
s. The KeyDict
object essentially consists of a hash map of keys and their associated list of actions (implemented by Action class). Thus a key dictionary encapsulates all possible actions associated with all keys active in a given state of a particular context (by context we mean the ScreenManager
, View
, SimplePanelData
). If the state changes, so does the key dictionary
Action is a data structure that holds the UUID of an object (an object can be of any type) and its associated method name, which is the name of some public method of the type to which the UUID identified object belongs. These pieces of information are required to invoke any method associated with a specific object as explained in the discussion on callbacks
The working of a key dictionary state machine is illustrated by the following example and diagram below
Fig. 10 Transitions of a ScreenManager
's KeyDict
state machines. The key dictionaries associated with the states are show at the left and right ends.
In the diagram we see how at the screen level, key dictionary state machine works.
In the initial state, we have a key dictionary where only two keys makes sense, these are F2
and F10
. F2 changes the view from the current one to a different one, but it also changes the key dictionary where F10 /F2 are not recognized. If one presses, ESC
they return back to the start state. On pressing F10 in the start state, one simply exits the application.
All keys associated actions are eventually invoked by calling the invoke
method of object of type MemFuncPDict. This object contains hashmaps that stores two types of information. One hashmap has object UUIDs as keys, and object pointers as values. The other type has function names as keys and function pointer as values. There is one hashmap for each object type, and one hashmap for storing methods that belong to a certain object type.
The MemFuncPDict
object is initialized at application startup by registers specific objects and their method.
Fig. 11Design of MemFuncPDict
. The two types of dictionaries it holds are shown
When the Action
object is passed to the invoke method, it uses Action.uuid
to determine the object, and Action.func_name
to determine the function pointer, and combines the information to invoke the public method of the object.
Method pointers are referred to as callbacks, the benefits of having callbacks is that any UI object using its key dictionary can call any object anas its associated registered method.
MemFuncPDict
is part of the machinery responsible for resolving the key inputs.
The rtop design envisions multiple threads, which may all want to write to the internal process database. Such a scenario is a recipe for creating race conditions. Therefore process database has been put under lock, any process that needs to write to the process database must acquire the lock and release it when done. One must note that this locking mechanism has not been optimized.
For logging, Boost.Log has been used. In the following discussion, its use as a debugging tool is described. If you want to understand Boost.Log's API, refer to its documentation
The most fundamental piece of information that logging provides is function call nesting. To achieve that, entry into and exit from every function of every class is logged. To accomplish that one needs to add the following code at the beginning of the function body
log_spacer.addSpace();
BOOST_LOG_SEV(lg, debug)<<log_spacer<<"log_message";
Since calls are heavily nested, it often becomes diffcult to read for a deeply nested call stack, what belongs in the stack and what is outside. Therefore, log messages were customized by passing a special object of type logSpacer to the logger. This object keeps track of white space added to/deleted from it. When a function is entered, space is added to this object, when this object is passed to the logger on the subsequent line, that space is prepended to the log message. This has the effect of indenting the message to the right. If the calls are nested, their log messages will be shown clearly nested due to the indentation.
Two crucial pieces for information are necessary to debug rtop. One is the specific thread on which a particular action was carried out, and second the temporal order of actions. Therefore we need both thread identity and time stamps. These facilities are provided by the Boost.Log library.
An important feature of logSpacer
is that it keeps the white spaces (for indenting) associated with each thread separate, so function call nesting within threads is correctly indented in log messages.
For indenting to work properly, a log message added at the end of the function body must delete the white space. This is accomplished by writing the following code just before the exit point
BOOST_LOG_SEV(lg, debug)<<log_spacer<<"log_message";
log_spacer.delSpace();
If a function has multiple exit points, these exit code snippets must be added before each.
Boost.Log also allows setting of logging levels, which means that one can make it more or less selective when logging information. If most debugging is complete, and performance needs to be optimized, the logging level can be set to the most selective i.e. FATAL, this will log minimal information. If more debugging information is desired, it is set to DEBUG. Offcourse, the developer must pass the appropriate log level to the logger, so that messages are printed appropriately. Using the FATAL logger for all messages defeats the purpose of using selective logging.
Configuration refers to two types of information, first refers to the system and the other refers to the application. System configuration includes such pieces of information as the number of CPUs on the system, operating system API and capabilities etc. This piece of information must be know before application start. Application configuration includes such pieces of information as key encodings which are used to initialize the key dictionaries.
The system configuration is managed in two ways
Some types (belonging to the business logic) are responsible for introspecting the system at start time. They parse system specific information and use this information to apporpriately initialize the different class objects
Other types (also from business logic), parse configuration files, that describe the user interface layout and the key dictionary definitions that are used to initialize the UI objects. These configuration files are distributed with the application.
The benefits of having having system configuration information that any changes in hardware/platform in the future, will only require us to make chnages to code responsible for doing introspection, we can build the UI logic independent of that. So it improves our applications portability and maintainability. The second configuration management allows us to tweak our UI layout without any code changes.
Currently there is a single file that holds UI related information, and its parsing logic can be much improved.
The case of ESC key
To process each individual key press, NCurses API is used to the set the terminal in raw mode. In this mode, each key input is processed individually, as opposed to line mode, where one line (indicated by series of character followed by end-of-line character) is processed at a time.
Ncurses defines easy to use macros for key value corresponding to special keys, for example KEY_DEL
for DEL key.
However, it should be noted a special key is not a single character value, instead it is a sequence of values, the first of which is the value corresponding to the ESC key. So whenever an ESC character is encountered, the system is configured such that it waits for the following character, to interpret the special key correctly. If it times out while waiting, then only the ESC key is reported. This makes processing of ESC key somewhat delayed. A user would experience a perceptible delay between pressing an ESC key and the associated action. This delay however is tunable, one can reduce the waiting time down to 100 msec, so that it is imperceptible to the user. One however has to make sure that under no circumstances a false negative is generated i.e. a special key is pressed, but the system does not wait long enough for the character after ESC
, and prematurely declares that an ESC key has been detected.
Mouse click processing
Mouse click events are also handled by the key input thread. Ncurses has a special key code for mouse click, this key, called KEY_MOUSE
is defined as a constant. You capture it just like any other key, if you click any key on the mouse, the int getch()
API will return key = KEY_MOUSE
.
To recognize and process individual buttons on the mouse, you need to define them using the mousemask
API, like so
mousemask(BUTTON1_CLICKED, NULL);
Ncurses makes it easier to define these mouse button masks, by defining constants in the ncurses library. You could OR them, to allow the system to process multiple events.
After getting the key, you need to obtain the specific event associated with a specific button press. Ncurses stores the specific event on a mouse button press internally. To access it, you pass getmouse(MEVENT*)
API an event object. On successful return, the event can be found in the passed MEVENT
object. Then you could AND it with specific masks to determine the kind of button pressed by the mouse.
case KEY_MOUSE:
if (getmouse(&event) == 0) // data OK
{
if (event.bstate&BUTTON1_PRESSED)
printw("BUTTON1 PRESSED\n");
if (event.bstate&BUTTON1_RELEASED)
printw("BUTTON1_RELEASED\n");
if (event.bstate&BUTTON1_CLICKED)
printw("BUTTON1 CLICKED\n");
}
Four use cases will be considered. First one tackles the periodic graphical update that is happening continuously every few 100-10000 msec depending on the application configuration. The second use case describes the chain of events when a user provides a key input. The third use cases deals with data flow between elements of the user interface and the final use cases describes how process data base is accessed when special actions such the sorting of displayed information is requested by the user. In each case, we describe the class APIs that are exercised, and the flow of data between the classes involved in generating that behavior.
There is no user input. The application is showing the process view panel, which is being periodically refreshed. This activity is handled in the thread responsible for periodic updates. The periodic refresh is achieved through a two step process, first the Columns class object performs a database update, and second, the top level UI object i.e. the ScreenManager
is asked to perform a graphical update
pclms->read();
screen->refresh();
The dataflow that occurs during the database update is summarized in the diagram below
Fig. 12 Data flow during periodic graphical update
Columns::read calls ProcInfo::update. It passes on a list of string values that are names of process properties that need to be updated. There are potentially 10's of properties, but the method only passes properties that are visible, this is done to not unnecessarily increase the computational load and thus slow down the graphical update. The second argument will be discussed in a later use case.
The ProcInfo class has an internal reference to ProcDb which is the type that holds the process database. When ProcInfo::update
is called, it reads the information from the procfs (and stores it in ProcDb
. It also updates the sorting information ProcDb
(discussed later). These are steps 3 and 4 in the dataflow diagram above.
At this point we have left the discussion on the internals of the process database, as these will be discussed at a later pointer when discussing sorting.
Performing the graphical update involves a call to ScreenManager::refresh() which invokes refresh methods of its in-focus child UI object, similar calls are made recursively by the child (View
) and its child(ProcViewPanelData
) object. One might wonder, how does the data reach the UI panel object? ProcViewPanel has a reference to the process database, which it accesses when ProcViewPanel::refresh() is invoked by the parent View
object. Therefore, data flows from ProcDb
to the ProcViewPanel
, and from the ProcViewPanel
to its constituent panels which actually display the data. These are steps 5,6 & 7 in the dataflow diagram.
This is a generic use case and encapsulates all events where a user provides a key input. The myriad possibilities for all the specific keys are not considered here, instead details are provided about mechanisms - how keys are recognized, how actions associated with them are invoked and how certain key inputs change the context in a given UI object.
All key inputs are processed by the key input activity loop. Most of the time, this thread is waiting for key input. Once a key input is received, the received key is passed to the ScreenManager::resolveKey method which blocks until the key is resolved at all UI levels.
Each UI object handles the key in two ways
-
it resolves the key using its own
KeyDictionary
state machine. The key dictionary state machine resolves the key by looking it up in the current key dictionary and sequentially invoking all theAction
objects using theMemFuncPDict
class object. It should be mentioned that a few actions are compulsorily invoked - updating the state of the UI object's key dictionary and child object state machine. -
it passes the key to in-focus child object. In case of
ScreenManager
, the key is passed to theView
object state machine, which will pass it to the current child object (which has the focus).
These set of calls are made recursively by the UI objects down the heirarchy and any action at all the UI levels are completed. If at any UI object level, the key is not recognized, it is simply ignored, but it is still passed to the UI object one level below (if one exists).
Fig. 13Dataflow during passing of key value down the UI heirarchy**. ScreenManager::resolve
(step 2) passes key information to ScreenManager
's key dictionary, which resolves it and passes on the list of Action
s to the callback dictionary MemFuncPDict (step 3) for invoking functions associated with those Action
s. ScreenManager:resolve
also passes on the key to its in-focus child UI object (step 4).
This use case is a specific instance of the more generic key input use case described above. In this use case, the focus is on understanding how exchange of data occurs between two UI objects such as two panels that hold the property names. Specifically, we consider the case of adding new process properties to Active Properties
panel, so that they can be displayed on the ProcViewPanel
Fig. 14Process Settings View. White entry indicates cursor position, Blue entry indicates that panel is in-focus as well as cursor position
As shown in the Fig. 14 above, once the user has navigated to Settings
view and selected the property in the right column (All Properties
panel), pressing ENTER key will initiate this use case. Assuming all the steps 1-6 in Fig. 13 have taken place, the focus is on understanding the specifics of the steps 7-8-9, in context of this use case. In step 7, the key is passed onto All Properties
panel. It resolves it by passing it onto its key dictionary (step 8 in Fig. 13) which invokes insertIntoLeftNbr. This method, access the references to the left neighbour of All Properties
panel, and passes the property string (of the selected entry) to the left neighbor using the neighbor's insert(string) API. The neighboring panel then dutifully inserts the passed string at the current cursor position. Note that the insert
API is not stable, and will probably be changed in the future.
Alternative implementation
At the View
level (step 4-5-6 of Fig. 13), read the in-focus panel's selected entry using some API, and then insert that entry into its left neighbor using some other API. In this scenario, the individual panels need not hold references to their left and right neighbors. It makes more sense, as a panel should not be worrying about who its neighbor is, instead, the View
class that stores the panels has the knowhow about how the panels' within it are arranged. This implementation requires one to handle callbacks of the form (void *func)(string)
instead of one without any arguments, like (void *func)()
in the current implementation.
This use case describes the sequence of events once a user forces the ProcViewPanel
to display the property values in a sorted order, based on the selection of a specific property (provided using a mouse click). The handling of mouse input has been described above.
In brief, the input processing logic recognizes the type of mouse button pressed, if its a left button single click, it will determines its position on the ProcViewPanel
and pass the button-click type and position to ProcViewPanel
. The ProcViewPanel
will then determine the column in which it lies, this determines the property according to which the sorting must be done. The ProcViewPanel
will store this property name, to be accessed at a later time. The sorting property name value in ProcViewPanel
is also initialized at application startup to some predefined value, so that even without the user input, there is a default sorting criteria.
Once the sorting property has been determined, ProcViewPanel
calls upon the Columns::read method. You might recall that this method is also called during the periodic refresh use case. As it turns out, sorting and database update are intertwined i.e. one cannot happen without the other. The reasons for having this arrangement are as follows
Arguments for combining sorting and data reload
The design decision to be made is that when a user request changing a sorting criteria, should we sort existing data or reload new data and then sort. It makes sense to reload data to present the latest information to the user. It would make little sense if the graphical updates are happening frequently, however if lets say, updates happen every 5-10 seconds, the user will not be satisfied with stale information, they would much prefer both sorted and fresh information. This means that sorting implies data reload.
There is also a compelling argument for the opposite case i.e. whenever a data reload is performed, sorting must be done as well. Imagine the case where sorting is being done according to PID. One must read all numeric directories in /proc
directory, then iterate over the sub-directories using Linux API calls. However, such calls do not guarantee that sub-directories will be read in a sorted order, or even consistently in the same order. The Linux Programming Interface by Kerrisk, 2nd says
The filenames returned by readdir() are not in sorted order, but rather in the order in which they happen to occur in the directory (this depends on the order in which the file system adds files to the directory and how it fills gaps in the directory list after files are removed). (The command ls –f lists files in the same unsorted order that they would be retrieved by readdir().)
Fig. 15Invalidation of sorting after the 2nd read
Even if readdir
returned sub-directories in the same order, processes are being added and removed. As shown in Fig. 15 if 1 process were added and other removed, the 2nd read would return data in invalidated order. The only anti-dote is to sort everytime a data reload is performed. This means data reload implies sorting.
Therefore whenever we perform sorting or data reload, we perform the other activity as well.
Dataflow during sorting
The diagram below describes the missing data flow information during periodic refresh.
Fig. 16 Dataflow during sorting
ProcViewPanel
passes the sortkey (property name according to which sorting needs to be done) to Columns when it invokes the Columns::read()
method. The Columns:read()
method passes the sortkey (along with property strings vector) to ProcInfo::update. This results in fresh process data getting acquired (step 3) from procfs, this information is stored in ProcDb
(step 4), then a sort is performed on the read values (step 5), and an sorted index vector is stored back in ProcDb
(step 6). The sorted index vector is stored in ProcDb
because the UI objects will access the ProcDb
(step 6, Fig. 16), and will use the vector to parse the ProcDb
data in the sorted order.