The design - scons.1086193.n5.nabble.com



Parts Standard Streams designThis document is to help pass knowledge about the current design of how to deal with stream handling to help deal with various issues that happen within Scons during -j based build. There are pros and cons to the approach used, and there are areas that can be improved. I make no claim I have the perfect solution, only one that has shown value and can be improved. I tried to start a rework of it in the gtest.py code Parts used for gold testing, however I have not made much progress on that front as of yet.The designThe core of the design is for Parts to override the “SPAWN” object in SCons to allow us to use SubProcess to grab the stderr and stdout streams, as well as map the python sys.stdout and sys.stderr. Parts then take this data and map it back to the sys.__stdout and sys.__stderr. By doing this Parts can prevent a host of issues that happen, such as the mixing of text which prevents the output from a -j based build from providing any value due to parallel outputs getting interleaved. It also allows Parts to apply coloring, define virtual streams (such as “verbose” or “trace”), do logging at a global level as well as at per-component level (note that Parts add this idea of “components”, i.e. some kind of node grouping, so this might not map back to SCons in the current form the same way, as SCons does not allow this notion at the moment).Above is the general flow of how Parts deal with input to output. Parts have to deal with all possible output, including that from SCons print statements or general users using print inside there build files. To do this Parts define two simple Streamer objects that are mapped to sys.stdout or sys.stderr. This data is all mapped to the reporter object. I will explain this object later, but for now it is command and control object that all output goes through to get processed and mapped correctly. Other inputs come from Parts.api.output module. The functions here are designed to be Python 3.x print() like, and allow Parts to have a portable API between python 2.7 and 3.x, as well as calls to deal with various virtual streams (more on that later). The last way input comes in is from the most common form of inputs which are the build tasks. This has the most challenges as the tasks are running in parallel, and depending on the system the streams are processed in slight different and annoying ways in how the buffered and non-buffer data is handled. The other tricky issue here is that different tools output data in different ways. In general most tools output a whole line at a time, and in general most tools that have multiple lines that should be grouped together tend to have those lines start with some space character.This lead to the point of what the part_logger deals with. When we spawn a task Parts define a special object to allow us to redirect stdout and stderr for that command to the part_logger object for a given component. Given a component has its own environment, we have one Spawn object per component. This makes the mapping simpler. Each time we spawn an object we spawn two threads that read the input from stdout and stderr to the part logger. The part logger once it gets this data caches the data till it is at a point to output the data to reporter and/or the log file for that component. This is now where things can get odd. The part_logger currently has two different strategies on how it buffers data. Keep in mind the point of this buffering it help keep the data grouped so when the user sees it, it looks correct and is in the same order as if we did not touch it. To do this the part logger will buffer data in a blocking and less-blocking way. For each case there are slight different rules in which the part logger object will push data out of its buffer.The less-blocking way will push data when:Logger has incoming data from a different stream than was previously stored,iI.e. it will output any text we are storing from stdout when some data came in from stderr.We see a new line.The task is finished.The blocking way will push data when:Logger has incoming data from a different stream, i.e. it will output any text we are storing from stdout when some data came in from stderr.We have some data that came in on a new line that did not start with some space character. This means this line should not be grouped. So it should be fine output in a parallel environment, or so we hope. (this also means that if a new line comes that starts with some space character we’re just adding it to internal buffer unlike in “less-blocking way”)The task is finished.The less-blocking way works great on –j1 based builds which the blocking way works great on –j2 or higher based builds. We had an option that allowed the text to be cached until the task was finished then we would output data for the whole task. We removed this as it had two issues.Memory footprint for large build and large –J values was too high to store all the text. Commands that called stuff such as “make” to build something would often have large chucks of text that would take a long time to see. This delay was too long for developers to see anything. In the case of makefile for example an issue might happen, but the developer would not see it till the task was done, so they could not cancel the build but instead wasted time waiting.I should also point out the main con to the part logger as well. It is that some systems, for example testing tools, output data in real time, but not in a per-line way. This mean we don’t show the “animated text”. This issues is difficult to deal with when mapping text to log files, or more important when reading in data from stdin and stderr of a given task. Linux, Windows and Mac are all different enough to make it difficult to get constant (cross-platform) behavior. The other trick is that I have had a hard time getting anything but a readline() to work correctly on all platforms when reading in data from the streams. If that could be fixed, and I am sure it should be possible, then the more realtime case should be easy to get to work in the –j1 based builds.Last note on the part_logger: I wanted to refactor it to allow a command line switch to control the type of logfile used. Currently I only have a text logger, but I would like the options for other fancy stuff such as html logger. The current implementation allows this but you have to set the objects to certain Environment variables manually in the build files. Last but not least it was found we have to write files in an append mode. The reason is that we found with our really large build we would run out of allowed open file handles on Linux and Windows. We have been forced to open file, write, close and repeat as we write out data to prevent system failures from stopping the build.The reporter object allows us to connect, filter, configure and manage most of the objects we need to process the streams text. I choose for various reasons to define some new streams. The reason for that was to allow a consistent and controllable way to deal with all output. It also allows a nice way to separate text streams coming from the build system from text coming from the build tasks. Streams we have in Parts are:Stdout – this is the classic stdoutStderr – this is the classic stderr. It is also used for anything that we believe to be an error. In hindsight I would add a stderror stream, to separate this from stderr of a task, and that of any text we know is an error. This gets mapped to the color red in the common case, and some tools output normal text to stderr. This confuses people as they want any red they see to mean something bad has happened.stdwrn – this is for text that is a warning. stdmsg – this is any normal print-like text you might get from Parts or Scons. stdverbose – this is any extra text that might be helpful. In Parts this only shows up with verbose option being set. The API allows for a group value to be used with the text, which allows the reporter to filter different verbose stream from each other, which help when you want to limit the text being displayed for a given issue. Stdtrace – I could rename this to stddebug I think. It is basically for text that we want dumped for debugging or tracing behavior. This differs from verbose as it usually contains a lot of text, and is not user friendly. Ideally it’s used in cases like tracing taskmaster logic, or subst() mapping logic. Cases like these are not ideal to dump the full data to a verbose stream. Like verbose it can be grouped to some value to control which text is dumped via the trace option.Console – this is the direct console stream. The con device or the tty device depending on platform. This is used do stuff like progress indicators or animation, such as the spinner we show during the build stage. This stream is not logged via the logger. It can be remapped to a different stream to allow it to be logged.Each stream can have a color assigned via the color option. The reporter will try to remap incoming data from stderr or stdout to stdwrn, stderr (should be stderror), stdmsg streams based on some filters we have defined which are looking for certain patterns such as “scons:“, “parts:” for stdmsg or “warning:” for stdwrn. These filters are generally defined via a regular expression.The other thing the reporter does with the stream is it adds some formatting. When calls to PrintError() are made in Parts the reporter will add some text to help format the message as an Error message that adds file line information. This makes it easy to define structured messages without having to copy-paste the structure throughout the code base.As noted the Reporter handles most command line options related to the output of text and setups various objects and maps streams at startup and shutdown. Objects such as “all” logger (the log file of everything) are handled by it. When the reporter writes data to a virtual stream it will also write that same data to the logger object. This allows data to be written out the same way we would see it on the screen. One common issue on all systems, and Linux seems to be worse with this, is that if the user redirects the output to a file on the command line, it seems that the stderror text will be written in a very different way that it would be compared to stderror mapped to the screen. The log file is controlled via the log options and can be set to output in text, html, or rtf, or some other custom format. It should be noted the HTML and RTF cases should be cleaned up.When text is outputted it is given to the Console object which is the object that defines the different stream objects. The Console creates different stream writer object for each stream. Each of these objects can be assigned a color. The stream writer will add color via ANSI codes (in the case of Windows it will translate the codes to WinAPI to change the color on the console). This works great for Windows and Linux. However it was found that on Mac the ANSI codes don’t seem to work correctly and at times the text will “disappear”. The current work around is to turn coloring off manually with color option. I also have seen some odd hiccup on Windows Server 2012 at times, in which is seems the color was not set/reset correctly.I believe this covers the main stuff. The Parts document has more detail about a given command line options and other variables that control stuff related to the outputting of text. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download