Overview
Programmers can prepare a program for deep profiling
by compiling it in a deep profiling grade such as asm_fast.gc.profdeep.
When a program compiled in a deep profiling grade is executed,
it builds a data structure containing profiling information,
and writes this out to a file called Deep.data at the end of execution.
Programmers can then browse the contents of these profiling data files
using the Mercury deep profiling tool, mdprof.
The structure of the profiling data file
The data structure written out to the Deep.data file, the profiling tree,
resembles a call graph.
It has four kinds of nodes:
CallSiteStatic, ProcStatic, CallSiteDynamic and ProcDynamic structures.
These four node types are consistently abbreviated as css, ps, csd and pd
throughout the deep profiling system,
including the names of structure fields.
The Deep.data file consists of a header and a sequence of nodes.
The header contains
- a string identifying the file as containing deep profiling data
- four integers giving the number of each kind of node in the data file
- an integer giving the number of profiling clock ticks per second
on the machine on which the profiling run was executed
- a count of the number of clock ticks
encountered within profiling instrumentation
- a count of the number of clock ticks
encountered outside profiling instrumentation
- the id of the ProcDynamic node representing the Mercury runtime system.
It has one callback call site, which has charged to it
the calls to library initialization procedure, to main/2
(or to other procedures if the top level of the program is in foreign code),
and to the library finalization procedure.
The following sequence contains nodes of all four types.
In the running program, these nodes refer to each other by pointers,
but in the process of writing them out we convert these pointers to node ids,
which are dense small integers starting at one.
- CallSiteStatic
-
CallSiteStatic structures are created by the compiler.
There is one CallSiteStatic structure for each call site in the source code.
CallSiteStatic structures contain the following fields:
-
MR_css_kind:
an indication of the nature of the call at that call site: first order call,
call through a procedure-valued variable, method call, or callback.
(Callback sites are invocations of foreign_proc goals
that may call back to Mercury.)
-
MR_css_callee_ptr_if_known:
if the call site is a first order call,
this field contains the id of the proc_static structure
of the procedure called from that call site.
Otherwise, it is not meaningful.
-
MR_css_type_subst_if_known:
if the call site is a first order call,
this field contains a text representation of the type variable binding(s)
required to unify the types of the actual parameters at the call site
and the formal parameters of the procedure called from that call site.
Otherwise, it is not meaningful.
-
MR_css_file_name:
the name of the file containing the call site,
or (for the sake of compactness) an empty string
if, as usual, this file name is the same as
the file name associated with the procedure containing the call site.
-
MR_css_line_number:
the line number on which the call site occurs.
-
MR_css_goal_path:
the goal path of the call goal within its procedure body.
- ProcStatic
-
ProcStatic structures are created by the compiler.
There is one ProcStatic structure for each procedure in the source code.
ProcStatic structures contain the following fields:
-
MR_ps_proc_id:
the identity of the procedure.
-
MR_ps_file_name:
the name of the file containing the procedure declaration.
-
MR_ps_num_call_sites:
the number of call sites within the procedure.
-
MR_ps_call_sites:
an array containing MR_ps_num_call_sites CallSiteStatic structures,
one for every call site within the procedure.
In the Deep.data file,
it contains an array of the node ids of those structures.
-
The MR_ps_outermost_activation_ptr and MR_ps_activation_count fields
are used only during runtime, and are omitted
when the ProcDynamic node is written out to the Deep.data file.
-
MR_ps_num_coverage_points:
the number of coverage points compiled into this procedure.
-
MR_ps_coverage_points_static:
an array containing static data information each of the coverage points.
-
MR_ps_coverage_points:
an array containing coverage point counts for each of the coverage points.
- CallSiteDynamic
-
CallSiteDynamic structures are created
by the instrumented program during a profiling run.
There will be one or more CallSiteDynamic structures
for each call site through which
the program actually performs a call during the profiling run.
For a given call site, there will be distinct CallSiteDynamic structures
for each distinct context in which those invocations take place.
-
MR_csd_callee:
the id of the ProcDynamic structure of the procedure called at that call site,
or zero if there were no calls through the given call site
in the context represented by this CallSiteDynamic structure
and all its ancestors.
-
MR_csd_own:
the measurements collected for the invocations of the called procedure
from the context represented by this CallSiteDynamic structure
and all its ancestors,
-
MR_csd_depth_count:
this field is used only during runtime, and is omitted
when the CallSiteDynamic node is written out to the Deep.data file.
- ProcDynamic
-
ProcDynamic structures are created
by the instrumented program during a profiling run.
There will be one or more ProcDynamic structures
for each procedure which is called during the profiling run.
For a given procedure, there will be distinct ProcDynamic structures
for each distinct context in which those calls take place.
-
MR_pd_proc_static:
gives the id of the ProcStatic structure of the procedure .
-
MR_pd_call_site_ptr_ptrs:
an array, whose size is given by the MR_ps_num_call_sites field
of the ProcStatic structure identified by the MR_pd_proc_static field.
Each element corresponds to a call site.
Elements corresponding to a first order call site
contain either the id of a CallSiteDynamic node
representing the call made from that call site
in the context represented by the ProcDynamic structure and its ancestors,
or zero if no such call was made.
Elements corresponding to other kinds of call sites
(higher order call, method call, callback)
have a list of the ids of zero or more CallSiteDynamic structures,
one for each different procedure that was called from that call site.
The Mercury deep profiling tool mdprof
The Mercury deep profiler consists of three programs.
One is the web browser of the user's choice:
this implements the user interface.
The other two are mdprof and mdprof_cgi.
- mdprof
-
This a simple shell script.
It is invoked by the web server in response to queries of the right form.
It does nothing more than set up the PATH environment variable
to contain the directory in which mdprof_cgi was installed,
and then invoke mdprof_cgi.
- mdprof_cgi
-
This is a Mercury program.
It is invoked once for every page displayed by the deep profiling system.
It is passed,
in the environment variable QUERY_STRING which is set by the web server,
an URL component containing the name of a profiling data file
and a query specifying which part of that data file is to be displayed.
mdprof_cgi checks whether a server process already exists
for the given profiling data file.
If the answer is yes, it passes the query to the server,
gets back the reply, gives it to the web server, and exits.
If the answer is no, it reads in the named profiling data file,
processes it to materialize information that is required by queries
but is stored in the profiling data file only implicitly,
and answers the query directly.
It then forks itself.
The parent exits to let the web server finish rendering the generated page.
The child process becomes a server process,
which goes into a loop awaiting queries.
When it gets a query from mdprof_cgi,
it answers the query and goes back to sleep.
It exits when it has not received a query for a set timeout period,
which by default is thirty minutes,
or when it receives a "query" telling it to shut down.
(Due to the timeout mechanism, shutting down the server explicitly
is not useful unless the profiling data file has changed,
the server has been recompiled,
or one wants to recover its space occupied by its virtual memory.)
-
The reason why we create the server process via a fork
instead of simply making the initial mdprof_cgi process the server process
is that the web server requires the program it invokes to exit
before it displays the page the program generates.
Doing without a server doesn't work
because we don't want have to read and process the deep profiling data file
for every page to be displayed,
since that takes a significant fraction of a minute.
The reason for the split between mdprof and mdprof_cgi
is to make it possible to specify some parameters of the deep profiler
without needing to recompile a Mercury program or even needing to know Mercury.
The elements of the interface between the client and server roles of mdprof_cgi
are documented in interface.m.
The client and server communicate via a pair of named pipes
whose names include a mangled form of the data file name.
(The mangling is required to replace any slashes in the name of the data file.)
The existence of these files serves as an approximation of a lock;
the idea is that they exist if and only if a server process
for that data file is alive and serving queries via those pipes.
mdprof_cgi creates a server process for the data file
if and only if these named pipes do not exist.
They are created only by mdprof_cgi transforms itself into a server,
and destroyed only when this server exits.
The two files are always created and destroyed together.
There are potential race conditions
both when the pipes are created and when they are destroyed.
It is possible for the web server to receive two requests
for a given data file in quick succession,
and it is possible that when the second invocation of mdprof_cgi
checks whether the pipes exist,
the first invocation of mdprof_cgi
has not yet forked itself off as a server process.
We avoid this by putting all code
that creates, destroys, or test the existence of the named pipes
inside a critical region protected by a lock on a mutex file.
Whichever invocation of mdprof_cgi gets the lock first will become the server;
any others will not be able to perform the test for the existence of a server
until after there is a server.
The other race condition involves a client arriving
between the time that the server gets the timeout signal
and the time that the server actually deletes the named pipes and exits.
To fix this, we make clients create a file indicating that they want a server
before they get the lock on the mutex file.
If the shutting-down server gets the mutex first,
it will abort the shutdown if it finds any want files around.
If it does not find any want files, it shuts down,
but because it holds the mutex lock throughout the process of shutdown,
no client can observe its decision process.
In the absence of the want files,
a server that got a timeout signal would have no decision to make.
It would therefore be possible for a client to arrive
and find that the named pipes exist,
without knowing that the server process is already committed to shutting down,
which can leave the client sending its query to a now nonexistent server.
Pipeline processing of deep profiler queries
As described above when the deep profiler starts it reads the deep
profiling data.
Processing is performed to make it easy to retrieve information from
this data, this results in a structure called 'deep'.
When a query arrives further processing is performed in several steps
before HTML is produced for the user.
First, create_report generates a report structure from the cmd and
deep structures.
This report reflects all the information that may be shown to
the user.
The report structure can also be used by other tools such as
mdprof_feedback to gather information to drive compiler optimisations.
The report structure can be used by the report_to_display predicate to
produce a display structure based on the user's display preferences.
The display structure is a format-neutral representation of the final
output.
Finally htmlize_display produces HTML output from the display structure.
To support a new report a developer should add that report to the
report type and the command for launching it to the cmd type.
They will need to add support to create_report, report_to_display,
string_to_maybe_cmd and query_to_string.
They may need to modify the display structure in order to support
displaying information of a different type (for instance, a
differently formatted number).
They should modify report_to_display for some existing reports to
create links or buttons that perform the new query.
The modules of the deep profiler
- mdprof_cgi.m
-
This file contains the program that is executed by the web server
to handle each web page request.
- mdprof_dump.m
-
This is the main module of a program
for dumping out the contents of Deep.data files,
for use in debugging.
- mdprof_test.m
-
This is the main module of a test program for checking that
all the web pages can be created without runtime aborts.
Its output can be gigabytes in size.
- mdprof_procrep.m
-
This is the main module of a test program used for reading and displaying the
byte-code representation of the procedures of a Mercury program.
- array_util.m
-
This module contains utility predicates for handling arrays.
- callgraph.m
-
This module constructs an explicit representation of the call graph,
so we can find its cliques.
- canonical.m
-
This module has code to canonicalize call graphs
(i.e. ensure that no clique contains
more than one ProcDynamic from a given procedure).
It also has code that uses canonicalization to merge two call graphs.
This module is not complete yet.
- cliques.m
-
This module allows you build a description of a directed graph (represented
as a set of arcs between nodes identified by dense small integers) and then
find the strongly connected components of that graph.
- conf.m
-
This module contains primitives whose parameters are decided by
the configure script. This module picks them up from the #defines
put into runtime/mercury_conf.h by the configure script.
- coverage.m
-
This module contains code that produces the coverage profiling reports. It also
infers coverage throughout a procedure based on partial information and
execution rules for Mercury programs.
- create_report.m
-
This module contains the create_report predicate which takes a command
and preprocessed deep profiling data and creates a report
data-structure.
- dense_bitset.m
-
This module provides an ADT for storing dense sets of small integers.
The sets are represented as bit vectors, which are implemented as arrays
of integers. This is used by cliques.m.
- display.m
-
This module defines the display structure.
This structure represents information to be displayed to the user.
The information in a display structure is format-neutral.
- display_report.m
-
This module contains the report_to_display predicate.
This predicate takes a report structure and produces a display
structure.
- dump.m
-
This module provides a mechanism for dumping out some of the deep profiler's
data structures for debugging.
- exclude.m
-
This module implements contour exclusion,
which is a mechanism for propagating measurements
from regions in the call graph below a given line (the contour) to that line.
- html_format.m
-
This module contains code that creates HTML output from a display
structure for use by mdprof_cgi.
- interface.m
-
This module defines the type of the commands sent from clients to servers,
as well as utility predicates for manipulating commands and responses.
- io_combinator.m
-
This module a set of I/O combinators for use by read_profile.m.
- measurements.m
-
This module defines the data structures
that store deep profiling measurements
and the operations on them.
- measurement_units.m
-
This module defines data types and predicates for various units of
measurement. Including percentages and time.
- profile.m
-
This file defines the main data structures of the server,
and predicates for accessing them.
- program_representation_utils.m
-
This module provides predicates that operate on program representation
structures, including formatting such structures as text.
- query.m
-
This module contains the top level predicates for servicing individual queries.
- read_profile.m
-
This module contains code for reading in a deep profiling data file.
- report.m
-
This module contains the report structure.
A sub-structure is defined for each type of report that may be
generated.
The report structure represents the information contained in a report
in a format that is easy to generate, and easy for a computer program
to analyse.
This module also contains common structures that multiple reports make
use of.
- startup.m
-
This module contains the code
for turning the raw list of nodes read in by read_profile.m
into the data structure that the server needs
to service requests for web pages.
- timeout.m
-
This module implements the timeouts that mdprof_sgi uses
to shut down after it hasn't received any queries for a while.
- top_procs.m
-
This module contains code to find the top procedures by several criteria.
- util.m
-
This module defines utility predicates for both mdprof_cgi and mdprof_server.
- var_use_analysis.m
-
This module contains predicates for analysing how soon or late a variable is
used (produced or consumed) by a procedure.