Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Linux Tools Project/TMF/CTF guide

< Linux Tools Project
Revision as of 18:44, 21 August 2012 by Unnamed Poltroon (Talk) (in progress)

This article is a guide about using the CTF component of Linux Tools (org.eclipse.linuxtools.ctf.core). It targets both users (of the existing code) and developers (intending to extend the existing code).

CTF general information

This section discusses the CTF format.

What is CTF?

CTF (Common Trace Format) is a generic trace binary format defined and standardized by EfficiOS. Although EfficiOS is the company maintaining LTTng (LTTng is using CTF as its sole output format), CTF was designed as a general purpose format to accommodate basically any tracer (be it software/hardware, embedded/server, etc.).

CTF was designed to be very efficient to produce, albeit rather difficult to decode, mostly due to the metadata parsing stage and dynamic scoping support.

CTF anatomy

This article does not cover the full specification of CTF; the official specification already do this.

Basically, the purpose of a trace is to record events. A CTF trace, like any other trace, is thus a collection of events data.

Here is a CTF trace:

Ctf structure.png

Stream files

A trace is divided into streams, which may span over multiple stream files. A trace also includes a metadata file which is covered later.

There can be any number of streams, as long as they have different IDs. However, in most cases (at least at the date of writing this article), there is only one stream, which is divided into one file per CPU. Since different CPUs can generate different events at the same time, LTTng splits its only stream into multiple files. Please note: a single file cannot contain multiple streams.

In the image above, we see 3 stream files: 2 for stream with ID 0 and a single one for stream with ID 1.

A stream "contains" packets. This relation can be seen the other way around: packets contain a stream ID. A stream file contains nothing else than packets (no useful data before, between or after packets).

Packets

A packet is the main container of events. Events data cannot reside outside packets. Sometimes a packet may contain only one event, but it's still inside a packet.

Every packet starts with a small packet header which contains stuff like its stream ID (which should always be the same for all packets within the same file) and often a magic number. Immediately following is a mandatory packet context. This one usually contains even more stuff, like the packet size and content size in bits, the time interval covered by its events.

Then, events, one after the other. How do we know when we reach the end of the packet? We just keep the current offset into the packet until it's equal to the content size defined into its context.

Events

An event isn't just a bunch of payload bits. We have to know what type of event it is, and sometimes other things. Here's the structure of an event:

Ctf event structure.png

The event header contains the time stamp of the event and its ID. Knowing its ID, we know the payload structure.

Both contexts are mandatory. The per-stream context exists in all events of a stream if enabled. The per-event context exists in all events with a given ID (within a certain stream) if enabled. Thus, the per-stream context is enabled per stream and the per-event context is enabled per (stream, event type) pair.

Please note: there is no stream ID written anywhere in an event. This means that an event "outside" its packet is lost forever since we cannot know anything about it. This is not the case of a packet: since it has a stream ID field in its header, a packet is independent and could be cut and paste elsewhere.

Metadata file

The metadata file (must be named exactly metadata if stored on a filesystem) describes all the trace structure. This means the CTF format is auto-described.

When "packetized" (like in the CTF structure image above), a metadata packet contains an absolutely defined metadata packet header (defined in the official specification) and no context. The metadata packet does not contain events: all its payload is a single text string. When concatening all the packets payloads, we get the final metadata text.

In its simpler version, the metadata file can be a plain text file containing only the metadata text. This file is still named metadata. It is valid and recognized by CTF readers. The way to differentiate the packetized from the plain text version is that the former starts with a magic number which has "non text bytes". In fact, it is the magic number field of the first packet's header. All the metadata packets have this required magic number.

JSON CTF

As a means to keep test traces in a portable and versionable format, a specific scheme of JSON was developed in summer 2012.

JSON is a lightweight text format used to define complex objets, with only a few data types that are common to all data types: objects (aka maps, hashes, dictionaries, property lists, key/value pairs) with ordered keys, arrays (aka lists), Unicode strings, integer and floating point numbers (no limitation on precision), booleans and null.

Here's a short example of a JSON object showing all the language features:

{
    "firstName": "John",
    "lastName": "Smith",
    "age": 25,
    "male": true,
    "carType": null,
    "kids": [
        "Moby Dick",
        "Mireille Tremblay",
        "John Smith II"
    ],
    "infos": {
        "address": {
            "streetAddress": "21 2nd Street",
            "city": "New York",
            "state": "NY",
            "postalCode": "10021"
        },
        "phoneNumber": [
            {
                "type": "home",
                "number": "212 555-1234"
            },
            {
                "type": "fax",
                "number": "646 555-4567"
            }
        ]
    },
    "balance": 3482.15,
    "lovesXML": false
}

A JSON object always starts with {. The root object is unnamed. All keys are strings (must be quoted). Numbers may be negative. You may basically have any structure: array of arrays of objects containing objects and arrays.

Even big JSON files are easy to read, but a tree view can always be used for even more clarity.

Why not using XML, then? From the official JSON website:

  • Simplicity: JSON is way simpler than XML and is easier to read for humans, too. Also: JSON has no "attributes" belonging to nodes.
  • Extensibility: JSON is not extensible because it does not need to be. JSON is not a document markup language, so it is not necessary to define new tags or attributes to represent data in it.
  • Interoperability: JSON has the same interoperability potential as XML.
  • Openness: JSON is at least as open as XML, perhaps more so because it is not in the center of corporate/political standardization struggles.

In other words: why would you care closing a tag already opened with the same name in XML while it can be written only once in JSON? Compare both:

<cart user="john">
    <item>asparagus</item>
    <item>pork</item>
    <item>bananas</item>
</cart>
{
    "user": "john",
    "items": [
       "asparagus",
       "pork",
       "bananas"
    ]
}

Back to the top