About

Chalk supports collecting a wide range of metadata. The type of data to be collected, the directives that define when collection will happen, as well as the specification of where the collected metadata will be getting sent to is defined in chalk configurations (configs). In this section, we will cover the core components of a chalk config, and how they come together.

Reports

Reports are at the core of chalk, because ultimately it is reports that are being accessed by users. We ask chalk to collect metadata we care about, and that metadata always ends up in a report, which is always in JSON format.

Think of the report as a document or binary object, which is sent to an output destination: it can be embedded in an artifact (e.g., injected in an executable), sent to a web endpoint, stored to a local or remote filesystem etc.

Chalk Report

A report might be getting emitted under different conditions - most often this is done during core chalk operations, such as an insert, exec, etc. but reports can also be configured to be getting emitted periodically or when a condition is met.

How exactly we can configure reports to be emitted, and what data ends being part of a report is discussed in the sections below.

Templates

The exact metadata that will be getting included in a report are defined in templates which are simply collections of metadata keys (with optional conditions on when said metadata should be getting emitted). The same template can be re-used across many reports, however each of the different reports making use of the template could have different trigger/generation conditions and different destinations.

Here is an excerpt from the template used by default for any metadata extracted upon a chalk insert operation:


report_template insertion_default {
  shortdoc: "The default template for insertion operations"
  ...
  if not in_container() {
    key._OP_ALL_PS_INFO.use                   = false
  }
  key.CHALK_VERSION.use                       = true
  key.DATE_CHALKED.use                        = false
  key.TIME_CHALKED.use                        = false
  key.TZ_OFFSET_WHEN_CHALKED.use              = false
  key.DATETIME_WHEN_CHALKED.use               = false
  key.EARLIEST_VERSION.use                    = false
  ...
  # Runtime host keys.
  key._ACTION_ID.use                          = true
  key._ARGV.use                               = true
  key._ENV.use                                = true
  key._TENANT_ID.use                          = true
  key._OPERATION.use                          = true
  key._TIMESTAMP.use                          = true
  ...
}

We define a report template using the report_template type definition, followed by the template name (in this case insertion_default). We notice that the template contains definitions about what metadata keys to export (set to true), and which to avoid (set to false) and under which conditions. For instance, if we are not within a docker container, _OP_ALL_PS_INFO metadata will not be getting emitted. For the purposes of this guide, you do not need to worry about what the individual metadata keys are, or the differences between naming conventions (e.g., keys starting with _ vs not). All you need to know is that we can define if we care about them inside templates.

You can read more about metadata keys and their semantics or restrictions in the metadata reference.

Chalkmarks

Chalkmarks are always embedded in an artifact (e.g., an ELF file, or docker container). Contrary to regular reports, there are restrictions on what metadata can be included in a chalkmark. In particular, no metadata that is collected at runtime (such as network connections) can be included in chalkmarks.

Templates that define what keys are included in a chalkmark have a special type of mark_template. For instance, here is the "minimal" mark_template which comes as a built-in with chalk:

mark_template minimal {
  shortdoc: "Used for minimal chalk marks."
  doc: """

This template is intended for when you're durably recording artifact
information, and want to keep just enough information in the mark to
facilitate other people being able to validate the mark.

This is the default for `docker` chalk marks.
"""
  key.DATETIME_WHEN_CHALKED.use               = true
  key.CHALK_PTR.use                           = true
  key.SIGNATURE.use                           = true
  key.INJECTOR_PUBLIC_KEY.use                 = true
  key.$CHALK_CONFIG.use                       = true
  key.$CHALK_IMPLEMENTATION_NAME.use          = true
  key.$CHALK_LOAD_COUNT.use                   = true
  key.$CHALK_PUBLIC_KEY.use                   = true
  key.$CHALK_ENCRYPTED_PRIVATE_KEY.use        = true
  key.$CHALK_ATTESTATION_TOKEN.use            = true
}

In chalk, metadata keys that start with an _ denote that the metadata is collected at runtime. For instance, _TIMESTAMP corresponds to the timestamp at the time of the chalk operation

Chalk Configurations

A chalk configuration is a collection of specifications that define when reports are to be created (what will be the condition for publishing the reports) and where reports are to be sent (what will be the sinks for the reports). Moreover, they contain information on what templates are to be used for the different reports.

Sinks

A report can be sent to one or more destinations (sinks), such as the local filesystem, an S3 bucket etc. For instance, the following snippet defines a sink named log_file_sink, which denotes that reports sent to it will be getting stored in local disk at "./test_sink.log"

sink_config log_file_sink {
  sink: "file"
  filename: "./test_sink.log"
}

For more information on the types of the sinks supported see the output configuration documentation,

Virtually all output in Chalk is handled through a 'pub-sub' (publish-subscribe) model. Chalk actions "publish" data to "topics", then sinks listen ("subscribe") on those topics. For instance, to send all reports to your newly created log_file_sink you may specify

subscribe("report", "log_file_sink")

Chalk comes with a set of sinks already configured for both chalkmarks and reports, and different chalk operations send data to different sinks by default.

For a full list of what sinks are active for the different chalk operations see here.

Writing a custom config with a custom template

Let's write a config that uses two templates to send data to two different sinks: One template is used to send data to an S3 bucket in AWS and another template is used to populate data in a rotating file log in the local filesystem. We only want to send to S3 upon an exec and send all available information to it (using the builtin report_all template, whereas we only want to send a select set of information to the local filesystem upon a build and insert operation. This is depicted in the following figure:

Custom Template

The configuration achieving the above is the following:

# suppress stdout logs unless there is an error
log_level: "error"

# disable terminal output
custom_report.terminal_chalk_time.enabled: false
custom_report.terminal_other_op.enabled: false

# disable writing to default log
unsubscribe("report", "default_out")

# minimal report template
report_template report_localdisk {
  key.CHALK_VERSION.use                       = true
  key.DATETIME_WHEN_CHALKED.use               = true
  key.HOSTINFO_WHEN_CHALKED.use               = true
  key.NODENAME_WHEN_CHALKED.use               = true

  key._DATETIME.use                           = true
  key._CHALKS.use                             = true
  key._OP_ERRORS.use                          = true

  key.CHALK_ID.use                            = true
  key.PATH_WHEN_CHALKED.use                   = true
  key.ARTIFACT_TYPE.use                       = true
  key.OLD_CHALK_METADATA_ID.use               = true
  key.EMBEDDED_CHALK.use                      = true
  key.METADATA_ID.use                         = true
  key.DOCKER_FILE.use                         = true
  key.DOCKERFILE_PATH.use                     = true
  key.DOCKER_LABELS.use                       = true
  key.DOCKER_TAGS.use                         = true
  key._CURRENT_HASH.use                       = true
  key._VIRTUAL.use                            = true
  key._IMAGE_ID.use                           = true
  key._INSTANCE_CONTAINER_ID.use              = true
  key._INSTANCE_CREATION_DATETIME.use         = true
  key._REPO_TAGS.use                          = true
}

sink_config s3_sink_config {
  enabled: true
  sink:    "s3"
  uri:     env("AWS_S3_BUCKET_URI")
  secret:  env("AWS_SECRET_ACCESS_KEY")
  uid:     env("AWS_ACCESS_KEY_ID")
}

# set up a custom template for saving information locally
sink_config chalk_log_file {
  sink: "rotating_log"
  enabled: true
  max: <<10mb>>
  filename: "/tmp/chalk_insert_build"
}

custom_report chalk_localdisk_logger {
  report_template: "report_localdisk"
  sink_configs: ["chalk_log_file"]
  use_when: ["insert", "build"]
}

custom_report chalk_s3_logger {
  report_template: "report_all"
  sink_configs: ["s3_sink_config"]
  use_when: ["extract"]
}

Notice that we have also suppressed local terminal output for the above report.

Updating the used templates

Often times you won't need to write a custom config, but simply overwrite the builtin configuration, changing the default output for a given chalk operation or updating the used templates. This is easy in con4m. For instance, the default output configuration for insert is as follows:

outconf insert {
  mark_template:          "mark_default"
  report_template:        "insertion_default"
}

If you want to use a "minimal" template for chalks inserted during an insert, all you need to specify in your config is

outconf insert {
  mark_template:          "mark_minimal"
  report_template:        "insertion_default"
}

and that will overwrite the defaults.

If you want to use your own custom template, that you defined in your config, you may use that as well. For instance, assuming we have a report_localdisk template as in the previous section, we can specify

outconf insert {
  mark_template: "minimal"
  report_template: "report_minimal"
}

Related Documentation and References

Beyond this document, there's an extensive amount of reference material for users:

NameWhat it is
Metadata ReferenceDetails what metadata Chalk can collect and report on, and in what circumstances
The Chalk Configuration Options GuideDetails properties you can set in Chalk's configuration file, if you choose to use it over our command-line configuration wizard
Output Configuration ReferenceShows how to set up sending reports wherever you like, using the config file.
Config File BuiltinsShows the functions you can call from within a configuration file