Skip to content

elastic/ecs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WARNING: THIS IS WORK IN PROGRESS

Elastic Common Schema (ECS)

The Elastic Common Schema (ECS) defines a common set of fields for ingesting data into Elasticsearch. A common schema helps you correlate data from sources like logs and metrics or IT operations analytics and security analytics.

ECS is still under development and backward compatibility is not guaranteed. Any feedback on the general structure, missing fields, or existing fields is appreciated. For contributions please read the Contributing Guide.

The current version of ECS is 0.1.0.

In this readme

Fields

ECS defines these fields.

Base fields

The base set contains all fields which are on the top level. These fields are common across all types of events.

Field Description Level Type Example
@timestamp Date/time when the event originated.
For log events this is the date/time when the event was generated, and not when it was read.
Required field for all events.
core date 2016-05-23T08:05:34.853Z
tags List of keywords used to tag each event. core keyword ["production", "env2"]
labels Key/value pairs.
Can be used to add meta information to events. Should not contain nested objects. All values are stored as keyword.
Example: docker and k8s labels.
core object {'application': 'foo-bar', 'env': 'production'}
message For log events the message field contains the log message.
In other use cases the message field can be used to concatenate different values which are then freely searchable. If multiple messages exist, they can be combined into one message.
core text Hello World

Agent fields

The agent fields contain the data about the agent/client/shipper that created the event.

Field Description Level Type Example
agent.version Version of the agent. core keyword 6.0.0-rc2
agent.name Name of the agent. core keyword filebeat
agent.id Unique identifier of this agent (if one exists).
Example: For Beats this would be beat.id.
core keyword 8a4f500d
agent.ephemeral_id Ephemeral identifier of this agent (if one exists).
This id normally changes across restarts, but agent.id does not.
extended keyword 8a4f500f

Examples: In the case of Beats for logs, the agent.name is filebeat. For APM, it is the agent running in the app/service. The agent information does not change if data is sent through queuing systems like Kafka, Redis, or processing systems such as Logstash or APM Server.

Cloud fields

Fields related to the cloud or infrastructure the events are coming from.

Field Description Level Type Example
cloud.provider Name of the cloud provider. Example values are ec2, gce, or digitalocean. extended keyword ec2
cloud.availability_zone Availability zone in which this host is running. extended keyword us-east-1c
cloud.region Region in which this host is running. extended keyword us-east-1
cloud.instance.id Instance ID of the host machine. extended keyword i-1234567890abcdef0
cloud.instance.name Instance name of the host machine. extended keyword
cloud.machine.type Machine type of the host machine. extended keyword t2.medium
cloud.account.id The cloud account or organization id used to identify different entities in a multi-tenant environment.
Examples: AWS account id, Google Cloud ORG Id, or other unique identifier.
extended keyword 666777888999

Examples: If Metricbeat is running on an EC2 host and fetches data from its host, the cloud info contains the data about this machine. If Metricbeat runs on a remote machine outside the cloud and fetches data from a service running in the cloud, the field contains cloud data from the machine the service is running on.

Container fields

Container fields are used for meta information about the specific container that is the source of information. These fields help correlate data based containers from any runtime.

Field Description Level Type Example
container.runtime Runtime managing this container. extended keyword docker
container.id Unique container id. core keyword
container.image.name Name of the image the container was built on. extended keyword
container.image.tag Container image tag. extended keyword
container.name Container name. extended keyword
container.labels Image labels. extended object

Destination fields

Destination fields describe details about the destination of a packet/event.

Field Description Level Type Example
destination.ip IP address of the destination.
Can be one or multiple IPv4 or IPv6 addresses.
core ip
destination.hostname Hostname of the destination. core keyword
destination.port Port of the destination. core long
destination.mac MAC address of the destination. core keyword
destination.domain Destination domain. core keyword
destination.subdomain Destination subdomain. core keyword

Device fields

Device fields are used to provide additional information about the device that is the source of the information. This could be a firewall, network device, etc.

Field Description Level Type Example
device.mac MAC address of the device core keyword
device.ip IP address of the device. core ip
device.hostname Hostname of the device. core keyword
device.vendor Device vendor information. core keyword
device.version Device version. core keyword
device.serial_number Device serial number. extended keyword
device.type The type of the device the data is coming from.
There is no predefined list of device types. Some examples are endpoint, firewall, ids, ips, proxy.
core keyword firewall

Error fields

These fields can represent errors of any kind. Use them for errors that happen while fetching events or in cases where the event itself contains an error.

Field Description Level Type Example
error.id Unique identifier for the error. core keyword
error.message Error message. core text
error.code Error code describing the error. core keyword

Event fields

The event fields are used for context information about the data itself.

Field Description Level Type Example
event.id Unique ID to describe the event. core keyword 8a4f500d
event.category Event category.
This can be a user defined category.
core keyword metrics
event.type A type given to this kind of event which can be used for grouping.
This is normally defined by the user.
core keyword nginx-stats-metrics
event.action The action captured by the event. The type of action will vary from system to system but is likely to include actions by security services, such as blocking or quarantining; as well as more generic actions such as login events, file i/o or proxy forwarding events.
The value is normally defined by the user.
core keyword reject
event.module Name of the module this data is coming from.
This information is coming from the modules used in Beats or Logstash.
core keyword mysql
event.dataset Name of the dataset.
The concept of a dataset (fileset / metricset) is used in Beats as a subset of modules. It contains the information which is currently stored in metricset.name and metricset.module or fileset.name.
core keyword stats
event.severity Severity describes the severity of the event. What the different severity values mean can very different between use cases. It's up to the implementer to make sure severities are consistent across events. core long 7
event.original Raw text message of entire event. Used to demonstrate log integrity.
This field is not indexed and doc_values are disabled. It cannot be searched, but it can be retrieved from _source.
core keyword Sep 19 08:26:10 host CEF:0|Security| threatmanager|1.0|100| worm successfully stopped|10|src=10.0.0.1 dst=2.1.2.2spt=1232
event.hash Hash (perhaps logstash fingerprint) of raw field to be able to demonstrate log integrity. extended keyword 123456789012345678901234567890ABCD
event.version The version field contains the version an event for ECS adheres to.
This field should be provided as part of each event to make it possible to detect to which ECS version an event belongs.
event.version is a required field and must exist in all events. It describes which ECS version the event adheres to.
The current version is 0.1.0.
core keyword 0.1.0
event.duration Duration of the event in nanoseconds. core long
event.created event.created contains the date when the event was created.
This timestamp is distinct from @timestamp in that @timestamp contains the processed timestamp. For logs these two timestamps can be different as the timestamp in the log line and when the event is read for example by Filebeat are not identical. @timestamp must contain the timestamp extracted from the log line, event.created when the log line is read. The same could apply to package capturing where @timestamp contains the timestamp extracted from the network package and event.created when the event was created.
In case the two timestamps are identical, @timestamp should be used.
core date
event.risk_score Risk score or priority of the event (e.g. security solutions). Use your system's original value here. core float
event.risk_score_norm Normalized risk score or priority of the event, on a scale of 0 to 100.
This is mainly useful if you use more than one system that assigns risk scores, and you want to see a normalized value across all systems.
extended float

File fields

File fields provide details about each file.

Field Description Level Type Example
file.path Path to the file. extended keyword
file.target_path Target path for symlinks. extended keyword
file.extension File extension.
This should allow easy filtering by file extensions.
extended keyword png
file.type File type (file, dir, or symlink). extended keyword
file.device Device that is the source of the file. extended keyword
file.inode Inode representing the file in the filesystem. extended keyword
file.uid The user ID (UID) or security identifier (SID) of the file owner. extended keyword
file.owner File owner's username. extended keyword
file.gid Primary group ID (GID) of the file. extended keyword
file.group Primary group name of the file. extended keyword
file.mode Mode of the file in octal representation. extended keyword 416
file.size File size in bytes (field is only added when type is file). extended long
file.mtime Last time file content was modified. extended date
file.ctime Last time file metadata changed. extended date

Geo fields

Geo fields can carry data about a specific location related to an event or geo information for an IP field.

Field Description Level Type Example
geo.continent_name Name of the continent. core keyword
geo.country_iso_code Country ISO code. core keyword
geo.location Longitude and latitude. core geo_point
geo.region_name Region name. core keyword
geo.city_name City name. core keyword

Host fields

Host fields provide information related to a host. A host can be a physical machine, a virtual machine, or a Docker container.

Normally the host information is related to the machine on which the event was generated/collected, but they can be used differently if needed.

Field Description Level Type Example
host.hostname Hostname of the host.
It can contain what hostname returns on Unix systems, the fully qualified domain name, or a name specified by the user. The sender decides which value to use.
core keyword
host.id Unique host id.
As hostname is not always unique, use values that are meaningful in your environment.
Example: The current usage of beat.name.
core keyword
host.ip Host ip address. core ip
host.mac Host mac address. core keyword
host.type Type of host.
For Cloud providers this can be the machine type like t2.medium. If vm, this could be the container, for example, or other information meaningful in your environment.
core keyword
host.os.platform Operating system platform (centos, ubuntu, windows, etc.) extended keyword darwin
host.os.name Operating system name. extended keyword Mac OS X
host.os.family OS family (redhat, debian, freebsd, windows, etc.) extended keyword debian
host.os.version Operating system version. extended keyword 10.12.6
host.architecture Operating system architecture. core keyword x86_64

HTTP fields

Fields related to HTTP requests and responses.

Field Description Level Type Example
http.request.method Http request method. extended keyword GET, POST, PUT
http.response.status_code Http response status code. extended long 404
http.response.body The full http response body. extended keyword Hello world
http.version Http version. extended keyword 1.1

Log fields

Fields which are specific to log events.

Field Description Level Type Example
log.level Log level of the log event.
Some examples are WARN, ERR, INFO.
core keyword ERR
log.original This is the original log message and contains the full log message before splitting it up in multiple parts.
In contrast to the message field which can contain an extracted part of the log message, this field contains the original, full log message. It can have already some modifications applied like encoding or new lines removed to clean up the log message.
This field is not indexed and doc_values are disabled so it can't be queried but the value can be retrieved from _source.
core keyword Sep 19 08:26:10 localhost My log

Network fields

Fields related to network data.

Field Description Level Type Example
network.name Name given by operators to sections of their network. extended keyword Guest Wifi
network.protocol Network protocol name. core keyword http
network.direction Direction of the network traffic.
Recommended values are:
* inbound
* outbound
* unknown
core keyword inbound
network.forwarded_ip Host IP address when the source IP address is the proxy. core ip 192.1.1.2
network.inbound.bytes Network inbound bytes. core long 184
network.inbound.packets Network inbound packets. core long 12
network.outbound.bytes Network outbound bytes. core long 184
network.outbound.packets Network outbound packets. core long 12
network.total.bytes Network total bytes. The sum of inbound.bytes + outbound.bytes. core long 368
network.total.packets Network outbound packets. The sum of inbound.packets + outbound.packets core long 24

Organization fields

The organization fields enrich data with information about the company or entity the data is associated with. These fields help you arrange or filter data stored in an index by one or multiple organizations.

Field Description Level Type Example
organization.name Organization name. extended keyword
organization.id Unique identifier for the organization. extended keyword

Operating System fields

The OS fields contain information about the operating system. These fields are often used inside other prefixes, such as host.os.* or user_agent.os.*.

Field Description Level Type Example
os.platform Operating system platform (such centos, ubuntu, windows). extended keyword darwin
os.name Operating system name. extended keyword Mac OS X
os.family OS family (such as redhat, debian, freebsd, windows). extended keyword debian
os.version Operating system version as a raw string. extended keyword 10.12.6-rc2
os.kernel Operating system kernel version as a raw string. extended keyword 4.4.0-112-generic

Process fields

These fields contain information about a process. These fields can help you correlate metrics information with a process id/name from a log message. The process.pid often stays in the metric itself and is copied to the global field for correlation.

Field Description Level Type Example
process.args Process arguments.
May be filtered to protect sensitive information.
extended keyword ['-l', 'user', '10.0.0.16']
process.name Process name.
Sometimes called program name or similar.
extended keyword ssh
process.pid Process id. core long
process.ppid Process parent id. extended long
process.title Process title.
The proctitle, some times the same as process name. Can also be different: for example a browser setting its title to the web page currently opened.
extended keyword

Service fields

The service fields describe the service for or from which the data was collected. These fields help you find and correlate logs for a specific service and version.

Field Description Level Type Example
service.id Unique identifier of the running service.
This id should uniquely identify this service. This makes it possible to correlate logs and metrics for one specific service.
Example: If you are experiencing issues with one redis instance, you can filter on that id to see metrics and logs for that single instance.
core keyword d37e5ebfe0ae6c4972dbe9f0174a1637bb8247f6
service.name Name of the service data is collected from.
The name of the service is normally user given. This allows if two instances of the same service are running on the same machine they can be differentiated by the service.name.
Also it allows for distributed services that run on multiple hosts to correlate the related instances based on the name.
In the case of Elasticsearch the service.name could contain the cluster name. For Beats the service.name is by default a copy of the service.type field if no name is specified.
core keyword elasticsearch-metrics
service.type The type of the service data is collected from.
The type can be used to group and correlate logs and metrics from one service type.
Example: If logs or metrics are collected from Elasticsearch, service.type would be elasticsearch.
core keyword elasticsearch
service.state Current state of the service. core keyword
service.version Version of the service the data was collected from.
This allows to look at a data set only for a specific version of a service.
core keyword 3.2.4
service.ephemeral_id Ephemeral identifier of this service (if one exists).
This id normally changes across restarts, but service.id does not.
extended keyword 8a4f500f

Source fields

Source fields describe details about the source of the event.

Field Description Level Type Example
source.ip IP address of the source.
Can be one or multiple IPv4 or IPv6 addresses.
core ip
source.hostname Hostname of the source. core keyword
source.port Port of the source. core long
source.mac MAC address of the source. core keyword
source.domain Source domain. core keyword
source.subdomain Source subdomain. core keyword

URL fields

URL fields provide a complete URL, with scheme, host, and path. The URL object can be reused in other prefixes, such as host.url.* for example. Keep the structure consistent whenever you use URL fields.

Field Description Level Type Example
url.href Full url. The field is stored as keyword. extended keyword https://elastic.co:443/search?q=elasticsearch#top
url.scheme Scheme of the request, such as "https".
Note: The : is not part of the scheme.
extended keyword https
url.hostname Hostname of the request, such as "elastic.co".
In some cases a URL may refer to an IP and/or port directly, without a domain name. In this case, the IP address would go to the hostname field.
extended keyword elastic.co
url.port Port of the request, such as 443. extended integer 443
url.path Path of the request, such as "/search". extended keyword
url.query The query field describes the query string of the request, such as "q=elasticsearch".
The ? is excluded from the query string. If a URL contains no ?, there is no query field. If there is a ? but no query, the query field exists with an empty string. The exists query can be used to differentiate between the two cases.
extended keyword
url.fragment Portion of the url after the #, such as "top".
The # is not part of the fragment.
extended keyword
url.username Username of the request. extended keyword
url.password Password of the request. extended keyword

User fields

The user fields describe information about the user that is relevant to the event. Fields can have one entry or multiple entries. If a user has more than one id, provide an array that includes all of them.

Field Description Level Type Example
user.id One or multiple unique identifiers of the user. core keyword
user.name Name of the user.
The field is a keyword, and will not be tokenized.
core keyword
user.email User email address. extended keyword
user.hash Unique user hash to correlate information for a user in anonymized form.
Useful if user.id or user.name contain confidential information and cannot be used.
extended keyword

User agent fields

The user_agent fields normally come from a browser request. They often show up in web service logs coming from the parsed user agent string.

Field Description Level Type Example
user_agent.original Unparsed version of the user_agent. extended keyword
user_agent.device Name of the physical device. extended keyword
user_agent.version Version of the physical device. extended keyword
user_agent.major Major version of the user agent. extended long
user_agent.minor Minor version of the user agent. extended long
user_agent.patch Patch version of the user agent. extended keyword
user_agent.name Name of the user agent. extended keyword Chrome
user_agent.os.name Name of the operating system. extended keyword
user_agent.os.version Version of the operating system. extended keyword
user_agent.os.major Major version of the operating system. extended long
user_agent.os.minor Minor version of the operating system. extended long

Use cases

These are example on how ECS fields can be used in different use cases. Most use cases not only contain ECS fields but additional fields which are not in ECS to describe the full use case. The fields which are not in ECS are in italic.

Contributions of additional uses cases on top of ECS are welcome.

Implementing ECS

Guidelines

  • The document MUST have the @timestamp field.
  • The data type defined for an ECS field MUST be used.
  • It SHOULD have the field event.version to define which version of ECS it uses.
  • As many fields as possible should be mapped to ECS.

Writing fields

  • All fields must be lower case
  • Combine words using underscore
  • No special characters except _

Naming fields

  • Present tense. Use present tense unless field describes historical information.
  • Singular or plural. Use singular and plural names properly to reflect the field content. For example, use requests_per_sec rather than request_per_sec.
  • General to specific. Organise the prefixes from general to specific to allow grouping fields into objects with a prefix like host.*.
  • Avoid repetition. Avoid stuttering of words. If part of the field name is already in the prefix, do not repeat it. Example: host.host_ip should be host.ip.
  • Use prefixes. Fields must be prefixed except for the base fields. For example all host fields are prefixed with host.. See dot notation in FAQ for more details.
  • Do not use abbreviations. (A few exceptions like ip exist.)

Understanding ECS conventions

Multi-fields text indexing

Elasticsearch can index text multiple ways:

By default, unless your index mapping or index template specifies otherwise (as the ECS index template does), Elasticsearch indexes text field as text at the canonical field name, and indexes a second time as keyword, nested in a multi-field.

Default Elasticsearch convention:

  • Canonical field: myfield is text
  • Multi-field: myfield.keyword is keyword

For monitoring use cases, keyword indexing is needed almost exclusively, with full text search on very few fields. Given this premise, ECS defaults all text indexing to keyword at the top level (with very few exceptions). Any use case that requires full text search indexing on additional fields can simply add a multi-field for full text search. Doing so does not conflict with ECS, as the canonical field name will remain keyword indexed.

ECS multi-field convention for text:

  • Canonical field: myfield is keyword
  • Multi-field: myfield.text is text

Exceptions

The only exceptions to this convention are fields message and error.message, which are indexed for full text search only, with no multi-field. These two fields don't follow the new convention because they are deemed too big of a breaking change with these two widely used fields in Beats.

Any future field that will be indexed for full text search in ECS will however follow the multi-field convention where text indexing is nested in the multi-field.

IDs are keywords not integers

Despite the fact that IDs are often integers in various systems, this is not always the case. Since we want to make it possible to map as many data sources to ECS as possible, we default to using the keyword type for IDs.

FAQ

What are the benefits of using ECS?

The benefits to a user adopting these fields and names in their clusters are:

  • Data correlation. Ability to easily correlate data from the same or different sources, including:
    • data from metrics, logs, and apm
    • data from the same machines/hosts
    • data from the same service
  • Ease of recall. Improved ability to remember commonly used field names (because there is a single set, not a set per data source)
  • Ease of deduction. Improved ability to deduce field names (because the field naming follows a small number of rules with few exceptions)
  • Reuse. Ability to re-use analysis content (searches, visualizations, dashboards, alerts, reports, and ML jobs) across multiple data sources
  • Future proofing. Ability to use any future Elastic-provided analysis content in your environment without modifications

What if I have fields that conflict with ECS?

The rename processor can help you resolve field conflicts. For example, imagine that you already have a field called "user," but ECS employs user as an object. You can use the rename processor on ingest time to rename your field to the matching ECS field. If your field does not match ECS, you can rename your field to user.value instead.

What if my events have additional fields?

Events may contain fields in addition to ECS fields. These fields can follow the ECS naming and writing rules, but this is not a requirement.

Why does ECS use a dot notation instead of an underline notation?

There are two common key formats for ingesting data into Elasticsearch:

  • Dot notation: user.firstname: Nicolas, user.lastname: Ruflin
  • Underline notation: user_firstname: Nicolas, user_lastname: Ruflin

For ECS we decided to use the dot notation. Here's some background on this decision.

What is the difference between the two notations?

Ingesting user.firstname: Nicolas and user.lastname: Ruflin is identical to ingesting the following JSON:

"user": {
  "firstname": "Nicolas",
  "lastname": "Ruflin"
}

In Elasticsearch, user is represented as an object datatype. In the case of the underline notation, both are just string datatypes.

NOTE: ECS does not use nested datatypes, which are arrays of objects.

Advantages of dot notation

With dot notation, each prefix in Elasticsearch is an object. Each object can have parameters that control how fields inside the object are treated. In the context of ECS, for example, these parameters would allow you to disable dynamic property creation for certain prefixes.

Individual objects give you more flexibility on both the ingest and the event sides. In Elasticsearch, for example, you can use the remove processor to drop complete objects instead of selecting each key inside. You don't have to know ahead of time which keys will be in an object.

In Beats, you can simplify the creation of events. For example, you can treat each object as an object (or struct in Golang), which makes constructing and modifying each part of the final event easier.

Disadvantage of dot notation

In Elasticsearch, each key can only have one type. For example, if user is an object, you can't use it as a keyword type in the same index, like {"user": "nicolas ruflin"}. This restriction can be an issue in certain datasets. For the ECS data itself, this is not an issue because all fields are predefined.

What if I already use the underline notation?

Mixing the underline notation with the ECS dot notation is not a problem. As long as there are no conflicts, they can coexist in the same document.