Post

A Technical Guide to Splunk Data Models

A Technical Guide to Splunk Data Models

A Technical Guide to Splunk Data Models

Splunk Data Models are essential for organizing and accelerating searches, providing structured data for Splunk’s Pivot interface, and creating efficient dashboards. This guide will walk you through the processes of creating, filling, maintaining, validating, and requesting data from Splunk Data Models.

Creating a Splunk Data Model

  1. Log in to Splunk: Access your Splunk instance through your web browser.
  2. Navigate to Data Models: Go to Settings > Data models.
  3. Create New Data Model:
    • Click on New Data Model.
    • Enter a Title and an optional ID and Description.
    • Choose the Permissions (private or shared in an app).
    • Click Create.
  4. Add Data Model Objects:
    • Click on Add Object.
    • Choose the Root Event Object or Root Search Object.
    • Provide an Object Name, Display Name, and Description.
    • Define the Constraint (base search).
      1
      
      (`cim_Vulnerabilities_indexes`) tag=vulnerability tag=report
      
    • Add Fields (auto-extracted or manually defined).
  5. Save the Data Model: Click Save to store the Data Model.

  6. Use Accelerations (recommended after development):
    • Enable acceleration for faster search performance.
    • Navigate to the Data Model, click Edit > Edit Acceleration.
    • Check Accelerate and set the Summary Range.
    • Click Save.

Dataset Types

Event

Explanation: Event datasets focus on raw data, applying constraints to select specific events for analysis.

  • Fields: username, timestamp, ip_address
  • Constraints: status="success"
  • Example: You have a log source that records user login activities. An event dataset might filter these logs to include only successful login events.

Explanation: Search datasets use the results of saved searches, allowing for complex queries and aggregations.

  • Search Query: index=web_logs | stats avg(response_time) as avg_response_time by endpoint
  • Fields: endpoint, avg_response_time
  • Example: You want to track the average response time of a web application. You could create a search dataset based on a saved search that calculates this metric.

Transaction

Explanation: Transaction datasets group related events, useful for analyzing multi-step processes or transactions.

  • Transaction Definition: Group events by session_id with a start event of action="add_to_cart" and an end event of action="checkout".
  • Fields: session_id, user_id, total_time, items_purchased
  • Example: You need to analyze a user’s journey through an e-commerce site, from adding items to the cart to checkout.

Validating a Splunk Data Model

  1. Use the Data Model “CIM Validation (S.o.S.)”:
    This is only available for internal Data Models.
    • Navigate to Settings > Data models.
    • Locate the CIM Validation (S.o.S.) Data Model and in the Actions column, click Pivot.
    • Click one of the following to create the Pivot:
      • Top-level dataset
      • Missing extractions
      • Untagged events
  2. Use the datamodelsimple Command:
    The datamodelsimple command in Splunk is designed to retrieve and explore the structure of Data Models, including listing available models, objects within a model, and attributes of a specific object.

    1
    2
    3
    4
    5
    6
    7
    8
    
    # List All Data Models
    | datamodelsimple type=models
    
    # List all objects in a specific Data Model
    | datamodelsimple type=objects datamodel=Authentication
    
    # List Attributes (Fields) for a Specific Object in a Data Model
    | datamodelsimple type=attributes datamodel=Authentication nodename=Authentication.Failed_Authentication
    
  3. Use the “CIM Vladiator” App:
    • Open the app “CIM Vladiator”.
    • Search Type: Datamodel.
    • Target Data Model: <your Model>.
    • Example Searches:
      1
      2
      
      | datamodel Vulnerabilities search
      index=vulnerabilities
      
  4. Search for Errors in Logs:
    1
    
    index=_internal sourcetype=splunkd "DataModelAccelerator" OR "DataModel"
    

Requesting Data from a Splunk Data Model

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# | from
## Relies on accelerated data
| from datamodel:<datamodel>

# | datamodel
## Relies on index data
| datamodel <DataModelName> <ObjectName> search

# tstats
| tstats count FROM datamodel=Network WHERE ip=10.9.8.7 by <Fields>
| tstats count 
   FROM datamodel=Network.Network_Traffic 
   WHERE src=10.9.8.7
   BY Network_Traffic.src, Network_Traffic.dest, Network_Traffic.action

Source

Splunk How to use the CIM data model
Splunk Use the CIM to validate your data
Splunk Common Information Model Add-on Manual
Splunk Use the CIM to normalize CPU performance metrics
Splunk Base - SA-cim_vladiator

This post is licensed under CC BY 4.0 by the author.