Parascope's OS collector bridges the gap between what your infrastructure platforms know (VMs, containers, bare metal) and what's actually running inside those machines. It connects via SSH, runs a self-contained collection script, and reports back operating system configuration, software inventory, security posture, and runtime metrics — without installing agents on your targets.

Why OS Collection Matters

Infrastructure platforms like Proxmox, OpenStack, and Kubernetes track resources from the outside — CPU allocation, memory limits, network attachments. But they can't tell you:

What Linux distribution and kernel version is actually running
Which packages are installed and whether security updates are pending
What services are listening on the network
Whether SSH is configured securely
Which TLS certificates are deployed and when they expire
What containers are running inside the VM
What physical network neighbors are connected via LLDP

OS collection answers these questions across your entire fleet, creating a rich layer of CI data that connects to infrastructure CIs you already track.

Architecture

Loading diagram...

Key design principles:

Agentless: No software installed on targets. A self-contained bash script is streamed via SSH and executed in memory
Minimal footprint: Only standard Linux tools required (bash, awk, grep, sed, df, uname, ss, ip). Optional tools like lldpctl and openssl enable additional sections
Safe execution: Script writes nothing to disk on the target (except /tmp), and missing tools or permissions result in empty fields rather than failures
Credential security: SSH keys and passwords stored securely in Parascope, never in config files or logs

CI Types Created

The OS collector creates four distinct CI types, each serving a specific purpose in your infrastructure model.

os.linux — Operating System Instance

The primary CI representing a Linux installation on a host. Contains comprehensive OS-level data organized into change-tracked configuration and point-in-time metrics.

Configuration (tracked in change history):

Category	Data Collected
Identity	Distribution, version, kernel, hostname, architecture, timezone
Hardware	CPU model and core count, total memory, swap, virtualization type, block devices, network devices
Network	Interfaces with IP addresses, routes, DNS resolvers, listening ports with process names, LLDP neighbors
Software	Full package inventory (dpkg/rpm), systemd services with states
Security	Pending security updates, SSH daemon config, user accounts with sudo status, SELinux/AppArmor status, SSH host key fingerprints
Parent link	Reference to hosting CI (VM, container, or bare metal node)

Metrics (point-in-time, not tracked):

Metric	Description
CPU usage	Current CPU utilization percentage
Memory usage	Used memory in MB
Swap usage	Used swap in MB
Filesystems	Mount points with capacity and usage
Uptime	Seconds since last boot
Load average	1, 5, and 15-minute load averages

Relationships: Each os.linux CI has a runs_on relationship to its parent infrastructure CI (e.g., proxmox.vm, openstack.instance).

os.software — Promoted Software

Not every installed package becomes a CI. The promotion engine identifies operationally significant software — services with listening ports, active daemons, and known infrastructure components — and promotes them to dedicated CIs.

Software CIs are deduplicated across the fleet: one CI per unique (software name, version, package type) combination, with per-host instance records tracking where it runs.

CI-level fields (shared identity):

Field	Example
Software name	`nginx`
Version	`1.24.0-2ubuntu1`
Package type	`dpkg`, `rpm`, `apk`
Promotion reasons	`listening_port`, `active_daemon`, `known_pattern`

Per-host instance fields:

Field	Example
Service state	`running`, `stopped`, `disabled`
Listening ports	`[{port: 80, protocol: tcp}, {port: 443, protocol: tcp}]`
Systemd units	`["nginx.service"]`
Memory RSS	`128.5 MB`
Available update	`1.24.1-2ubuntu1` (if security update pending)

This two-tier model means you can answer both "What version of nginx exists in our fleet?" (CI level) and "Which specific hosts run nginx, and is the service healthy?" (instance level).

os.certificate — X.509 Certificates

TLS certificates discovered in standard paths (/etc/ssl, /etc/pki, etc.) are parsed and deduplicated by SHA-256 fingerprint. One CI per unique certificate, with per-host instance records tracking file locations.

CI-level fields:

Field	Description
Subject CN	Common name (e.g., `*.example.com`)
SANs	Subject alternative names
Issuer	Certificate authority
Validity period	Not before / not after dates
Fingerprint	SHA-256 fingerprint (deduplication key)
Key details	Key type (RSA, EC) and size (2048, 4096)

Per-host instance fields:

Field	Description
Host CI	Reference to the `os.linux` CI
File path	Where the certificate was found on disk

os.container — Container Instances

Docker and Podman containers discovered on hosts are tracked as individual CIs with a runs_on relationship to their host os.linux CI.

Key fields:

Field	Description
Container name	Raw name from the runtime
Stable name	Normalized name (Ceph FSID prefixes stripped)
Image and tag	Container image reference
State	`running`, `exited`, `paused`, etc.
Runtime	`docker` or `podman`
Port bindings	Host-to-container port mappings
Labels	Container labels/metadata
Network mode	`bridge`, `host`, `none`, etc.
Host CI	Reference to hosting `os.linux` CI

Ceph container stable naming: Cephadm containers include the cluster FSID in their name (e.g., ceph-28ea88f2-1234-5678-abcd-ef0123456789-osd-2). The collector strips this prefix to produce a stable name (osd-2@hostname) that persists across cluster rebuilds.

What Gets Collected

The collection script runs 13 independent sections, each gathering a specific category of data. Sections can be individually enabled or disabled per ruleset.

Section	What It Collects	Tools Used
os_identity	Distribution, version, kernel, hostname, architecture, timezone	`/etc/os-release`, `uname`
packages	Full package inventory with names, versions, architectures	`dpkg-query` or `rpm -qa`
services	Systemd unit states (running, enabled, disabled)	`systemctl`
network	Interfaces, IP addresses, routes, DNS resolvers	`ip -j addr`, `ip -j route`, `/etc/resolv.conf`
filesystems	Mount points with capacity and usage	`df`
resource_usage	CPU model/count, memory, swap, uptime, load average	`/proc/cpuinfo`, `/proc/meminfo`
listeners	Listening TCP/UDP ports with owning process names	`ss -tlnp`, `ss -ulnp`
certificates	X.509 certs in standard paths	`openssl x509`
patch_status	Pending security and regular updates	`apt list --upgradable` or `dnf updateinfo`
security_baseline	SSH config, user accounts, sudo access, SELinux/AppArmor	`/etc/ssh/sshd_config`, `/etc/passwd`
hardware	Virtualization type, DMI info, CPU topology, block/network devices	`/sys/class/`, `systemd-detect-virt`
containers	Docker/Podman containers with config and state	`docker ps`/`podman ps` with JSON output
lldp	LLDP neighbors with remote system, port, and chassis IDs	`lldpctl -f json0`
ssh_host_keys	SSH host key fingerprints from `/etc/ssh/ssh_host_*_key.pub`	`python3` (SHA256 hash)

How Collection Works

Target discovery — The collector queries the Parascope API for CIs matching the ruleset filters (e.g., all running Proxmox VMs tagged "linux"). It extracts IP addresses from each CI's network data
Credential resolution — SSH credentials are resolved from Parascope's credential store, with per-target overrides taking priority over ruleset-level defaults
SSH connection — The collector connects to each target via direct SSH or through a jump host, with configurable timeouts and host key verification (see below)
Script execution — The collection script is streamed to the target via stdin (bash -s) and executed. No files are written to the target
Parsing — The script's JSON output is parsed into Parascope's structured format, separating config (change-tracked) from metrics (point-in-time)
Software promotion — The promotion engine evaluates packages against behavioral signals and known patterns
Publishing — Results are published for processing as separate messages per CI type (os.linux, os.software, os.certificate, os.container)
Processing — CIs are created or updated, relationships are established, and enrichment data is merged into parent CIs

Collection runs with bounded concurrency (default: 10 parallel SSH connections) and per-target error isolation — a failure on one host doesn't affect others.

SSH Host Key Verification

The OS collector uses Trust-On-First-Use (TOFU) host key pinning to protect SSH connections from man-in-the-middle attacks. This ensures that SSH credentials are only sent to verified hosts.

How It Works

First contact — When connecting to a host for the first time (no stored keys), the collector accepts the host key, captures it, and stores the fingerprint and full public key on the os.linux CI
Subsequent connections — On every following collection cycle, the stored public key is used for host key verification during the SSH handshake. The host key is verified during the SSH handshake, before credentials are sent. If the key doesn't match, the connection is rejected and no credentials are transmitted
Cross-validation — The collection script also reads host keys from /etc/ssh/ssh_host_*_key.pub on the target. The collector compares the handshake key against the filesystem key. A mismatch is a strong indicator of a MITM attack (the handshake key differs from what the host itself reports)

Verification Modes

The verification mode is configurable per source via the host_key_verification setting:

Mode	Behavior	Credentials sent on mismatch?
reject (default)	Verify host key during handshake before authentication. Reject on mismatch	No — connection aborted pre-auth
warn	Connect normally, compare keys post-connect, log warning on mismatch	Yes — connection proceeds
disabled	Capture key for storage only, no verification	Yes — no comparison performed

Stored Data

Host key fingerprints are stored on each os.linux CI in the ssh_host_key_fingerprints field:

{
  "ssh-ed25519": {
    "fingerprint": "SHA256:rNo3mjXJZLLC6R0SNbhvBjTMWCuJhK3cFlZps8HI2rI",
    "public_key": "ssh-ed25519 AAAA..."
  },
  "ssh-rsa": {
    "fingerprint": "SHA256:WRRx5bPX71T3pa7r0JIaJmgLl90JoVMOj9w9d75yseI",
    "public_key": "ssh-rsa AAAA..."
  },
  "_connection_ip": "10.0.1.50"
}

Keys from both the SSH handshake and the target's filesystem are merged. The handshake key takes precedence for its algorithm. The _connection_ip records which IP address was used for the SSH connection.

Key Rotation

If a host's SSH keys are legitimately rotated (e.g., after a reinstall), the collector will reject the connection in reject mode. To reset:

Clear the ssh_host_key_fingerprints field on the os.linux CI (or delete and recreate the CI)
The next collection cycle will treat the host as first-contact and accept the new key

Alternatively, temporarily set host_key_verification to warn to allow the new key through while logging the change.

Software Promotion Engine

With hundreds of packages installed on a typical Linux host, promoting all of them to CIs would create noise. The promotion engine identifies the packages that actually matter — the ones running services, listening on the network, or matching known infrastructure patterns.

Promotion Signals

Signal	What It Detects	Example
Listening port	Package owns a process with a network socket	`nginx` listening on port 80
Active daemon	Package has a running systemd service	`postgresql` with active `postgresql.service`
Known pattern	Package name matches infrastructure patterns	`redis-server` matches the `redis` pattern
Force promote	Explicitly configured in the ruleset	Custom internal monitoring agent
Unpackaged listener	Process with a port but no matching package	Unauthorized daemon (security visibility)

The engine uses fuzzy name matching to connect packages to their processes. For example, postgresql-15 is matched to listener process postgres by stripping version suffixes and checking known aliases.

Known Infrastructure Patterns

The promotion engine recognizes over 100 infrastructure software patterns across these categories:

Category	Examples
Web servers	nginx, apache2, httpd, caddy, traefik, envoy
Databases	postgresql, mysql, mariadb, mongodb, redis, memcached, etcd
Message queues	rabbitmq, kafka, nats-server, mosquitto
Container runtimes	docker, containerd, cri-o, podman
Monitoring	prometheus, grafana, node-exporter, telegraf, zabbix, datadog
DNS	bind9, unbound, coredns, dnsmasq, powerdns
Security	fail2ban, crowdsec, certbot
Storage	ceph, minio, glusterfs, nfs-kernel-server
CI/CD	jenkins, gitlab-runner, drone, argo
Virtualization	qemu, libvirt, proxmox

Promotion Overrides

Each ruleset can configure:

Force promote — Package names to always promote, even without behavioral signals. Use for custom or internal software
Suppress — Package names to never promote, even if they match signals. Use to filter noisy or uninteresting software

Real-World Use Cases

Security Incident Response: "Which hosts have vulnerable package X?"

When a CVE is announced for a critical package, you need to know your exposure immediately.

Scenario: A critical vulnerability is disclosed in OpenSSL 3.0.x (CVE-2024-XXXX). Your security team needs to identify all affected hosts within minutes.

With OS collection:

Search for os.software CIs with software_name = openssl — instantly see every version deployed across your fleet
Click into each software CI to see the per-host instance list — which specific machines run the vulnerable version
Check available_version on each instance to see if patches are already available in your package repositories
Use the runs_on relationship from os.linux to trace back to the hosting VM and the physical infrastructure beneath it

Without OS collection: You'd need to SSH into each machine manually, or maintain a separate inventory tool, or hope your vulnerability scanner has recent data.

Patch Compliance: "How many hosts have pending security updates?"

Scenario: Your compliance policy requires security patches within 30 days. You need a dashboard-ready view of patch status.

With OS collection:

Every os.linux CI tracks security_update_count — the number of pending security updates
The security_updates field lists each pending update with the available version
Filter the CI list for os.linux CIs where security_update_count > 0 to see non-compliant hosts
Track changes over time — Parascope's change history records when security updates appear and when they get applied

Certificate Expiration Monitoring: "Which certificates expire soon?"

Scenario: An expired TLS certificate causes a production outage. You need visibility into certificate lifetimes across your fleet.

With OS collection:

Every TLS certificate found on disk is tracked as an os.certificate CI with not_after (expiration date)
Certificates are deduplicated — a wildcard cert used on 10 hosts appears as one CI with 10 instance records showing the file paths
Sort certificates by expiration date to see what's expiring next
The instance records show exactly which hosts and file paths would be affected

Fleet Software Inventory: "What's running across our infrastructure?"

Scenario: During an architecture review, you need to understand the software landscape across 50+ Linux hosts.

With OS collection:

Browse os.software CIs to see every promoted software component across the fleet
Each software CI shows how many hosts run it (via instance count)
Filter by promotion reason to focus on network services (listening_port), active daemons (active_daemon), or known infrastructure (known_pattern)
Identify version sprawl — the same software at different versions across hosts
Discover unexpected services — the unpackaged_listener signal catches processes listening on ports without corresponding packages

Infrastructure Drift Detection: "Is this VM configured as expected?"

Scenario: A VM was rebuilt from a template, but something isn't working. You need to compare its current state against known-good configuration.

With OS collection:

Compare the current os.linux CI's configuration with its change history to see what changed
Check kernel version, installed packages, network configuration, and service states
The IP mismatch detection in OS enrichment flags cases where the OS-reported IP addresses differ from what the infrastructure platform expects
Compare two hosts by viewing their os.linux CIs side by side — same distribution? Same kernel? Same key packages?

Container Visibility: "What containers run on this host?"

Scenario: A host managed by cephadm is having performance issues. You need to see what containers are running on it.

With OS collection:

Each os.container CI shows the container's image, state, port bindings, and resource configuration
The runs_on relationship connects containers to their host os.linux CI
Ceph containers use stable naming (osd-2@hostname instead of ceph-28ea88f2-...-osd-2), so CI identity persists across cluster rebuilds
Distinguish Docker vs Podman containers via the runtime field

Network Service Mapping: "What's listening on the network?"

Scenario: You're auditing network exposure across your fleet and need to know every listening port and what process owns it.

With OS collection:

The listeners section captures every TCP and UDP listening socket with the owning process name
Promoted software CIs include their listening ports in the instance data
unpackaged_listener promotions flag processes with network sockets but no corresponding package — potential unauthorized services
Combine with network data (interfaces, routes, DNS) to build a complete picture of each host's network posture

Physical Connectivity: "What switch port is this host connected to?"

Scenario: A host is experiencing network issues and you need to identify the physical switch and port it's connected to for troubleshooting.

With OS collection:

The lldp section collects LLDP neighbor data from hosts running lldpd, showing the directly connected network switch and port
Each LLDP neighbor record includes the remote system name, chassis ID, port ID, and management IP
Cross-reference with SNMP LLDP data collected from switches to verify both sides of the physical link agree
LLDP neighbor changes are tracked in change history — a neighbor change means a physical cabling change
Field names are aligned with the SNMP LLDP collector format, enabling future cross-source correlation

Note: LLDP data requires lldpd to be installed on the target host. Hosts without lldpd will simply report an empty neighbors list.

Configuration

Collection Rulesets

Rulesets define what to collect from, how to reach it, and what to collect. You can create multiple rulesets for different parts of your infrastructure (e.g., production vs staging, different network zones).

Key ruleset settings:

Setting	Description	Default
Name	Unique friendly name for the ruleset	Required
Platform	Target OS platform	`linux`
Enabled	Whether collection runs on schedule	`true`
Target filters	CI types and filters to select targets	Required
Credential	Name of stored SSH credential	Required
Collection interval	Seconds between scheduled collections	`86400` (24 hours)
SSH port	Port for SSH connections	`22`
Become method	Privilege escalation	`sudo`
Reachability	Transport strategy (direct or jump host)	`direct`
Section toggles	Enable/disable individual collection sections	All enabled
Promotion overrides	Force-promote or suppress specific packages	Empty

Target Discovery

Targets are discovered dynamically by querying the Parascope API for CIs matching the ruleset's filter criteria. The collector extracts IP addresses from each CI's network data.

Example: Collect from all running Proxmox VMs

{
  "targets": {
    "mode": "whitelist",
    "ci_types": ["proxmox.vm"],
    "filters": [
      {"field": "config.status", "op": "eq", "value": "running"}
    ]
  }
}

Example: Collect from OpenStack instances in a specific project

{
  "targets": {
    "mode": "whitelist",
    "ci_types": ["openstack.instance"],
    "filters": [
      {"field": "config.status", "op": "eq", "value": "ACTIVE"},
      {"field": "scope_label", "op": "eq", "value": "production-cloud"}
    ]
  }
}

Credential Management

SSH credentials are stored securely in Parascope's credential store and referenced by name in rulesets.

Supported authentication methods:

SSH private key (RSA, Ed25519) — recommended
SSH password — for legacy systems

Resolution priority per target:

Target-specific credential (if configured)
Ruleset-level default credential
Error (no credentials available)

Reachability Strategies

Strategy	Use Case	Configuration
Direct	Targets reachable from the collector's network	Default, no extra config needed
Jump host	Targets in isolated networks, behind a bastion	Configure gateway host and credentials

Jump host example:

{
  "reachability": {
    "strategy": "jump_host",
    "gateway": "10.20.0.1",
    "gateway_port": 22,
    "gateway_credential": "Bastion SSH Key"
  }
}

On-Demand Collection

In addition to scheduled collection, you can trigger immediate collection for a single target from the CI detail page. This is useful for:

Verifying a configuration change was applied
Getting fresh data before an incident investigation
Testing connectivity to a new target

OS Enrichment

When the OS collector gathers data from a host, it also enriches the parent infrastructure CI (the VM or bare metal node) with a summary of what's running inside. This enrichment appears in the parent CI's detail view.

Enrichment data includes:

Field	Description
OS summary	Distribution and kernel version (e.g., "Ubuntu 22.04.3 LTS (kernel 5.15.0-91)")
CPU utilization	Current CPU usage percentage
Memory utilization	Current memory usage percentage
IP mismatch	Whether OS-reported IPs differ from what the infrastructure platform expects
Last collected	When the OS data was last gathered

This gives you at-a-glance OS information directly on VM and node detail pages, without navigating to the os.linux CI.

Monitoring

The OS collector exposes Prometheus metrics for operational visibility:

Metric	Description
`os_targets_collected_total`	Successful collections per ruleset
`os_targets_failed_total`	Failed collections per ruleset
`os_targets_discovered`	Targets discovered per ruleset
`collection_run_duration_seconds`	Collection timing (P50/P95/P99)
`collector_source_health_state`	Circuit breaker state (0=healthy, 1=degraded, 2=unhealthy, 3=circuit_open)

Circuit Breaker

Each ruleset has an independent circuit breaker that prevents unhealthy rulesets from blocking healthy ones:

State	Meaning
Healthy	Normal operation
Degraded	High latency detected (collections taking longer than 10s)
Unhealthy	Failures detected, still retrying
Circuit open	3 consecutive failures — skips collection for 60s before retrying

Architecture — System design overview
Correlation Engine — Cross-system relationship discovery
CI Types — All CI types across sources

OS Collection

On this page