OS Collection
Agentless operating system discovery across your Linux fleet via SSH
Parascope's OS collector bridges the gap between what your infrastructure platforms know (VMs, containers, bare metal) and what's actually running inside those machines. It connects via SSH, runs a self-contained collection script, and reports back operating system configuration, software inventory, security posture, and runtime metrics — without installing agents on your targets.
Why OS Collection Matters
Infrastructure platforms like Proxmox, OpenStack, and Kubernetes track resources from the outside — CPU allocation, memory limits, network attachments. But they can't tell you:
- What Linux distribution and kernel version is actually running
- Which packages are installed and whether security updates are pending
- What services are listening on the network
- Whether SSH is configured securely
- Which TLS certificates are deployed and when they expire
- What containers are running inside the VM
- What physical network neighbors are connected via LLDP
OS collection answers these questions across your entire fleet, creating a rich layer of CI data that connects to infrastructure CIs you already track.
Architecture
Key design principles:
- Agentless: No software installed on targets. A self-contained bash script is streamed via SSH and executed in memory
- Minimal footprint: Only standard Linux tools required (bash, awk, grep, sed, df, uname, ss, ip). Optional tools like
lldpctlandopensslenable additional sections - Safe execution: Script writes nothing to disk on the target (except
/tmp), and missing tools or permissions result in empty fields rather than failures - Credential security: SSH keys and passwords stored securely in Parascope, never in config files or logs
CI Types Created
The OS collector creates four distinct CI types, each serving a specific purpose in your infrastructure model.
os.linux — Operating System Instance
The primary CI representing a Linux installation on a host. Contains comprehensive OS-level data organized into change-tracked configuration and point-in-time metrics.
Configuration (tracked in change history):
| Category | Data Collected |
|---|---|
| Identity | Distribution, version, kernel, hostname, architecture, timezone |
| Hardware | CPU model and core count, total memory, swap, virtualization type, block devices, network devices |
| Network | Interfaces with IP addresses, routes, DNS resolvers, listening ports with process names, LLDP neighbors |
| Software | Full package inventory (dpkg/rpm), systemd services with states |
| Security | Pending security updates, SSH daemon config, user accounts with sudo status, SELinux/AppArmor status, SSH host key fingerprints |
| Parent link | Reference to hosting CI (VM, container, or bare metal node) |
Metrics (point-in-time, not tracked):
| Metric | Description |
|---|---|
| CPU usage | Current CPU utilization percentage |
| Memory usage | Used memory in MB |
| Swap usage | Used swap in MB |
| Filesystems | Mount points with capacity and usage |
| Uptime | Seconds since last boot |
| Load average | 1, 5, and 15-minute load averages |
Relationships: Each os.linux CI has a runs_on relationship to its parent infrastructure CI (e.g., proxmox.vm, openstack.instance).
os.software — Promoted Software
Not every installed package becomes a CI. The promotion engine identifies operationally significant software — services with listening ports, active daemons, and known infrastructure components — and promotes them to dedicated CIs.
Software CIs are deduplicated across the fleet: one CI per unique (software name, version, package type) combination, with per-host instance records tracking where it runs.
CI-level fields (shared identity):
| Field | Example |
|---|---|
| Software name | nginx |
| Version | 1.24.0-2ubuntu1 |
| Package type | dpkg, rpm, apk |
| Promotion reasons | listening_port, active_daemon, known_pattern |
Per-host instance fields:
| Field | Example |
|---|---|
| Service state | running, stopped, disabled |
| Listening ports | [{port: 80, protocol: tcp}, {port: 443, protocol: tcp}] |
| Systemd units | ["nginx.service"] |
| Memory RSS | 128.5 MB |
| Available update | 1.24.1-2ubuntu1 (if security update pending) |
This two-tier model means you can answer both "What version of nginx exists in our fleet?" (CI level) and "Which specific hosts run nginx, and is the service healthy?" (instance level).
os.certificate — X.509 Certificates
TLS certificates discovered in standard paths (/etc/ssl, /etc/pki, etc.) are parsed and deduplicated by SHA-256 fingerprint. One CI per unique certificate, with per-host instance records tracking file locations.
CI-level fields:
| Field | Description |
|---|---|
| Subject CN | Common name (e.g., *.example.com) |
| SANs | Subject alternative names |
| Issuer | Certificate authority |
| Validity period | Not before / not after dates |
| Fingerprint | SHA-256 fingerprint (deduplication key) |
| Key details | Key type (RSA, EC) and size (2048, 4096) |
Per-host instance fields:
| Field | Description |
|---|---|
| Host CI | Reference to the os.linux CI |
| File path | Where the certificate was found on disk |
os.container — Container Instances
Docker and Podman containers discovered on hosts are tracked as individual CIs with a runs_on relationship to their host os.linux CI.
Key fields:
| Field | Description |
|---|---|
| Container name | Raw name from the runtime |
| Stable name | Normalized name (Ceph FSID prefixes stripped) |
| Image and tag | Container image reference |
| State | running, exited, paused, etc. |
| Runtime | docker or podman |
| Port bindings | Host-to-container port mappings |
| Labels | Container labels/metadata |
| Network mode | bridge, host, none, etc. |
| Host CI | Reference to hosting os.linux CI |
Ceph container stable naming: Cephadm containers include the cluster FSID in their name (e.g., ceph-28ea88f2-1234-5678-abcd-ef0123456789-osd-2). The collector strips this prefix to produce a stable name (osd-2@hostname) that persists across cluster rebuilds.
What Gets Collected
The collection script runs 13 independent sections, each gathering a specific category of data. Sections can be individually enabled or disabled per ruleset.
| Section | What It Collects | Tools Used |
|---|---|---|
| os_identity | Distribution, version, kernel, hostname, architecture, timezone | /etc/os-release, uname |
| packages | Full package inventory with names, versions, architectures | dpkg-query or rpm -qa |
| services | Systemd unit states (running, enabled, disabled) | systemctl |
| network | Interfaces, IP addresses, routes, DNS resolvers | ip -j addr, ip -j route, /etc/resolv.conf |
| filesystems | Mount points with capacity and usage | df |
| resource_usage | CPU model/count, memory, swap, uptime, load average | /proc/cpuinfo, /proc/meminfo |
| listeners | Listening TCP/UDP ports with owning process names | ss -tlnp, ss -ulnp |
| certificates | X.509 certs in standard paths | openssl x509 |
| patch_status | Pending security and regular updates | apt list --upgradable or dnf updateinfo |
| security_baseline | SSH config, user accounts, sudo access, SELinux/AppArmor | /etc/ssh/sshd_config, /etc/passwd |
| hardware | Virtualization type, DMI info, CPU topology, block/network devices | /sys/class/, systemd-detect-virt |
| containers | Docker/Podman containers with config and state | docker ps/podman ps with JSON output |
| lldp | LLDP neighbors with remote system, port, and chassis IDs | lldpctl -f json0 |
| ssh_host_keys | SSH host key fingerprints from /etc/ssh/ssh_host_*_key.pub | python3 (SHA256 hash) |
How Collection Works
- Target discovery — The collector queries the Parascope API for CIs matching the ruleset filters (e.g., all running Proxmox VMs tagged "linux"). It extracts IP addresses from each CI's network data
- Credential resolution — SSH credentials are resolved from Parascope's credential store, with per-target overrides taking priority over ruleset-level defaults
- SSH connection — The collector connects to each target via direct SSH or through a jump host, with configurable timeouts and host key verification (see below)
- Script execution — The collection script is streamed to the target via stdin (
bash -s) and executed. No files are written to the target - Parsing — The script's JSON output is parsed into Parascope's structured format, separating config (change-tracked) from metrics (point-in-time)
- Software promotion — The promotion engine evaluates packages against behavioral signals and known patterns
- Publishing — Results are published for processing as separate messages per CI type (
os.linux,os.software,os.certificate,os.container) - Processing — CIs are created or updated, relationships are established, and enrichment data is merged into parent CIs
Collection runs with bounded concurrency (default: 10 parallel SSH connections) and per-target error isolation — a failure on one host doesn't affect others.
SSH Host Key Verification
The OS collector uses Trust-On-First-Use (TOFU) host key pinning to protect SSH connections from man-in-the-middle attacks. This ensures that SSH credentials are only sent to verified hosts.
How It Works
- First contact — When connecting to a host for the first time (no stored keys), the collector accepts the host key, captures it, and stores the fingerprint and full public key on the
os.linuxCI - Subsequent connections — On every following collection cycle, the stored public key is used for host key verification during the SSH handshake. The host key is verified during the SSH handshake, before credentials are sent. If the key doesn't match, the connection is rejected and no credentials are transmitted
- Cross-validation — The collection script also reads host keys from
/etc/ssh/ssh_host_*_key.pubon the target. The collector compares the handshake key against the filesystem key. A mismatch is a strong indicator of a MITM attack (the handshake key differs from what the host itself reports)
Verification Modes
The verification mode is configurable per source via the host_key_verification setting:
| Mode | Behavior | Credentials sent on mismatch? |
|---|---|---|
| reject (default) | Verify host key during handshake before authentication. Reject on mismatch | No — connection aborted pre-auth |
| warn | Connect normally, compare keys post-connect, log warning on mismatch | Yes — connection proceeds |
| disabled | Capture key for storage only, no verification | Yes — no comparison performed |
Stored Data
Host key fingerprints are stored on each os.linux CI in the ssh_host_key_fingerprints field:
{
"ssh-ed25519": {
"fingerprint": "SHA256:rNo3mjXJZLLC6R0SNbhvBjTMWCuJhK3cFlZps8HI2rI",
"public_key": "ssh-ed25519 AAAA..."
},
"ssh-rsa": {
"fingerprint": "SHA256:WRRx5bPX71T3pa7r0JIaJmgLl90JoVMOj9w9d75yseI",
"public_key": "ssh-rsa AAAA..."
},
"_connection_ip": "10.0.1.50"
}Keys from both the SSH handshake and the target's filesystem are merged. The handshake key takes precedence for its algorithm. The _connection_ip records which IP address was used for the SSH connection.
Key Rotation
If a host's SSH keys are legitimately rotated (e.g., after a reinstall), the collector will reject the connection in reject mode. To reset:
- Clear the
ssh_host_key_fingerprintsfield on theos.linuxCI (or delete and recreate the CI) - The next collection cycle will treat the host as first-contact and accept the new key
Alternatively, temporarily set host_key_verification to warn to allow the new key through while logging the change.
Software Promotion Engine
With hundreds of packages installed on a typical Linux host, promoting all of them to CIs would create noise. The promotion engine identifies the packages that actually matter — the ones running services, listening on the network, or matching known infrastructure patterns.
Promotion Signals
| Signal | What It Detects | Example |
|---|---|---|
| Listening port | Package owns a process with a network socket | nginx listening on port 80 |
| Active daemon | Package has a running systemd service | postgresql with active postgresql.service |
| Known pattern | Package name matches infrastructure patterns | redis-server matches the redis pattern |
| Force promote | Explicitly configured in the ruleset | Custom internal monitoring agent |
| Unpackaged listener | Process with a port but no matching package | Unauthorized daemon (security visibility) |
The engine uses fuzzy name matching to connect packages to their processes. For example, postgresql-15 is matched to listener process postgres by stripping version suffixes and checking known aliases.
Known Infrastructure Patterns
The promotion engine recognizes over 100 infrastructure software patterns across these categories:
| Category | Examples |
|---|---|
| Web servers | nginx, apache2, httpd, caddy, traefik, envoy |
| Databases | postgresql, mysql, mariadb, mongodb, redis, memcached, etcd |
| Message queues | rabbitmq, kafka, nats-server, mosquitto |
| Container runtimes | docker, containerd, cri-o, podman |
| Monitoring | prometheus, grafana, node-exporter, telegraf, zabbix, datadog |
| DNS | bind9, unbound, coredns, dnsmasq, powerdns |
| Security | fail2ban, crowdsec, certbot |
| Storage | ceph, minio, glusterfs, nfs-kernel-server |
| CI/CD | jenkins, gitlab-runner, drone, argo |
| Virtualization | qemu, libvirt, proxmox |
Promotion Overrides
Each ruleset can configure:
- Force promote — Package names to always promote, even without behavioral signals. Use for custom or internal software
- Suppress — Package names to never promote, even if they match signals. Use to filter noisy or uninteresting software
Real-World Use Cases
Security Incident Response: "Which hosts have vulnerable package X?"
When a CVE is announced for a critical package, you need to know your exposure immediately.
Scenario: A critical vulnerability is disclosed in OpenSSL 3.0.x (CVE-2024-XXXX). Your security team needs to identify all affected hosts within minutes.
With OS collection:
- Search for
os.softwareCIs withsoftware_name = openssl— instantly see every version deployed across your fleet - Click into each software CI to see the per-host instance list — which specific machines run the vulnerable version
- Check
available_versionon each instance to see if patches are already available in your package repositories - Use the
runs_onrelationship fromos.linuxto trace back to the hosting VM and the physical infrastructure beneath it
Without OS collection: You'd need to SSH into each machine manually, or maintain a separate inventory tool, or hope your vulnerability scanner has recent data.
Patch Compliance: "How many hosts have pending security updates?"
Scenario: Your compliance policy requires security patches within 30 days. You need a dashboard-ready view of patch status.
With OS collection:
- Every
os.linuxCI trackssecurity_update_count— the number of pending security updates - The
security_updatesfield lists each pending update with the available version - Filter the CI list for
os.linuxCIs wheresecurity_update_count > 0to see non-compliant hosts - Track changes over time — Parascope's change history records when security updates appear and when they get applied
Certificate Expiration Monitoring: "Which certificates expire soon?"
Scenario: An expired TLS certificate causes a production outage. You need visibility into certificate lifetimes across your fleet.
With OS collection:
- Every TLS certificate found on disk is tracked as an
os.certificateCI withnot_after(expiration date) - Certificates are deduplicated — a wildcard cert used on 10 hosts appears as one CI with 10 instance records showing the file paths
- Sort certificates by expiration date to see what's expiring next
- The instance records show exactly which hosts and file paths would be affected
Fleet Software Inventory: "What's running across our infrastructure?"
Scenario: During an architecture review, you need to understand the software landscape across 50+ Linux hosts.
With OS collection:
- Browse
os.softwareCIs to see every promoted software component across the fleet - Each software CI shows how many hosts run it (via instance count)
- Filter by promotion reason to focus on network services (
listening_port), active daemons (active_daemon), or known infrastructure (known_pattern) - Identify version sprawl — the same software at different versions across hosts
- Discover unexpected services — the
unpackaged_listenersignal catches processes listening on ports without corresponding packages
Infrastructure Drift Detection: "Is this VM configured as expected?"
Scenario: A VM was rebuilt from a template, but something isn't working. You need to compare its current state against known-good configuration.
With OS collection:
- Compare the current
os.linuxCI's configuration with its change history to see what changed - Check kernel version, installed packages, network configuration, and service states
- The IP mismatch detection in OS enrichment flags cases where the OS-reported IP addresses differ from what the infrastructure platform expects
- Compare two hosts by viewing their
os.linuxCIs side by side — same distribution? Same kernel? Same key packages?
Container Visibility: "What containers run on this host?"
Scenario: A host managed by cephadm is having performance issues. You need to see what containers are running on it.
With OS collection:
- Each
os.containerCI shows the container's image, state, port bindings, and resource configuration - The
runs_onrelationship connects containers to their hostos.linuxCI - Ceph containers use stable naming (
osd-2@hostnameinstead ofceph-28ea88f2-...-osd-2), so CI identity persists across cluster rebuilds - Distinguish Docker vs Podman containers via the
runtimefield
Network Service Mapping: "What's listening on the network?"
Scenario: You're auditing network exposure across your fleet and need to know every listening port and what process owns it.
With OS collection:
- The
listenerssection captures every TCP and UDP listening socket with the owning process name - Promoted software CIs include their listening ports in the instance data
unpackaged_listenerpromotions flag processes with network sockets but no corresponding package — potential unauthorized services- Combine with network data (interfaces, routes, DNS) to build a complete picture of each host's network posture
Physical Connectivity: "What switch port is this host connected to?"
Scenario: A host is experiencing network issues and you need to identify the physical switch and port it's connected to for troubleshooting.
With OS collection:
- The
lldpsection collects LLDP neighbor data from hosts runninglldpd, showing the directly connected network switch and port - Each LLDP neighbor record includes the remote system name, chassis ID, port ID, and management IP
- Cross-reference with SNMP LLDP data collected from switches to verify both sides of the physical link agree
- LLDP neighbor changes are tracked in change history — a neighbor change means a physical cabling change
- Field names are aligned with the SNMP LLDP collector format, enabling future cross-source correlation
Note: LLDP data requires lldpd to be installed on the target host. Hosts without lldpd will simply report an empty neighbors list.
Configuration
Collection Rulesets
Rulesets define what to collect from, how to reach it, and what to collect. You can create multiple rulesets for different parts of your infrastructure (e.g., production vs staging, different network zones).
Key ruleset settings:
| Setting | Description | Default |
|---|---|---|
| Name | Unique friendly name for the ruleset | Required |
| Platform | Target OS platform | linux |
| Enabled | Whether collection runs on schedule | true |
| Target filters | CI types and filters to select targets | Required |
| Credential | Name of stored SSH credential | Required |
| Collection interval | Seconds between scheduled collections | 86400 (24 hours) |
| SSH port | Port for SSH connections | 22 |
| Become method | Privilege escalation | sudo |
| Reachability | Transport strategy (direct or jump host) | direct |
| Section toggles | Enable/disable individual collection sections | All enabled |
| Promotion overrides | Force-promote or suppress specific packages | Empty |
Target Discovery
Targets are discovered dynamically by querying the Parascope API for CIs matching the ruleset's filter criteria. The collector extracts IP addresses from each CI's network data.
Example: Collect from all running Proxmox VMs
{
"targets": {
"mode": "whitelist",
"ci_types": ["proxmox.vm"],
"filters": [
{"field": "config.status", "op": "eq", "value": "running"}
]
}
}Example: Collect from OpenStack instances in a specific project
{
"targets": {
"mode": "whitelist",
"ci_types": ["openstack.instance"],
"filters": [
{"field": "config.status", "op": "eq", "value": "ACTIVE"},
{"field": "scope_label", "op": "eq", "value": "production-cloud"}
]
}
}Credential Management
SSH credentials are stored securely in Parascope's credential store and referenced by name in rulesets.
Supported authentication methods:
- SSH private key (RSA, Ed25519) — recommended
- SSH password — for legacy systems
Resolution priority per target:
- Target-specific credential (if configured)
- Ruleset-level default credential
- Error (no credentials available)
Reachability Strategies
| Strategy | Use Case | Configuration |
|---|---|---|
| Direct | Targets reachable from the collector's network | Default, no extra config needed |
| Jump host | Targets in isolated networks, behind a bastion | Configure gateway host and credentials |
Jump host example:
{
"reachability": {
"strategy": "jump_host",
"gateway": "10.20.0.1",
"gateway_port": 22,
"gateway_credential": "Bastion SSH Key"
}
}On-Demand Collection
In addition to scheduled collection, you can trigger immediate collection for a single target from the CI detail page. This is useful for:
- Verifying a configuration change was applied
- Getting fresh data before an incident investigation
- Testing connectivity to a new target
OS Enrichment
When the OS collector gathers data from a host, it also enriches the parent infrastructure CI (the VM or bare metal node) with a summary of what's running inside. This enrichment appears in the parent CI's detail view.
Enrichment data includes:
| Field | Description |
|---|---|
| OS summary | Distribution and kernel version (e.g., "Ubuntu 22.04.3 LTS (kernel 5.15.0-91)") |
| CPU utilization | Current CPU usage percentage |
| Memory utilization | Current memory usage percentage |
| IP mismatch | Whether OS-reported IPs differ from what the infrastructure platform expects |
| Last collected | When the OS data was last gathered |
This gives you at-a-glance OS information directly on VM and node detail pages, without navigating to the os.linux CI.
Monitoring
The OS collector exposes Prometheus metrics for operational visibility:
| Metric | Description |
|---|---|
os_targets_collected_total | Successful collections per ruleset |
os_targets_failed_total | Failed collections per ruleset |
os_targets_discovered | Targets discovered per ruleset |
collection_run_duration_seconds | Collection timing (P50/P95/P99) |
collector_source_health_state | Circuit breaker state (0=healthy, 1=degraded, 2=unhealthy, 3=circuit_open) |
Circuit Breaker
Each ruleset has an independent circuit breaker that prevents unhealthy rulesets from blocking healthy ones:
| State | Meaning |
|---|---|
| Healthy | Normal operation |
| Degraded | High latency detected (collections taking longer than 10s) |
| Unhealthy | Failures detected, still retrying |
| Circuit open | 3 consecutive failures — skips collection for 60s before retrying |
Related Documentation
- Architecture — System design overview
- Correlation Engine — Cross-system relationship discovery
- CI Types — All CI types across sources