Parascope Docs

OS Collection

Agentless operating system discovery across your Linux fleet via SSH

Parascope's OS collector bridges the gap between what your infrastructure platforms know (VMs, containers, bare metal) and what's actually running inside those machines. It connects via SSH, runs a self-contained collection script, and reports back operating system configuration, software inventory, security posture, and runtime metrics — without installing agents on your targets.

Why OS Collection Matters

Infrastructure platforms like Proxmox, OpenStack, and Kubernetes track resources from the outside — CPU allocation, memory limits, network attachments. But they can't tell you:

  • What Linux distribution and kernel version is actually running
  • Which packages are installed and whether security updates are pending
  • What services are listening on the network
  • Whether SSH is configured securely
  • Which TLS certificates are deployed and when they expire
  • What containers are running inside the VM
  • What physical network neighbors are connected via LLDP

OS collection answers these questions across your entire fleet, creating a rich layer of CI data that connects to infrastructure CIs you already track.

Architecture

Loading diagram...

Key design principles:

  • Agentless: No software installed on targets. A self-contained bash script is streamed via SSH and executed in memory
  • Minimal footprint: Only standard Linux tools required (bash, awk, grep, sed, df, uname, ss, ip). Optional tools like lldpctl and openssl enable additional sections
  • Safe execution: Script writes nothing to disk on the target (except /tmp), and missing tools or permissions result in empty fields rather than failures
  • Credential security: SSH keys and passwords stored securely in Parascope, never in config files or logs

CI Types Created

The OS collector creates four distinct CI types, each serving a specific purpose in your infrastructure model.

os.linux — Operating System Instance

The primary CI representing a Linux installation on a host. Contains comprehensive OS-level data organized into change-tracked configuration and point-in-time metrics.

Configuration (tracked in change history):

CategoryData Collected
IdentityDistribution, version, kernel, hostname, architecture, timezone
HardwareCPU model and core count, total memory, swap, virtualization type, block devices, network devices
NetworkInterfaces with IP addresses, routes, DNS resolvers, listening ports with process names, LLDP neighbors
SoftwareFull package inventory (dpkg/rpm), systemd services with states
SecurityPending security updates, SSH daemon config, user accounts with sudo status, SELinux/AppArmor status, SSH host key fingerprints
Parent linkReference to hosting CI (VM, container, or bare metal node)

Metrics (point-in-time, not tracked):

MetricDescription
CPU usageCurrent CPU utilization percentage
Memory usageUsed memory in MB
Swap usageUsed swap in MB
FilesystemsMount points with capacity and usage
UptimeSeconds since last boot
Load average1, 5, and 15-minute load averages

Relationships: Each os.linux CI has a runs_on relationship to its parent infrastructure CI (e.g., proxmox.vm, openstack.instance).


os.software — Promoted Software

Not every installed package becomes a CI. The promotion engine identifies operationally significant software — services with listening ports, active daemons, and known infrastructure components — and promotes them to dedicated CIs.

Software CIs are deduplicated across the fleet: one CI per unique (software name, version, package type) combination, with per-host instance records tracking where it runs.

CI-level fields (shared identity):

FieldExample
Software namenginx
Version1.24.0-2ubuntu1
Package typedpkg, rpm, apk
Promotion reasonslistening_port, active_daemon, known_pattern

Per-host instance fields:

FieldExample
Service staterunning, stopped, disabled
Listening ports[{port: 80, protocol: tcp}, {port: 443, protocol: tcp}]
Systemd units["nginx.service"]
Memory RSS128.5 MB
Available update1.24.1-2ubuntu1 (if security update pending)

This two-tier model means you can answer both "What version of nginx exists in our fleet?" (CI level) and "Which specific hosts run nginx, and is the service healthy?" (instance level).


os.certificate — X.509 Certificates

TLS certificates discovered in standard paths (/etc/ssl, /etc/pki, etc.) are parsed and deduplicated by SHA-256 fingerprint. One CI per unique certificate, with per-host instance records tracking file locations.

CI-level fields:

FieldDescription
Subject CNCommon name (e.g., *.example.com)
SANsSubject alternative names
IssuerCertificate authority
Validity periodNot before / not after dates
FingerprintSHA-256 fingerprint (deduplication key)
Key detailsKey type (RSA, EC) and size (2048, 4096)

Per-host instance fields:

FieldDescription
Host CIReference to the os.linux CI
File pathWhere the certificate was found on disk

os.container — Container Instances

Docker and Podman containers discovered on hosts are tracked as individual CIs with a runs_on relationship to their host os.linux CI.

Key fields:

FieldDescription
Container nameRaw name from the runtime
Stable nameNormalized name (Ceph FSID prefixes stripped)
Image and tagContainer image reference
Staterunning, exited, paused, etc.
Runtimedocker or podman
Port bindingsHost-to-container port mappings
LabelsContainer labels/metadata
Network modebridge, host, none, etc.
Host CIReference to hosting os.linux CI

Ceph container stable naming: Cephadm containers include the cluster FSID in their name (e.g., ceph-28ea88f2-1234-5678-abcd-ef0123456789-osd-2). The collector strips this prefix to produce a stable name (osd-2@hostname) that persists across cluster rebuilds.


What Gets Collected

The collection script runs 13 independent sections, each gathering a specific category of data. Sections can be individually enabled or disabled per ruleset.

SectionWhat It CollectsTools Used
os_identityDistribution, version, kernel, hostname, architecture, timezone/etc/os-release, uname
packagesFull package inventory with names, versions, architecturesdpkg-query or rpm -qa
servicesSystemd unit states (running, enabled, disabled)systemctl
networkInterfaces, IP addresses, routes, DNS resolversip -j addr, ip -j route, /etc/resolv.conf
filesystemsMount points with capacity and usagedf
resource_usageCPU model/count, memory, swap, uptime, load average/proc/cpuinfo, /proc/meminfo
listenersListening TCP/UDP ports with owning process namesss -tlnp, ss -ulnp
certificatesX.509 certs in standard pathsopenssl x509
patch_statusPending security and regular updatesapt list --upgradable or dnf updateinfo
security_baselineSSH config, user accounts, sudo access, SELinux/AppArmor/etc/ssh/sshd_config, /etc/passwd
hardwareVirtualization type, DMI info, CPU topology, block/network devices/sys/class/, systemd-detect-virt
containersDocker/Podman containers with config and statedocker ps/podman ps with JSON output
lldpLLDP neighbors with remote system, port, and chassis IDslldpctl -f json0
ssh_host_keysSSH host key fingerprints from /etc/ssh/ssh_host_*_key.pubpython3 (SHA256 hash)

How Collection Works

  1. Target discovery — The collector queries the Parascope API for CIs matching the ruleset filters (e.g., all running Proxmox VMs tagged "linux"). It extracts IP addresses from each CI's network data
  2. Credential resolution — SSH credentials are resolved from Parascope's credential store, with per-target overrides taking priority over ruleset-level defaults
  3. SSH connection — The collector connects to each target via direct SSH or through a jump host, with configurable timeouts and host key verification (see below)
  4. Script execution — The collection script is streamed to the target via stdin (bash -s) and executed. No files are written to the target
  5. Parsing — The script's JSON output is parsed into Parascope's structured format, separating config (change-tracked) from metrics (point-in-time)
  6. Software promotion — The promotion engine evaluates packages against behavioral signals and known patterns
  7. Publishing — Results are published for processing as separate messages per CI type (os.linux, os.software, os.certificate, os.container)
  8. Processing — CIs are created or updated, relationships are established, and enrichment data is merged into parent CIs

Collection runs with bounded concurrency (default: 10 parallel SSH connections) and per-target error isolation — a failure on one host doesn't affect others.

SSH Host Key Verification

The OS collector uses Trust-On-First-Use (TOFU) host key pinning to protect SSH connections from man-in-the-middle attacks. This ensures that SSH credentials are only sent to verified hosts.

How It Works

  1. First contact — When connecting to a host for the first time (no stored keys), the collector accepts the host key, captures it, and stores the fingerprint and full public key on the os.linux CI
  2. Subsequent connections — On every following collection cycle, the stored public key is used for host key verification during the SSH handshake. The host key is verified during the SSH handshake, before credentials are sent. If the key doesn't match, the connection is rejected and no credentials are transmitted
  3. Cross-validation — The collection script also reads host keys from /etc/ssh/ssh_host_*_key.pub on the target. The collector compares the handshake key against the filesystem key. A mismatch is a strong indicator of a MITM attack (the handshake key differs from what the host itself reports)

Verification Modes

The verification mode is configurable per source via the host_key_verification setting:

ModeBehaviorCredentials sent on mismatch?
reject (default)Verify host key during handshake before authentication. Reject on mismatchNo — connection aborted pre-auth
warnConnect normally, compare keys post-connect, log warning on mismatchYes — connection proceeds
disabledCapture key for storage only, no verificationYes — no comparison performed

Stored Data

Host key fingerprints are stored on each os.linux CI in the ssh_host_key_fingerprints field:

{
  "ssh-ed25519": {
    "fingerprint": "SHA256:rNo3mjXJZLLC6R0SNbhvBjTMWCuJhK3cFlZps8HI2rI",
    "public_key": "ssh-ed25519 AAAA..."
  },
  "ssh-rsa": {
    "fingerprint": "SHA256:WRRx5bPX71T3pa7r0JIaJmgLl90JoVMOj9w9d75yseI",
    "public_key": "ssh-rsa AAAA..."
  },
  "_connection_ip": "10.0.1.50"
}

Keys from both the SSH handshake and the target's filesystem are merged. The handshake key takes precedence for its algorithm. The _connection_ip records which IP address was used for the SSH connection.

Key Rotation

If a host's SSH keys are legitimately rotated (e.g., after a reinstall), the collector will reject the connection in reject mode. To reset:

  1. Clear the ssh_host_key_fingerprints field on the os.linux CI (or delete and recreate the CI)
  2. The next collection cycle will treat the host as first-contact and accept the new key

Alternatively, temporarily set host_key_verification to warn to allow the new key through while logging the change.


Software Promotion Engine

With hundreds of packages installed on a typical Linux host, promoting all of them to CIs would create noise. The promotion engine identifies the packages that actually matter — the ones running services, listening on the network, or matching known infrastructure patterns.

Promotion Signals

SignalWhat It DetectsExample
Listening portPackage owns a process with a network socketnginx listening on port 80
Active daemonPackage has a running systemd servicepostgresql with active postgresql.service
Known patternPackage name matches infrastructure patternsredis-server matches the redis pattern
Force promoteExplicitly configured in the rulesetCustom internal monitoring agent
Unpackaged listenerProcess with a port but no matching packageUnauthorized daemon (security visibility)

The engine uses fuzzy name matching to connect packages to their processes. For example, postgresql-15 is matched to listener process postgres by stripping version suffixes and checking known aliases.

Known Infrastructure Patterns

The promotion engine recognizes over 100 infrastructure software patterns across these categories:

CategoryExamples
Web serversnginx, apache2, httpd, caddy, traefik, envoy
Databasespostgresql, mysql, mariadb, mongodb, redis, memcached, etcd
Message queuesrabbitmq, kafka, nats-server, mosquitto
Container runtimesdocker, containerd, cri-o, podman
Monitoringprometheus, grafana, node-exporter, telegraf, zabbix, datadog
DNSbind9, unbound, coredns, dnsmasq, powerdns
Securityfail2ban, crowdsec, certbot
Storageceph, minio, glusterfs, nfs-kernel-server
CI/CDjenkins, gitlab-runner, drone, argo
Virtualizationqemu, libvirt, proxmox

Promotion Overrides

Each ruleset can configure:

  • Force promote — Package names to always promote, even without behavioral signals. Use for custom or internal software
  • Suppress — Package names to never promote, even if they match signals. Use to filter noisy or uninteresting software

Real-World Use Cases

Security Incident Response: "Which hosts have vulnerable package X?"

When a CVE is announced for a critical package, you need to know your exposure immediately.

Scenario: A critical vulnerability is disclosed in OpenSSL 3.0.x (CVE-2024-XXXX). Your security team needs to identify all affected hosts within minutes.

With OS collection:

  1. Search for os.software CIs with software_name = openssl — instantly see every version deployed across your fleet
  2. Click into each software CI to see the per-host instance list — which specific machines run the vulnerable version
  3. Check available_version on each instance to see if patches are already available in your package repositories
  4. Use the runs_on relationship from os.linux to trace back to the hosting VM and the physical infrastructure beneath it

Without OS collection: You'd need to SSH into each machine manually, or maintain a separate inventory tool, or hope your vulnerability scanner has recent data.


Patch Compliance: "How many hosts have pending security updates?"

Scenario: Your compliance policy requires security patches within 30 days. You need a dashboard-ready view of patch status.

With OS collection:

  • Every os.linux CI tracks security_update_count — the number of pending security updates
  • The security_updates field lists each pending update with the available version
  • Filter the CI list for os.linux CIs where security_update_count > 0 to see non-compliant hosts
  • Track changes over time — Parascope's change history records when security updates appear and when they get applied

Certificate Expiration Monitoring: "Which certificates expire soon?"

Scenario: An expired TLS certificate causes a production outage. You need visibility into certificate lifetimes across your fleet.

With OS collection:

  • Every TLS certificate found on disk is tracked as an os.certificate CI with not_after (expiration date)
  • Certificates are deduplicated — a wildcard cert used on 10 hosts appears as one CI with 10 instance records showing the file paths
  • Sort certificates by expiration date to see what's expiring next
  • The instance records show exactly which hosts and file paths would be affected

Fleet Software Inventory: "What's running across our infrastructure?"

Scenario: During an architecture review, you need to understand the software landscape across 50+ Linux hosts.

With OS collection:

  • Browse os.software CIs to see every promoted software component across the fleet
  • Each software CI shows how many hosts run it (via instance count)
  • Filter by promotion reason to focus on network services (listening_port), active daemons (active_daemon), or known infrastructure (known_pattern)
  • Identify version sprawl — the same software at different versions across hosts
  • Discover unexpected services — the unpackaged_listener signal catches processes listening on ports without corresponding packages

Infrastructure Drift Detection: "Is this VM configured as expected?"

Scenario: A VM was rebuilt from a template, but something isn't working. You need to compare its current state against known-good configuration.

With OS collection:

  • Compare the current os.linux CI's configuration with its change history to see what changed
  • Check kernel version, installed packages, network configuration, and service states
  • The IP mismatch detection in OS enrichment flags cases where the OS-reported IP addresses differ from what the infrastructure platform expects
  • Compare two hosts by viewing their os.linux CIs side by side — same distribution? Same kernel? Same key packages?

Container Visibility: "What containers run on this host?"

Scenario: A host managed by cephadm is having performance issues. You need to see what containers are running on it.

With OS collection:

  • Each os.container CI shows the container's image, state, port bindings, and resource configuration
  • The runs_on relationship connects containers to their host os.linux CI
  • Ceph containers use stable naming (osd-2@hostname instead of ceph-28ea88f2-...-osd-2), so CI identity persists across cluster rebuilds
  • Distinguish Docker vs Podman containers via the runtime field

Network Service Mapping: "What's listening on the network?"

Scenario: You're auditing network exposure across your fleet and need to know every listening port and what process owns it.

With OS collection:

  • The listeners section captures every TCP and UDP listening socket with the owning process name
  • Promoted software CIs include their listening ports in the instance data
  • unpackaged_listener promotions flag processes with network sockets but no corresponding package — potential unauthorized services
  • Combine with network data (interfaces, routes, DNS) to build a complete picture of each host's network posture

Physical Connectivity: "What switch port is this host connected to?"

Scenario: A host is experiencing network issues and you need to identify the physical switch and port it's connected to for troubleshooting.

With OS collection:

  • The lldp section collects LLDP neighbor data from hosts running lldpd, showing the directly connected network switch and port
  • Each LLDP neighbor record includes the remote system name, chassis ID, port ID, and management IP
  • Cross-reference with SNMP LLDP data collected from switches to verify both sides of the physical link agree
  • LLDP neighbor changes are tracked in change history — a neighbor change means a physical cabling change
  • Field names are aligned with the SNMP LLDP collector format, enabling future cross-source correlation

Note: LLDP data requires lldpd to be installed on the target host. Hosts without lldpd will simply report an empty neighbors list.


Configuration

Collection Rulesets

Rulesets define what to collect from, how to reach it, and what to collect. You can create multiple rulesets for different parts of your infrastructure (e.g., production vs staging, different network zones).

Key ruleset settings:

SettingDescriptionDefault
NameUnique friendly name for the rulesetRequired
PlatformTarget OS platformlinux
EnabledWhether collection runs on scheduletrue
Target filtersCI types and filters to select targetsRequired
CredentialName of stored SSH credentialRequired
Collection intervalSeconds between scheduled collections86400 (24 hours)
SSH portPort for SSH connections22
Become methodPrivilege escalationsudo
ReachabilityTransport strategy (direct or jump host)direct
Section togglesEnable/disable individual collection sectionsAll enabled
Promotion overridesForce-promote or suppress specific packagesEmpty

Target Discovery

Targets are discovered dynamically by querying the Parascope API for CIs matching the ruleset's filter criteria. The collector extracts IP addresses from each CI's network data.

Example: Collect from all running Proxmox VMs

{
  "targets": {
    "mode": "whitelist",
    "ci_types": ["proxmox.vm"],
    "filters": [
      {"field": "config.status", "op": "eq", "value": "running"}
    ]
  }
}

Example: Collect from OpenStack instances in a specific project

{
  "targets": {
    "mode": "whitelist",
    "ci_types": ["openstack.instance"],
    "filters": [
      {"field": "config.status", "op": "eq", "value": "ACTIVE"},
      {"field": "scope_label", "op": "eq", "value": "production-cloud"}
    ]
  }
}

Credential Management

SSH credentials are stored securely in Parascope's credential store and referenced by name in rulesets.

Supported authentication methods:

  • SSH private key (RSA, Ed25519) — recommended
  • SSH password — for legacy systems

Resolution priority per target:

  1. Target-specific credential (if configured)
  2. Ruleset-level default credential
  3. Error (no credentials available)

Reachability Strategies

StrategyUse CaseConfiguration
DirectTargets reachable from the collector's networkDefault, no extra config needed
Jump hostTargets in isolated networks, behind a bastionConfigure gateway host and credentials

Jump host example:

{
  "reachability": {
    "strategy": "jump_host",
    "gateway": "10.20.0.1",
    "gateway_port": 22,
    "gateway_credential": "Bastion SSH Key"
  }
}

On-Demand Collection

In addition to scheduled collection, you can trigger immediate collection for a single target from the CI detail page. This is useful for:

  • Verifying a configuration change was applied
  • Getting fresh data before an incident investigation
  • Testing connectivity to a new target

OS Enrichment

When the OS collector gathers data from a host, it also enriches the parent infrastructure CI (the VM or bare metal node) with a summary of what's running inside. This enrichment appears in the parent CI's detail view.

Enrichment data includes:

FieldDescription
OS summaryDistribution and kernel version (e.g., "Ubuntu 22.04.3 LTS (kernel 5.15.0-91)")
CPU utilizationCurrent CPU usage percentage
Memory utilizationCurrent memory usage percentage
IP mismatchWhether OS-reported IPs differ from what the infrastructure platform expects
Last collectedWhen the OS data was last gathered

This gives you at-a-glance OS information directly on VM and node detail pages, without navigating to the os.linux CI.


Monitoring

The OS collector exposes Prometheus metrics for operational visibility:

MetricDescription
os_targets_collected_totalSuccessful collections per ruleset
os_targets_failed_totalFailed collections per ruleset
os_targets_discoveredTargets discovered per ruleset
collection_run_duration_secondsCollection timing (P50/P95/P99)
collector_source_health_stateCircuit breaker state (0=healthy, 1=degraded, 2=unhealthy, 3=circuit_open)

Circuit Breaker

Each ruleset has an independent circuit breaker that prevents unhealthy rulesets from blocking healthy ones:

StateMeaning
HealthyNormal operation
DegradedHigh latency detected (collections taking longer than 10s)
UnhealthyFailures detected, still retrying
Circuit open3 consecutive failures — skips collection for 60s before retrying