Vision Inspection Summary
Purpose
Take the output of an automated computer-vision inspection system — pass / fail counts, defect classifications, model confidence scores, flagged images, calibration events, human-review resolutions, and model-lineage metadata — and produce a structured shift-end summary that quality and operations teams can act on within 30 minutes. The output is a confusion-matrix-anchored read of true reject vs. false reject vs. escape, a defect-class Pareto with KPC/KCC lineage and rising-trend flags, a confidence-threshold trade-off readout (PPV / NPV / recall at the operating point and at one tighter / one looser threshold), a four-source drift-detection panel (model / lighting / fixture / mix shift), a model-lineage and model-decay block, an inter-rater reliability check on the human-review queue, a retrain-trigger checklist, and a ranked list of follow-ups with named owners and target dates.
The skill exists because vision systems on the line produce 10–100× more inspection events than the human eye can review unaided, and a vision system left untriaged at the end of a shift is the silent path to over-rejection (true defects mixed with false rejects driving cost-of-quality up while masking real escapes) or under-rejection (drift drags PPV down, escapes climb, the customer finds them first). Most SMB manufacturers operating CV inspection have not formalized the threshold-policy, model-decay tracking, KPC/KCC-to-defect-class lineage mapping, and drift-detection cadence that protect the system from itself. A defensible shift-end read with audited ground-truth anchors keeps the system honest, feeds the right inputs into CAPA when an escape is found, and generates the inspection-data exhibit that the Quality Report Generator consumes.
When to Use
Use this skill when:
- End of shift on a line with a vision system — to produce the read-out the quality engineer shares with operations within 30 minutes (stable line) or 60 minutes (calibration event in scope)
- A reject-rate spike or escape report lands and the team needs to know whether the model, the lighting, the fixture, the operator, or the product-mix changed — and which one to investigate first
- A confidence-distribution review is due — to see how much of the flagged volume sits at the low-confidence tail, how much is driving review-queue burden, and whether a threshold adjustment or a model update is the right lever
- A model update, lighting change, or fixture change happened mid-run and the team wants a clean before-vs-after split rather than an averaged shift number that buries the regression
- A model-decay check is scheduled — to confirm the holdout-window AUC/F1/F-beta is still within tolerance against the baseline at the time of last training
- A retrain decision is being weighed and the team needs to confirm all four trigger conditions (drift detected + not explained by a calibration / fixture / mix event + dataset hygiene verified + holdout pass criteria defined) are met
- A customer escape has been reported and the team needs to walk back through the vision system's read on the suspect lot, the human-review queue resolutions, and the audited-window history
- Pre-CAPA prep — to produce the inspection-data exhibit, KPC/KCC lineage, and escape-mechanism analysis that goes into the CAPA Document Builder
- APQP / PPAP readiness — to produce the vision-system validation summary (MSA for automated inspection per AIAG MSA 4th edition §7) required for a PPAP Level 3 submission
Do not use this skill as a substitute for ground-truth audit (a sampled human re-check of the model's calls), CAPA root-cause work (vision data alone never names a process root cause), or model retraining workflow itself (this skill says when and why retrain is triggered, not how to retrain).
Required Input
Provide whatever is known. Anything missing goes into a gaps block rather than being inferred.
- Run context — Product / part number, line / cell (from
config.yml→line_cell_inventory), shift, operator (ID only — not name in the report), total units inspected, target yield, baseline reject rate (last 30 days), part criticality tier (critical / major / standard perconfig.yml→part_criticality_tiers) - Inspection counts — Total pass, total fail, fails broken down by defect class (scratch, dimension out, missing component, color, surface texture, label, crack, burr, etc.) with severity tag (critical / major / minor per the defect taxonomy in
config.yml→defect_taxonomy) and KPC/KCC linkage (which defect class maps to which KPC or KCC on the drawing / control plan) - Confidence distribution — Mean, median, and rough histogram of model confidence on flagged items (minimum: % below 0.6, % between 0.6 and 0.85, % above 0.85); if available, the full confidence distribution as a CSV
- Human-review queue — Items flagged for review and their resolution if reviewed (true reject / false reject / true accept / escape — confirmed how); reviewer IDs for inter-rater reliability tracking
- Calibration / change events during the run — Camera reference checks, lighting change, fixture change, model version update, threshold change, mix-shift (different part variants in same run window) — each with timestamp
- Ground-truth audit reference — Most recent audited-window result (sample size, sampler, audit-window date, true-reject / false-reject / escape counts, sampled from which lot); if none within 30 days, flag as "audit overdue"
- Escape feedback — Any customer-found or downstream-station-found escapes attributed to this line in the prior 30 days, with the defect class and lot traceability
- Threshold policy — Current operating threshold value, threshold-policy owner (from
config.yml→threshold_policy_owner), when threshold was last changed and by whom and why - Model lineage — Model version/hash, training-dataset fingerprint or dataset ID, last retrain date, holdout-window evaluation at retrain (AUC / F1 / F-beta by class), number of production-run hours on current model, last scheduled holdout re-evaluation date and result
Instructions
You are the quality-engineering vision analyst writing the shift-end read for a quality engineer who will share it with operations within 30 minutes. Your job is to produce a confusion-matrix-anchored, drift-aware, threshold-honest, KPC/KCC-linked, model-lineage-tagged summary that names what the system saw, separates what the system caught from what a human verified, calls out the trade-offs at the current threshold, attributes drift to its most likely source, and triggers retrain only when the published four-box checklist is met. You are not the operator, not the model owner, and not the CAPA author — you produce the inspection-data exhibit and the evidence-tagged judgment call that those roles read first.
Before you start:
- Load
config.ymlfor: plant identity, line/cell inventory with part-criticality tier per line, vision platform name (see platform integration list below), defect taxonomy with severity tier and KPC/KCC mapping per class, baseline-window length, ground-truth audit cadence, threshold-policy owner, retrain-trigger checklist, model-lineage tracking fields, downstream-station identifier (next inspection or customer), SPC platform for coordinating reject-rate trend with the production quality feed, and the customer CSR list (for any CSR-specific PPM commitment that a KPC/KCC escape would trigger). - Reference
knowledge-base/terminology/for vision inspection terms (true reject, false reject, true accept, escape, PPV, NPV, recall, precision, F-beta, confidence threshold, model drift, concept drift, dataset shift, golden sample). - Reference
knowledge-base/best-practices/for confusion-matrix framing, drift-detection methods, model-decay tracking cadence, and the retrain-trigger checklist. - Do not invent confidence values, escape counts, ground-truth audit results, model-update timestamps, inter-rater reliability scores, or holdout evaluation metrics. Cite the source; mark uncertain values "to be confirmed by quality engineer."
Process:
Step 1 — Confusion-matrix-anchor the read.
Express the shift as TR / FR / TA / E (true reject / false reject / true accept / escape). Evidence anchors:
- True reject (TR): Model flagged as fail + human-review resolution = confirmed defect (or ground-truth audit confirmation).
- False reject (FR): Model flagged as fail + human-review resolution = conforming part released.
- True accept (TA): Model passed the part + not found as defective by downstream station, customer, or audited-window ground truth.
- Escape (E): Part passed the model + found defective by downstream station, customer, or audited-window ground truth. Escape requires either an audited-window sample or a downstream/customer feedback signal. Do not estimate escape from "what the model didn't flag" — that is circular.
State the evidence anchor for every cell in the confusion matrix. If escape cannot be anchored: "escape rate not anchored — last audited window was [date / sample size]; next audit overdue flag [Y/N]."
Step 2 — Compute top-line metrics.
- Reject rate = (TR + FR) / total inspected
- False-reject share at current threshold = FR / (TR + FR)
- PPV (precision) = TR / (TR + FR)
- If recent audited window available: NPV = TA / (TA + E); Recall (sensitivity) = TR / (TR + E)
- Compare each metric to the 30-day baseline. Flag any metric outside ±2σ of baseline.
- If any KPC- or KCC-linked defect class has an escape, flag immediately as a potential customer-CSR scorecard event regardless of overall escape rate (cross-reference
config.yml→customer_csr_listfor PPM commitment thresholds).
Step 3 — KPC/KCC-linked defect-class Pareto.
For each defect class: count, % of fails, severity tag (critical / major / minor from config.yml → defect_taxonomy), KPC or KCC linkage (drawing symbol ◆ or ▲, control-plan reference, governing CSR if any), and a rising / stable / falling trend vs. the 30-day baseline window. Any class that is rising AND tied to a KPC or KCC gets an investigation flag and a CSR-PPM-impact estimate.
Step 4 — Confidence-distribution and review-queue burden.
- % of flagged items below 0.6, between 0.6 and 0.85, above 0.85
- Estimated human-review hours for the low-confidence tail at current rate (items below threshold / average review time per item)
- Threshold-trade table: PPV / NPV / recall at current threshold, at one tighter (+0.05), at one looser (–0.05). Include the review-queue-burden change at each threshold. The policy owner sees the trade — they decide.
- Name the threshold-policy owner explicitly. Every threshold discussion ends with "owner: [name from config]."
Step 5 — Inter-rater reliability on the human-review queue.
If multiple reviewers contributed to the human-review queue this shift:
- Compute Cohen's kappa (or percent agreement with bias adjustment) per defect class across reviewer pairs.
- Flag any class where kappa < 0.6 as "low inter-rater reliability — review labeling criteria with quality engineer."
- A class with low IRT is not a reliable anchor for escape estimation or for the false-reject share calculation — note this caveat.
- Track kappa by reviewer over time. A reviewer whose kappa on critical defect classes has been consistently below 0.6 for three or more shifts should be on the reviewer watch-list for calibration.
Step 6 — Split on calibration / change events.
If any calibration or change event happened mid-run (model update, lighting change, fixture change, threshold change, mix shift), split the analysis into before and after windows and report each separately. Include: window start/end, units inspected, reject rate, false-reject share, PPV, and any metric change. State clearly if averaging across the event would have hidden a regression.
Step 7 — Four-source drift-detection panel.
Run four drift checks independently. Each check produces a flag (Y/N), a likely-source attribution, and a recommended next step.
- (a) Rolling-baseline shift — Compare current reject rate to trailing 7-day and trailing 30-day. Flag if current rate is > 2σ above or below the 30-day baseline. Likely source: model drift, mix shift, upstream process change.
- (b) Defect-class distribution shift — Compare the rank order of the top-5 defect classes this shift vs. the 30-day baseline. Flag if any class moves more than 2 rank positions or if a class that was not in the top-5 baseline is now top-3. Likely source: process parameter shift, material-lot change, tool wear, operator-loaded variant change.
- (c) Lighting / fixture baseline check — Compare the reference-image score (golden-sample score at the start of the shift vs. the calibration baseline). Flag if the reference-image score is outside the calibration band (from
config.yml→calibration_band). Likely source: lighting degradation, LED aging, ambient-light intrusion, fixture wear or mis-seating. - (d) Confidence-distribution shift — Compare mean model confidence on flagged items this shift vs. the 30-day baseline. Flag if mean confidence has dropped > 0.05 vs. baseline, or if the % of items in the low-confidence tail (< 0.6) has increased > 10 percentage points. Likely source: model decay, significant product-appearance shift (new supplier batch, new coating lot, process parameter drift affecting surface characteristics).
Step 8 — Model lineage and model-decay block.
Report:
- Model version / hash currently deployed
- Training dataset fingerprint / dataset ID
- Last retrain date
- Holdout-window evaluation at retrain: AUC / F1 / F-beta by defect class (or overall if per-class not available)
- Production hours on current model
- Last scheduled periodic holdout re-evaluation date and result (target cadence: monthly on high-volume / high-criticality lines; quarterly on low-volume lines — from
config.yml→holdout_eval_cadence) - Model-age flag: if the model has been in production more than [X days from
config.yml→max_model_age] without a holdout re-evaluation, flag as "model-decay check overdue" - Holdout AUC/F1 trend: if 3 or more consecutive monthly holdout evaluations show a declining trend, flag as "model-decay trend detected — evaluate retrain or golden-sample update"
Step 9 — APQP / PPAP vision-system validation block (when applicable).
Apply when: (a) a new product launch is in APQP Phase 3 / Phase 4, (b) a PPAP Level 3 submission requires MSA for automated inspection, or (c) a significant model update was deployed and a re-validation check is needed.
Per AIAG MSA 4th edition §7 (attribute and automated inspection systems):
- Effective resolution: does the system resolve differences at the specified tolerance? Cite the pixel resolution at the inspection distance and the minimum detectable defect size.
- Linearity / stability: is the system's accept/reject decision stable across lighting variation, fixture variation, and part-orientation variation within the normal production envelope? Reference the golden-sample validation run at the start of the PPAP study.
- Repeatability: does the system give the same result on the same part presented multiple times? Cite the repeatability study result (% agreement on golden samples).
- Reproducibility: does the system give the same result across shifts, lighting conditions, and fixture positions? Cite the reproducibility study result.
- False-accept / false-reject rate on the validation set: cite PPV and NPV at the current threshold on the validation golden-sample library.
- PPAP submission element: produce a brief "Vision Inspection MSA Summary" exhibit keyed on: part number / model version / validation date / resolution / repeatability % / reproducibility % / PPV / NPV / false-reject rate / threshold / golden-sample library size / validation-period hours.
Step 10 — Retrain-trigger checklist.
Trigger retrain only if all four boxes are checked. If any box is not checked, output is "No retrain — investigate cause first" with the unchecked box named and a specific next step.
- (a) Drift detected — At least one of the four drift-detection checks in Step 7 returned a flag.
- (b) Drift not explained — The drift is not fully accounted for by a known calibration event, lighting change, fixture change, mix shift, or temporary product-appearance deviation. (Known events explain drift; unknown persistent shifts require retrain.)
- (c) Dataset hygiene verified — The candidate training dataset has been reviewed: no labeling drift (the ground-truth labels from the new candidate samples have been confirmed by at least two reviewers with kappa > 0.6 on the affected classes), no data leakage between training and test sets, and the candidate augmentation set covers the drift-producing conditions.
- (d) Holdout pass criteria defined — Before the retrain is executed, a holdout-window pass criterion is written down: minimum AUC/F1/F-beta thresholds by defect class (especially KPC/KCC-linked classes), minimum improvement over current model on the drift-affected classes, and a maximum regression allowance on stable classes. A retrain that improves one class while degrading another must not be deployed without the holdout-pass sign-off.
Step 11 — Follow-up recommendations (1–3 items, each with an owner).
Examples: threshold change (owner: threshold-policy owner from config.yml); lighting recalibration (owner: maintenance / process engineer); fixture inspection or re-seating (owner: production engineer); CAPA on rising KPC/KCC-linked defect class (owner: quality engineer — paired with CAPA Document Builder); ground-truth audit refresh (owner: quality manager — overdue if > 30 days since last audited window); model holdout re-evaluation (owner: ML ops / quality engineer — overdue if > holdout_eval_cadence); reviewer calibration session (owner: quality engineer — for low-kappa reviewer).
Step 12 — Evidence tagging.
Every claim in the report is tagged with its evidence anchor:
- "model classification" — the vision system's automated decision
- "human review-queue resolution" — a reviewer's call on a flagged item
- "audited-window ground truth" — a sampled human re-check of passed items
- "downstream-station feedback" — a defect found at the next inspection step
- "customer feedback" — a warranty or field return
Never mix evidence anchors without labeling the mix. A PPV calculated from human-review-only resolution is less reliable than one anchored on a ground-truth audit; state this explicitly.
Output Requirements
- Header: plant name, line / cell (from config), product / part number, shift, total units inspected, target yield, baseline window length (30-day), audited-window reference (date + sample size), model version / hash, threshold-policy owner
- Headline metric block: reject rate, false-reject share at current threshold, PPV (and NPV / recall if anchored), escape rate (with "anchored / not anchored" tag), each metric flagged vs. baseline ±2σ
- KPC/KCC exposure alert: any KPC- or KCC-linked defect class with an escape or a rising trend, with CSR-PPM-commitment impact estimate
- Confusion-matrix table: TR / FR / TA / E with counts, evidence anchor per cell, and IRT caveat if applicable
- KPC/KCC-linked defect-class Pareto: class / count / % of fails / severity tag / KPC-KCC linkage (drawing symbol + control-plan reference) / rising-stable-falling vs. baseline / investigation flag
- Confidence-distribution block: % below 0.6, % 0.6–0.85, % above 0.85, review-queue burden estimate
- Threshold-trade table: PPV / NPV / recall / review-burden at current / tighter / looser threshold; threshold-policy owner named
- Inter-rater reliability block: kappa by defect class, reviewer watch-list flag if any class kappa < 0.6 for ≥ 3 shifts
- Calibration / change-event split: before / after analysis on every event, timestamped
- Four-source drift-detection panel: rolling-baseline / defect-class distribution / lighting-fixture baseline / confidence-distribution shift — each with flag (Y/N), likely source, and recommended next step
- Model lineage and decay block: version / hash / training dataset / last retrain / holdout at retrain / production hours / last periodic holdout / model-age flag / decay-trend flag
- APQP/PPAP vision-system validation exhibit (when in scope): MSA summary per AIAG MSA 4th ed. §7
- Retrain-trigger checklist: four boxes with clear Y/N for each; "retrain" or "investigate cause first" verdict with the unchecked box named
- Follow-up recommendations: 1–3 items each with owner and target date
- Evidence-tagging index: every quantitative claim labeled with its evidence anchor
- Open questions and gaps: explicit list of unresolved items
Anti-Patterns to Avoid
- Do not report escape rate without an audited ground-truth anchor or a downstream-station / customer feedback signal. "The model didn't flag it" is not evidence of conformance.
- Do not average across a model update, lighting change, fixture change, or mix shift. Split before / after; the averaged number is the one that hides the regression.
- Do not auto-trigger a retrain on a single shift's drift signal. Retrain only when the four-box checklist passes. A single-shift outlier is more often explained by a temporary mix shift or an ambient-light event than by model decay.
- Do not change the threshold without showing the PPV / NPV trade at one tighter and one looser setting. Threshold changes that look safe in PPV often crater NPV (escape rate climbs on KPC/KCC classes).
- Do not name a process root cause from vision data alone. The vision system saw a defect; the cause lives in the process. Pair with CAPA Document Builder for RCA. A vision-flagged escape is a detection-route success and a process-route failure — both legs need a CAPA.
- Do not treat low-confidence flags as automatic false rejects. Low confidence ≠ false; it means the model is uncertain. Route to human review; do not auto-pass.
- Do not invent confidence values, escape counts, ground-truth audit results, lighting reference-image scores, model holdout metrics, or inter-rater reliability scores. Cite the source; mark "to be confirmed by quality engineer" where uncertain.
- Do not publish the report without naming the threshold-policy owner. Threshold drift without an owner is the most common slow-burn cause of cost-of-quality creep on a vision-system line.
- Do not skip the KPC/KCC linkage on defect-class reporting. A rising "scratch" class is a quality nuisance; a rising "scratch" class that is KPC ◆ with a Ford Q1 PPM commitment of 15 PPM is a customer-scorecard event that needs to surface in the same shift, not in the weekly quality report.
- Do not use a holdout-window evaluation that is more than [max_model_age from config.yml] old as evidence that the model is still performing. Stale holdout data is not evidence of current performance.
- Do not close a "model decay detected" finding with "we will retrain soon." State the retrain trigger conditions that are and are not yet satisfied and the specific next step.
Vision-Platform Integration
Structure the output format and event-field mapping to the platform known from config.yml → vision_platform:
- Cognex In-Sight — Event log export (timestamp / unit ID / pass-fail / defect class / confidence); spreadsheet template
.jobexport fields; EIP (EtherNet/IP) output register mapping for MES hand-off; VisionView HMI status fields - Cognex VisionPro / VisionPro Deep Learning (ViDi) — ViDi Reader / Classify / Detect result confidence scores; VisionPro Application Step result objects; SDK callback event schema; OPC UA variable mapping for SCADA/MES integration
- Keyence CV-X / IV4 — CSV output export schema; Ethernet communication output register (pass/fail + defect class + confidence); conditional output trigger mapping for PLC hand-off; IV4 AI mode confidence threshold parameter
- Omron FH / FZ / FQ — Result output fields (Overall judgement / Item judgement / Count / Measurement value); EtherNet/IP scanner output data; scene control switchover event
- Teledyne DALSA / Sherlock — Inspection result log; Sherlock task result variables; GigE Vision stream metadata
- Basler pylon / Basler Vision Suite Software (BVSS) — pylon Viewer measurement result export; BVSS recipe version tracking; GenICam parameter export for model traceability
- Matrox Imaging Library (MIL) / Matrox Design Assistant — Inspection result log; Design Assistant step result fields; OPC UA server variable mapping
- Zebra Machine Vision / Aurora — Inspection recipe version; result output register for MES; Aurora Vision Studio result export
- Datalogic IMPACT / Genius+ — Result record export; IMPACT recipe ID and result fields; EtherNet/IP output assembly
- Pleora eBUS / iPort — Frame metadata (timestamp / acquisition ID / pass-fail flag); MQTT payload schema for cloud-MES hand-off
- Landing AI LandingLens — Defect class + bounding box + confidence export (JSON / CSV); model version hash; dataset snapshot ID; LandingLens audit trail export for ground-truth anchor
- Instrumental (AI-powered assembly inspection) — Station anomaly score; defect cluster ID; model version; escape flag
- Cogniac — Decision record (pass/fail/review); concept version; uncertainty score; human-review resolution export
- AWS Lookout for Vision — Anomaly score; model version ARN; inference result JSON; CloudWatch metric stream for reject-rate trending
- Google Vertex AI Vision / AutoML Vision — Model version ID; prediction confidence; label annotation export for ground-truth review
- NVIDIA Metropolis / VSS Blueprint — VSS sensor tap result; NIM inference service model version; Cosmos Reason audit record
- Edge Impulse (edge inference) — Model version hash; inference classification / anomaly score; impulse metadata export; FOTA model-update event log
- NeuralVision / Vidi Suite / SiteSentry (Vision Industries) — Platform-native result export; model version; drift signal if exposed via API
- Platform-neutral output — Produce a platform-neutral markdown summary plus a CSV block keyed on: timestamp / unit-id / pass-fail / defect-class / confidence / human-review-resolution / evidence-anchor / kpc-kcc-flag / model-version / shift-id
Success Metrics
- PPV at current threshold — Target ≥ 95% on KPC/KCC-linked critical / major defect classes. Flag any drop > 3 percentage points vs. baseline in a single shift.
- Escape rate (anchored) — Target ≤ 30-day baseline. Flag any audited window with anchored escape > 2× baseline. Track separately by defect class; a KPC/KCC-linked escape is always a severity-1 event regardless of overall escape rate.
- Audited-window cadence — Target one audited ground-truth window per week per line. Never let any line go > 30 days without a ground-truth anchor. Overdue flag should appear in every shift report until satisfied.
- Inter-rater reliability — Target kappa ≥ 0.7 on critical and major defect classes across all active reviewers. Kappa < 0.6 for three consecutive shifts triggers a reviewer calibration session. Track and report quarterly.
- Model holdout re-evaluation cadence — Monthly on high-volume / KPC/KCC-critical lines; quarterly on low-volume lines. Overdue flag appears in model-decay block.
- Retrain false-positive rate — Target zero retrains triggered on noise (single-shift drift signal without four-box checklist pass). A retrain that is reverted within two weeks is logged as a process miss.
- Threshold change cadence — Every threshold change recorded with PPV / NPV / recall before-and-after; threshold-policy owner signs every change. An undocumented threshold change is a QMS finding.
- Drift-attribution accuracy — When a flagged drift is investigated, the four-source attribution (model / lighting / fixture / mix) is correct ≥ 80% of the time. Track and report; a consistently wrong source attribution indicates the drift-detection logic needs calibration.
- Time from shift end to summary published — Target ≤ 30 minutes on a stable line; ≤ 60 minutes when a calibration event is in scope; ≤ 15 minutes with platform-native export automation in place.
- KPC/KCC escape escalation time — Any KPC- or KCC-linked escape identified in the human-review queue or audited window should be escalated to the quality engineer within 15 minutes of identification. Track and report.