Machine Vision

Physics-Based Machine Vision

Our early research was motivated by three questions that challenged the technological and economic paradigms of the 1980s:

Cost & Accessibility: Why does a consumer-grade color video camera cost only US$1,500, while an industrial gray-level system costs US$50,000?
Computational Bottlenecks: Why does it take 0.5 seconds to compute the centroid of a small blob?
Human Emulation: Can machine vision emulate the human visual system’s reliability amid noise, occlusion, and lighting variations?

Breaking the RS-170 Barrier: From Analog Latency to Direct Digital Vision

Prior to the 1990s, machine vision was constrained by the RS-170 analog TV standard (30 fps), which required expensive, high-latency "frame-grabbers". Our research led to integrated architectures that eliminated these buffers, achieving frame-rates well beyond 30 fps and dramatically lowering costs.

As reported by Lee and Blenis (1994), this approach introduced total system integration, direct pixel processing, and decentralized real-time control, providing the first effective solution to sampling delays in vision-based motion control. This shifted the vision system's role from a qualitative picture-taker to a high-speed, measurement-grade quantitative sensor.

Direct Digital Vision: Integrated Sensing and Computation

To address the performance-cost disparity of the late 1980s, we developed a direct digital acquisition architecture that eliminated the need for PC-based frame grabbers and broke free from the restrictive RS-170 video standard.

Architectural Breakthroughs: This integrated approach (Lee and Blenis, 1994) introduced three key innovations that laid the foundation for modern mechatronic sensing:

Total System Integration: Imaging, lighting, digitizing, computing and control were combined into a single autonomous unit. This removed the overhead of TV-standard limitations and significantly lowered hardware and packaging costs.
Direct Pixel Processing: By processing pixel data computationally as they were acquired, we eliminated the requirement for full-frame buffering. This reduced acquisition latency and allowed the "brain" of the system to respond to visual cues in real-time.
Decentralized Real-Time Control: Computing controlled variables directly at the sensor level and eliminated host-bus data transfer delays. This provided the first effective solution to the long sampling delays that historically limited vision-based motion control.

From Architecture to Measurement

This shift to direct digital acquisition was not merely an upgrade in speed; it was a shift in purpose. By bypassing analog conversion, we ensured photometric linearity and physics-consistent image formation. This architectural foundation allowed us to treat the vision system as a calibrated measurement instrument, setting the stage for the specific research thrusts in Retroreflective Sensing and Artificial Color Contrast.

Human vs. Machine Vision: The Fundamental Distinction

At the core, the difference between human visual perception and machine vision lies in purpose:

Human Vision (Qualitative): Supported by cognition, the eye–brain system excels at pattern recognition, contextual inference, and relative judgment. Consumer cameras produce visually pleasing images with color fidelity, contrast, and dynamic range tuned for human interpretation.
Machine Vision (Quantitative): Robots require metric, traceable measurements—sub-millimeter positions, dimensions, edges, and forces.

Machine vision must therefore behave as a calibrated measurement instrument, requiring photometric linearity, controlled illumination, and physics-consistent image formation. This fundamental distinction guided our development of a physics-based imaging framework, in which image formation, illumination, and computation are co-designed to transform visual data into reliable, actionable physical measurements.

Thrust 1: Physically-Accurate Machine Perception

We leverage the physical laws of illumination and reflectance to generate high-contrast, measurement-grade images for real-time robotic control. To achieve this, we developed two distinct yet complementary methods:

Retroreflective Sensing: This method exploits the physical laws of directional gain and coaxial reflectance to isolate engineered landmarks or components from ambient noise.
Artificial Color Contrast (ACC): This approach exploits the physical laws of spectral reflectance and human physiology to differentiate natural and biological subjects that overlap in conventional color spaces.

Experimental verification of these methods is presented as Case Studies to demonstrate the transition from theory to real-world significance.

1.1 Retroreflective Vision Sensing: Exploiting Directional Gain with Coaxial Optics

For engineered environments, we developed a retroreflective sensing framework that serves as a robust alternative to high-intensity strobes (Figure MV1 and Table MV1):

Extreme Directional Gain: Within a 0.2°observation angle, retroreflective intensity exceeds flat-white surfaces by 250x.
Ambient Noise Rejection: The signal is 250× stronger than diffuse reflections, making ambient light negligible.
Collocated illumination: Placing the source and camera on the same axis (α≈0), produces sharp, dark silhouettes of non-retroreflective objects.

Figure MV1. Retroreflective Vision Sensing: Principle and Demonstration

(a) Within an observation angle of approximately 0.2 °, the reflected intensity from a retroreflective surface exceeds that of a flat white surface by 250×, demonstrating its extreme directional gain.

(b, c) For cameras equipped with CID imaging sensors exhibiting peak responsivity near 600 nm, LEDs centered at approximately 650 nm are selected to maximize retroreflective return while maintaining strong signal-to-noise characteristics.

(d) Two representative examples (Top) illustrate the contrast enhancement achieved through retroreflective sensing.

Middle: Grayscale images obtained using conventional illumination without structured retroreflective backgrounds.

Bottom: Images captured using retroreflective sensing technique.

Example 1(left): Machine components.

Example 2(right): Parts marked with retroreflective landmarks on low-cost totes, pallets, or kitting trays. ]

In both cases, retroreflective sensing produces high-contrast, physically accurate images even under low-cost imaging hardware.

Table MV1: Comparison of Conventional vs. Retroreflective Sensing

Feature

Illumination

Surface Interaction

Image Contrast

Computation

Hardware Cost

Data Role

Conventional Machine Vision

High-intensity (e.g., Xenon flashtubes)

Object-dependent/diffuse reflectance

Sensitive to ambient lighting/noise

High (complex edge/blob detection)

High (specialized lighting and optics)

Qualitative image interpretation

Retroreflective Vision Sensing

Low-intensity collocated LEDs

Structured surface reflectance

Intrinsic high contrast; noise-invariant

Low (simple thresholding)

Low (simple hardware/packaging)

Quantitative spatial measurement

1.2 Artificial Color Contrast (ACC): Exploiting Spectral Physics

In automated visual inspection, particularly within the food-processing sector, non-target features often overlap significantly with targets in conventional RGB color space. While human inspectors resolve these ambiguities through contextual perception (Question #3), machine vision systems frequently interpret this “unrelated overlap” as valid targets, leading to false detections.

For biological subjects where conventional RGB segmentation fails, we developed an ACC prefilter inspired by the Human Visual System (HVS). As illustrated in Figure CMV1, ACC integrates principles of trichromatic color representation and opponent-process theory to restructure color space in a physiologically meaningful manner:

Mechanism: Uses center–surround receptive fields to re-encode chromatic information and enhance inter-class separation.
Performance: ACC suppresses low-frequency background variations and biological texture noise.
Result: Supports robust “measurement-grade localization” without reliance on expensive hardware or heuristic threshold tuning.

Figure CMV1. Artificial Color Contrast (ACC) Inspired by the Human Visual System.

The theoretical and experimental foundations of ACC are illustrated through the following components:

(a) Opponent Color Theory: Schematic of Hering’s antagonistic color channels (Red-Green, Blue-Yellow) that form the basis of human color perception.

(b) Receptive Field Structure: Quantitative modeling of ON-center and OFF-center mechanisms (Kuffler, 1993) using a Difference-of-Gaussians (DoG) formulation to organize center–surround spatial responses.

(c) Edge Enhancement: Illustration of simultaneous edge sharpening and surround smoothing, consistent with cat retinal ganglion cell (RGC) responses that preserve global contrast (Enroth-Cugell & Robson, 1984).

(d) Feature Separation: Synthetic test images containing visually similar red features, demonstrating the dramatic redistribution of pixel clusters when transformed from RGB to ACC space.

(e, f) Experimental Validation: Application to poultry breast meat inspection. In RGB space, blood stains near the fan bone frequently cause false detections. In ACC space, the overlap between these pixel distributions is significantly reduced, increasing the cluster separation and improving detection robustness.

Why Artificial Color Contrast (ACC) Works:

ACC is effective because it emulates early stages of the human visual system, where center–surround receptive fields and opponent-color processing enhance contrast while suppressing irrelevant background variations.

By restructuring the color space through Difference-of-Gaussians–like transformations and antagonistic color channels, ACC increases separability between target and non-target features, reduces sensitivity to illumination and texture noise, and converts visually ambiguous overlaps in RGB space into distinct, quantifiable clusters.

This physiology-inspired, computation-efficient encoding supports robust, real-time feature localization without reliance on expensive hardware or heuristic threshold tuning.

1.3 Experimental Verification and Industrial Significance

The technical frameworks developed in Research Thrust 1 have been experimentally validated across multiple industrial and agricultural scenarios, demonstrating that physics-based machine vision can deliver the stability and repeatability required for closed-loop robotic control rather than merely qualitative inspection.

Case Study A: Engineered Environments

Verified using machine and assembly components, retroreflective sensing transforms vision from a heuristic task into a predictable, measurement-grade modality.

Methodology: Low-cost, collocated LEDs (Table MV2) generate strong directional optical gain for rapid centroid and pose extraction.
Significance: This confirmed that vision-based motion control can be as stable as traditional encoder-based feedback.

Table MV2 Comparison between three spectral light sources

Sources

LED

Laser Diode

Xenon Flashtube

Wavelength (nm)

650

790-840

830-1000

Unit Cost (US$)

1.00

200.00

10.00

Life (hours)

5, 000,000

250,000

1,000,000 flashes (0.3-4 flashes/s)

Power (W)

0.1

1.0

25.0 (500V nominal)

References:

Lee & Li, (Journal of Robotic Systems, 1991)— Provided foundational principles of collocated MV sensing for 2-D part localization and blob-based measurement.
Lee (ASME Journal of Engineering for Industry, 1994)—Developed theoretical framework for LED-based retroreflective vision architectures.
Parker & Lee (ASME Journal of Manufacturing Science and Engineering, 1999)—Introduced physically accurate synthetic image generation methods for validating and tuning real-world vision systems.

Case Study B: Natural and Industrial Environments

Validated using the ACC and ACC–PCA framework across agricultural and food-processing sectors.

Outdoor Perception: Extended to autonomous tea-leaf harvesting, maintaining performance under highly variable natural lighting.
Industrial Processing: Applied to high-throughput poultry meat segmentation, mitigating spectral overlap between adjacent tissues.
Significance: Shifts machine vision from fragile RGB thresholding to a structured, physiology-inspired representation.

B1: Natural Environments — ACC for Pose-Robust Biological Feature Localization

As summarized in Figures CMV1 and CMV2, ACC transformation was evaluated on poultry imagery containing heavy background clutter and significant pose variation.

Figure CMV2. ACC effects on Color Space:

Top: original RGB images and corresponding color-space distributions.

Bottom: ACC-transformed images and color-space mappings.

(a, b) Red Comb Localization in Noisy Background.

(c, d) Forward- and Backward-Facing Birds.

Figure CMV2 demonstrates the effectiveness of Artificial Color Contrast (ACC) transformation in both color-space separation and pose-robust feature localization. In the original RGB representation, comb pixels significantly overlap with feathers and background textures, resulting in ambiguous thresholds and unstable segmentation; after ACC transformation, the comb pixels collapse into a compact, well-separated cluster that supports stable, noise-resilient “measurement-grade localization”. Moreover, even under substantial pose and lighting variations—such as forward- and backward-facing birds—ACC preserves discriminative color features while suppressing irrelevant variations, yielding consistent segmentation and improved orientation-invariant feature identification.

Methodology: ACC restructures RGB color space using opponent-process and center–surround principles, collapsing target pixels (e.g., the red comb) into compact clusters while dispersing irrelevant textures such as feathers and background noise. This transformation shifts segmentation from heuristic threshold tuning to structured color-space separation.

Significance:

Noise-Resilient Segmentation: Substantial reduction of pixel overlap between target and background improves threshold stability.
Pose and Lighting Robustness: Discriminative color features are preserved under forward- and backward-facing orientations and variable illumination.
Computational Efficiency: The prefilter reduces downstream classification complexity by enhancing intrinsic separability in the image domain.

Reference: Lee, Qiang & Daley, IEEE Transactions on Automation Science and Engineering, 2007.

B2: Outdoor Agricultural Perception — ACC–PCA for Tender-Leaf Localization

Figure CMV3 illustrates the combined use of ACC and principal component analysis (PCA) for detecting tender tea leaves in visually ambiguous foliage.

Figure CMV3. ACC–PCA Effects on Color-Space Separation for Locating Tender-Leaf:

Top: Original RGB and Transformed-ACC images for locating tender tea leaves.

Bottom: Corresponding sample distribution in RGB space and ACC/PCA-transformed color space.

(a) Background Dominance:

(b) RGB Color Space

(d) ACC + PCA Characterization.

Figure CMV3 illustrates how ACC, followed by principal component analysis (PCA), enhances color-space separability for tender-leaf localization in visually ambiguous natural scenes. In the original RGB images, the field is dominated by mature leaves whose colors closely resemble tender leaves, producing strong visual ambiguity; correspondingly, sample distributions of 120 tender leaves (targets) and 360 old leaves (background noise) exhibit substantial overlaps that limit reliable threshold-based segmentation. After ACC transformation, the color distribution is reshaped to suppress background variability and increase inter-class separation in the image domain. Subsequent PCA further compacts the target clusters and enlarges class margins, as illustrated by Gaussian ellipses, yielding improved statistical separability and more robust “measurement-grade localization” and classification performance.

Methodology: ACC first reshapes the color distribution to suppress dominant background variability from mature leaves. PCA is then applied to compact target clusters and enlarge inter-class margins, enabling statistically separable representations even when raw RGB samples exhibit strong overlap.

Significance:

Enhanced Class Separability: Target (tender leaves) and background (mature leaves) distributions become distinguishable without exhaustive threshold tuning.
Statistical Robustness: Gaussian-ellipse characterization in ACC–PCA space supports reliable classification under natural variability.
Field Deployability: The approach maintains performance in outdoor, unstructured agricultural settings where lighting and surface reflectance are inherently unstable.

Reference: Lu, Huang & Lee, International Journal of Intelligent Robotics and Applications, 2021.

B3: Industrial Processing — ACC–PCA for Poultry Product Segmentation

Figure CMV4 presents validation of ACC–PCA in high-throughput poultry processing environments characterized by complex biological textures and spectral overlap among tissues.

Figure CMV4. ACC–PCA Effects on Color-Space Segmentation for Locating Poultry Meat Products:

RGB imagery (Top) and corresponding RGB vs. ACC–PCA color-space distributions (Bottom) demonstrating improved segmentation and localization of poultry component products

(a) Multiple segments against complex background requiring identification and localization.

(b) Sample distributions in RGB color space.

Figure CMV4 demonstrates how the combined ACC–PCA transformation improves color-space segmentation for poultry meat product localization. In the original RGB imagery, product components and surrounding tissues exhibit substantial spectral overlap, limiting the effectiveness of direct thresholding. The ACC transformation restructures color distributions to suppress biologically induced texture and illumination variability, while PCA further compacts target clusters and enlarges inter-class margins. The resulting representation yields statistically separable color clusters that support robust, high-throughput identification and spatial localization of component products in industrial processing environments.

Methodology: ACC suppresses biologically induced color and illumination variability, while PCA compacts product-specific clusters and increases inter-class margins in transformed color space. This dual transformation enables reliable segmentation and localization of multiple component products within a single frame.

Significance:

High-Throughput Reliability: Supports real-time segmentation and classification on fast production lines.
Reduced False Detection: Mitigates spectral overlap between adjacent tissues and background structures.
Scalable Industrial Integration: Enables stable, measurement-grade color segmentation without specialized lighting or extensive calibration.

Reference: Lu, Lee & Ji, IEEE/ASME Transactions on Mechatronics, 2023.

Together, these results establish a unified framework in which physics-guided image formation and physiology-inspired color encoding transform machine vision from heuristic image processing into a predictive measurement science.

Thrust 2: Model-Based Machine Vision Systems

Research Thrust 2 advances the AIMRL philosophy by integrating computer vision with rigorous physics-based models to systematically design vision systems for specific, well-defined problem classes.

Rather than treating vision as a purely data-driven task, this thrust embeds physical structure and geometric constraints directly into the perception pipeline for state estimation and reconstruction.

Two complementary approaches were developed:

Geometric Template Matching and Shape Reconstruction: Utilizing similar-triangle geometry and projective relationships to enforce geometric invariants for measurement-grade localization.
Thermal Physics-Based Infrared Machine Vision: Exploiting heat transfer laws to interpret thermal images as quantitative measurement fields rather than qualitative grayscale textures.

2.1 Geometric and Structural Shape Reconstruction

Our research demonstrates that the reconstruction of anatomical structures and the identification of industrial components are mathematically isomorphic tasks. Both are formulated as inverse geometric problems (Figure CMV5) where a parameterized physical model is optimized to match observed sensory data.

Figure CMV5. Flowchart illustrating a three-level CMV part-presentation strategy based on template -matching.

(Lu, Lee & Ji, 2023)

This framework allows us to replace appearance-driven correlation alone with geometry-constrained inference. In this method, correspondence is posed as an overdetermined least-squares problem using point-pairs to enforce geometric invariants. By embedding physical structure directly into the perception pipeline, we transform 2D visual data into measurement-grade 3D pose and contour estimates.

As illustrated in Figure TM1, the system minimizes the squared distances between matched point-pairs to estimate target pose parameters, including translation, rotation, and the target-to-template scale factor. This procedure yields a robust estimation of position and orientation suitable for high-precision model-based machine vision.

Figure TM1. Geometry-Constrained Triangle Matching for Model-based Machine Vision.

Top: Feature extraction based on constant curvature. ① Contour extracted from an ACC/PCA-filtered image. ② Curvature-based feature identification. ③ Centers of constant curvature arcs. ④ Template feature points and corresponding triangles.

Bottom left: Illustration of similarity-based template-matching using invariant triangle geometry.

Bottom right: Pseudo-inverse solution of the overdetermined least-squares system to estimate translation, rotation, and scale.

References

Engineered Objects: Lee & Janakiraman, SME Applied Machine Vision Conf., 1992.
Biological Products: Lu, Lee & Ji, IEEE/ASME Transactions on Mechatronics, 2023.

2.2 Thermal Physics-Based Infrared Machine Vision

We move beyond grayscale textures to treat IR imagery as a quantitative measurement field governed by the physical laws of heat generation, conduction, and dissipation.

Physics-Informed Normalization: Remove background thermal noise, allowing ambient temperatures to be naturally suppressed while isolating heat sources.
Thermal-Field-Based Modeling: The system interprets thermal images through thermodynamic models—such as lumped-parameter representations—to identify dominant heat-transfer paths and orientation-directional thermal gradients.
State Reconstruction: By embedding thermodynamic principles, we enable the quantitative estimation of motion, contact conditions, and tool health in unreliable visible-light environments.

In this framework, human motion detection and industrial tool monitoring are treated as isomorphic thermal field problems. By interpreting IR data through thermodynamic models, we enable the quantitative estimation of internal states—such as tool health or human motion—that remain invisible to traditional visible-light systems.

This method is illustrated in Figure IR1 in the context of Human Motion and Crowd Perception, where humans act as natural heat sources.

Figure IR1. Temperature Field-based Infrared Machine Vision: Principle and System Demonstration.

Top: Human head thermal model for locating face-orientation.

Left: IR image and corresponding head temperature field.
Middle: Lumped-parameter thermal model representing dominant heat-transfer paths.
Right: Computed isotherms and heat-flow streamlines illustrating orientation-dependent thermal gradients.

Bottom: System-level implementation for human motion and crowd perception. A public infrared camera captures temperature fields of surrounding humans; the normalized thermal data are processed and transmitted through an iSpace cloud-based server to support situational awareness for visually impaired (VIP) users.

Left: Illustrative RGB image of the scene.
Middle: Normalized IR image with background thermal components suppressed.
Right: Vertical projection profiles identify individual human subjects and motion states.

References

Human Motion and Crowd Perception: Jiang, Lee & Ji, Int. J. of Intelligent Robotics and Applications. 2017.
3D Steady State Tool Temperature Reconstruction: Ji, Huang & Lee, IEEE/ASME Trans. on Mechatronics, 2018

2.3 Experimental Verification and Representative Applications:

This section demonstrates the transition from first principles to real-world significance through rigorous state reconstruction.

A. Anatomical Shape Reconstruction: Human Spine Contour

This work extends the model-based template-matching framework to anatomical shape recovery, treating the human spine as a subject-specific mechanical structure (Figure TM2). As in TM1, the reconstruction is formulated as an inverse geometric problem. A measured back contour—extracted from sagittal-plane imagery between the neck and pelvis—serves as the target shape.

Non-Radiative Estimation Method: By modeling the spine as a serial chain of pin-connected triangles, we estimate sagittal curvature directly from measured external back contours.
Significance: This inverse geometric framework allows for the accurate recovery of inter-vertebral alignment and sagittal curvature without assuming parallelism or relying on medical imaging for calibration.

Figure TM2. Model-based Template Matching for Human Spine Contour Reconstruction.

Left: Geometric formulation and inverse reconstruction pipeline.

(a)Flowchart: Model-based identification procedure, from measured back-profile extraction to spine-curve reconstruction.

(b)Template Model: Spine represented as a sequence of pin-connected triangles with edge lengths (lmi, lmi1, lmi2) scaled by the subject-specific spine length Lm. Each triangle corresponds to a vertebra-equivalent segment, preserving relative anatomical proportions.

Right: Experimental Results.

(c)Validation: Comparison with X-ray images during standing and flexion postures, demonstrating accurate recovery of sagittal curvature and inter-vertebral alignment.

Why Model-Based Matching Matters

Model-based matching replaces appearance-driven correlation alone with physics- and geometry-constrained inference, enabling measurement-grade estimation rather than visual similarity. Although illustrated here using human spine reconstruction (Figure TM2), the same inverse geometric framework underlies template-based perception for engineered parts and food-processing applications (Figure TM1), demonstrating a unified, extensible methodology applicable across biological and industrial domains.

Reference: Ding, Jiang & Lee, IEEE Transactions on Instrumentation and Measurement, 2024.

B. Thermal Physics-Based Infrared Machine Vision: 3D Steady State Tool Temperature Reconstruction:

In metal cutting, accurate knowledge of the tool temperature distribution is essential for monitoring tool health, workpiece integrity, and process stability—particularly when machining hard-to-machine materials with low thermal conductivity. Owing to intense, localized heat generation at the micro-scale tool–chip interface, inferring the internal temperature field from surface measurements remains a well-known challenge.

To address this problem, we developed a hybrid macro–micro thermal modeling framework (Figure IR2) to infer internal temperature fields from surface-only measurements during machining.

Coupled Reconstruction: Micro-scale FEA simulations for heat generation at the tool-chip interface are coupled with macro-scale heat-transfer modeling guided by IR surface isotherms and temperature gradients.
Metrological Accuracy: This transforms infrared images from qualitative textures into quantitative temperature fields, enabling real-time assessment of tool health and thermal loading

Figure IR2. Macro-Micro Modeling and Infrared Imaging for 3D Steady-State Tool Temperature Reconstruction.

Top: Physics-Based Modeling Framework.

(a) Flowchart: Coupled macro-micro modeling and infrared-guided temperature field reconstruction.

(b) Reconstruction Setup: Schematics illustrating sensor-based thermal field estimation from surface IR data.

Bottom: Reconstruction Results.

(d, e) Validation: Comparison between reconstructed temperature fields with experimentally observed macro- and micro-scale behavior during orthogonal cutting.

Why Thermal Physics Matters

In infrared machine vision, temperature is not merely an image intensity—it is a physical state governed by heat generation, conduction, and dissipation. By embedding thermodynamic principles into the interpretation of infrared imagery, thermal vision transitions from qualitative visualization to quantitative sensing. This physics-based formulation enables reliable inference of motion, contact conditions, and tool health in environments where visible-light cues are unreliable or unavailable, providing a complementary, physics-grounded perception modality for model-based machine vision.

Reference:

3D Steady State Tool Temperature Reconstruction Ji, Huang & Lee, IEEE/ASME Trans. on Mechatronics, 2018.
Online Tool Temperature Monitoring: Lee, Huang, Ji & Lin, IEEE Trans. on Automation Science and Engineering, 2018.

Thrust 3: Intelligent Perception for Real-Time Autonomous Applications

Research Thrust 3 focuses on the "Integrated Intelligence" required for autonomous systems. To solve complex inverse perception problems under stringent real-time constraints. The emphasis is on combining physical insight with learning-based methods to achieve fast, reliable, and interpretable perception for autonomous operation.

3.1 Physics-Guided Neural Networks for Process Monitoring

To address thermal loading in dynamic environments, we developed hybrid neural frameworks that embed established physical laws directly into computational architectures.

Inverse Real-Time Estimation: ANNs are integrated with physics-guided constraints to enable online monitoring of 3D temperature fields during active machining.
Intelligent Calibration: Training with validated physics-based models from Thrust 2 ensures computational speed without sacrificing metrological fidelity.

The hybrid framework is illustrated in Figure IR3, where the temperature field surrounding the tool-chip interface in the presence of chip occlusion is reconstructed from IR thermal imagery to enable online estimation of the internal peak temperature of the cutting tool. The tool temperature field is partitioned into two regions:

1) a far-field region, used to estimate the heat-transfer coefficient between the tool and ambient temperature, and

2) a near-field region, where an ANN is trained to capture unknown and highly nonlinear heat variations at the frictional contact interface.

Figure IR3. Online Monitoring the Internal Peak Tool Temperature.

Top: Hybrid analytical-experimental framework for modeling the tool temperature field.

Middle: Near-field ANN accounting for unknown heat variations at the frictional tool–chip interface.

Bottom: Experimental results and validation.

(a)Overall hybrid framework.(b)Occluded temperature interpolation from surface IR measurements.

(c, d) Physics-guided design and training flowcharts of the of ANN architecture.

(e) Reconstructed temperature field showing FEA-simulated surface temperature and far-field isotherms.

(f) Experimental verification comparing ANN estimation, FEA results and physical measurements.

(g, h) IR image sequences captured during transient machining conditions..

This physics-guided learning paradigm enables autonomous systems to achieve real-time perception that is both computationally efficient and physically interpretable—bridging first-principles modeling and data-driven intelligence.

Why Physics-Guided ANN Matters

Physics-guided neural networks are most effective when the physical system is well understood, but key boundary conditions or heat sources are unknown, time-varying, or partially occluded. By embedding conservation laws and validating physical models into the network architecture and training process, the ANN learns only unmodeled residual physics, enabling fast inverse estimation without sacrificing physical consistency. This makes physics-guided ANN particularly suitable for near-field regions, such as tool–chip interfaces, where contact conditions and frictional heat generation cannot be measured directly.

Reference: Lee, Huang, Ji & Lin, IEEE Trans. on Automation Science and Engineering, 2018.

3.2 Modal Expansion for Dynamic Thermal Reconstruction

Addressing computational bottlenecks of high-fidelity thermal sensing, this research employs modal expansion for real-time reconstruction of complex temperature fields. Rather than directly solving high-dimensional heat-transfer problems, the approach leverages the underlying physics to reduce the problem to a compact, computationally efficient representation.

Formulation of Thermal States: The temperature field of thin-disk workpieces is formulated using a modal expansion derived from the governing heat-transfer equations. The field is expressed as a superposition of spatially distributed temperature mode shapes and time-varying modal coefficients.
Dimensionality Reduction: By retaining only dominant physical modes, the method transforms a continuous thermal field into a low-dimensional dynamical system.
Robustness: This physics-based formulation enables thermal reconstruction under realistic machining conditions involving occlusions and cutting fluids.

As illustrated in Figure IR4, a temperature field reconstruction (TFR) method is developed as a real-time, online approach for investigating the thermal dynamics of a thin, disk-like workpiece during turning. Using separation-of-variables and modal decomposition, the temperature field is decoupled into spatial mode shapes and temporal modal responses, allowing efficient reconstruction of transient thermal behavior throughout the machining process.

Figure IR4. Modal Expansion–Based Temperature Field Reconstruction for Thin-Disk Workpieces.

Top: Formulation of the thin-disk temperature field under rotational machining.

Middle: Transient response obtained through modal temperature field reconstruction.

Bottom: Experimental results and validation.

（a）Thermal dynamics of a rotating workpiece during lathe-machining.

（b）Field reconstruction procedure based on separation-of-variables and modal expansion.

（c）Transient temperature field reconstruction:

Left: measured thermal image and dominant temperature mode shape.

Right: reconstructed temperature fields at regions I–VI.

Verified using simulated measurements in finite-element analysis, the TFR method provides a simple, physically grounded, and computationally efficient means for analyzing the thermal dynamics of thin-wall disk-like workpieces during active machining.

Why Modal Expansion Matters

Modal expansion is most effective when the governing physics admits a compact, low-dimensional representation dominated by a small number of modes. By projecting high-dimensional thermal fields into physically meaningful mode shapes, the method reduces computational complexity while preserving interpretability and stability. Modal reconstruction is especially well suited for far-field or distributed systems, such as thin-disk or thin-wall workpieces, where temperature evolution is smooth, global, and governed by well-defined boundary conditions.

Reference: Elsheikh, Guo, Huang, Ji & Lee, Int. J. of Heat and Mass Transfer, 2018.

3.3 Physics-Based Calibration for Binocular Visuotactile Sensing

Addressing 3D deformation reconstruction under contact, we developed an inverse calibration framework for binocular visuotactile perception (Figure MV2). Extending the physically accurate imaging principles of Research Thrust 1 into contact mechanics, the approach ensures that tactile perception remains metrically correct and physically traceable.

Inverse Refraction Modeling: A rigorous optical refraction model compensates for light-path distortion introduced by transparent elastomeric tactile media.
Measurement-Grade 3D Reconstruction: By solving the inverse problem of binocular light refraction, the method reconstructs the true 3D displacement and deformation fields of the contact surface with high fidelity.
Accuracy: Experimental identification of refractive indices showed agreement within 3% of spectroscopic ellipsometry measurements.
Autonomous Significance: Physics-based calibration preserves metric accuracy in visuotactile sensing, enabling reliable force and deformation perception for robotic manipulation in unstructured environments.

Figure MV2 illustrates the Binocular Visuotactile Sensor Model for Reconstructing 3D Deformation caused by Mechanical Contact on an Elastomeric Surface.

Left: Schematic of the binocular visuotactile sensor, illustrating the conversion of optical deformation cues into contact-induced displacement fields.

Middle: Comparison of depth-map reconstruction using the proposed physics-based ray-tracing model versus conventional stereo-vision and lumped-parameter calibration methods.

Right: Experimental validation, demonstrating accurate reconstruction of surface deformation.

Reference: Ma, Ji, & Lee, IEEE Sensors J., 22(18), 2022.

3.4 Unifying Principle: Integrated Intelligence through Physics

Across color (ACC), geometry (template matching), temperature (thermal fields), and touch (visuotactile sensing), Integrated Intelligence is achieved by enforcing physics at the perception level so that learning accelerates inference without compromising physical truth. This unified approach provides a common foundation for advancing autonomous manufacturing, bio-inspired systems, and human-centered robotics.

Instead of treating perception as a data-driven pattern recognition problem, these studies formulate vision, temperature, and contact as inverse problems constrained by physics, ensuring that measurements remain metric, interpretable, and transferable across domains. By coupling first-principles modeling with computational efficiency—through analytical formulations, modal representations, and physics-guided learning—this framework enables real-time autonomous perception without sacrificing physical fidelity.

Collectively, the research across thrusts establishes Integrated Intelligence as a disciplined synthesis of sensing physics, model-based inference, and adaptive computation.

Bridging Physics-Guided Learning and Modal Intelligence

Figures IR3 and IR4 both address the same fundamental limitation of thermal sensing: the reliance on idealized, unobstructed visibility. Real-world processes often involve occlusions from cutting fluids, chips, or structural blockages.

Rather than depend on direct line-of-sight, these approaches reconstruct internal thermal states from partial, noisy, and indirect observations. This is achieved by embedding physical constraints—either through physics-guided artificial neural networks (ANNs) or modal decomposition techniques—directly into the perception process.

Together, they demonstrate how intelligent thermal perception can remain accurate and reliable even practical constraints where classical sensing assumptions break down.

Choosing Between Physics-Guided ANN and Modal Expansion

Problem Characteristics

Unknown or time-varying heat sources

Strong occlusion (chips, fluids, blockage)

Need for real-time inverse estimation

Well-defined geometry and boundary conditions

Requirement for interpretability

Near-field, contact-dominated physics

Far-field, distributed thermal behavior

Modal Expansion

✖ Limited

△ Depends on observability

✔ If mode count is small

✔ Ideal

✔ Strong

✖ Less suitable

✔ Ideal

Physics-Guided ANN

✔ Preferred

✔ Robust

✔ High-speed inference

△ Helpful but not required

△ Moderate

✔ Ideal

△ Possible

Selected Publications

Thrust I: Physics-Based Imaging & Preprocessing

Focus: Eliminating noise and enhancing contrast at the source of image formation

Lee & Li (1991): Established the principles of collocated optical sensing for measurement-grade 2D localization in robotic vision.

Lee (1994): Developed the theoretical framework for LED-based retroreflective imaging architectures enabling high-contrast, low-cost sensing.

Lee, Qiang & Daley (2007): Introduced the foundational theory of Artificial Color Contrast (ACC) for physics-guided color separation and robust feature detection.

Lu, Lee & Ji (2023): Demonstrated real-time, high-throughput ACC-based vision for segmentation and quality assessment in food-processing environments.

Thrust II: Model-Based State Reconstruction

Focus: Integrating geometric and thermodynamic laws into the vision pipeline.

Lee & Janakiraman (1992): Developed similarity-based geometric template matching for 2D pose estimation and contour reconstruction.

Jiang, Lee & Ji (2017): Combined infrared imaging with lumped-parameter thermal modeling for motion detection and human-state inference.

Ji, Huang & Lee (2018): Formulated a physics-based framework for reconstructing 3D steady-state tool temperature fields during machining.

Ding, Jiang & Lee (2024): Proposed a non-invasive inverse geometric method for reconstructing human spine curvature from surface measurements.

Thrust III: Intelligent Perception & Real-Time Inversion

Focus: Fusing physical insight with computation for autonomous applications.

Lee, Huang, Ji & Lin (2018): Physics-guided ANN for peak tool temperature monitoring.Elsheikh, Guo, Huang, Ji & Lee (2018): Modal expansion for dynamic thermal field reconstruction.

Ma, Ji & Lee (2022): Physics-based calibration and refraction modeling for visuotactile sensing.