WO1991019263A1

WO1991019263A1 - Image data encoding and compression

Info

Publication number: WO1991019263A1
Application number: PCT/GB1991/000855
Authority: WO
Inventors: Peng Seng Toh
Original assignee: Axiom Innovation Limited
Priority date: 1990-05-29
Filing date: 1991-05-29
Publication date: 1991-12-12
Also published as: EP0548080A1; AU7986491A

Abstract

An image processing system comprises a source of digital electronic image data to be processed. The image data is processed by a processor on a line by line basis by detecting edges in the image as for example discontinuities in intensity and by representing the intensity profile between detected edges as a mathematical expression such as a polynomial function. Data representing the detected edges and data relating to the mathematical expressions are stored as a data set for further processing. The data set for one image may be compared to that for another similar image in order to match features between images and thereby produce three dimensional information about a scene.

Description

IMAGE DATA ENCODING AND COMPRESSION

The invention relates to image data encoding and compression.

In image processing systems raw image data input from an image source such as an electronic camera is preprocessed to provide a higher level representation of the image before the image is analysed. The raw image is said to be at the lowest level and the level of representation increases as the degree of abstraction increases.

In the art of artificial intelligence, machine vision and the like a great deal of work has been done in processing image data to identify sharp intensity changes or intensity discontinuities. One of the most notable developments in this field is the so-called primal sketch which resulted from work carried out by Marr and Hildreth as disclosed in their paper "Theory of Edge Detection" Pre. R. Soc. London, 13207, 187- 217, 1980. However, very little work has been done in other areas such as the representation of shading in an image. Accordingly processed image data has not necessarily contained the or a maximum amount of information about a scene and thus has resulted in analyses being carried out on incomplete data. Stereo vision and depth perception arecharacterised by matching corresponding primitives across two or more disparate views of the same scene. Only if the corresponding primitives can be successfully matched, can depth be perceived. It isrecognised that correspondence matching is one of the major problems in stereo vision but, in spite of this major difficulty, stereo vision remains very attractive. This is because stereopsis derivesabsolute depth or 3 dimensional in forma tion through triangulation. In contrast, monocular techniques, in which the correspondence problem can be avoided, can only provide relative depth information and thus there is insufficient information to describe fully the scene.

There are three major approaches to stereo matching namely structure matching, feature matching and intensity based matching. The distinction between these approaches is the choice of primitive rather than the method of matching since the method of matching is generally dependent upon the choice of primitive. Each of the three methods undergo differing degrees of pre-processing in order to obtain a desirable primitive representation of the image. Intensity or area-based matching involves almost no pre-processing other than perhaps image smoothing. In contrast, deriving the structure of a scene involves a higher degree of abstractions.

Structural matching, also known as high level or relational matching, uses high-level features such as regions, bodies or the relationship between features as a matching primitive. High-level features have some kind of semantic description of the object and can be represented in various forms such as graphs, stars, networks and circuits. The distinctive characteristic of these representations is the existence of a hierachcial structure. For example, one known approach is to group high-level features into a hierarchical structure comprising bodies, surfaces, curves, junctions and edgels. The body feature is the highest level and is formed by several surfaces located in the hierarchy one level below. Surfaces are formed by curves and junctions. The lowest level consists of edgels which make up the curves and junctions. The highest level in a structure, in this example the body, has the most distinctive attributes and should result in a less ambiguous match. Matching is then traversed down the hierarchy until the lowest level is reached. A star structure approach can be used to define at a node the relationship with all neighbouring nodes including the node itself plus all the links to the neighbouring nodes. The advantage of structural matching as awhole is the ability to avoid local mismatches and this leads directly to a meaningful 3D description of the scene. Views with larger separations and transformations are more likely to be matched usingstructural matching than they are using other primitives.

Feature matching starts from the basis that correspondence cannot take place at all points in the image and can only be applied to those points which can be identified without ambiguity in the two images.

Features are usually detected by the application of

"interest" operators such as those proposed by Moravec or edge detectors such as those proposed by Marr and Hildreth. Stereo matching using features results in a huge reduction of matching candidates and the result obtained from matching can be further improved by extensive operations to remove ambiguities, such as figural continuity for edges. However, feature based stereo analysis produces only a sparse depth map and this is often regarded as the techniques main drawback.

Intensity based matching has , despite many arguments against it , enjoyed some success . One known method of intensity matching is an application of a stati sti cal paradigm of combining independent measurements. Many measurements are combined statistically to produce a more robust indication of correspondence than is possible with fewer measurements. In short, the improvement arises from the association of more attributes to each matching primitive. However, the intensity based method is generally more time-consuming due to the vast numbers of matching candidates and one of the major setbacks is its inability to handle homogeneous or uniform brightness areas. These areas do not have gray level variation which is essential to correlation measurement. Another disadvantage is the need to define a local area in which correspondence is sought. The size of the local area, usually in the form of a correlation mask, is crucial to the method and yet it is always chosen arbitrarily.

Methods have also been developed that match both edges and intensity. An example of one such method is that developed by Baker and Binford. The Baker and Binford method first matches edges and false matches are then made unambiguous by a global connectivity check. After having obtained the edge disparities, the disparities are then used as references to carry out the intensity correlation. This method produces dense results and adheres to two distinct steps of local matching followed by a global refinement step. The present invention resides in the realisation that an image can be represented fay combined edge and shading data. As such, the invention enables encoded data to be processed directly without the need first to reconstruct the image and thus offers significantadvantages in terms of processing overheads.

According to one aspect of the invention there is provided an image processing system in which an acquired image is processed to identify edges in theimage and to represent the intensity profile of image portions between detected edges as a respective mathematical expression thereby to reduce the amount of data used to define the image.

According to another aspect of the invention there is provided an image processing system comprising an image source for supplying digital electronic image data, an edge detector for detecting edges in the supplied image and for creating an edge map therefrom, and an integrating processor for combining in the edge map data mathematical expressions representing the intensity of image portions between the detected edges, thereby to reduce the amount of data defining the image.

According to a further aspect of the invention there is provided a method of encoding data representing an image, the method comprising smoothing initial image data to suppress noise and fitting a continuous equation to image intensity profile portions bounded by abrupt intensity changes.

Furthermore, the invention provides a multiple view vision system in which features in different images representing a scene viewed from different respective locations are marked by comparing one set of encoded data representing intensity profiles for image portions defined between abrupt intensity changes in one image with a similar set of data representing another image.

Moreover, the invention provides a system for processing image data, the system comprising acquiring means for acquiring at least one image, first storing means for temporarily storing data representing the acquired image, detecting means for detecting edges in the acquired image, defining means for defining intensity profiles between the detected edges as respective mathematical expressions on an image line by line basis.

Thus, the present invention aims to produce a signature which combines feature and non-feature points whenever possible in an image. In this respect, a feature point may be regarded as an active element representing an abruptly changing feature such as an edge and a non-feature point may be regarded as a picture element representing, together with other picture elements in its vicinity, a slowly changing feature such as a change in surface shade. An advantage of using feature points in that they provide, as it were, rigid terminals or anchors between which non feature points may be defined as additional image attributes.

The above and further features of the inventionare set forth with particularity in the appended claims and together with advantages thereof will become clearer from consideration of the following detailed description of an exemplary embodiment of the invention given with reference to the accompanying drawings.

In the drawings:

Figure 1 is a schematic view of a system according to the invention;

Figure 2 shows a) an image portion and b) an intensity profile associated with the image portion;

Figure 3 illustrates image geometry;

Figure 4 shows a weighting function for deemphasising edges;

Figure 5 shows corresponding intensity profiles in two differently viewed images of the same scene;

Figure 6 is a flow diagram of a multiple pass matching technique;

Figure 7 is an image restoration algorithm;

Figure 8 is an image edges decompressing algorithm.

Before describing the embodiment, reference is made to the adaptive vision based controller disclosed in international patent application no. WO 89/01850 now assigned to us, the teachings of which are incorporated herein by reference. It should be noted that the embodiment to be described can be incorporated into said controller or indeed into any other suitable machine vision system as required.

Turning now to Figure 1 of the accompanying drawings there is shown an image data encoding and compression system 1. Image data from an image source 2 which may be an electronic camera for example is input to a smoothing circuit 3. Encoding is carried out in two stages. In the first of these stages the image data is smoothed to suppress noise and edges are detected by an edge detector 4, and in the second stage, as will be described in greater detail hereinafter, a polynomial is fitted to shading information between detected edges, which information is held in a shading store 5.

First, the image data is subjected to smoothing by the smoothing circuit 3. The smoothing circuit 3 uses a standard convolution such as a Gaussianconvolution to suppress noise such as spikes or other glitches in the incoming image data. The smoothing circuit 3 delivers the smoothed image to the edge detector 4 which is arranged to detect edges as sharp intensity changes or discontinuities using any suitable known method of edge detection. Global or two dimensional edge detection is preferred though linear scan line or one dimensional edge detection can instead be used. The reason why one dimensional edge detection can be used is that, as will become clearer from the description that follows, the encoding technique only preserves nonrhorizontal edges, assuming a horizontally scanned image raster, and does not preserve horizontal edges.

The edge detector 4 outputs an edge map which represents edges tietected in the image and which is held in any suitable store 6. Once the edge map has been created it is used by an integrator 7 to define boundaries or anchor points in the image. These anchor points define positions in the image between which a polynomial function can be fitted to the shading profile of the image. The polynomial function is preferably obtained by least square fitting to the shading profile.

Figure 2 of the accompanying drawings shown (a) an exemplary image 10 and (b) an exemplary intensity profile 11 along a horizontal scan line 12 in the image 10. The image 10 includes edges 13 which are detected by the edge detector 4 and areas of shading, ie. varying or constant intensity between the edges 13. As can be seen from Figure 2, points on the line 12 corresponding to edges 13 in the image, are seen as discontinuities at X₀, X₁, X₂, X₃, X₄ and X₅ in the intensity profile 11. Between these points the intensity profile is constant or continuously and smoothly varying. The intensity profile portions between X₀ and X₁, and between X₁ and X₂, and so on can each be represented by a polynomial equation represented in Figure 2 as I₁(x), l₂(x), l₃(x), I₄(x) and I₅(x).

There are several advantages in employing a polynomial function in the definition of intensity profile between two edges. Firstly, a polynomial can approximate to a large number of pixels using only a few parameters. Secondly, least-square fitting with a polynomial reduces noise, such as interference spikes and camera noise. Furthermore, very slight intensity variations due to surface texture, which are of course undesirable, are also removed. Thirdly, a polynomial fit is easily implemented by numerical algorithms on any suitable computer or image processor. Nevertheless, the application of a polynomial least-square method is not without difficulties. The intensity profile along the entire length of a scanline is a complex curve, and this curve cannot be represented simply by a polynomial. However, thepresent embodiment overcomes this problem by segmenting the scan line into several low order polynomials which are preferred because of their stability. The joints between these segments correspond to edges in the image and therefore correspond also to discontinuities in the intensity profile of the image. This ensures that the low order fitted polynomial will be accurate because there will be no discontinuities within the segment of the intensity profile to which the polynomial function is being fitted. Since the polynomial function is fitted strictly to the profile in-between edge points, the condition of smoothness can be well satisfied.

Each intensity profile portion is approximated by a polynomial function I_i(x) as follows:

I_i(x) = a₀ + a₁x + a₂x2 + ... = a_nxⁿ

for the sample points x₀ to x₅ in each portion.

Thus; each line in the image is expressed as a collection of edge coordinates x₁ ... x₅ for example interleaved with polynomial or other continuous equations defining the intensity profile between consecutive edge coordinates. Once the image data has been reduced to this form it can be used in a wide range of different image processing applications.

In many if not all hitherto known encoding schemes it is necessary to reconstruct the image from the encoded data before further analysis can be done. The present encoding scheme makes it unnecessary in many cases to reconstruct the image and this is a significant advantage in terms of increased processing speed etc. For example, if an edge map for say a part of the image is required for use in say feature matching in stereo analysis, the map can quickly be constructed from the encoded data simply by reading the edge coordinate data x₁ ... x₅. Indeed, in some circumstances it will be possible to dispense with any form of image reconstruction and to work instead exclusively on the encoded data.

For example, consider a stereo matching system in which one view of a scene includes a profile portion I₁(X_l) = a₀ + a₁ x₁ + a₂ x² + ... (1)

for x₁₀ to x_1s sample points, and another view of the scene includes a profile portion I₂(x_r) =b₀+b₁ x_r + b₂ (2)

for x_ro to x_rt sample points.

The number of samples involved in generating I_iand I_j is usually different ie. s t T h e

independent variables x_l and x_r denote the horizontal coordinates of the left and right images respectively.

Since the left and right segment profiles arelargely the same for Lambertian surfaces

I₁(x_l) = l₂(x_r), (4) and we let geometrical transformation of the left and right images to be related by

x_r = h(x₁). (5) which can also be represented by a polynomial

.

h (x_l) = d_jx_lj (6)

Since most surfaces can be approximated by a quadratic function, up to a second order function for h(x_l) is sufficient.

Given different choices of h(x_l), a match is established if the criterion function

< {I_l(x_l) - I₂(h(x_r))} ²

is minimised.

Another example of the use to which the encoded data can be put is qualitative shape analysis. It is will be assumed that the objects in the scene have near lambertian reflectance (ie. substantially diffuse reflection as opposed specular reflection) and that the change in observed intensity is negligible withrespect to the viewing angle. As shown in Figure 3 of the accompanying drawings, the intensity I of a Lambertian surface 30 under orthogonal projection is given by

I = S_p(N.L)

where ρ is the surface albedo and is constant across a strip 31 on the surface 30 because any discontinuity in p also appears as intensity discontinuity and would be detected by the edge detector;

S is the intensity of the incident light whose variation across the strip is negligible;

and N and L are the space vectors of the normal of the surface orientation and the direction of the incident light respectively. The dependency of the expression on image coordinate space is omitted for the sake of clarity. The intensity I and the surface normal N are different along the strip. The curvature of the strip is given by the derivative of the surface normal along the scanline direction and can be related to the derivative of the intensity as follows: dl = ρS (dN.L) + _ρ S (N.dL)

since L is from a rdistant light source, its variation across the strip is extremely small, and therefore dl can be approximated to

dl = ρ s dN

Similarly, the second derivative of intensity d²I can be approximated to

d²I = ρ S d²N

It will therefore be appreciated that the intensity derivatives dl and d²I correspond to the order of the curvature of the strip 31. If the strip 31 has a planar curvature, dN is constant and therefore dl is also constant. It follows that the second and higher intensity derivatives will be zero for a planar surface. If however, a strip has a surface which is defined by a second order polynomial, then d²I will not be zero. It follows that if d²I is not zero the strip is non-planar.

Thus it is possible to classify the curvature of a strip as planar or non-planar based on the polynomial representation of the strip. Under normal lighting conditions, where the light source has both ambient and directional components shading will be caused by the directional component of the source. Ambient light is uniform or diffuse and will not contribute to shading. With surface shading, the curvature of a surface along an axis, eg its x axis, can be estimated and since the polynomial representation incorporates photometric information relating to the strip, the strip can be used directly for shape analysis. Having knowledge about the shape of the surface will enhance the general disparity function to apply the appropriate order and thus the most important use of this preprocessing step is to classify a strip as either planar and non-planar.

In order to use the polynomial representation of the combined profile for curvature classification, the polynomial representation is extracted from the image using a trapezoidal weighting function, as shown in Figure 4. The purpose of this weighting function is to suppress the influence of sudden changes in intensity at edges in the image. To this end, the weighting function is maximum in a central area between edges and tapers to a minimum in the vicinity of detected edges. This weighting function is first applied to the intensity profile and a suitably adjusted polynomial representing the profile is then calculated as previously or as a product of the weighting function and a previously calculated polynomial.

Once the weighted polynomial has been extracted, the classification criterion for classifying a surface as planar or non-planar is very simple. Any stripwith an intensity profit represented by polynomial order higher than one is non-planar, and a strip is planar if its polynomial order is less than or equal to one. Local shading analysis suggests that a planar curve is one whose second order derivative is zero, or equivalently, the intensity is at most first order.

This technique of matching planar strips can be extended to deal also with non-planar surfaces in a scene. It is possible to estimate the function relating to the foreshortening of a non-planar surface strip by replacing the assumption x_r = cx_l with the assumption

x_r = h(x_l)

It can be shown that

h(X_P) = l₂ ^_1I₁(X_P)

This equation involves finding the inverse of I₂ which is not a trivial exercise. In practice it is also necessary to consider the effect of noise and so a minimisation approach is therefore instead adopted.

It can be shown that

= 0 (17)

This equation can be solved by breaking it down into several stages. First of all, x_r is solved given x_l by successive approximation within a small neighborhood in accordance with the equation

x_r = x_l + δx_l(x_l) - I_l(x_l))² is minimum for x₀ ≤ x_p ≤ x_t

The disparity at x_p is then given by δx_l(x_t). Next, since the disparity function of the strip should also be smooth, therefore a polynomial S d_jx^j,

.

is used to fit δx(x) at different samples of x_P.

There is one point to be considered while fitting this disparity function. Since the disparities of the edge points are more accurate than the non-feature points, they must be given more weights to constraint the least-square solution. An unbiased fit will not guarantee that the function passes through these end points. A weighting function in the form of an inverted trapezoid is therefore appropriate and it should be noted that this weighting function is the reverse to that for curvature analysis shown in Fig. 4 of the drawings.

Another example of the use to which the encoded data can be put is quantitative shape analysis. When a surface in a scene is viewed from two different viewing positions there will be a difference in the geometry of the surface between the two views and this will result in a different intensity profilepolynomial being defined for the same feature in the two different views. An example of this effect is shown in Figure 5 of the accompanying drawings. It is possible from the image geometry to calculate afunction relating the planar surface between the two images and it can be shown that x_r = cx_l + d ... (3) where c =

and t = no of pixels in x_r

s = no of pixels in x_l

From equations 1, 2 and 3 above it can be shown that

It is unrealistic to expect all coefficient ratios to be identical and so instead, the system is arranged to accept the two equations (1) and (2) for a planar surface to be matched if the following equation is satisfied.

|a_n - b_ncⁿ| ≤ ε (4)

Where e is a preset threshold.

The need for the setting the an appropriate value of threshold ε for the matching criterion function as discussed in relation to eσuation 4 can be eliminated using a multiple pass algorithm such as is shown in Figure 6 of the accompanying drawings.

As can be seen from Figure 6, a small threshold ε ₁ is first chosen for matching candidates along the scanline. This sets a very strict test to be passed and under such stringent criterion, very few pairs of matches will normally occur. The value of the threshold ε is then progressively relaxed for subsequent passes of matching for the same scanline until an upper limit ε₂ is reached, or all candidates are matched. The reliability of the matching varies with the value of the threshold ε . For instance, a smaller threshold will produce a more reliable result and a reliability factor can thus be assigned to the matched result and this will facilitate further refinement if required.

It is of course often desirable to be able to reconstitute the image in the form after filtering by the smoothing circuit 3. The procedure for decoding the entire image is shown as an alogorithm in Figure 7 of the accompanying drawings. The reconstruction represented by this algorithm proceeds line by line with the function "polynomial" reconstructing the intensity profile for each point on the line between edge points.

Alternatively the edge map only may be reconstituted by way of the algorithm shown in Figure 8 of the accompanying drawings.

Having thus described the present invention by reference to a preferred embodiment it is to be understood that the embodiment in question is exemplary only and that modifications and variations such as will occur to those possessed of appropriate knowledge and skills may be made without departure from the spirit and scope of the invention as setforth in the appended claims and equivalents thereof.

Claims

CLAIMS :

1. An image processing system in which an acquired image is processed to identify edges in the image and to represent the intensity profile of image portions between detected edges as a respective mathematical expression thereby to reduce the amount of data used to define the image.

2. A system as claimed in claim 1, in which the acquired image data is processed initially to remove noise therefrom.

3. A system as claimed in claim 2 , in which the initial processing comprises applying a gaussian convolution to the acquired image data.

4. A system as claimed in any preceding claim in which edges are detected as discontinuities and/or abrupt changes in intensity.

5. A system as claimed in any preceding claim in which the mathematical expression is a polynomial function.

6. A system as claimed in claim 5 in which the polynomial function is calculated as a least square fit to the intensity profile.

7. A system as claimed in any preceding claim in which the reduced image data is stored in the form ofedge addresses interleaved with mathematical expressions on an image scan line by line basis.

8. An image processing system comprising an imagesource for supplying digital electronic image data, an edge detector for detecting edges in the supplied image and for creating an edge map therefrom, and an integrating processor for combining in the edge map data mathematical expressions representing the intensity of image portions between the detected edges, thereby to reduce the amount of data defining the image.

9 . An image processing system as claimed in claim 8 comprising a smoothing circuit for removing noise from the image supplied from the image source.

10. An image processing system as claimed in claim 8 or 9, in which the intensity of each of the image portions is represented by a respective polynomial function.

11. A multiple view vision system comprising an image processing system as claimed in any of claims 1 to 10.

12. A method of encoding data representing an image , the method comprising smoothing initial image data to suppress noise and fitting a continuous equation to image intensity profile portions bounded by abrupt intensity changes.

13. A multiple view vision system in which features in different images representing a scene viewed from different respective locations are marked by comparing one set of encoded data representing intensity profiles for image portions defined between abrupt intensity changes in one image with a similar set of data representing another image.

14. A system for processing image data, the system comprising acquiring means for acquiring at least one image, first storing means for temporarily storing data representing the acquired image, detecting means for detecting edges in the acquired image, defining means for defining intensity profiles between the detected edges as respective mathematical expressions on an image. line by line basis.

15. A system as claimed in claim 14 further comprising combining means for combining data representing the detected edges with the mathematical expressions therebetween in order to produce a reduceddata set representing the image.

16. A system as claimed in claim 15, further comprising second storing means for storing a reduceddata set to enable data, representing another image to be processed to produce a reduced data set representing the other image, and comparing means for comparing the reduced data sets representing respectively the image and the other image to identify similar features therebetween.

17. A system as claimed in any of claims 14 to 16, further comprising reconstructing means for reconstructing an image from data representing the detected edges and from the mathematical expressions, and display means for displaying the reconstructed image.

18. A system or a method substantially as herein described with reference to the accompanying drawings.