(Stereoscopic Displays and Applications VII, San Jose, California, February 1996, Proceedings of the SPIE volume 2653A)

3D Video Standards Conversion

Andrew Woods 1, Tom Docherty 2 and Rolf Koch 3

1 Centre for Marine Science and Technology,
Curtin University of Technology,
GPO Box U1987, Perth 6845, AUSTRALIA
WWW: http://www.AndrewWoods3D.com

2 School of Electrical and Computer Engineering,
Curtin University of Technology
GPO Box U1987, Perth 6845, AUSTRALIA

3 School of Physical Sciences, Engineering & Technology,
Murdoch University,
South Street, Murdoch 6150, AUSTRALIA

ABSTRACT

This paper discusses the conversion of 3D video between the three world video standards of NTSC, PAL and SECAM. An overview is given of the five main methods of achieving 3D with consumer video and the principles of video standards conversion are discussed. A solution for converting field-sequential 3D video between standards is presented and a number of other advantages which the system offers are discussed.

Keywords: 3D video, stereoscopic video, field-sequential, standards conversion, PAL, NTSC, SECAM.

1. INTRODUCTION

The most commonly used format for 3D video is the field-sequential method. In what has become a defacto standard, the left and right images are stored in the even and odd fields of the video signal. The standard is popular because it uses relatively simple equipment to generate and display the 3D video signal and also because the 3D video signal can be stored on a single video tape.

Unfortunately, despite its simplicity, field-sequential 3D video cannot be converted between the 60Hz (NTSC) and 50Hz (PAL & SECAM) video standards using conventional standards converters. Most, if not all, video standards converters corrupt the 3D content of the 3D video signal by mixing the odd and even fields and the output signal is unviewable in 3D in the new standard.

2. BACKGROUND

There are three main video standards in use around the world today - NTSC, PAL and SECAM 1. NTSC is used extensively in North America and Japan and the PAL system is used primarily in Europe and Australia. SECAM is a French developed system and the rest of the countries around the world are fairly evenly distributed between these three standards mainly for political reasons.

The standards differ in three main respects: the field rate (number of fields per second), the number of lines per frame and the method of encoding colour. The parameters for each of the three standards are summarised in Table 1.

NTSC PAL SECAM
field rate 60Hz 50Hz 50Hz
lines/frame 525 625 625
colour encoding QAM 3.58MHz QAM PAL 4.43MHz FM 4.25, 4.40MHz
Table 1: Differences between World Video Standards

All three standards use 2:1 interlacing. This means that each frame of 525 or 625 lines is scanned in two parts called fields. Firstly all the odd numbered lines are scanned (called the odd field) and then the even numbered lines are scanned (called the even field). Therefore there are 262.5 lines per field in the 60Hz standard (NTSC) and 312.5 lines per field in the 50Hz standards (PAL & SECAM). Interlacing is used because it allows a high vertical resolution with a low amount of flicker while keeping the signal bandwidth to a minimum.

3. 3D VIDEO STANDARDS

There are five main techniques by which 3D/stereoscopic imagery can be encoded onto a standard video signal. These are field-sequential, sidefields (side-by-side), subfields (over-under), separate channels and anaglyph. The field- sequential and sidefield methods are the most commonly used today.

3.1 Field-Sequential

In this system, video fields are alternately encoded with right or left information. The popularity of this system is a result of its simplicity. Field-sequential 3D Video is easily generated from a pair of genlocked video cameras by using a video multiplexer which selects odd fields from the right camera and even fields from the left camera. The 3D video signal can be recorded and played back with standard video cassette recorders (VCRs) and it can be viewed in 3D quite simply using a standard television, a pair of liquid crystal shutter glasses and a small device which synchronises the glasses with the left and right images being displayed on the screen.

Field-sequential 3D video does have a problem with flicker when used with a standard television because each eye only receives half the overall field rate (25Hz for PAL and 30Hz for NTSC). The flicker problem can be overcome by using commercially available field doublers 2.

With field-sequential 3D video there are two polarities by which left and right images can be stored in the odd and even fields. Most companies have chosen to store right images in the odd fields and left images in the even fields (3DTV Corporation, VRex Inc, Virtual I/O, SOCS Research, etc). Some systems, however, use the opposite polarity, e.g. the Toshiba 3D camcorder. The result of this is that 3D video generated with one system cannot be viewed correctly on a system with the opposite polarity. The incorrect image will be sent to each eye and a pseudoscopic (reversed stereo) image will be seen by the viewer and incorrect depth information will be perceived. Some systems are, however, compatible with both polarities by changing an external switch.

3.2 Sidefields

The sidefield method (sometimes called the side-by-side method) stores the left and right images side-by-side on the left and right halves of the video signal. There are actually two ways in which this can be done: (a) The left and right images are squeezed in the horizontal direction by a factor of two or (b) without the 2:1 squeezing. The former is the main system in use today.

The squeezing method has the advantage that allows the 4:3 aspect ratio (ratio of image width to image height) of the left and right images to be retained (after they are unsqueezed). Digital video electronics are used to squeeze the left and right images from a pair of video cameras to generate the sidefield 3D video signal. Digital video electronics are again used at the display to convert the sidefield format signal into a 120Hz field-sequential signal. The sidefield 3D video signal can be recorded and played back with a standard VCR. To our knowledge, the generation of 3D video in this format is only supported by equipment available from StereoGraphics Corporation 3 (San Rafael, California). 3D video in this format can also be displayed on some equipment available from 3DTV corporation (San Rafael, California).

The sidefield method without the 2:1 horizontal image squeeze is generally only used for amateur purposes because the images have a vertically narrow aspect ratio of 2:3. Sidefield 3D video in this format is generally produced by a single video camera fitted with an optical beam splitter. This is basically a device containing four mirrors and is quite commonly used in 3D still photography. The image is viewed in 3D either by free-viewing the stereo-pair or by viewing the display while using some optical aid (containing either mirrors or lenses).

3.3 Subfields

The subfield format (sometimes called the over-under format) is used extensively in computer graphics as the primary method of producing stereoscopic imagery from computers with standard computer graphics hardware. Basically the left and right images are stored in the top and bottom halves of the video signal. The left and right images are squeezed in the vertical direction by a factor of two.

We are only aware of one system which used this format with standard video. It was developed by StereoGraphics Corporation and implemented in the NTSC video standard.4 This system was discontinued several years ago when StereoGraphics' sidefield system was released.3 The use of the subfield format with standard video is not supported by any currently available equipment.

Since this system was invented before inexpensive digital video electronics were available, it required the use of specially modified video cameras to generate the subfield format 3D Video. The signal was displayed on a monitor whose vertical deflection scanned at twice the normal rate so that the left and right images were displayed overlapping each other. The image was then viewed through a pair of shutter glasses which were driven in synchronisation with the left and right images being displayed on the screen. This system had the big advantage that the 3D imagery was displayed flicker-free, however this advantage was offset by the complexity of the cameras. The subfield format 3D video signal can also be recorded and played back with a standard VCR.

3.4 Separate Channels

This system maintains two separate video signals - one containing the left images (from the left video camera) and one containing the right images (from the right video camera). There are two main problems associated with this system: recording/play-back and display. The two signals must be recorded and played back with a pair of synchronised video cassette recorders. The synchronisation capability is generally only found on professional level video cassette recorders. The signals must also be displayed on a dual channel stereoscopic display device (such as a pair of video projectors or a dual monitor stereo display). This system does, however, have the advantage of maintaining full video resolution from both cameras.

3.5 Anaglyph

This system encodes the left and right images by way of colour. The left and right images are stored as two different primary colour channels - usually red and blue. The 3D image is viewed by wearing a pair of glasses with appropriately coloured filters in the eye pieces. This system does not work particularly well with video because of the low bandwidth allocated to the colour signal and the way in which the colour is encoded in the signal. A full colour stereoscopic image cannot be displayed with this technique.

4. STANDARDS CONVERSION

In order for a video signal to be converted to another standard, three aspects of the video signal may need to be changed - field rate, lines/frame and colour encoding. When converting PAL to SECAM, it is only necessary to change the colour encoding of the video signal (since the field rate and the number of lines per frame are the same). When converting from NTSC to PAL, however, it is necessary to change all three parameters.

Conversion of the colour encoding method is a fairly simple process and can be relatively easily achieved using linear analog electronics. Unfortunately, the process of changing the field rate and the number of lines per frame is more complicated and is generally performed using digital electronics. There are three main ways in which the number of fields per second and the number of lines per field are converted: Field/Line Omission/Duplication, Field/Line Interpolation and Motion Estimation.

4.1 Field/Line Omission/Duplication

This is the simplest process and requires the least complicated electronics. In what can be considered a two step process, the number of lines per field are first converted to the new number and then the number of fields per second is converted. In a PAL to NTSC conversion, firstly the number of lines per field is converted from 312.5 lines/field to 262.5 lines per field. This is done by omitting one line from every six. This is illustrated in Figure 1(a). The field rate is then be converted from 50 fields per second to 60 fields per second. This is done by duplicating or repeating one field in every five. Note that because each field now consists of only 262.5 lines it is possible to display 60 fields per second. This is illustrated in Figure 1(b). With an NTSC to PAL conversion, it is necessary to repeat one in every five lines and omit one in every six fields to obtain 312.5 lines per field and 50 fields per second.

PAL to NTSC conversion
Figure 1: PAL to NTSC conversion. (a) Omission of lines when converting a 325.5 line field to a 262.5 line field (b) Duplication of fields when converting from 50Hz to 60Hz.

This is the simplest and lowest quality conversion technique. It introduces some conversion artefacts especially when motion is present in the scene. Subjectively the conversion is acceptable.

4.2 Field/Line Interpolation

In this method, individual lines and fields in the output standard are a product of several lines or fields of the input standard. This is an extension of the previous scheme where individual lines and fields in the output standard were based on single lines or fields from the input standard.

In a simple implementation of such a system, a new line in the output standard is calculated as a linear interpolation between two lines from the input standard. The particular input lines from which the output line is calculated and the weightings used are determined from the position in the scan where the output line must be generated. This is illustrated in Figure 2(a) which shows a PAL to NTSC conversion. For example, line 5 in the output standard is calculated as 24% of line 5 and 76% of line 6 from the input standard. This calculation continues such that the correct number of output lines is generated from the input lines. The conversion of the number of fields per second is a similar process and is illustrated in Figure 2(b). For example, output field number 3 occurs at t=2/60 seconds. It is calculated from inputs field numbers 2 and 3 (which occur at t=1/50 and t=2/50 seconds) at a weighting of 33% of field 2 and 67% of field 3.

PAL to NTSC conversion
Figure 2: PAL to NTSC conversion. (a) interpolation of the new line rate and (b) interpolation of the new field rate.

Four line/four field converters are also available. They work in a similar way to the process explained above except that each individual output line is based on a weighted average of four input lines and each individual output field is based on the weighted average of four input fields.

The performance of this conversion method with standard video is much better than the previous method, however some conversion artefacts are still evident, particularly with scene motion. It should be noted that the details I have provided give only a brief description of the process. A full explanation is contained in Sandbank (1990).

4.3 Motion Estimation

In this method, a motion vector array is calculated between consecutive fields in the video stream. The motion vector array shows how the objects move and change in the video image from field to field. When an output field is to be generated at a time interval between two input fields, the motion vector array is used to calculate a new intermediate field with a percentage of the motion between the two fields.

Motion estimation is generally only found in broadcast quality standards converters. The quality of conversion will obviously vary with the quality of the algorithm which calculates the motion vector array.

5. STANDARDS CONVERSION OF 3D VIDEO

Of all the 3D video methods mentioned, field-sequential 3D video is the only method which has serious problems with video standards conversion. This arises because the odd and even fields are used to store the different left and right images. The three conversion techniques described either upset the order in which left and right images are presented in the output standard or mix the left and right images to generated fields in the output standard. The problem actually lies with the field rate conversion process - the conversion of the line rate and colour encoding does not corrupt the signal. Therefore, field-sequential 3D video is only corrupted when a standards conversion is performed which changes the field rate. For example, SECAM<->PAL conversions do not corrupt field-sequential 3D video whereas NTSC<->PAL and NTSC<->SECAM conversions do corrupt field-sequential 3D video.

Three different types of problems occur with each of the three methods of standards conversion. These problems are illustrated in Figure 3 for a PAL to NTSC conversion. Figure 3(a) show the native PAL field sequential 3D video signal. The black and white squares represent the odd and even fields which contain right and left images. The field/line omission/duplication method (illustrated in Figure 3(b)) does not mix fields, but the field polarity of the output signal changes every five or six fields. It can be seen that where a field is duplicated (the two consecutive white fields or the two consecutive black fields), the field polarity changes. This obviously destroys the 3D effect. The field/line interpolation method corrupts the 3D information because it produces the output fields from a mixture of odd and even fields. As can be seen in Figure 3(c), very rarely in the output field sequence does a complete left image or complete right image exist. Generally the output fields are a mixture of left and right input fields (represented by the different shades of grey). The motion estimation method corrupts the field-sequential 3D video signal because output fields are motion estimated between consecutive odd and even fields and therefore between a pair of right and left images. The corruption of the 3D information would not occur if even output fields were motion estimates from a consecutive pair of even fields from the input standard. Unfortunately, this is not the case with currently available equipment.

Conversion problems
Figure 3: Conversion problems with field-sequential 3D Video. (a) Native 3D-PAL signal (b) converted to NTSC using field duplication and (c) converted to NTSC using field interpolation.

The separate channels method of 3D video has slight problems with standards conversion. Care must be taken that a time shift is not generated between the two channels when the conversion takes place. If a time difference is introduced between the two channels, temporal stereoscopic effects would be introduced which could upset the stereoscopic information. Ideally both channels would be converted simultaneously with a pair of standards converters (with synchronised output timebase) and with the input video signal coming from a pair of synchronised VCRs.

The 3D information in the other three 3D video methods (sidefields, subfields and anaglyph) is not corrupted by the standards conversion process. The only conversion artefacts present are the same as those present when converting normal (non-3D) video between standards but this does not corrupt the 3D information.

6. DISCUSSION

In order for field-sequential 3D video to be successfully converted between standards, it is important to keep the left and right images separate during the conversion process. It is also equally important that the left and right images are stored in the even and odd fields of the output standard.

We have extended the capabilities of a commercially available video standards converter to allow the successful conversion of field-sequential 3D video between the PAL, NTSC and SECAM video standards. The converter allows field-sequential 3D-PAL, 3D-NTSC and 3D-SECAM material to be converted to field-sequential 3D-NTSC or 3D- PAL. Particular care is taken to keep the odd and even fields separate in the conversion process and to ensure that the left and right images from the input standard are stored on the even and odd fields of the output video standard. Figure 4 shows how the converter can be used to convert field-sequential 3D-NTSC to 3D-PAL.

3D-NTSC to 3D-PAL
Figure 4: A block diagram of how the converter can be used to convert field-sequential 3D-NTSC to 3D-PAL.

The advantage of using digital video and digital frame store technology in the implementation of a standards converter is that it also allows a number of other functions to be achieved. The converter can be used for (a) the conversion of field-sequential 3D video to 2D and (b) the conversion of field-sequential 3D video to the opposite field polarity (field inversion). In the 3D to 2D conversion mode the output video signal consists of only odd fields (left images) or only even fields (right images) of the original field-sequential 3D video signal as chosen by the user. This mode could be used to convert a 3D video sequence to 2D so that the footage could be shown to an audience without need for 3D viewing apparatus. Another application of this mode is for 3D video projection. If two converters are used along with two video projectors, one converter could be configured to provide the first video projector with left images only and the other converter could be configured to provide the second video projector with right images only. If polarising filters are placed in front of each of the projectors and a silvered projection screen is used, a stereoscopic video projection display would be achieved. This configuration is illustrated in Figure 5.

two converters for polarised projection
Figure 5: The use of two converters for polarised projection of field-sequential 3D video.

The field-reversal mode swaps the field polarity of the incoming video signal. Images stored on the odd fields are shifted to the even fields and vice versa. For example, this mode could be used to convert field-sequential 3D video which has been recorded with the Toshiba 3D camcorder (which stores left images in the odd fields) to the defacto standard field polarity (left images stored in the even fields). This configuration is illustrated in Figure 6.

Conversion of the field-polarity
Figure 6: Conversion of the field-polarity of field-sequential 3D Video.

The converter also acts as a time base corrector to stabilise the timing of the video signal and clean up the synchronisation signals.

7. CONCLUSION

We have described the effects standards conversion has on the various forms of 3D video. Most notably, field- sequential 3D video is the only method which encounters serious problems with conventional standards conversion. To solve this problem, we have presented a system which is capable of converting field-sequential 3D video between standards without corrupting the 3D information. This will ease the difficulties encountered when collaborative work is performed between researchers from different countries.

8. ACKNOWLEDGMENTS

This work was inspired by the wish to share field-sequential 3D video with people around the world. Thanks to all those people that provided that motivation - particularly the attendees and organisers of the Stereoscopic Displays and Applications Conferences. The authors would also like to thank Woodside Offshore Petroleum for their continued support of the stereoscopic video research being undertaken at Curtin University.

9. REFERENCES

1. Keith Jack, "Video Demystified: A Handbook for the Digital Engineer", Hightext Publications, California, 1993.

2. Andrew Woods, Tom Docherty and Rolf Koch, "Field Trials of Stereoscopic Video with an Underwater Remotely Operated Vehicle", Stereoscopic Displays and Applications V, Stereoscopic Displays and Virtual Reality Systems, J. Merritt, S. Fisher, Editors, Proceedings of the SPIE volume 2177, pp. 203-210, 1994.

3. Lenny Lipton, "Stereoscopic Real-Time and Multiplexed Video System", Stereoscopic Displays and Applications IV, J. Merritt, S. Fisher, Editors, Proceedings of the SPIE volume 1915, pp. 6-11, 1993.

4. Lenny Lipton, Lhary Meyer, "A Time-Multiplexed Two Times Vertical Frequency Stereoscopic Video System", 1984 SID International Symposium, Society for Information Display.

5. C.P. Sandbank, "Digital Television", John Wiley and Sons, Ltd., West Sussex, 1990.


The 3D Video Multi-standard Converter mentioned in this article is available for purchase. Click here to see the brochure.


Copyright on this document is retained by Curtin University. This document is not public domain. Permission is hereby given to reprint this paper on a non-profit basis for scholarly purposes provided the document is unaltered and this notice is intact. This paper may not be reprinted for profit or in an anthology without prior written permission. If you wish to reprint this paper on this basis, please contact the primary author at the address shown on the first page of this document.


Valid HTML 4.01! GO BACK to Andrew's Home Page

Last modified: 16th February, 1996.
Maintained by: Andrew Woods