|
INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11N4980
Title: MPEG-7 Overview (version 8)
MPEG-7 Overview Executive Overview MPEG-7, formally named Multimedia Content Description Interface, is a standard for describing the multimedia content data that supports some degree of interpretation of the informations meaning, which can be passed onto, or accessed by, a device or a computer code. MPEG-7 is not aimed at any one application in particular; rather, the elements that MPEG-7 standardizes support as broad a range of applications as possible. More information about MPEG-7 can be found at the MPEG home page (http://mpeg.tilab.comcselt.it/) and the MPEG-7 Alliance website (http://www.mpeg-industry.com/). These web pages contain links to a wealth of information about MPEG, including much about MPEG-7, many publicly available documents, several lists of Frequently Asked Questions and links to other MPEG-7 web pages. This document gives an overview of the MPEG-7 standard, explaining which pieces of technology it includes and what sort of applications are supported by this technology. Also the current work towards MPEG-7 version 2 is presented.
Table of Contents
1. Introduction..................................................................................................................................... 2. Major functionalities in MPEG-7................................................................................................... 3. Detailed technical description of the MPEG-7 Technologies........................................................ 4. MPEG-7 Profiling............................................................................................................
............... 5. Current developments......................................................................................................
.............. References............................................................................................................................................ Annexes.................................................................................................................................................
1. Introduction The transition between the second and third millennium abounds with new ways to produce, offer, filter, search, and manage digitized multimedia information. Broadband is being offered with increasing audio and video quality and speed of access. The trend is clear: in the next few years, users will be confronted with such a large number of contents provided by multiple sources that efficient and accurate access to this almost infinite amount of content seems unimaginable today. In spite of the fact that users have increasing access to these resources, identifying and managing them efficiently is becoming more difficult, because of the sheer volume. This applies to professional as well as end users. The question of identifying and managing content is not just restricted to database retrieval applications such as digital libraries, but extends to areas like broadcast channel selection, multimedia editing, and multimedia directory services. This challenging situation demands a timely solution to the problem. MPEG-7 is the answer to this need. MPEG-7 is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group), the committee that also developed the successful standards known as MPEG-1 (1992) and MPEG-2 (1994), and the MPEG-4 standard (Version 1 in 1998, and version 2 in 1999). The MPEG-1 and MPEG-2 standards have enabled the production of widely adopted commercial products, such as Video CD, MP3, digital audio broadcasting (DAB), DVD, digital television (DVB and ATSC), and many video-on-demand trials and commercial services. MPEG-4 is the first real multimedia representation standard, allowing interactivity and a combination of natural and synthetic material, coded in the form of objects (it models audiovisual data as a composition of these objects). MPEG-4 provides the standardized technological elements enabling the integration of the production, distribution and content access paradigms of the fields of interactive multimedia, mobile multimedia, interactive graphics and enhanced digital television. The MPEG-7 standard, formally named Multimedia Content Description Interface, provides a rich set of standardized tools to describe multimedia content. Both human users and automatic systems that process audiovisual information are within the scope of MPEG-7. MPEG-7 offers a comprehensive set of audiovisual Description Tools (the metadata elements and their structure and relationships, that are defined by the standard in the form of Descriptors and Description Schemes) to create descriptions (i.e., a set of instantiated Description Schemes and their corresponding Descriptors at the users will), which will form the basis for applications enabling the needed effective and efficient access (search, filtering and browsing) to multimedia content. This is a challenging task given the broad spectrum of requirements and targeted multimedia applications, and the broad number of audiovisual features of importance in such context. MPEG-7 has been developed by experts representing broadcasters, electronics manufacturers, content creators and managers, publishers and intellectual property rights managers, telecommunication service providers and academia. More information about MPEG-7 can be found at the MPEG-7 website (mpeg.tilab.comcselt.it/) and the MPEG-7 Alliance website (http://www.mpeg-industry.com/). These web pages contain links to a wealth of information about MPEG, including much about MPEG-7, many publicly available documents, several lists of Frequently Asked Questions and links to other MPEG-7 web pages. 1.1 Context of MPEG-7 Audiovisual sources will play an increasingly pervasive role in our lives, and there will be a growing need to have these sources processed further. This makes it necessary to develop forms of audiovisual information representation that go beyond the simple waveform or sample-based, compression-based (such as MPEG-1 and MPEG-2) or even objects-based (such as MPEG-4) representations. Forms of representation that allow some degree of interpretation of the informations meaning are necessary. These forms can be passed onto, or accessed by, a device or a computer code. In the examples given above an image sensor may produce visual data not in the form of PCM samples (pixels values) but in the form of objects with associated physical measures and time information. These could then be stored and processed to verify if certain programmed conditions are met. A PVR could receive descriptions of the audiovisual information associated to a program that would enable it to record, for example, only news with the exclusion of sport. Products from a company could be described in such a way that a machine could respond to unstructured queries from customers making inquiries. MPEG-7 is a standard for describing the multimedia content data that will support these operational requirements. The requirements apply, in principle, to both real-time and non real-time as well as push and pull applications. MPEG-7 does not standardize or evaluate applications. In the development of the MPEG-7 standard applications have been used for understanding the requirements and evaluation of technology. It must be made clear that the requirements are derived from analyzing a wide range of potential applications that could use MPEG-7 descriptions. MPEG-7 is not aimed at any one application in particular; rather, the elements that MPEG-7 standardizes support as broad a range of applications as possible. 1.2 MPEG-7 Objectives Audiovisual data content that has MPEG-7 data associated with it, may include: still pictures, graphics, 3D models, audio, speech, video, and composition information about how these elements are combined in a multimedia presentation (scenarios). A special case of these general data types is facial characteristics. MPEG-7 Description Tools do, however, not depend on the ways the described content is coded or stored. It is possible to create an MPEG-7 description of an analogue movie or of a picture that is printed on paper, in the same way as of digitised content. MPEG-7, like the other members of the MPEG family, is a standard representation of audio-visual information satisfying particular requirements. The MPEG-7 standard builds on other (standard) representations such as analogue, PCM, MPEG-1, -2 and ?4. One functionality of the MPEG-7 standard is to provide references to suitable portions of them. For example, perhaps a shape descriptor used in MPEG-4 is useful in an MPEG-7 context as well, and the same may apply to motion vector fields used in MPEG-1 and MPEG-2. MPEG-7 allows different granularity in its descriptions, offering the possibility to have different levels of discrimination. Even though the MPEG-7 description does not depend on the (coded) representation of the material, MPEG-7 can exploit the advantages provided by MPEG-4 coded content. If the material is encoded using MPEG-4, which provides the means to encode audio-visual material as objects having certain relations in time (synchronisation) and space (on the screen for video, or in the room for audio), it will be possible to attach descriptions to elements (objects) within the scene, such as audio and visual objects. Because the descriptive features must be meaningful in the context of the application, they will be different for different user domains and different applications. This implies that the same material can be described using different types of features, tuned to the area of application. To take the example of visual material: a lower abstraction level would be a description of e.g. shape, size, texture, colour, movement (trajectory) and position (where in the scene can the object be found?); and for audio: key, mood, tempo, tempo changes, position in sound space. The highest level would give semantic information: This is a scene with a barking brown dog on the left and a blue ball that falls down on the right, with the sound of passing cars in the background. Intermediate levels of abstraction may also exist. The level of abstraction is related to the way the features can be extracted: many low-level features can be extracted in fully automatic ways, whereas high level features need (much) more human interaction. Next to having a description of what is depicted in the content, it is also required to include other types of information about the multimedia data: - The form - An example of the form is the coding format used
(e.g. JPEG, MPEG-2), or the overall data size. This information helps determining
whether the material can be read by the users terminal;
Therefore, MPEG-7 Description Tools allows to create descriptions (i.e., a set of instantiated Description Schemes and their corresponding Descriptors at the users will) of content that may include: - Information describing the creation and production processes
of the content (director, title, short feature movie). All these descriptions are of course coded in an efficient way for searching, filtering, etc. To accommodate this variety of complementary content descriptions, MPEG-7 approaches the description of content from several viewpoints. The sets of Description Tools developed on those viewpoints are presented here as separate entities. However, they are interrelated and can be combined in many ways. Depending on the application, some will present and others can be absent or only partly present. A description generated using MPEG-7 Description Tools will be associated with the content itself, to allow fast and efficient searching for, and filtering of material that is of interest to the user. MPEG-7 data may be physically located with the associated AV material, in the same data stream or on the same storage system, but the descriptions could also live somewhere else on the globe. When the content and its descriptions are not co-located, mechanisms that link the multimedia material and their MPEG-7 descriptions are needed; these links will have to work in both directions. MPEG-7 addresses many different applications in many different environments, which means that it needs to provide a flexible and extensible framework for describing audiovisual data. Therefore, MPEG-7 does not define a monolithic system for content description but rather a set of methods and tools for the different viewpoints of the description of audiovisual content. Having this in mind, MPEG-7 is designed to take into account all the viewpoints under consideration by other leading standards such as, among others, TV Anytime, Dublin Core, SMPTE Metadata Dictionary, and EBU P/Meta. These standardisation activities are focused to more specific applications or application domains, whilst MPEG-7 has been developed as generic as possible. MPEG-7 uses also XML as the language of choice for the textual representation of content description, as XML Schema has been the base for the DDL (Description Definition Language) that is used for the syntactic definition of MPEG-7 Description Tools and for allowing extensibility of Description Tools (either new MPEG-7 ones or application specific). Considering the popularity of XML, usage of it will facilitate interoperability with other metadata standards in the future. The main elements of the MPEG-7s standard are: - Description Tools: Descriptors (D), that define the syntax
and the semantics of each feature (metadata element); and Description Schemes
(DS), that specify the structure and semantics of the relationships between
their components, that may be both Descriptors and Description Schemes;
|