MPEG7:Moving Picture Experts Group 7 full report
Active In SP
Joined: Apr 2010
01-05-2010, 07:53 PM
Moving Picture Experts Group-7-MPEG_7.doc (Size: 558 KB / Downloads: 92)
Syed Shafeeq Ahmed
Electronics and Communication Engineering
Visvesvaraya Technological University
An immeasurable amount of multimedia information is available today in digital archives, on the Web, in broadcast data streams, and in personal and professional databases and this amount continues to grow. Yet, the value of that information depends on how easily we can manage, find, retrieve, access, and filter it. The transition between two millennia abounds with new ways to produce, offer, filter, search, and manage digitized multimedia information. Broadband is being offered with increasing audio and video quality, using ever-improving access speeds on both fixed and mobile networks. As a result, users are confronted with numerous content sources. Wading through these sources, and finding what you need and what you like in the vast content sea, is becoming a daunting task.
MPEG-7â€developed by the Moving Picture Experts Group (MPEG)â€addresses this content management challenge. (This same International Organization for Standardization [ISO] committee also developed the successful standards known as MPEG-1 , MPEG-2 , and MPEG-4 [version 1 in 1998 and version 2 in 1999].) The recently completed ISO/IEC International Standard 15938, formally called the Multimedia Content Description Interface (but better known as MPEG-7), provides a rich set of tools for completely describing multimedia content. The standard wasnâ„¢t just designed from a content management viewpoint (classical archival information). It includes an innovative description of the mediaâ„¢s content, which we can extract via content analysis and processing. MPEG-7 also isnâ„¢t aimed at any one application; rather, the elements that MPEG-7 standardizes support as broad a range of applications as possible. This is one of the key differences between MPEG-7 and other metadata standards; it aims to be generic, not targeted to a specific application or application domain. This article provides a comprehensive overview of MPEG-7â„¢s motivation, objectives, scope, and components. MPEG-7 offers a comprehensive set of audiovisual Description Tools (the metadata elements and their structure and relationships, that are defined by the standard in the form of Descriptors and Description Schemes) to create descriptions (i.e., a set of instantiated Description Schemes and their corresponding Descriptors at the users will), which will form the basis for applications enabling the needed effective and efficient access (search, filtering and browsing) to multimedia content.
Established in 1988, the Moving Picture Experts Group (MPEG) has developed digital audiovisual compression standards that have changed the way audiovisual content is produced by manifold industries, delivered through all sorts of distribution channels and consumed by a variety of devices.
Accessing audio and video used to be a simple matter - simple because of the simplicity of the access mechanisms and because of the poverty of the sources. An incommensurable amount of audiovisual information is becoming available in digital form, in digital archives, on the World Wide Web, in broadcast data streams and in personal and professional databases, and this amount is only growing. The value of information often depends on how easy it can be found, retrieved, accessed, filtered and managed.
The transition between the second and third millennium abounds with new ways to produce, offer, filter, search, and manage digitized multimedia information. Broadband is being offered with increasing audio and video quality and speed of access. The trend is clear: in the next few years, users will be confronted with such a large number of contents provided by multiple sources that efficient and accurate access to this almost infinite amount of content seems unimaginable today. Inspite of the fact that users have increasing access to these resources, identifying and managing them efficiently is becoming more difficult, because of the sheer volume. This applies to professional as well as end users. The question of identifying and managing content is not just restricted to database retrieval applications such as digital libraries, but extends to areas like broadcast channel selection, multimedia editing, and multimedia directory services.
This challenging situation demands a timely solution to the problem. MPEG-7 is the answer to this need.
MPEG-7 is an ISO/IEC standard developed by MPEG (Moving Picture Experts Group), the committee that also developed the successful standards known as MPEG-1 (1992) and MPEG-2 (1994), and the MPEG-4 standard (Version 1 in 1998, and version 2 in 1999). The MPEG-1 and MPEG-2 standards have enabled the production of widely adopted commercial products, such as Video CD, MP3, digital audio broadcasting (DAB), DVD, digital television (DVB and ATSC), and many video-on-demand trials and commercial services. MPEG-4 is the first real multimedia representation standard, allowing interactivity and a combination of natural and synthetic material coded in the form of objects (it models audiovisual data as a composition of these objects). MPEG-4 provides the standardized technological elements enabling the integration of the production, distribution and content access paradigms of the fields of interactive multimedia, mobile multimedia, interactive graphics and enhanced digital television.
The MPEG-7 standard, formally named "Multimedia Content Description Interface", provides a rich set of standardized tools to describe multimedia content. Both human users and automatic systems that process audiovisual information are within the scope of MPEG-7.
MPEG-7 offers a comprehensive set of audiovisual Description Tools (the metadata elements and their structure and relationships, that are defined by the standard in the form of Descriptors and Description Schemes) to create descriptions (i.e., a set of instantiated Description Schemes and their corresponding Descriptors at the users will), which will form the basis for applications enabling the needed effective and efficient access (search, filtering and browsing) to multimedia content. This is a challenging task given the broad spectrum of requirements and targeted multimedia applications, and the broad number of audiovisual features of importance in such context.
MPEG-7 has been developed by experts representing broadcasters, electronics manufacturers, content creators and managers, publishers, intellectual property rights managers, telecommunication service providers and academia.
The family of MPEG standards
MPEG is a working group of the International Organization for Standardization/International Electronics Commission (ISO/IEC), in charge of developing international standards for compression, decompression, processing, and coded representation of moving pictures, audio, and their combination. So far, MPEG has produced MPEG- 1, MPEG-2, MPEG-4 version 1, and is currently working on MPEG-4 version 2 and MPEG-7.
2.1 MPEG-1: Storage and retrieval
MPEG-1 is the standard for storage and retrieval of moving pictures and audio on storage media. MPEG-1 provides a nominal data stream compression rate of about 1.2 Mbits per secondâ€the typical CD-ROM data transfer rateâ€but can deliver data at a rate of up to 1,856,000 bps. MPEG-1 distinguishes four types of image coding for processing: I (intra-coded pictures), P (predictive coded pictures), B (bidirectionally predictive pictures), and D-Frame (coding based on discrete cosine only parameter) images. To allow audio compression in acceptable quality, MPEG-1 enables audio data rates between 32 and 448 Kbps. MPEG-1 explicitly considers other standards and functionalities, such as JPEG and H.261, suitable for symmetric and asymmetric compression. It also provides a system definition to specify the combination of several individual data streams. Note that MPEG-1 doesnâ„¢t prescribe compression in real time. Furthermore, though MPEG-1 defines the process of decoding, it doesnâ„¢t define the decoder itself. The quality of an MPEG-1 video without sound at roughly 1.2 Mbps (the single speed CD-ROM transfer rate) is equivalent to a VHS recording. We should mention that MPEG-1 provides a means for transmitting metadata. In general, two mechanisms exist, the transmission of z user data extensions within a video stream or z data in a separated private data stream that gets multiplexed with the audio and video stream as part of the system stream. Since both methods attach additional data into the MPEG-1 stream, they either increase the demand of bandwidth for transmission/storage or reduce the quality of the audio-visual streams for a given bandwidth. No format for the coding of those extra streams was defined, which led to proprietary solutions. This might explain why these mechanisms arenâ„¢t widely adopted.
2.2 MPEG-2: Digital television
MPEG-2, the digital television standard, strives for a higher resolutionâ€up to 100 Mbpsâ€that resembles the digital video studio standard CCIR 601 and the video quality needed in HDTV. As a compatible extension to MPEG-1, MPEG-2 supports interlaced video formats and a number of other advanced features, such as those to support HDTV. As a generic standard, MPEG-2 was defined in terms of extensible profiles, each supporting the feature required by an important application class. The Main Profile, for example, supports digital video transmission at a range of 2 to 80 Mbps over cable, satellite, and other broadcast channels. Furthermore, it supports digital storage and other communications applications. An essential extension from MPEG-1 to MPEG-2 is the ability to scale the compressed video, which allows the encoding of video at different qualities (spatial-, rate-, and amplitude-based scaling2). The MPEG-2 audio coding was developed for low bit-rate coding of multichannel audio. MPEG- 2 extends the MPEG-1 standard by providing five full bandwidth channels, two surround channels, one channel to improve low frequencies, and/or seven multilingual channels, and the coding of mono and stereo (at 16 kHz, 22.05 kHz, and 24 kHz). Nevertheless, MPEG-2 is still backward compatible with MPEG-1. MPEG-2 provides an MPEG-2 system with definitions of how video, audio, and other data combine into single or multiple streams suitable for storage and transmission. Furthermore, it provides syntactical and semantical rules that synchronize decoding and presentation of audio and video information.
With respect to transmission/storage, the same mechanisms developed for MPEG-1 were assigned to MPEG-2. Additionally, some of the MPEG-2 header contains a structured information block, covering such application-related information as copyright and conditional access. The amount of information is restricted to a number of bytes. Reimers3 described an extensive structuring of content, coding, and access of such metadata within MPEG-2. Originally, there were plans to specify MPEG-3 as a standard approaching HDTV. However, during the development of MPEG-2, researchers found that it scaled up adequately to meet HDTV requirements. Thus, MPEG-3 was dropped.
2.3 MPEG-4: Multimedia production, distribution, and content access
Though the results of MPEG-1 and MPEG-2 served well for wide-ranging developments in such fields as interactive video, CD-ROM, and digital TV, it soon became apparent that multimedia applications required more than the established achievements. Thus, in 1993 MPEG started working to provide the standardized technological elements enabling the integration of the production, distribution, and content access paradigms of digital TV, interactive graphics applications (synthetic content), and interactive multimedia (distribution of and access to enhanced content on the Web). MPEG-4 version 1, formally called ISO/IEC 14496, has been available as an international standard since December 1998. The second version will be finished in December 1999. MPEG-4 aims to provide a set of technologies to satisfy the needs of authors, service providers, and end users, by avoiding the emergence of a multitude of proprietary, incompatible formats and players. The standard should allow the development of systems that can be configured for a vast number of applications (among others, real-time communications, surveillance, and mobile multimedia). To achieve this requires providing standardized ways to z Interact with the material, based on encoding units of aural, visual, or audio-visual content,called media objects. These media objects can be natural or synthetic, which means they could be recorded with a camera or microphone, or generated with a computer. z Interact with the content, based on the
3.1 Introduction to MPEG-7
MPEG-7 is a standard for describing features of multimedia content.MPEG-7, formally named Multimedia Content Description Inter-face, is the standard that describes multimedia content so users can search, browse, and retrieve that content more efficiently and effectively than they could using todayâ„¢s mainly text-based search engines. Itâ„¢s a standard for describing the features of multimedia content.
MPEG-7 provides the worldâ„¢s richest set of audio-visual descriptions.
These descriptions are based on catalogue (e.g., title, creator, rights), semantic (e.g., the who, what, when, where information about objects and events) and structural (e.g., the colour histogram - measurement of the amount of colour associated with an image or the timbre of an recorded instrument) features of the AV content and leverages on AV data representation defined by MPEG-1, 2 and 4.
Comprehensive Scope of Data Interoperability:
MPEG-7 uses XML(Extensible Mark up Language) Schema as the language of choice for content description MPEG-7 will be interoperable with other leading standards such as, SMPTE Metadata Dictionary, Dublin Core, EBU P/Meta, and TV Anytime. XML has not been designed to deal ideally in a real-time, constrained and streamed environment like in the multimedia or mobile industry. As long as structured documents (HTML, for instance) were basically composed of only few embedded tags, the overhead induced by textual representation was not critical. MPEG-7 standardizes an XML language for audiovisual metadata. MPEG-7 uses XML to model this rich and structured data. To overcome the lack of efficiency of textual XML, MPEG-7 Systems defines a generic framework to facilitate the carriage and processing of MPEG-7 descriptions: BiM (Binary Format for MPEG-7). It enables the streaming and the compression of any XML documents.
BiM coders and decoders can deal with any XML language. Technically, the schema definition (DTD or XML Schema) of the XML document is processed and used to generate a binary format. This binary format has two main properties. First, due to the schema knowledge, structural redundancy (element name, attribute names, aso) is removed from the document. Therefore the document structure is highly compressed (98% in average). Second, elements and attributes values are encoded according to some dedicated codecs. A library of basic datatype codecs is provided by the specification (IEEE 754, UTF_8, compact integers, VLC integers, lists of values, aso...). Other codecs can easily be plugged using the type-codec mapping mechanism.
3.2 MPEG-7 Elements
In October 1996, MPEG started a new work item to provide a solution to the questions described above. The new member of the MPEG family, named "Multimedia Content Description Interface" (in short MPEG-7), provides standardized core technologies allowing the description of audiovisual data content in multimedia environments. It extends the limited capabilities of proprietary solutions in identifying content that exist today, notably by including more data types.
The main elements of the MPEG-7 standard are:
Â¢ Description Tools: Descriptors (D), that define the syntax and the semantics of each feature (metadata element); and Description Schemes (DS), that specify the structure and semantics of the relationships between their components, that may be both Descriptors and Description Schemes; A Descriptor defines the syntax and semantics of each feature. For example, for the color feature, the color histogram or the text of the title is the descriptor. The director of a multimedia document or a texture in a single picture is also an example of a descriptor .A Description Scheme specifies the structure and semantics of the relationships between its components which may be both Descriptors and Description Schemes.
Â¢ A Description Definition Language (DDL) to define the syntax of the MPEG-7 Description Tools and to allow the creation of new Description Schemes and, possibly, Descriptors and to allow the extension and modification of existing Description Schemes;
Â¢ System tools, to support binary coded representation for efficient storage and transmission, transmission mechanisms (both for textual and binary formats), multiplexing of descriptions, synchronization of descriptions with content, management and protection of intellectual property in MPEG-7 descriptions,etc
Figure 3.1 :MPEG-7 main elements
3.2.1 Basic structures
There are five Visual related Basic structures: the Grid layout, the Time series, Multiple view, the Spatial 2D coordinates, and Temporal interpolation.
126.96.36.199 Grid layout
The grid layout is a splitting of the image into a set of equally sized rectangular regions, so that each region can be described separately. Each region of the grid can be described in terms of other Descriptors such as color or texture. Furthermore, the descriptor allows to assign the sub Descriptors to all rectangular areas, as well as to an arbitrary subset of rectangular regions.
188.8.131.52 Time Series
This descriptor defines a temporal series of Descriptors in a video segment and provides image to video-frame matching and video-frames to video-frames matching functionalities. Two types of TimeSeries are available: Regular TimeSeries and Irregular TimeSeries. In the former, Descriptors locate regularly (with constant intervals) within a given time span. This enables a simple representation for the application that requires low complexity. On the other hand, Descriptors locate irregularly (with various intervals) within a given time span in the latter. This enables an efficient representation for the application that has the requirement of narrow transmission bandwidth or low storage capability. These are useful in particular to build Descriptors that contain time series of Descriptors.
184.108.40.206 2D-3D Multiple View
The 2D/3D Descriptor specifies a structure which combines 2D Descriptors representing a visual feature of a 3D object seen from different view angles. The descriptor forms a complete 3D view-based representation of the object. Any 2D visual descriptor, such as for example contour-shape, region-shape, colour or texture can be used. The 2D/3D descriptor supports integration of the 2D Descriptors used in the image plane to describe features of the 3D (real world) objects. The descriptor allows the matching of 3D objects by comparing their views, as well as comparing pure 2D views to 3D objects.
220.127.116.11 Spatial 2D Coordinates
This description defines a 2D spatial coordinate system and a unit to be used by reference in other Ds/DSs when relevant. The coordinate system is defined by a mapping between an image and the coordinate system. One of the advantages using this descriptor is that MPEG-7 descriptions need not to be modified even if the image size is changed or a part of the image is clipped. In this case, only the description of the mapping from the original image to the edited image is required.
It supports two kinds of coordinate systems: "local" and "integrated" (see Figure 3.2). In a "local" coordinate system, the coordinates used for the calculation of the description is mapped to the current coordinate system applicable. In an "integrated" coordinate system, each image (frame) of e.g. a video may be mapped to different areas with respect to the first frame of a shot or video. The integrated coordinate system can for instance be used to represent coordinates on a mosaic of a video shot.
a) "Local" coordinates b) "Integrated" coordinates
Figure 3.2: "local" and "integrated" coordinate system
18.104.22.168 Temporal Interpolation
The Temporal Interpolation Dimensional describes a temporal interpolation using connected polynomials. This can be used to approximate multi-dimensional variable values that change with timeâ€such as an object position in a video. The description size of the temporal interpolation is usually much smaller than describing all values. In Figure 3.3 real values are represented by five linear interpolation functions and two quadratic interpolation functions. The beginning of the temporal interpolation is always aligned to time 0.
Figure 3.3: Real Data and Interpolation functions
3.2.2 Encoding and Delivery
BiM(Binary format for MPEG-7) coders and decoders can deal with any XML language. Technically, the schema definition (DTD or XML Schema) of the XML document is processed and used to generate a binary format. This binary format has two main properties. First, due to the schema knowledge, structural redundancy (element name, attribute names) is removed from the document. Second, elements and attributes values are encoded according to some dedicated codecs.
Technical description of MPEG-7
This section contains a detailed overview of the different MPEG-7 technologies that are currently standardized. First the MPEG-7 Multimedia Descriptions Schemes are described as the other Description Tools (Visual and Audio ones) are used always wrapped in some MPEG-7 MDS descriptions. Afterwards the Visual and Audio Description Tools are described in detail. Then the DDL is described, paving the ground for describing the MPEG-7 formats, both textual (TeM) and binary (BiM). Then the MPEG-7 terminal architecture is presented, followed by the Reference Software. Finally the MPEG-7 Conformance specification and the Extraction and Use of Descriptions Technical Report are explained.
4.1 MPEG-7 Multimedia Description Schemes
MPEG-7 Multimedia Description Schemes (DSs) are metadata structures for describing and annotating audio-visual (AV) content. The DSs provide a standardized way of describing in XML the important concepts related to AV content description and content management in order to facilitate searching, indexing, filtering, and access. The DSs are defined using the MPEG-7 Description Definition Language (DDL), which is based on the XML Schema Language, and are instantiated as documents or streams. The resulting descriptions can be expressed in a textual form (i.e., human readable XML for editing, searching, filtering) or compressed binary form (i.e., for storage or transmission). In this paper, they provide an overview of the MPEG-7 Multimedia DSs and describe their targeted functionality and use in multimedia applications.
The goal of the MPEG-7 standard is to allow interoperable searching, indexing, filtering and access of audio-visual (AV) content by enabling interoperability among devices and applications that deal with AV content description. MPEG-7 describes specific features of AV content as well as information related to AV content management. MPEG-7 descriptions take two possible forms: (1) a textual XML form suitable for editing, searching, and filtering, and (2) a binary form suitable for storage, transmission, and streaming delivery. Overall, the standard specifies four types of normative elements: Descriptors, Description Schemes (DSs), a Description Definition Language (DDL), and coding schemes.
The MPEG-7 Descriptors are designed primarily to describe low-level audio or visual features such as color, texture, motion, audio energy, and so forth, as well as attributes of AV content such as location, time, quality, and so forth. It is expected that most Descriptors for low-level features shall be extracted automatically in applications.
On the other hand, the MPEG-7 DSs are designed primarily to describe higher-level AV features such as regions, segments, objects, events; and other immutable metadata related to creation and production, usage, and so forth. The DSs produce more complex descriptions by integrating together multiple Descriptors and DSs, and by declaring relationships among the description components. In MPEG-7, the DSs are categorized as pertaining to the multimedia, audio, or visual domain. Typically, the multimedia DSs describe content consisting of a combination of audio, visual data, and possibly textual data, whereas, the audio or visual DSs refer specifically to features unique to the audio or visual domain, respectively. In some cases, automatic tools can be used for instantiating the DSs, but in many cases instantiating DSs requires human assisted extraction or authoring tools.
4.2 Organization of MDS tools
Figure 4.1 provides an overview of the organization of MPEG-7 Multimedia DSs into the following areas: Basic Elements, Content Description, Content Management, Content Description, Content Organization, Navigation and Access, and User Interaction.
Figure 4.1 : Overview of the MPEG-7 Multimedia DSs
4.2.1 Basic Elements
MPEG-7 provides a number of Schema Tools that assist in the formation, packaging, and annotation of MPEG-7 descriptions. An MPEG-7 description begins with a root element that signifies whether the description is complete or partial. A complete description provides a complete, standalone description of AV content for an application. On the other hand, a description unit carries only partial or incremental information that possibly adds to an existing description. In the case of a complete description, an MPEG-7 top-level element follows the root element. The top-level element orients the description around a specific description task, such as the description of a particular type of AV content, for instance an image, video, audio, or multimedia, or a particular function related to content management, such as creation, usage, summarization, and so forth. The top-level types collect together the appropriate tools for carrying out the specific description task. In the case of description units, the root element can be followed by an arbitrary instance of an MPEG-7 DS or Descriptor. Unlike a complete description which usually contains a "semantically-complete" MPEG-7 description, a description unit can be used to send a partial description as required by an application â€œ such as a description of a place, a shape and texture descriptor and so on. The Package DS describes a user-defined organization of MPEG-7 DSs and Ds into a package, which allows the organized selection of MPEG-7 tools to be communicated to a search engine or user. Furthermore, the Description Metadata DS describes metadata about the description, such as creation time, extraction instrument, version, confidence, and so forth.
A number of basic elements are used throughout the MDS specification as fundamental constructs in defining the MPEG-7 DSs. The basic data types provide a set of extended data types and mathematical structures such as vectors and matrices, which are needed by the DSs for describing AV content. The basic elements include also constructs for linking media files, localizing pieces of content, and describing time, places, persons, individuals, groups, organizations, and other textual annotation. We briefly discuss the MPEG-7 approaches for describing time and textual annotations.
Figure 4.2: Overview of the Time DSs
Temporal Information: the DSs for describing time are based on the ISO 8601 standard, which has also been adopted by the XML Schema language. The Time DS and Media Time DS describe time information in the real world and in media streams, respectively. Both follow the same strategy described in Figure 4.2. Figure 4.2 A illustrates the simplest way to describe a temporal instant and a temporal interval. A time instant, t1, can be described by a lexical representation using the Time Point. An interval, [t1, t2], can be described by its starting point, t1, (using the Time Point) and a Duration, t2 - t1. An alternative way to describe a time instant is shown in Figure 4.2 B. It relies on Relative Time Point. The instant, t1, is described by a temporal offset with respect to a reference, t0, called Time Base. Note that the goal of the Relative Time Point is to define a temporal instant, t1, and not an interval as the Duration in Figure 4.2 A. Finally, Figure 4.2 C illustrates the specification of time using a predefined interval called Time Unit and counting the number of intervals.
This specification is particularly efficient for periodic or sampled temporal signals. Since the strategy consists of counting Time Units, the specification of a time instant has to be done relative to a Time Base (or temporal origin). In Figure 4.2 C, t1 is defined with a Relative Incremental Time Point by counting 8 Time Units (starting from t0). An interval [t1, t2], can also be defined by counting Time Units. In Figure 4.2 C, Incremental Duration is used to count 13 Time Units to define the interval [t1, t2].
Textual Annotation: text annotation is an important component of many DSs.MPEG-7 provides a number of different basic constructs for textual annotation. The most flexible text annotation construct is the data type for free text. Free text allows the formation of an arbitrary string of text, which optionally includes information about the language of the text. Moreover, more complex textual annotations can also be defined by describing explicitly the syntactic dependency between the grammatical elements forming sentences (for example, relation between a verb and a subject, etc.). This last type of textual annotation is particularly useful for applications where the annotation will be processed automatically. Lastly, MPEG-7 provides constructs for classification schemes and controlled terms. The classification schemes provide a language independent set of terms that form a vocabulary for a particular application or domain. Controlled terms are used in descriptions to make reference to the entries in the classification schemes. Allowing controlled terms to be described by classification schemes offers advantages over the standardization of fixed vocabularies for different applications and domains, since it is likely that the vocabularies for multimedia applications will evolve over time.
4.2.2 Content Management
MPEG-7 provides DSs for AV content management. These tools describe the following information: (1) creation and production, (2) media coding, storage and file formats, and (3) content usage. More details about the MPEG-7 tools for content management are described as follows [Many of the components of the content management DSs are optional. The instantiation of the optional components is often decided in view of the specific multimedia application.]: The Creation Information describes the creation and classification of the AV content and of other related materials. The Creation information provides a title (which may itself be textual or another piece of AV content), textual annotation, and information such as creators, creation locations, and dates. The classification information describes how the AV material is classified into categories such as genre, subject, purpose, language, and so forth. It provides also review and guidance information such as age classification, parental guidance, and subjective review. Finally, the Related Material information describes whether there exists other AV materials that are related to the content being described.The Media Information describes the storage media such as the format, compression, and coding of the AV content. The Media Information DS identifies the master media, which is the original source from which different instances of the AV content are produced. The instances of the AV content are referred to as Media Profiles, which are versions of the master obtained perhaps by using different encodings, or storage and delivery formats. Each Media Profile is described individually in terms of the encoding parameters, storage media information and location. The Usage Information describes the usage information related to the AV content such as usage rights, usage record, and financial information. The rights information is not explicitly included in the MPEG-7 description, instead, links are provided to the rights holders and other information related to rights management and protection. The Rights DS provides these references in the form of unique identifiers that are under management by external authorities. The underlying strategy is to enable MPEG-7 descriptions to provide access to current rights owner information without dealing with information and negotiation directly. The Usage Record DS and Availability DSs provide information related to the use of the content such as broadcasting, on demand delivery, CD sales, and so forth. Finally, the Financial DS provides information related to the cost of production and the income resulting from content use. The Usage Information is typically dynamic in that it is subject to change during the lifetime of the AV content.
The Content Management Description Tools allow the description of the life cycle of the content, from content to consumption.
The content described by MPEG-7 descriptions can be available in different modalities, formats, Coding Schemes, and there can be several instances. For example, a concert can be recorded in two different modalities: audio and audio-visual. Each of these modalities can be encoded by different Coding Schemes. This creates several media profiles. Finally, several instances of the same encoded content may be available. These concepts of modality, profile and instance are illustrated in Figure 4.3.
Figure 4.3: Model for content, profile and instance
Â¢ Content: One reality such as a concert in the world can be represented as several types of media, e.g., audio media, audio-visual media. A content is an entity that has a specific structure to represent the reality.
Â¢ Media Information: Physical format of a content is described by Media Information DS. One description instance of the DS will be attached to one content entity to describe it. The DS is centered about an identifier for the content entity and it also has sets of Descriptors for the storage format of the entity.
Â¢ Media Profile: One content entity can have one or more media profiles that correspond to different Coding Schemes of the entity. One of the profiles is the original one, called master profile, that corresponds to initially created or recorded one. The others will be transcoded from the master. If the content is encoded with the same encoding tool but with different parameters, different media profiles are created.
Â¢ Media Instance: A content entity can be instantiated as physical entities called media instances. An identifier and a locator specify the media instance.
Â¢ Creation Information: Information about the creation process of a content entity is described by Creation Information DS. One description instance of the DS will be attached to one content entity to describe it.
Â¢ Usage Information: Information about the usage of a content entity is described by Usage Information DS. One description instance of the DS will be attached to one content entity to describe it.
The only part of the description that depends on the storage media or the encoding format is the Media Information described in this section. The remaining part of the MPEG-7 description does not depend on the various profiles or instances and, as a result, can be used to describe jointly all possible copies of the content.
4.2.3 Media Description Tools
The description of the media involves a single top-level element, the Media Information DS. It is composed of an optional Media Identification D and one or several Media Profile Ds
The Media Identification D contains Description Tools that are specific to the identification of the AV content, independently of the different available instances. The different media profiles of the content are described via their Media Profile and for each Profile there can be different media instances available.
The Media Profile D contains the different Description Tools that allow the description of one profile of the media AV content being described. The profile concept refers to the different variations that can be produced from an original or master media depending of on the values chosen for the coding, storage format, etc. The profile corresponding to the original or master copy of the AV content is considered the master media profile. For each profile there can be one or more media instances of the master media profile.
The Media Profile D is composed of:
Â¢ Media Format D: contains Description Tools that are specific to the coding format of the media profile.
Â¢ Media Instance D: contains the Description Tools that identify and locate the different media instances (copies) available of a media profile.
Â¢ Media Transcoding Hints D: contains Description Tools that specify transcoding hints of the media being described. The purpose of this D is to improve quality and reduce complexity for transcoding applications. The transcoding hints can be used in video transcoding and motion estimation architectures to reduce the computational complexity.
Â¢ Media Quality D: represents quality rating information of an audio or visual content. It can be used to represent both subjective quality ratings and objective quality ratings.
22.214.171.124 Creation & production Description Tools
The creation and production information Description Tools describe author-generated information about the generation/production process of the AV content. This information cannot usually be extracted from the content itself. This information is related to the material but it is not explicitly depicted in the actual content.
The description of the creation and production information has as top-level element, the Creation Information DS, which is composed of one Creation D, zero or one Classification D, and zero or several Related Material Ds.
The Creation D contains the Description Tools related to the creation of the content, including places, dates, actions, materials, staff (technical and artistic) and organizations involved.
The Classification D contains the Description Tools that allow classifying the AV content. The Classification D is used for the description of the classification of the AV content. It allows searching and filtering based on user preferences regarding user-oriented classifications (e.g., language, style, genre, etc.) and service-oriented classifications (e.g., purpose, parental guidance, market segmentation, media review, etc.).
The Related Material D contains the Description Tools related to additional information about the AV content available in other materials.
126.96.36.199 Content usage Description Tools
The content usage information Description Tools describe information about the usage process of the AV content.
The description of the usage information is enabled by the Usage Information DS, which may include one Rights D, zero or one Financial D, and zero or several Availability Ds and Usage Record Ds.
It is important to note that the Usage Information DS description may incorporate new descriptions each time the content is used (e.g., Usage Record DS, Income in Financial datatype), or when there are new ways to access to the content (e.g., Availability D).
The Rights datatype gives access to the information to the rights holders of the annotated content (IPR) and the Access Rights.
The Financial datatype contains information related to the costs generated and income produced by AV content. The notions of partial costs and incomes allows the classification of different costs and incomes as a function of their type. Total and subtotal costs and incomes are to be calculated by the application from these partial values.
The Availability DS contains the Description Tools related to the availability for use of the content.
The Usage Record DS contains the Description Tools related to the past use of the content.
MPEG-7 provides DSs for description of the structure and semantics of AV content. The structural tools describe the structure of the AV content in terms of video segments, frames, still and moving regions and audio segments. The semantic tools describe the objects, events, and notions from the real world that are captured by the AV content.
The functionality of each of these classes of DSs is given as follows:
Structural aspects: describes the audio-visual content from the viewpoint of its structure. The Structure DSs are organized around a Segment DS that represents the spatial, temporal or spatio-temporal structure of the audio-visual content. The Segment DS can be organized into a hierarchical structure to produce a Table of Content for accessing or Index for searching the audio-visual content. The Segments can be further described on the basis of perceptual features using MPEG-7 Descriptors for color, texture, shape, motion, audio features, and so forth, and semantic information using Textual Annotations.
Conceptual aspects: describes the audio-visual content from the viewpoint of real-world semantics and conceptual notions. The Semantic DSs involve entities such as objects, events, abstract concepts and relationships. The Structure DSs and Semantic DSs are related by a set of links, which allows the audio-visual content to be described on the basis of both content structure and semantics together. The links relate different Semantic concepts to the instances within the audio-visual content described by the Segments.
Most of the MPEG-7 content description and content management DSs are linked together, and in practice, the DSs are included within each other in MPEG-7 descriptions. For example, Usage information, Creation and Production, and Media information can be attached to individual Segments identified in the MPEG-7 description of audio-visual content structure. Depending on the application, some aspects of the audio-visual content description can be emphasized, such as Semantics or Creation description, while others can be minimized or ignored, such Media or Structure description.
188.8.131.52 Navigation and Access
MPEG-7 provides also DSs for facilitating browsing and retrieval of audio-visual content by defining summaries, partitions and decompositions, and variations of the audio-visual material.
Summaries: provide compact summaries of the audio-visual content to enable discovery, browsing, navigation, visualization and sonification of audio-visual content. The Summary DSs involve two types of navigation modes: hierarchical and sequential. In the hierarchical mode, the information is organized into successive levels, each describing the audio-visual content at a different level of detail. In general, the levels closer to the root of the hierarchy provide more coarse summaries, and levels further from the root provide more detailed summaries. The sequential summary provides a sequence of images or video frames, possibly synchronized with audio, which may compose a slide-show or audio-visual skim.
Partitions and Decompositions: describe different decompositions of the audio-visual signals in space, time and frequency. The partitions and decompositions can be used to describe different views of the audio-visual data, which is important for multi-resolution access and progressive retrieval.
Variations: provide information about different variations of audio-visual programs, such as summaries and abstracts; scaled, compressed and low-resolution versions; and versions with different languages and modalities â€œ audio, video, image, text, and so forth. One of the targeted functionalities of the Variation DS is to allow the selection of the most suitable variation of an audio-visual program, which can replace the original, if necessary, to adapt to the different capabilities of terminal devices, network conditions or user preferences.
184.108.40.206 Content Organization
MPEG-7 provides also DSs for organizing and modeling collections of audio-visual content and of descriptions. The Collection DS organizes collections of audio-visual content, segments, events, and/or objects. This allows each collection to be described as a whole based on the common properties. In particular, different models and statistics may be specified for characterizing the attribute values of the collections.
220.127.116.11 User Interaction
Finally, the last set of MPEG-7 DSs deals with User Interaction. The User Interaction DSs describe user preferences and usage history pertaining to the consumption of the multimedia material. This allows, for example, matching between user preferences and MPEG-7 content descriptions in order to facilitate personalization of audio-visual content access, presentation and consumption. The User Interaction DS describe preferences of users pertaining to the consumption of the AV content, as well as usage history. The MPEG-7 AV content descriptions can be matched to the preference descriptions in order to select and personalize AV content for more efficient and effective access, presentation and consumption. The User Preference DS describes preferences for different types of content and modes of browsing, including context dependency in terms of time and place. The User Preference DS describes also the weighting of the relative importance of different preferences, the privacy characteristics of the preferences and whether preferences are subject to update, such as by an agent that automatically learns through interaction with the user. The Usage History DS describes the history of actions carried out by a user of a multimedia system. The usage history descriptions can be exchanged between consumers, their agents, content providers, and devices, and may in turn be used to determine the user preferences with regard to AV content.
4.3 Color Descriptors
There are seven Color Descriptors: Color space, Color Quantization, Dominant Colors, Scalable Color, Color Layout, Color-Structure, and GoF/GoP Color.
4.3.1 Color space
The feature is the color space that is to be used in other color based descriptions. In the current description, the following color spaces are supported:
Â¢ Linear transformation matrix with reference to R, G, B
4.3.2 Color Quantization
This descriptor defines a uniform quantization of a color space. The number of bins which the quantizer produces is configurable, such that great flexibility is provided for a wide range of applications. For a meaningful application in the context of MPEG-7, this descriptor has to be combined with dominant color descriptors, e.g. to express the meaning of the values of dominant colors.
4.3.3 Dominant Color(s)
This color descriptor is most suitable for representing local (object or image region) features where a small number of colors are enough to characterize the color information in the region of interest. Whole images are also applicable, for example, flag images or color trademark images. Color quantization is used to extract a small number of representing colors in each region/image. The percentage of each quantized color in the region is calculated correspondingly. A spatial coherency on the entire descriptor is also defined, and is used in similarity retrieval.
4.3.4 Scalable Color
The Scalable Color Descriptor is a Color Histogram in HSV Color Space, which is encoded by a Haar transform. Its binary representation is scalable in terms of bin numbers and bit representation accuracy over a broad range of data rates. The Scalable Color Descriptor is useful for image-to-image matching and retrieval based on color feature. Retrieval accuracy increases with the number of bits used in the representation.
4.3.5 Color Layout
This descriptor effectively represents the spatial distribution of color of visual signals in a very compact form. This compactness allows visual signal matching functionality with high retrieval efficiency at very small computational costs. It provides image-to-image matching as well as ultra high-speed sequence-to-sequence matching, which requires so many repetitions of similarity calculations. It also provides very friendly user interface using hand-written sketch queries since this descriptors captures the layout information of color feature. The sketch queries are not supported in other color descriptors.
The advantages of this descriptor are:
Â¢ that there are no dependency on image/video format, resolutions, and bit-depths. The descriptor can be applied to any still pictures or video frames even though their resolutions are different. It can be also applied both to a whole image and to any connected or unconnected parts of an image with arbitrary shapes.
Â¢ that the required hardware/software resources for the descriptor is very small. It needs as low as 8 bytes per image in the default video frame search, and the calculation complexity of both extraction and matching is very low. It is feasible to apply this descriptor to mobile terminal applications where the available resources is strictly limited due to hardware constrain.
Â¢ that the captured feature is represented in frequency domain, so that users can easily introduce perceptual sensitivity of human vision system for similarity calculation.
Â¢ that it supports scalable representation of the feature by controlling the number of coefficients enclosed in the descriptor. The user can choose any representation granularity depending on their objectives without interoperability problems in measuring the similarity among the descriptors with different granularity. The default number of coefficients is 12 for video frames while 18 coefficients are also recommended for still pictures to achieve a higher accuracy.
4.3.6 Color-Structure Descriptor
The Color structure descriptor is a color feature descriptor that captures both color content (similar to a color histogram) and information about the structure of this content. Its main functionality is image-to-image matching and its intended use is for still-image retrieval, where an image may consist of either a single rectangular frame or arbitrarily shaped, possibly disconnected, regions. The extraction method embeds color structure information into the descriptor by taking into account all colors in a structuring element of 8x8 pixels that slides over the image, instead of considering each pixel separately. Unlike the color histogram, this descriptor can distinguish between two images in which a given color is present in identical amounts but where the structure of the groups of pixels having that color is different in the two images. Color values are represented in the double-coned HMMD color space, which is quantized non-uniformly into 32, 64, 128 or 256 bins. Each bin amplitude value is represented by an 8-bit code. The Color Structure descriptor provides additional functionality and improved similarity-based image retrieval performance for natural images compared to the ordinary color histogram.
4.3.7 GoF/GoP Color
The Group of Frames/Group of Pictures color descriptor extends the Scalable Color descriptor that is defined for a still image to color description of a video segment or a collection of still images. Additional two bits allows to define how the color histogram was calculated, before the Haar transform is applied to it: by average, median or intersection. The average histogram, which refers to averaging the counter value of each bin across all frames or pictures, is equivalent to computing the aggregate color histogram of all frames and pictures with proper normalization. The Median Histogram refers to computing the median of the counter value of each bin across all frames or pictures. It is more robust to round-off errors and the presence of outliers in image intensity values compared to the average histogram. The Intersection Histogram refers to computing the minimum of the counter value of each bin across all frames or pictures to capture the "least common" color traits of a group of images. Note that it is different from the histogram intersection, which is a scalar measure. The same similarity/distance measures that are used to compare scalable color descriptions can be employed to compare GoF/GoP color Descriptors
MPEG-7 APPLICATION DOMAINS
The elements that MPEG-7 standardizes will support a broad a range of applications (for example, multimedia digital libraries, broadcast media selection, multimedia editing, home entertainment devices, etc.). MPEG-7 will also make the web as searchable for multimedia content as it is searchable for text today. This would apply especially to large content archives, which are being made accessible to the public, as well as to multimedia catalogues enabling people to identify content for purchase. The information used for content retrieval may also be used by agents, for the selection and filtering of broadcasted "push" material or for personalized advertising. Additionally, MPEG-7 descriptions will allow fast and cost-effective usage of the underlying data, by enabling semi-automatic multimedia presentation and editing. All domains making use of multimedia will benefit from MPEG-7 including,
Digital libraries, Education (image catalogue, musical dictionary, Bio-medical imaging cataloguesÂ¦)
Multimedia editing (personalised electronic news service, media autho Cultural services (history museums, art galleries, etc.),
Multimedia directory services (e.g. yellow pages, Tourist information, Geographical information systems)
Broadcast media selection (radio channel, TV channel,Â¦)
Journalism (e.g. searching speeches of a certain politician using his name, his voice or his face)
E-Commerce (personalised advertising, on-line catalogues, directories of e-shops,Â¦) Surveillance (traffic control, surface transportation, non-destructive testing in hostile environments, etc.)
Investigation services (human characteristics recognition, forensics)
Home Entertainment (systems for the management of personal multimedia collections, including manipulation of content, e.g. home video editing, searching a game, karaoke,Â¦)
Social (e.g. dating services)
5.1 Typical applications enabled by MPEG-7 technology include
Â¢ Audio: I want to search for songs by humming or whistling a tune or, using an excerpt of Pavarottiâ„¢s voice, get a list of Pavarottiâ„¢s records and video clips in which Pavarotti sings or simply makes an appearance. Or, play a few notes on a keyboard and retrieve a list of musical pieces similar to the required tune, or images matching the notes in a certain way, e.g. in terms of emotions.
Â¢ Graphics: Sketch a few lines on a screen and get a set of images containing similar graphics, logos, and ideograms.
Â¢ Image: Define objects, including color patches or textures, and get examples from which you select items to compose your image. Or check if your company logo was advertised on a TV channel as contracted.
Â¢ Visual: Allow mobile phone access to video clips of goals scored in a soccer game, or automatically search and retrieve any unusual movements from surveillance videos.
Â¢ Multimedia: On a given set of multimedia objects, describe movements and relations between objects and so search for animations fulfilling the described temporal and spatial relations. Or, describe actions and get a list of scenarios containing such actions.
In this chapter they have introduce in brief some potential application areas and real world applications for MPEG-7. Basically, all application domains making use of multimedia can benefit from MPEG-7. The list below shows some application areas and examples that MPEG-7 is capable of boosting .
Broadcast media selection: media selection for radio and TV channels
E-commerce: personalized advertising, on-line catalogues
Home entertainment: systems for the management of personal multimedia collections
Multimedia editing: personalized electronic news services
Shopping: searching clothes that one likes
Surveillance: traffic control
Mpeg-7 is an ambitious standardization effort from the Motion Pictures Expert Group. A number of open questions still exist, but the established results point to a promising future. However, the most important question still needs to be answered, that is: What is the balance between flexibility and compatibility within MPEG-7
The MPEG-7 working group has to decide whether they follow a specific, bottom-up approach for a few individual domains, or if the intention is to let anyone create their own MPEG-7 solution. The groupâ„¢s decision will have a clear influence on the option of standardizing only the DDL, or a DDL and a core set of the descriptors and description Schemes. MPEG-7 should make a strong showing in some more applications by establishing Descriptions Schemes and variants that would serve the video, image, music, speech, and sound indexing communities well, allowing a number of initial products to target those basic standards.MPEG-7 should provide a level of genericity (in the Descriptors) and power (in the DDL) that will let specialized communities (such as biomedical or remote sensing imaging) adapt the standard to their uses.
Furthermore, MPEG-7â„¢s core goal is to provide interoperability. At the end of MPEG-7, whether version 1 or 2, there should exist a single DDL, a generic set of Descriptors for audio and visual features, and a specific description scheme that serves specific applications. However, even the authors are divided on the question of how to handle cases where a Feature cannot be captured by simply structuring existing Descriptors into a novel Description Scheme. The problem is that a Descriptor built using the DDL might allow the novel Description Scheme to be perfectly parsable, but the new defined Descriptor at the bottom of whatever structure might provide semantic information that other computers canâ„¢t understand. On the other side, introducing a registration body seems more problematic, especially since this might also lead to forced incompatibilities due to a variety of competing but incompatible Descriptors. Ultimately, struggling with these sorts of questions makes the MPEG-7 process intellectually stimulating and rewarding.
We have faith that we will see a standard that provides the compatibility of content descriptions, allowing a given community to adopt it early.MPEG-7 should also offer the flexibility for that community to grow and include other special interests.
 ISO/MPEG N2859,MPEG-7 Requirement,MPEG Requirement Group,Intâ„¢l Organization
for Standardization,Geneva,July 2006.
 MPEG-7: the generic Multimedia Content Description Standard, JosÃƒÂ© M. MartÃƒÂnez,Rob
Koenen, and Fernando Pereira, Copyright Ã‚Â© 2002 IEEE. Reprinted from IEEE Computer
Society, April-June 2002.
 ISO/IEC/TC1/SC29/WG11 N2862MPEG-7 Description Definition Language Document,
v 1.0,Intâ„¢l Organization for Standardization,Geneva,July 2003.
 J.Hunter, A Revised Proposal for an MPEG-7 DDL,M4518,Distributed Systems
Development Centre(DSTC), Brisbane, Australia,1999.
 ISO/IEC/TCI/SC18 WG8, W1920 rev.,Hypermedia/Time-based Structuring
Language(HyTime), 2nd ed., Intâ„¢l Organization for Standardization,Geneva,May 1997.
 S.DeRose and D.Durand, Making Hypermedia Work-A Userâ„¢s Guideto HyTime,Kluwer
Academic Publishers, Boston,1999.
 ISO/MPEG N2469, Call for Proposals for MPEG-7 Tech-nology,MPEG Requirements
Group,Intâ„¢l Organisation for Standardization,Geneva,October 1998.
 ISO/MPEG N2463,MPEG-7 Evaluation Process Document,MPEG Requirement Group,
Intâ„¢l Organisation for Standardization,Geneva,October 1998.
 ISO/MPEG N2467,Description of MPEG-7 Content set,MPEG Requirement Group,
Intâ„¢l Organisation for Standardization,Geneva,October 1998.
 ISO/MPEG N2467 ISO/MPEG N2467,Description of MPEG-7 Content set,MPEG
Requirement Group, Intâ„¢l Organisation for Standardization,Geneva,October 1998.
http://topicideas.org/how-to-mpeg7--8420 for getting all information about mpeg7 or Moving Picture Experts Group 7
Use Search at http://topicideas.net/search.php wisely To Get Information About Project Topic and Seminar ideas with report/source code along pdf and ppt presenaion
|Tagged Pages: moving picture experts group, explain why a smooth and accurate picture can be obtained even in fast moving image within the existing broadcasting frames r, tci packaging divisiom solution,|