umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Representation bound for human facial mimic with the aid of principal component analysis
Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics. (Digital Media Lab)
Umeå University, Faculty of Science and Technology, Department of Applied Physics and Electronics. (Digital Media Lab)
2010 (English)In: International Journal of Image and Graphics, ISSN 0219-4678, Vol. 10, no 3, 343-363 p.Article in journal (Refereed) Published
Abstract [en]

In this paper, we examine how much information is needed to represent the facial mimic, based on Paul Ekman's assumption that the facial mimic can be represented with a few basic emotions. Principal component analysis is used to compact the important facial expressions. Theoretical bounds for facial mimic representation are presented both for using a certain number of principal components and a certain number of bits. When 10 principal components are used to reconstruct color image video at a resolution of 240 × 176 pixels the representation bound is on average 36.8 dB, measured in peak signal-to-noise ratio. Practical confirmation of the theoretical bounds is demonstrated. Quantization of projection coefficients affects the representation, but a quantization with approximately 7-8 bits is found to match an exact representation, measured in mean square error.

Place, publisher, year, edition, pages
World Scientific Publishing Company , 2010. Vol. 10, no 3, 343-363 p.
Keyword [en]
Distortion bound, rate-distortion bound, facial mimic, basic emotions, principal component analysis, PCA
National Category
Media Engineering
Identifiers
URN: urn:nbn:se:umu:diva-5428DOI: 10.1142/S0219467810003810OAI: oai:DiVA.org:umu-5428DiVA: diva2:144938
Available from: 2011-09-19 Created: 2006-10-13 Last updated: 2011-09-19Bibliographically approved
In thesis
1. Very low bitrate facial video coding: based on principal component analysis
Open this publication in new window or tab >>Very low bitrate facial video coding: based on principal component analysis
2006 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis introduces a coding scheme for very low bitrate video coding through the aid of principal component analysis. Principal information of the facial mimic for a person can be extracted and stored in an Eigenspace. Entire video frames of this persons face can then be compressed with the Eigenspace to only a few projection coefficients. Principal component video coding encodes entire frames at once and increased frame size does not increase the necessary bitrate for encoding, as standard coding schemes do. This enables video communication with high frame rate, spatial resolution and visual quality at very low bitrates. No standard video coding technique provides these four features at the same time.

Theoretical bounds for using principal components to encode facial video sequences are presented. Two different theoretical bounds are derived. One that describes the minimal distortion when a certain number of Eigenimages are used and one that describes the minimum distortion when a minimum number of bits are used.

We investigate how the reconstruction quality for the coding scheme is affected when the Eigenspace, mean image and coefficients are compressed to enable efficient transmission. The Eigenspace and mean image are compressed through JPEG-compression while the while the coefficients are quantized. We show that high compression ratios can be used almost without any decrease in reconstruction quality for the coding scheme.

Different ways of re-using the Eigenspace for a person extracted from one video sequence to encode other video sequences are examined. The most important factor is the positioning of the facial features in the video frames.

Through a user test we find that it is extremely important to consider secondary workloads and how users make use of video when experimental setups are designed.

Place, publisher, year, edition, pages
Umeå: Tillämpad fysik och elektronik, 2006. 54 p.
Series
Digital Media Lab, ISSN 1652-6295 ; 7
Keyword
Image processing, Video processing, Very low bitrate coding
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:umu:diva-895 (URN)91-7264-172-x (ISBN)
Presentation
(English)
Supervisors
Available from: 2006-10-13 Created: 2006-10-13 Last updated: 2010-01-26Bibliographically approved
2. Very Low Bitrate Video Communication: A Principal Component Analysis Approach
Open this publication in new window or tab >>Very Low Bitrate Video Communication: A Principal Component Analysis Approach
2008 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A large amount of the information in conversations come from non-verbal cues such as facial expressions and body gesture. These cues are lost when we don't communicate face-to-face. But face-to-face communication doesn't have to happen in person. With video communication we can at least deliver information about the facial mimic and some gestures. This thesis is about video communication over distances; communication that can be available over networks with low capacity since the bitrate needed for video communication is low.

A visual image needs to have high quality and resolution to be semantically meaningful for communication. To deliver such video over networks require that the video is compressed. The standard way to compress video images, used by H.264 and MPEG-4, is to divide the image into blocks and represent each block with mathematical waveforms; usually frequency features. These mathematical waveforms are quite good at representing any kind of video since they do not resemble anything; they are just frequency features. But since they are completely arbitrary they cannot compress video enough to enable use over networks with limited capacity, such as GSM and GPRS.

Another issue is that such codecs have a high complexity because of the redundancy removal with positional shift of the blocks. High complexity and bitrate means that a device has to consume a large amount of energy for encoding, decoding and transmission of such video; with energy being a very important factor for battery-driven devices.

Drawbacks of standard video coding mean that it isn't possible to deliver video anywhere and anytime when it is compressed with such codecs. To resolve these issues we have developed a totally new type of video coding. Instead of using mathematical waveforms for representation we use faces to represent faces. This makes the compression much more efficient than if waveforms are used even though the faces are person-dependent.

By building a model of the changes in the face, the facial mimic, this model can be used to encode the images. The model consists of representative facial images and we use a powerful mathematical tool to extract this model; namely principal component analysis (PCA). This coding has very low complexity since encoding and decoding only consist of multiplication operations. The faces are treated as single encoding entities and all operations are performed on full images; no block processing is needed. These features mean that PCA coding can deliver high quality video at very low bitrates with low complexity for encoding and decoding.

With the use of asymmetrical PCA (aPCA) it is possible to use only semantically important areas for encoding while decoding full frames or a different part of the frames.

We show that a codec based on PCA can compress facial video to a bitrate below 5 kbps and still provide high quality. This bitrate can be delivered on a GSM network. We also show the possibility of extending PCA coding to encoding of high definition video.

Place, publisher, year, edition, pages
Umeå: Tillämpad fysik och elektronik, 2008. 90 p.
Series
Digital Media Lab, ISSN 1652-6295 ; 11
Keyword
Video compression, Very low bitrate, Principal component analysis, Complexity, Semantically important areas, Wearable video
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:umu:diva-1808 (URN)978-91-7264-644-5 (ISBN)
Public defence
2008-09-26, N200, Naturvetarhuset, Umeå universitet, Umeå, 10:00 (English)
Opponent
Supervisors
Available from: 2008-09-05 Created: 2008-09-05 Last updated: 2010-11-05Bibliographically approved

Open Access in DiVA

fulltext(720 kB)746 downloads
File information
File name FULLTEXT05.pdfFile size 720 kBChecksum SHA-512
7fdbf975b7b43872e12358d2ebb2046ff7f29682dab2e9e7a2524221ca5d8755cbda65e778de8c283b633887d343786ced77c62bc0d522031133eea4c61ed05f
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records BETA

Söderström, UlrikLi, Haibo

Search in DiVA

By author/editor
Söderström, UlrikLi, Haibo
By organisation
Department of Applied Physics and Electronics
In the same journal
International Journal of Image and Graphics
Media Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 746 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 324 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf