UTokyo Repository 東京大学
 

UTokyo Repository >
122 新領域創成科学研究科 >
14 基盤科学研究系 複雑理工学専攻 >
1221420 博士論文(基盤科学研究系複雑理工学専攻) >

このページ(論文)をリンクする場合は次のURLを使用してください: http://hdl.handle.net/2261/52457

タイトル: Analysis and Applications of the T-complexity
その他のタイトル: T-complexityの解析と応用
著者: Hamano, Kenji
著者(別言語): 濱野, 健二
キーワード: T-code
T-complexity
T-decomposition
Lempel-Ziv complexity
NIST SP 800-22
data compression
complexity profile
発行日: 2010年3月24日
抄録: T-codes are variable-length self-synchronizing codes introduced by Titchener in 1984. T-code codewords are constructed recursively from a finite alphabet using an algorithm called T-augmentation, resulting in excellent self-synchronization properties. An algorithm called T-decomposition parses a given sequence into a series of T-prefixes, and finds a T-code set in which the sequence is encoded to a longest codeword. There are similarities and differences between T-decomposition and the conventional LZ78 incremental parsing. The LZ78 incremental parsing algorithm parses a given sequence into consecutive distinct subsequences (words) sequentially in such a way that each word consists of the longest matching word parsed previously and a literal symbol. Then, the LZ-complexity is defined as the number of words. By contrast, T-decomposition parses a given sequence into a series of T-prefixes, each of which consists of the recursive concatenation of the longest matching T-prefix parsed previously and a literal symbol, and it has to access the whole sequence every time it determines a T-prefix. Alike to the LZ-complexity, the T-complexity of a sequence is defined as the number of T-prefixes, however, the T-complexity of a particular sequence in general tends to be smaller than the LZ-complexity. In the first part of the thesis, we deal with our contributions to the theory of T-codes. In order to realize a sequential determination of T-prefixes, we devise a new T-decomposition algorithm using forward parsing. Both the T-complexity profile obtained from the forward T-decomposition and the LZ-complexity profile can be derived in a unified way using a differential equation method. The method focuses on the increase of the average codeword length of a code tree. The obtained formulas are confirmed to coincide with those of previous studies. The magnitude of the T-complexity of a given sequence s in general indicates the degree of randomness. However, there exist interesting sequences that have much larger T-complexities than any random sequences. We investigate the maximum T-complexity sequences and the maximum LZ-complexity sequences using various techniques including those of the test suite released by the National Institute of Standards and Technology (NIST) of the U.S. government, and find that the maximum T-complexity sequences are less random than the maximum LZ-complexity sequences. In the second part of the thesis, we present our achievements in terms of application. We consider two applications -- data compression and randomness testing. First, we propose a new data compression scheme based on T-codes using a dictionary method such that all phrases added to a dictionary have a recursive structure similar to T-codes. Our data compression scheme can compress any of the files in the Calgary Corpus more efficiently than previous schemes based on T-codes and the UNIX compress, a variant of LZ78 (LZW). Next, we introduce a randomness test based on the T-complexity. Recently, the Lempel-Ziv (LZ) randomness test based on the LZ-complexity was officially excluded from the NIST test suite. This is because the distribution of P-values for random sequences of length 106, the most common length used, is strictly discrete in the case of the LZ-complexity. Our test solves this problem because the T-complexity features an almost ideal uniform continuous distribution of P-values for random sequences of length 106. The proposed test outperforms the NIST LZ test, a modified LZ test proposed by Doganaksoy and Göloglu, and all other tests included in the NIST test suite, in terms of the detection of undesirable pseudorandom sequences generated by a multiplicative congruential generator (MCG) and non-random byte sequences Y = Y0, Y1, Y2, ···, where Y3i and Y(3i+1) are random, but Y(3i+2) is given by Y3i + Y(3i+1) mod 28.
内容記述: 報告番号: 甲26141 ; 学位授与年月日: 2010-03-24 ; 学位の種別: 課程博士 ; 学位の種類: 博士(科学) ; 学位記番号: 博創域第558号 ; 研究科・専攻: 新領域創成科学研究科基盤科学研究系複雑理工学専攻
URI: http://hdl.handle.net/2261/52457
出現カテゴリ:021 博士論文
1221420 博士論文(基盤科学研究系複雑理工学専攻)

この論文のファイル:

ファイル 記述 サイズフォーマット
K-02429.pdf本文(fulltext)2.95 MBAdobe PDF見る/開く

本リポジトリに保管されているアイテムはすべて著作権により保護されています。

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - ご意見をお寄せください