
UTokyo Repository >
122 新領域創成科学研究科 >
14 基盤科学研究系 複雑理工学専攻 >
1221420 博士論文（基盤科学研究系複雑理工学専攻） >
このページ(論文)をリンクする場合は次のURLを使用してください:
http://hdl.handle.net/2261/52457

タイトル:  Analysis and Applications of the Tcomplexity 
その他のタイトル:  Tcomplexityの解析と応用 
著者:  Hamano, Kenji 
著者(別言語):  濱野, 健二 
キーワード:  Tcode Tcomplexity Tdecomposition LempelZiv complexity NIST SP 80022 data compression complexity profile 
発行日:  2010年3月24日 
抄録:  Tcodes are variablelength selfsynchronizing codes introduced by Titchener in 1984. Tcode codewords are constructed recursively from a finite alphabet using an algorithm called Taugmentation, resulting in excellent selfsynchronization properties. An algorithm called Tdecomposition parses a given sequence into a series of Tprefixes, and finds a Tcode set in which the sequence is encoded to a longest codeword. There are similarities and differences between Tdecomposition and the conventional LZ78 incremental parsing. The LZ78 incremental parsing algorithm parses a given sequence into consecutive distinct subsequences (words) sequentially in such a way that each word consists of the longest matching word parsed previously and a literal symbol. Then, the LZcomplexity is defined as the number of words. By contrast, Tdecomposition parses a given sequence into a series of Tprefixes, each of which consists of the recursive concatenation of the longest matching Tprefix parsed previously and a literal symbol, and it has to access the whole sequence every time it determines a Tprefix. Alike to the LZcomplexity, the Tcomplexity of a sequence is defined as the number of Tprefixes, however, the Tcomplexity of a particular sequence in general tends to be smaller than the LZcomplexity. In the first part of the thesis, we deal with our contributions to the theory of Tcodes. In order to realize a sequential determination of Tprefixes, we devise a new Tdecomposition algorithm using forward parsing. Both the Tcomplexity profile obtained from the forward Tdecomposition and the LZcomplexity profile can be derived in a unified way using a differential equation method. The method focuses on the increase of the average codeword length of a code tree. The obtained formulas are confirmed to coincide with those of previous studies. The magnitude of the Tcomplexity of a given sequence s in general indicates the degree of randomness. However, there exist interesting sequences that have much larger Tcomplexities than any random sequences. We investigate the maximum Tcomplexity sequences and the maximum LZcomplexity sequences using various techniques including those of the test suite released by the National Institute of Standards and Technology (NIST) of the U.S. government, and find that the maximum Tcomplexity sequences are less random than the maximum LZcomplexity sequences. In the second part of the thesis, we present our achievements in terms of application. We consider two applications  data compression and randomness testing. First, we propose a new data compression scheme based on Tcodes using a dictionary method such that all phrases added to a dictionary have a recursive structure similar to Tcodes. Our data compression scheme can compress any of the files in the Calgary Corpus more efficiently than previous schemes based on Tcodes and the UNIX compress, a variant of LZ78 (LZW). Next, we introduce a randomness test based on the Tcomplexity. Recently, the LempelZiv (LZ) randomness test based on the LZcomplexity was officially excluded from the NIST test suite. This is because the distribution of Pvalues for random sequences of length 106, the most common length used, is strictly discrete in the case of the LZcomplexity. Our test solves this problem because the Tcomplexity features an almost ideal uniform continuous distribution of Pvalues for random sequences of length 106. The proposed test outperforms the NIST LZ test, a modified LZ test proposed by Doganaksoy and Göloglu, and all other tests included in the NIST test suite, in terms of the detection of undesirable pseudorandom sequences generated by a multiplicative congruential generator (MCG) and nonrandom byte sequences Y = Y0, Y1, Y2, ···, where Y3i and Y(3i+1) are random, but Y(3i+2) is given by Y3i + Y(3i+1) mod 28. 
内容記述:  報告番号: 甲26141 ; 学位授与年月日: 20100324 ; 学位の種別: 課程博士 ; 学位の種類: 博士(科学) ; 学位記番号: 博創域第558号 ; 研究科・専攻: 新領域創成科学研究科基盤科学研究系複雑理工学専攻 
URI:  http://hdl.handle.net/2261/52457 
出現カテゴリ:  021 博士論文 1221420 博士論文（基盤科学研究系複雑理工学専攻）

本リポジトリに保管されているアイテムはすべて著作権により保護されています。
