Analysis and Applications of the T-complexity

Hamano, Kenji

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

Analysis and Applications of the T-complexity

https://doi.org/10.15083/00004060

名前 / ファイル	ライセンス	アクション
本文(fulltext) (3.0 MB)

Item type

学位論文 / Thesis or Dissertation(1)

公開日

2012-11-01

タイトル

Analysis and Applications of the T-complexity

言語

eng

キーワード

主題Scheme

Other

主題

T-code

キーワード

主題Scheme

Other

主題

T-complexity

キーワード

主題Scheme

Other

主題

T-decomposition

キーワード

主題Scheme

Other

主題

Lempel-Ziv complexity

キーワード

主題Scheme

Other

主題

NIST SP 800-22

キーワード

主題Scheme

Other

主題

data compression

キーワード

主題Scheme

Other

主題

complexity profile

資源タイプ

資源

http://purl.org/coar/resource_type/c_46ec

タイプ

thesis

ID登録

10.15083/00004060

ID登録タイプ

JaLC

その他のタイトル

T-complexityの解析と応用

著者

Hamano, Kenji

著者別名

識別子Scheme

WEKO

識別子

9382

姓名

濱野, 健二

著者所属

東京大学大学院新領域創成科学研究科基盤科学研究系複雑理工学専攻

著者所属

Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo

Abstract

内容記述タイプ

Abstract

内容記述

T-codes are variable-length self-synchronizing codes introduced by Titchener in 1984. T-code codewords are constructed recursively from a finite alphabet using an algorithm called T-augmentation, resulting in excellent self-synchronization properties. An algorithm called T-decomposition parses a given sequence into a series of T-prefixes, and finds a T-code set in which the sequence is encoded to a longest codeword. There are similarities and differences between T-decomposition and the conventional LZ78 incremental parsing. The LZ78 incremental parsing algorithm parses a given sequence into consecutive distinct subsequences (words) sequentially in such a way that each word consists of the longest matching word parsed previously and a literal symbol. Then, the LZ-complexity is defined as the number of words. By contrast, T-decomposition parses a given sequence into a series of T-prefixes, each of which consists of the recursive concatenation of the longest matching T-prefix parsed previously and a literal symbol, and it has to access the whole sequence every time it determines a T-prefix. Alike to the LZ-complexity, the T-complexity of a sequence is defined as the number of T-prefixes, however, the T-complexity of a particular sequence in general tends to be smaller than the LZ-complexity. In the first part of the thesis, we deal with our contributions to the theory of T-codes. In order to realize a sequential determination of T-prefixes, we devise a new T-decomposition algorithm using forward parsing. Both the T-complexity profile obtained from the forward T-decomposition and the LZ-complexity profile can be derived in a unified way using a differential equation method. The method focuses on the increase of the average codeword length of a code tree. The obtained formulas are confirmed to coincide with those of previous studies. The magnitude of the T-complexity of a given sequence s in general indicates the degree of randomness. However, there exist interesting sequences that have much larger T-complexities than any random sequences. We investigate the maximum T-complexity sequences and the maximum LZ-complexity sequences using various techniques including those of the test suite released by the National Institute of Standards and Technology (NIST) of the U.S. government, and find that the maximum T-complexity sequences are less random than the maximum LZ-complexity sequences. In the second part of the thesis, we present our achievements in terms of application. We consider two applications -- data compression and randomness testing. First, we propose a new data compression scheme based on T-codes using a dictionary method such that all phrases added to a dictionary have a recursive structure similar to T-codes. Our data compression scheme can compress any of the files in the Calgary Corpus more efficiently than previous schemes based on T-codes and the UNIX compress, a variant of LZ78 (LZW). Next, we introduce a randomness test based on the T-complexity. Recently, the Lempel-Ziv (LZ) randomness test based on the LZ-complexity was officially excluded from the NIST test suite. This is because the distribution of P-values for random sequences of length 106, the most common length used, is strictly discrete in the case of the LZ-complexity. Our test solves this problem because the T-complexity features an almost ideal uniform continuous distribution of P-values for random sequences of length 106. The proposed test outperforms the NIST LZ test, a modified LZ test proposed by Doganaksoy and Göloglu, and all other tests included in the NIST test suite, in terms of the detection of undesirable pseudorandom sequences generated by a multiplicative congruential generator (MCG) and non-random byte sequences Y = Y0, Y1, Y2, ···, where Y3i and Y(3i+1) are random, but Y(3i+2) is given by Y3i + Y(3i+1) mod 28.

書誌情報

発行日 2010-03-24

日本十進分類法

主題Scheme

NDC

主題

007

学位名

博士(科学)

学位

値

doctoral

学位分野

Science (Kagaku)(科学)

学位授与機関

学位授与機関名

University of Tokyo (東京大学)

研究科・専攻

Department of Complexity Science and Engineering, Graduate School of Frontier Sciences (新領域創成科学研究科複雑理工学専攻)

学位授与年月日

2010-03-24

学位授与番号

甲第26141号

学位記番号

博創域第558号

戻る

views

See details

	Views

Versions

Ver.1

2021-03-01 19:49:04.173417

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Analysis and Applications of the T-complexity

× Hamano, Kenji

Versions

Share

Cite as

エクスポート