Open Access Open Access  Restricted Access Subscription Access

Using Adaptive Automata in Grammar-Based Text Compression to Identify Frequent Substrings


Affiliations
1 Escola Politecnica da Universidade de Sao Paulo, Sao Paulo, Brazil
 

Compression techniques allow reduction in the data storage space required by applications dealing with large amount of data by increasing the information entropy of its representation. This paper presents an adaptive rule-driven device - the adaptive automata - as the device to identify recurring sequences of symbols to be compressed in a grammar-based lossless data compression scheme.

Keywords

Adaptive Automata, Grammar Based Data Compression.
User
Notifications
Font Size

  • Sipser, M. (2006) “Introduction to the theory of computation”, Thomson Course Technology.
  • Lohrey, M. (2012) “Algorithmics on SLP-compressed strings: A survey.” Groups Complexity Cryptology, vol. 4, no. 2, pp. 241–299.
  • Sakamoto, H. (2014) “Grammar compression: Grammatical inference by compression and its application to real data”, in Proceedings of the 12th International Conference on Grammatical Inference, ICGI 2014, Kyoto, Japan, September 17-19, 2014., pp. 3–20.
  • Tabei,Y., Saigo, H., Yamanishi, Y. & Puglisi, S. J. (2016) “Scalable partial least squares regression on grammar-compressed data matrices”, in 22nd KDD, pp. 1875—1884.
  • Maruyama, S., Tanaka, Y., Sakamoto, H., & Takeda, M. (2010) “Context-sensitive grammar transform: Compression and pattern matching”, IEICE Transactions on Information and Systems, vol. E93.D, no. 2, pp. 219–226.
  • Tabei, Y. (2016) “Recent development of grammar compression”, Information Processing Society of Japan Magazine, vol. 57, no. 2, pp. 172–178 (in Japanese).
  • Jez, A. (2016) “A really simple approximation of smallest grammar”, Theoretical Computer Science, vol. 616, pp. 141–150.
  • Lohrey, M. (2015) “Grammar-based tree compression”, in International Conference on Developments in Language Theory. Springer, pp. 46–57.
  • De la Higuera, C. (2010) Grammatical inference: learning automata and grammars. Cambridge University Press.
  • José Neto, J. (2001) “Adaptive rule-driven devices - general formulation and case study”, in CIAA 2001 6th International Conference on Implementation and Application of Automata, ser. Lecture Notes in Computer Science, B. W. Watson and D. Wood, Eds., vol. 2494. Pretoria, South Africa: Springer-Verlag, pp. 234–250.
  • Rocha, R. L. A. & José Neto, J. (2000) “Adaptive automaton, limits and complexity compared to the Turing machine”, in Proceedings of the I LAPTEC, pp. 33–48.
  • Fukunaga, S., Takabatake, Y., I, T. & Sakamoto, H. (2016) “Online grammar compression for frequent pattern discovery”, CoRR, vol. abs/1607.04446.
  • Takabatake, Y., Tabei, Y., & Sakamoto, H. (2015) Online Self-Indexed Grammar Compression. Springer International Publishing, pp. 258–269.
  • Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A. & Shelat, A. (2005) “The smallest grammar problem.” IEEE Trans. Inf. Theory, vol. 51, no. 7, pp. 2554–2576.
  • Nevill-Manning, C. G. & Witten, I. H. (1997) “Identifying hierarchical structure in sequences: A linear-time algorithm”, J. Artif. Intell. Res. (JAIR) vol. 7, pp. 67–82.
  • Larsson, N. J. & Moffat, A. (1999) “Offline dictionary-based compression”, in Data Compression Conference, 1999. Proceedings. DCC ’99, pp. 296–305.
  • Rytter, W. (2002) Application of Factorization to the Approximation of Grammar-Based Compression. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 20–31.
  • Ziv, J. & Lempel, A. (2006) “A universal algorithm for sequential data compression”, IEEE Trans. Inf. Theor., vol. 23, no. 3, pp. 337–343.
  • Sakamoto, H., (2005) “A fully linear-time approximation algorithm for grammar-based compression”, Journal of Discrete Algorithms, vol. 3, no. 2–4, pp. 416–430.
  • Jez, A. (2015) “Approximation of grammar-based compression via recompression”, Theoretical Computer Science, vol. 592, pp. 115–134.
  • Bille, P. & Gørtz, I. L. & Prezza, N. (2016) “Space-efficient re-pair compression,” CoRR, vol.abs/1611.01479.
  • Maruyama, S. & Tabei, Y. (2014) “Fully online grammar compression in constant space.” in DCC, Bilgin, A., Marcellin, M. W., Serra-Sagristà, J. & Storer, J. A. Eds. IEEE, pp. 173–182.
  • Cormode, G. & Muthukrishnan, S. (2007) “The string edit distance matching problem with moves”, ACM Transactions on Algorithms (TALG) vol. 3, no. 1, pp. 2:1–2:19.
  • Jose Neto, J. (1994) “Adaptive automata for context-sensitive languages”, SIGPLAN Notices, vol. 29, no. 9, pp. 115–124.
  • Jose Neto, J. & Iwai, M. K. (1998) “Adaptive automata for syntax learning” in Anais da XXIV Conferência Latinoamericana de Informática - CLEI 98, pp. 135–149.
  • Matsuno, I. P. (2006) “Um estudo dos processos de inferência de gramáticas regulares e livres de contexto baseados em modelos adaptativos”, Master’s Thesis, Escola Politécnica da Universidade de São Paulo, São Paulo, Brazil (in Portuguese).
  • Miura, N. K. & José Neto, J. (2017) “Adaptive automata for grammar based compression” in Proceedings of the 4th International Conference on Computer Science and Information Technology: CoSIT 2017, pp. 173–183
  • Cereda, P. R. M. & José Neto, J. (2015) “A recommendation engine based on adaptive automata”, in Proceedings of the 17th International Conference on Enterprise Information Systems - Volume 2: ICEIS, pp. 594–601.
  • Cereda, P. R. M. & Jose Neto, J. (2016) “AA4J: uma biblioteca para implementação de autômatos adaptativos” in Memorias do X Workshop de Tecnologia Adaptativa – WTA 2016, 2016, pp. 16–26 (in Portuguese).
  • Ferragina, P. & Gonzalez, R. & Navarro, G. & Venturini, R. (2009) “Compressed text indexes: From theory to practice”, Journal of Experimental Algorithmics (JEA), vol. 13, pp. 12–31.
  • Iwai, M. K. (2000) “Um formalismo gramatical adaptativo para linguagens dependentes de contexto”, PhD Thesis, Escola Politecnica da Universidade de São Paulo, Sao Paulo, Brazil (in Portuguese).

Abstract Views: 273

PDF Views: 147




  • Using Adaptive Automata in Grammar-Based Text Compression to Identify Frequent Substrings

Abstract Views: 273  |  PDF Views: 147

Authors

Newton Kiyotaka Miura
Escola Politecnica da Universidade de Sao Paulo, Sao Paulo, Brazil
Joao Jose Neto
Escola Politecnica da Universidade de Sao Paulo, Sao Paulo, Brazil

Abstract


Compression techniques allow reduction in the data storage space required by applications dealing with large amount of data by increasing the information entropy of its representation. This paper presents an adaptive rule-driven device - the adaptive automata - as the device to identify recurring sequences of symbols to be compressed in a grammar-based lossless data compression scheme.

Keywords


Adaptive Automata, Grammar Based Data Compression.

References