Open Access Open Access  Restricted Access Subscription Access

A Survey of Different Approaches for Overcoming the Processor-Memory Bottleneck


Affiliations
1 Department of Computer Science and Engineering, Skopje, Macedonia, the former Yugoslav Republic of
 

The growing rate of technology improvements has caused dramatic advances in processor performances, causing significant speed-up of processor working frequency and increased amount of instructions which can be processed in parallel. The given development of processor's technology has brought performance improvements in computer systems, but not for all the types of applications. The reason for this resides in the well known Von-Neumann bottleneck problem which occurs during the communication between the processor and the main memory into a standard processor-centric system. This problem has been reviewed by many scientists, which proposed different approaches for improving the memory bandwidth and latency. This paper provides a brief review of these techniques and also gives a deep analysis of various memorycentric systems that implement different approaches of merging or placing the memory near to the processing elements. Within this analysis we discuss the advantages, disadvantages and the application (purpose) of several well-known memory-centric systems.

Keywords

Memory Latency Reduction and Tolerance, Memory-Centric Computing, Processing in/Near Memory, Processor-Centric Computing, Smart Memories, Von Neumann Bottleneck.
User
Notifications
Font Size

  • Patterson, D. A. & Hennessy, J. L. (2014) Computer organization and design: the hardware/software interface., Elsevier.
  • Wulf, W. A. & McKee, S. A., (1995) "Hitting the memory wall: implications of the obvious", ACM SIGARCH Computer Architecture News, Vol. 23, Issue 1.
  • Hennessy, J. L. & Patterson, D. A. (2012) Computer architecture: a quantitative approach, Elsevier.
  • Borkar, S. & Chien, A. A., (2011) "The future of microprocessors", Communications of the ACM, 54(5), Vol. 54 No. 5, pp 67-77.
  • Intel Corporation, (2013) "New Microarchitecture for 4th gen. Intel core processor platforms", Product Brief.
  • Carvalho, C., (2002) "The gap between processor and memory speeds", Proceedings of 3rd. Internal Conference on Computer Architecture, Braga, Portugal.
  • Machanick, P., (2002) "Approaches to addressing the memory wall", Technical Report, University of Queensland Brisbane, Australia.
  • Smotherman, M., (2002) "Understanding EPIC architectures and implementations", Proceedings of ACM Southeast Conference.
  • Silc, J., Robic, B., Ungerer, T., (1999) Processor architecture: from dataflow to superscalar and beyond, Springer.
  • Jakimovska, D., Tentov, A., Jakimovski, G., Gjorgjievska, S., Malenko, M., (2012) "Modern processor architectures overview", Proceedings of XVIII International Scientific Conference on Information, Communication and Energy Systems and Technologies, Bulgaria, pp. 239-242.
  • Eigenmann, R., Lilja, D. J., (1998) "Von Neumann computers", Wiley Encyclopaedia of Electrical and Electronics Engineering, Vol. 23, pp. 387-400.
  • Saulsbury, A., Pong, F., Nowatzyk, A., (1996) "Missing the memory wall: the case for processor/memory integration", Proceedings of 23rd international symposium on Computer architecture, USA.
  • Cojocaru, C., (1995) "Computational RAM: implementation and bit-parallel architecture", Master Thesis, Carletorn University, Ottawa.
  • Tsubota, H., Kobayashi, T., (1996) "The M32R/D, a 32b RISC microprocessor with 16Mb embedded DRAM", Technical Report.
  • Draper, J., Barrett, J. T., Sondeen, J., Mediratta, S., Kang, C. W., Kim, I., Daglikoca, G., (2005) "A prototype processing-in-memory (PIM) chip for the data-intensive architecture (DIVA) system", Journal of VLSI Signal Processing Systems, Vol. 40, Issue 1, pp. 73-84.
  • Gokhale, M., Holmes, B., Jobst, K., (1995) "Processing in memory: the Terasys massively parallel PIM array", IEEE Computer Journal.
  • Keeton, K., Arpaci-Dusseau, R., Patterson, D.A., (1997) "IRAM and SmartSIMM: overcoming the I/O bus bottleneck", Proceedings of the 24th Annual International Symposium on Computer Architecture.
  • Kozyrakis, C. E., Perissakis, S., Patterson, D., Andreson, T., Asanovic, K., Cardwell, N., Fromm, R., Golbus, J., Gribstad, B., Keeton, K., Thomas, R., Treuhaft, N., Yelick, K., (1997) "Scalable processors in the billion-transistor era: IRAM", IEEE Computer Journal, 30(9), Vol. 30, Issue 9, pp 75-78.
  • Gebis, J., Williams, S., Patterson, D., Kozyrakis, C., (2004) "VIRAM1: a media-oriented vector processor with embedded DRAM", 41st Design Automation Student Design Contest, CA, USA.
  • Murakami, K., Shirakawa, S., Miyajima, H., (1997) "Parallel processing RAM chip with 256 Mb DRAM and quad processors", Proceedings of Solid-State Circuits Conference,
  • Kaxiras, S., Burger, D., Goodman, J. R., (1999) "DataScalar: a memory-centric approach to computing", Journal of Systems Architecture.
  • Oskin, M., Chong, F. T., Sherwood, T., (1998) "Active pages a computation model for intelligent memory", Proceedings of 25th Annual International Symposium on Computer architecture, pp. 192-203.
  • Azarkhish, E., Rossi, D., Loi, I., Benini, L., (2016) "Design and evaluation of a processing-in-memory architecture for the smart memory cube", Proceedings of the 29th International Conference Architecture of Computing Systems, Germany.
  • Blahut, R. E. (2010) Fast Algorithms for Signal Processing, Cambridge University Press.
  • Madan, N., (2006) "Asynchronous micro engines for network processing", Master Thesis, School of Computing, University of Utah.
  • IEEE, (2015) "Moore's law is dead - long live Moore's law", IEEE Spectrum Magazine.
  • Hruska, J., (2014) "Forget Moore’s law: hot and slow DRAM is a major roadblock to exascale and beyond", Extreme Tech Magazine.
  • Bakshi, A., Gaudiot, J., Lin, W., Makhija, M., Prasanna, V. K., Ro, W., Shin, C., (2000) "Memory latency: to tolerate or to reduce?", Proceedings of 12th Symposium on Computer Architecture and High Performance Computing.
  • Kozyrakis, C., Patterson, D., (2002) "Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks", Proceedings of 35th International Symposium on Microarchitecture, Instabul, Turkey.
  • Patterson, D., (2004) "Latency lags bandwidth", Communications of the ACM, Vol. 47, Num. 10, pp. 71-75.
  • Yu, W. S., Huang, R., Xu, S. Q., Wang, S., Kan, E., Suh, G. E., (2011) "SRAM-DRAM hybrid memory with applications to efficient register files in fine-grained multi-threading", Proceedings of 38th IEEE International Symposium on Computer Architecture.
  • Son, Y. H., Seongil, O., Ro, Y., Lee, J. W., Ahn, J. H., (2013) "Reducing memory access latency with asymmetric DRAM bank organizations", ACM SIGARCH Computer Architecture News, Volume 41, Issue 3.
  • Hamzaoglu, F., Arslan, U., Bisnik, N., Ghosh, S., Lal, M. B., Lindert, N., Meterelliyoz, M., Osborne, R. B., Park, J., Tomishima, S., Wang, Y., Zhang, K., "A 1GB 2GHz embedded DRAM in 22nm trigate CMOS technology", Proceedings of IEEE International Solid-State Circuits Conference.
  • Barth, J., Plass, D., Nelson, E., Hwang, C., Fredeman, G., Sperling, M., Mathews, A., Kirihata, T., Reohr, W. R., Nair, K., Cao, N., (2011), "A 45 nm SOI embedded DRAM macro for the POWER™ processor 32 MByte on-chip L3 cache", IEEE Journal of Solid-state Circuits, Vol. 46, Num. 1.
  • Jacob, B.L., (2002) "Synchronous DRAM architectures, organizations, and alternative technologies", Technical Paper.
  • Li, S., Chen, K., Brockman, J. B., Joupp, N. P., (2011) "Performance impacts of non-blocking caches in out-of-order processors", Technical Paper.
  • Suresh, P., (2004) "PERL - a register-less processor", PhD Thesis, Department of Computer Science & Engineering, Indian Institute of Technology, Kanpur.
  • Panda, P. R., Dutt, N. D., Nicolu, A., (2000) "On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems" ACM Transactions on Design Automation of Electronic Systems.
  • Wang, P., (2013) "Designing Scratchpad memory architecture with emerging STT-RAM memory technologies", Proceedings of IEEE International Symposium on Circuits and Systems.
  • Hewlett Packard Labs, (2016) "The Machine: the future of technology", Technical Paper.

Abstract Views: 355

PDF Views: 292




  • A Survey of Different Approaches for Overcoming the Processor-Memory Bottleneck

Abstract Views: 355  |  PDF Views: 292

Authors

Danijela Efnusheva
Department of Computer Science and Engineering, Skopje, Macedonia, the former Yugoslav Republic of
Ana Cholakoska
Department of Computer Science and Engineering, Skopje, Macedonia, the former Yugoslav Republic of
Aristotel Tentov
Department of Computer Science and Engineering, Skopje, Macedonia, the former Yugoslav Republic of

Abstract


The growing rate of technology improvements has caused dramatic advances in processor performances, causing significant speed-up of processor working frequency and increased amount of instructions which can be processed in parallel. The given development of processor's technology has brought performance improvements in computer systems, but not for all the types of applications. The reason for this resides in the well known Von-Neumann bottleneck problem which occurs during the communication between the processor and the main memory into a standard processor-centric system. This problem has been reviewed by many scientists, which proposed different approaches for improving the memory bandwidth and latency. This paper provides a brief review of these techniques and also gives a deep analysis of various memorycentric systems that implement different approaches of merging or placing the memory near to the processing elements. Within this analysis we discuss the advantages, disadvantages and the application (purpose) of several well-known memory-centric systems.

Keywords


Memory Latency Reduction and Tolerance, Memory-Centric Computing, Processing in/Near Memory, Processor-Centric Computing, Smart Memories, Von Neumann Bottleneck.

References