# Analysis of Offset-Canceled DRAM Sense Amplifier

# Giwoo Lee<sup>1\*</sup>, Donghwan Kim<sup>1\*</sup> and SeongHwan Cho<sup>a,1</sup>

<sup>1</sup>Department of Electrical Engineering, Korea Advanced Institute of Science and Technology. E-mail: <sup>1</sup>dget235@kaist.ac.kr. \*Equal Contribution Authorship

Abstract - This paper analyzes a DRAM sense amplifier (SA) architecture employing an offset cancellation technique using parasitic bit-line capacitance. The offset is stored through a diode-connected configuration without requiring additional calibration circuits, enabling a compact design. A prototype was fabricated in a TSMC 65 nm CMOS process and operated at 400 MHz. A built-in self-test (BIST) evaluates sensing accuracy by performing repeated write-and-read operations across 64 SAs. Measurements show that the standard deviation of input-referred offset decreases with longer offset cancellation (OC) time and saturates near 5 ns, indicating an optimal tradeoff between performance and power. The main sensing (MS) duration has minimal effect on offset characteristics, confirming that a 5 ns MS period ensures reliable operation. The architecture achieves effective offset cancellation with minimal overhead, making it well suited for scaled DRAM applications.

Keywords— DRAM, Sense Amplifier, Offset Cancellation Technique

#### I. INTRODUCTION

As DRAM technology advances toward higher bit densities, aggressive process scaling is essential. This scaling not only reduces the physical dimensions of the cell capacitor but also shrinks the footprint of peripheral circuits such as the sense amplifier (SA) [1]. However, these benefits come with significant challenges. The downscaling of the cell capacitor leads to a substantial decrease in cell capacitance, thereby tightening the required sensing margin to reliably distinguish between logical states, as shown in Fig. 1. At the same time, the scaling of the SA imposes stricter constraints on its design, particularly in terms of device matching [2]. More specifically, as the area of individual transistors continues to shrink, they become increasingly susceptible to processinduced variations, which can manifest as threshold voltage mismatches within the SA, as shown in Fig. 2. These mismatches degrade the sensing accuracy and, in severe cases, may lead to sensing failures due to incorrect bit decisions. To address this, offset-cancellation techniques have been introduced to restore sensing reliability [3]-[7].

#### a. Corresponding author; chosta@kaist.ac.kr

Manuscript Received Jul. 14, 2025, Revised Sep. 3, 2025, Accepted Sep. 5, 2025

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (<a href="http://creativecommons.org/licenses/by-nc/4.0">http://creativecommons.org/licenses/by-nc/4.0</a>) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.



Figure 1. Charge sharing and its magnitude with cell capacitance

#### Trade-Off Between Mismatch and Area



Figure 2. Challenges of scaling in sense amplifier

By actively compensating for internal offsets, these techniques improve both sensing precision and yield in deeply scaled DRAM technologies. In this work, we focus on one such architecture presented in [7], which integrates an offset-cancellation mechanism within the SA. We present the full design implementation of this structure in silicon, followed by detailed measurement and analysis to evaluate its effectiveness in mitigating process-induced mismatches and maintaining robust sensing performance under scaled conditions. The remainder of this paper is organized as follows: Section I provides the background and motivation for this work. Section II describes the design methodology

and implementation details. Section III presents the measurement results and analysis. Finally, Section IV concludes the paper with a summary of key findings.

#### II. DESIGN METHODOLOGY

## A. Basic Concept of Offset-Cancellation Technique

In scaled DRAM technologies, the sense amplifier (SA) becomes increasingly vulnerable to input-referred offset due to device mismatch, which arises primarily from random variations in threshold voltage ( $\Delta V_{TH}$ ), mobility, and gate dimensions of constituent transistors. These mismatches translate into imbalanced current drive between the pull-up and pull-down paths of the differential SA, thereby shifting the effective trip point and inducing a static offset voltage ( $V_{OS}$ ). Given that the SA operates as a dynamic latched comparator, such offset voltage directly degrades sensing accuracy, especially when the signal amplitude is reduced due to shrinking cell capacitance.

To mitigate this issue, offset-cancellation techniques are employed to suppress the input-referred offset prior to final sensing. A common approach involves precharging the SA input nodes to a known reference level and storing the internal offset as a differential voltage across a capacitor. This stored offset is then subtracted, effectively centering the SA's decision threshold. Such offset-cancellation techniques are conventionally implemented by inserting a series capacitor between the SA input and its sensing path, allowing the internal offset to be stored as a voltage drop across the capacitor and subsequently subtracted. However, in DRAM applications, parasitic capacitors connected in parallel to the SA inputs can be utilized to store the offset. While offset information stored on a parallel capacitor can be overwritten by the input signal in conventional circuits, DRAM-specific charge-sharing mechanisms inherently allow this approach to function reliably. Since the bit-line signal is transferred through charge redistribution rather than direct driving, the stored offset on the parallel capacitor remains largely preserved, enabling effective cancellation even without series insertion.

In actual circuits, the cancellation operation is performed using a diode-connected configuration rather than a conventional unity-gain feedback amplifier. In this approach, each NMOS input transistor is configured in a diode-connected manner, allowing threshold voltage  $V_{TH}$  to be adaptively sampled onto the capacitor during the offset storage phase. This method inherently captures the device-specific mismatch without requiring external references. However, the settling behavior during offset cancellation is governed by the RC time constant of the circuit, introducing a trade-off between accuracy and speed. This trade-off highlights the importance of carefully designed offset-cancellation schemes in achieving reliable sensing performance under process variation.

Considering the trade-off, offset-cancellation techniques play a critical role in modern DRAMs by enhancing sensing robustness and improving yield in the presence of process-induced mismatches. Their ability to adapt to shrinking cell capacitance and tighter noise margins makes them indispensable in high-density memory design.



Figure 3. Schematic and timing diagram of target SA

#### B. Structure of Sense Amplifier and its Timing Diagram

Fig. 3 shows the sense amplifier (SA) circuit with integrated offset-cancellation capability as proposed in [7] and the corresponding timing diagram. One of its key advantages is that it enables effective offset cancellation and sensing operation with a minimal number of transistors, making it area-efficient and well-suited for dense DRAM arrays. This architecture exploits parasitic capacitance at the bit-lines (BLT and BLB) to store offset information.

The overall SA operation proceeds through four distinct phases, as illustrated in the timing diagram of Fig. 3: precharge (PCG), offset cancellation (OC), charge sharing (CS), and main sensing (MS). During the PCG phase, M3 and M4 turn ON by enabling S<sub>PCG</sub>, so the internal nodes and BLs are precharged to V<sub>P</sub>. In the subsequent OC phase, the core-NMOS transistors, M1 and M2, are diode-connected by M5 and M6 enabled by S<sub>OC</sub>. This allows the input-referred offset to be stored onto the parasitic capacitors at the BL nodes. After offset cancellation is complete, the wordline (WL) is activated during the CS phase through the control signal S<sub>WLT/WLB</sub>, allowing the charge stored in the memory cell to be shared with the charges in BL parasitic capacitor.

This charge redistribution creates a small difference voltage between BLT and BLB, which reflects the stored data. In the subsequent MS (main sensing) phase, the SA is enabled by activating the control signals LA and LAB, which drive the SA operation and amplify the small input differential into a full-swing digital output. Through this sequence, the circuit achieves precise and reliable data sensing while minimizing area and design complexity.

#### C. Overall Architecture

To validate the functionality and effectiveness of the SA with offset cancellation, a full DRAM prototype system is implemented, as illustrated in Fig. 4. The system is composed of a 1T1C cell array, a SA array, peripheral control logic, and a built-in self-test (BIST) structure for evaluation. Each component is co-designed to ensure full integration of the SA operation within a realistic DRAM environment.

The 1T1C cell array serves as the core memory array, where each memory bit is formed by a single transistor and a capacitor. Directly beneath the cell array, the SA array contains 64 instances of the offset-canceled SA. Each SA is connected to a pair of BLs from the cell array.

Peripheral timing and control signals are generated through a clock generation and driver block, which includes a row decoder and clock drivers. The row decoder activates the selected WL, while the clock driver distributes the required control signals across the SA array in synchronization with the global clock input  $CK_{\rm IN}$ . A dedicated controller orchestrates the entire access sequence, managing row activation, SA timing, and data validation steps.

At the bottom of the architecture, a column decoder and BIST block is implemented to enable automated functionality and yield characterization. The BIST block performs repeated write-and-read operations across all 64 SAs. Specifically, 64-bit data is written to the memory array and subsequently read back from the same locations to verify correct sensing. The BIST logic compares the read data with the expected values and accumulates the number of correctly sensed bits. This enables statistical yield analysis across multiple test iterations, allowing for robust evaluation of offset cancellation performance under realistic operating conditions.

This prototype architecture enables comprehensive testing of the SA under array-level constraints and allows quantitative assessment of sensing robustness and yield in the presence of circuit-level variations.

### III. RESULTS AND DISCUSSIONS

The performance of the SA with offset cancellation was evaluated through silicon measurement, with a focus on its input-referred offset characteristics under varying operational conditions. Three key sets of results are presented in Figs. 5-7. Fig. 5 illustrates the sensing probability as a function of the input cell voltage for various offset cancellation (OC) durations. The measurement was conducted with a fixed main sensing (MS) time of 5 ns. As



Figure 4. Overall architecture of DRAM prototype



Figure 5. Number of 1's over cell voltage



Figure 6. Standard deviation of SA offset over OC period

the input voltage increases, the SA's likelihood resolving to logic '1' also increases, forming a sigmoid-like transition curve. The steepness of this transition curve reflects the effective input-referred offset of the SA, and its standard deviation can be estimated from the measurement results. Fig. 6 presents the extracted standard deviation of the input-referred offset as a function of the OC period. As the OC



Figure 7. Standard deviation of SA offset over MS period



Figure 8. Die photograph

duration increases, the standard deviation of input-referred offset initially decreases due to more accurate storage of the internal mismatch on the cancellation capacitor. However, this improvement saturates near 5 ns, and extending the OC period beyond this point results in negligible additional benefit. Since the OC phase involves additional power consumption from static current, a longer OC duration leads to higher energy overhead without meaningful performance improvement. Therefore, a 5 ns OC duration provides an effective trade-off between offset reduction and power efficiency, making it a suitable design point. Fig. 7 shows the standard deviation of input-referred offset under varying MS durations, while keeping the OC period fixed at 5 ns. The results indicate that the standard deviation remains largely unaffected by the duration of MS phase. This suggests that, once the SA is enabled, the sensing decision is determined within the early portion of the MS phase, and extending the cycle does not contribute to further offset averaging or error suppression. Therefore, a 5 ns MS duration is sufficient for full sensing resolution without compromising accuracy or offset characteristics. The prototype was fabricated using a TSMC 65nm CMOS process and operates at a clock frequency of 400MHz. The SA consumes 56.8 fJ/cycle. A die photograph of the fabricated chip is shown in Fig. 8.

In summary, the measurements confirm that the SA architecture achieves robust offset cancellation within a short OC phase, and that its sensing performance remains stable across a range of MS durations, indicating that a 5ns MS period is sufficient for reliable operation. These results validate the design's suitability for area-constrained DRAM systems.

#### IV. CONCLUSION

This work analyzes a DRAM SA architecture incorporating an offset cancellation technique using parasitic bit-line The design achieves accurate offset cancellation, enabling robust sensing performance with minimal circuit overhead. A full prototype was fabricated in a TSMC 65 nm CMOS process, and silicon measurements demonstrated that the input-referred offset decreases with increasing OC duration and saturates near 5 ns. A 5 ns OC period provides a favorable trade-off between offset reduction and energy efficiency. Furthermore, the sensing accuracy remains consistent across different MS durations, indicating that a 5 ns MS period is sufficient for reliable operation. The architecture thus offers a compact and scalable solution suitable for future high-density DRAM applications.

#### ACKNOWLEDGMENT

The chip fabrication and EDA tool were supported by the IC Design Education Center (IDEC), Korea. This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-01172, DRAM PIM Design Base Technology Development).

#### REFERENCES

- [1] S. K. Park, "Technology scaling challenge and future prospects of DRAM and NAND flash memory." 2015 IEEE international memory workshop (IMW). IEEE, 2015.
- [2] J. Vollrath, "Signal margin analysis for DRAM sense amplifiers." *Proceedings First IEEE International Workshop on Electronic Design, Test and Applications'* 2002. IEEE, 2002.
- [3] J. M. Yoon, et al., "A capacitor-coupled offset-canceled sense amplifier for DRAMs with reduced variation of decision threshold voltage." *IEEE Journal of Solid-State Circuits* 55.8 (2020): 2219-2227.
- [4] S. Hong, et al., "An offset cancellation bit-line sensing scheme for low-voltage DRAM applications." 2002 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No. 02CH37315). Vol. 1. IEEE, 2002.
- [5] P. Huang, et al., "Offset-compensation high-performance sense amplifier for low-voltage DRAM based on current mirror and switching point." *IEEE Transactions on Circuits and Systems II: Express Briefs* 69.4 (2022): 2011-2015.
- [6] K. Nam, et al., "An Offset-Compensated Charge-Transfer Pre-Sensing Bit-Line Sense-Amplifier for Low-Voltage DRAM." 2024 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits). IEEE, 2024.
- [7] Y. Seo, "Sensor amplifier, memory device comprising same, and related method of operation." U.S. Patent No. 9,202,531. 1 Dec. 2015.



Giwoo Lee received the B.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2021. He is currently pursuing the integrated master's and doctoral degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon,

South Korea. His research interest includes memory IC and PIM for low power application.



Donghwan Kim received the B.S. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2019. He is currently pursuing the integrated master's and doctoral degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea. His research interest

includes memory IC and PIM for area efficient application.



SeongHwan Cho received the B.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 1995, and the M.S. and Ph.D. degrees in EECS from MIT, Cambridge, MA, USA, in 1997 and 2002, respectively.

In 2002, he joined Engim, Inc. Acton, MA, USA, where he was involved in data converters and

phased-locked loop (PLL) design for IEEE 802.11abg WLANs. Since 2004, he has been with the School of EE, KAIST, where he is currently a Professor and the Department Head of semiconductor system engineering. He was with Marvell Inc., Santa Clara, CA, USA, from 2011 to 2012, and Google, London, U.K., from 2016 to 2017, as the Research Scientist. His research interests include analog and mixed-signal circuits for high-speed communication, low-power sensors, memory, and machine learning.

Prof. Cho was a co-recipient of the 2009 IEEE Circuits and System Society Guillemin-Cauer Best Paper Award and the 2012 ISSCC Takuo Sugano Award for Outstanding Far-East Paper. He has twice received Outstanding Lecturer Award from KAIST. He has served on the Technical Program Committee on several IEEE conferences, including ISSCC, Symposium on VLSI and A-SSCC. He has served as an Associate Editor for IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, and Distinguished Lecturer of the IEEE Solid-State Circuits Society.