Technical Program - all times are in Eastern Time (ET)
Sunday, October 18, 2020
09:00-12:30 Tutorial 1: Developing HPC and ML Accelerators using Xilinx FPGAs
Session Chairs: Parimal Patel, Xilinx

This tutorial will introduce the Xilinx Vitis development environment for developing FPGA accelerators for HPC applications. Vitis supports OpenCL, C and C++. RTL design flows are also supported for experienced hardware developers. Each of these flows will be discussed along with the open-source Xilinx Runtime Library and Vitis open-source accelerated libraries. The latest available cloud and local hardware will be covered including AWS-F1, Nimbix, and the range of Alveo accelerator boards. Topics to be covered: 1) Xilinx Vitis development framework, design flows, and use cases, 2) AWS, Nimbix, and Alveo boards for FPGA acceleration, 3) Demonstration and hands-on-experience.
09:00-10:00 Special Session 1: Reliable Quantum Computing using Noisy Intermediate-Scale Quantum Systems
Chairs: Himanshu Thapliyal
334 (S) Daniel Volya and Prabhat Mishra. Impact of Noise on Quantum Algorithms in Noisy Intermediate-Scale Quantum Systems
332 (S) Himanshu Thapliyal, Edgard Munoz-Coreas and Vladislav Khalus. Quantum Carry Lookahead Adders for NISQ and Quantum Image Processing
335 (S) Susmita Sur-Kolay and Ritajit Majumdar. Quantum Error Correction in Near Term Systems
333 (S) Christopher Wood. Noise Characterization and Error Mitigation in Near-term Quantum Computers
10:15-11:15 Special Session 2: Revisiting Adiabatic Circuits in the Era of Energy-Efficiency and Security
Session Chairs: Emre Salman
338 (S) Krithika Dhananjay and Emre Salman. Adiabatic Circuits for Energy-Efficient and Secure IoT Systems
337 (S) Michael P. Frank, Robert W. Brocato, Thomas M. Conte, Alexander H. Hsia, Anirudh Jain, Nancy A. Missert, Karpur Shukla and Brian D. Tierney. Exploring the Ultimate Limits of Adiabatic Circuits
331 (S) Himanshu Thapliyal and S. Dinesh Kumar. A Novel Low-Power and Energy-Efficient Adiabatic Logic-In-Memory Architecture Using CMOS/MTJ
330 (S) Kohei Ogura and Yasuhiro Takahashi. An Adiabatic Logic Based Silicon Physical Unclonable Function
12:45-02:00 Tutorial 2: XTA: Open source eXtensible, scalable and adaptable Tensor Architecture for AI acceleration
Session Chairs: Sabya Das, Synopsys

Accelerator frameworks have gained prominence since the advent of AI applications. The limitation with current open source accelerator solutions is that it was not designed to be scalable and adaptable for commercial MPSoC products that have different network requirements and higher performance goals. We have implemented a new AI accelerator framework, XTA, derived from TVM-VTA which is a popular, first known, open source backend AI accelerator for Xilinx MPSoC. XTA is scalable and adaptable to various network types and workloads of AI applications. XTA is a multi-core architecture that can dynamically scale and adapt to a given AI problem at both hardware and software layers. XTA also supports parallel, pipelined processing and autotuning of subgraphs in a MPSoC environment.
11:30-12:45 Special Session 3: Secure and Trustworthy Bottom-Up Design Methods for Industrial Internet-of-Things in Cyber-Physical Systems
Session Chairs: Charalambos Konstantinou
344 (S) Dimitrios Tychalas and Michail Maniatakos. Potentially Leaky Controller: Examining Cache Side-Channel Attacks in Programmable Logic Controllers
343 (S) Solon Falas, Charalambos Konstantinou and Maria K. Michael. Physics-Informed Neural Networks for Securing Water Distribution Cyber-Physical Systems
339 (S) Feng Yu, Yaodan Hu, Teng Zhang and Yier Jin. Resilient Distributed Estimator with Information Consensus for CPS Security
341 (S) Anomadarshi Barua and Mohammad Abdullah Al Faruque. Noninvasive Sensor-Spoofing Attacks on Embedded and Cyber-Physical Systems
342 (S) Ioannis Zografopoulos, Juan Ospina and Charalambos Konstantinou. Harness the Power of DERs for Secure Communications in Electric Energy Systems
Monday, October 19, 2020
09:00-09:10 Welcome
09:10-10:40 Best Papers Session
Session Chair: Resit Sendag
248 (R) Jeongjun Lee and Peng Li. Reconfigurable Dataflow Optimization for Spatiotemporal Spiking Neural Computation on Systolic Array Accelerators.
179 (R) Bing Wu, Mengye Peng, Dan Feng and Wei Tong. DualFS: A Coordinative Flash File System with Flash Block Dual-mode Switching.
23 (R) Rashmi Agrawal, Lake Bu and Michel Kinsy. Quantum-Proof Lightweight McEliece Cryptosystem Co-processor Design.
209 (R) Jinwoo Kim, Venkata Chaitanya Krishna Chekuri, Nael Mizanur Rahman, Majid Ahadi Dolatsara, Hakki Torun, Madhavan Swaminathan, Saibal Mukhopadhyay and Sung Kyu Lim. Silicon vs. Organic Interposer: PPA and Reliability Tradeoffs in Heterogeneous 2.5D Chiplet Integration.
12 (R) Hengyu Zhao, Yubo Zhang, Pingfan Meng, Hui Shi, Li Erran Li, Tiancheng Lou and Jishen Zhao. Driving Scenario Perception-Aware Computing System Design in Autonomous Vehicles.
60 (R) Anuradha Ranasinghe and Sabih Gerez. MEPNTC: A Standard Cell Library Design Scheme, Extending the Minimum Energy Point Operation of Near-Threshold Computing.
96 (R) Heming Zeng, Chi Zhang, Chentao Wu, Gen Yang, Jie Li, Guangtao Xue and Minyi Guo. FAGR: An Efficient File-aware Graph Recovery Scheme for Erasure Coded Cloud Storage Systems.
10:50-11:40 Keynote: Vivienne Sze (Massachusetts Institute of Technology (MIT)), How to Evaluate Efficient Deep Neural Network Approaches
Session Chair: Maciej Ciesielski
11:50-12:40 1A: Novel Architectures
Session Chair: Callie Hao, Georgia Tech
11:50-12:40 1B: Storage Systems
Session Chair: Bingzhe Li, Oklahama State University
43 (R) Ali Ebrahim and Jalal Khlaifat. An Efficient Hardware Architecture for Finding Frequent Items in Data Streams 115 (R) Yaobin Qin, Xianbo Zhang and David Lilja. PBCCF: Accelerated Deduplication by Prefetching Backup Content Correlated Fingerprints
94 (R) Ivan Fernandez, Ricardo Quislant, Christina Giannoula, Mohammed Alser, Juan Gómez-Luna, Eladio Gutiérrez, Oscar Plata and Onur Mutlu. NATSA: A Near-Data Processing Accelerator for Time Series Analysis 39 (R) Suzhen Wu, Jindong Zhou, Weidong Zhu, Hong Jiang, Zhijie Huang, Zhirong Shen and Bo Mao. EAD: a Collision-free and High Performance ECC assisted Deduplication Scheme for Flash Storage
14 (R) Bahar Asgari, Ramyad Hadidi and Hyesoon Kim. MEISSA: Multiplying Matrices Efficiently in a Scalable Systolic Architecture 154 (S) Chunhua Xiao, Zipei Feng, Weichen Liu, Ting Wu, Lin Zhang and Dandan. COSMA: An efficient Concurrency-Oriented Space Management Scheme for In-memory File Systems.
283 (R) Yao Sun, Yingxun Fu and Tao Li. QuPAA: Exploiting Parallel and Adaptive Architecture to Scale up Quantum Computing 257 (R) Tianqi Zhan, Dan Feng, Xianpeng Wang and Wei Tong. AetEC: Adaptive error-tolerant Erasure Coding Scheme Within SSDs
118 (S) Hui Chen, Yina Lv, Changlong Li, Shouzhen Gu and Liang Shi. An Empirical Study of Hybrid SSD with Optane and QLC Flash
Tuesday, October 20, 2020
09:00-10:05 2A: Reliability and Fault Tolerance
Session Chair: Ilia Polian (University of Stuttgart, Germany)
09:00-10:05 2B: Memory and Cache Optimizations
Session Chair: Mohamed Zahran, NYU
35 (R) Mehran Goli, Alireza Mahzoon and Rolf Drechsler. ASCHyRO: Automatic Fault Localization of SystemC HLS Designs Using a Hybrid Accurate Rank Ordering Technique 75 (R) Xiaoyang Lu, Rujia Wang and Xian-He Sun. APAC: An Accurate and Adaptive Prefetch Framework with Concurrent Memory Access Analysis
86 (R) Leonid Yavits, Lois Orosa, Suyash Mahar, João Dinis Ferreira, Ran Ginosar, Onur Mutlu and Mattan Erez. WoLFRaM: Enhancing Wear-Leveling and Fault Tolerance in Resistive Memories Using Programmable Address Decoders 103 (R) Abhijit Das, Abhishek Kumar and John Jose. Reducing Off-Chip Miss Penalty by Exploiting Underutilised On-Chip Router Buffers
4 (R) Xiaoming Du and Cong Li. DPCLS: Improving Partial Cache Line Sparing with Dynamics for Memory Error Prevention 133 (R) Joe Augustine, Raghavendra Kanakagiri, John Jose and Madhu Mutyam. Router Buffer Caching for Managing Shared Cache Blocks in Tiled Multi-Core Processors
99 (R) Romain Mercier, Cédric Killian, Angeliki Kritikakou, Youri Helen and Daniel Chillet. Multiple Permanent Faults Mitigation through Bit-Shuffling for Network-on-Chip Architecture 142 (R) Kyle Kuan and Tosiron Adegbija. A Study of Runtime Adaptive Prefetching for STTRAM L1 Caches
307 (R) Haoqiang Guo, Lu Peng, Jian Zhang, Qing Chen and Travis D LeCompte. ATT: A Fault-Tolerant ReRAM Accelerator for Attention-based Neural Networks 165 (R) Zhulin Ma, Yujuan Tan, Hong Jiang, Zhichao Yan, Duo Liu, Xianzhang Chen, Qingfeng Zhuge and Edwin Hsing-Mean Sha. Unified-TP: A Unified TLB and Page Table Cache Structure for Efficient Address Translation
10:15-11:10 3A: Neural Networks on Edge Systems
Session Chair Yu Bi, University of Rhode Island:
10:15-11:15 3B: Design Automation
Session Chair: Sabya Das, Synopsys
62 (R) Amir Erfan Eshratifar and Massoud Pedram. Runtime Deep Model Multiplexing for Reduced Latency and Energy Consumption Inference 53 (R) Wentian Jin, Sheriff Sadiqbatcha, Zeyu Sun, Han Zhou and Sheldon Tan. EM-GAN: Fast Stress Analysis for Multi-Segment Interconnect Using Generative Adversarial Networks
132 (R) Md Jubaer Hossain Pantho, Pankaj Bhowmik and Christophe Bobda. Near-Sensor Inference Architecture with Region Aware Processing 158 (R) Xu He, Yipei Wang, Zhiyong Fu, Yao Wang and Yang Guo. Maximum Clique Based Method for Optimal Solution of Pattern Classification
159 (R) Jiangsu Du, Minghua Shen and Yunfei Du. A Distributed In-Situ CNN Inference System for IoT Applications 185 (S) Vladimir Herdt, Daniel Grosse, Sören Tempel and Rolf Drechsler. Adaptive Simulation with Virtual Prototypes for RISC-V: Switching Between Fast and Accurate at Runtime
292 (R) Xiangzhong Luo, Di Liu, Hao Kong and Weichen Liu. EdgeNAS: Discovering Efficient Neural Architectures for Edge Systems 315 (R) Marcos T. Leipnitz and Gabriel Nazar. Throughput-Oriented Spatio-Temporal Optimization in Approximate High-Level Synthesis
324 (R) Zhuolun He, Yuzhe Ma, Lu Zhang, Peiyu Liao, Ngai Wong, Bei Yu and Martin D. F. Wong. Learn to Floorplan through Acquisition of Effective Local Search Heuristics
11:20-12:00 4A: Stochastic and Approximate Computing
Session Chair: Hassan Najafi, U of Louisiana
11:25-12:05 4B:Low Power and Energy-Efficient Computing
Session Chair: Lu Peng, Louisina State University
9 (S) Abdulqader Mahmoud, Frederic Vanderveken, Christoph Adelmann, Florin Ciubotaru, Said Hamdioui and Sorin Cotofana. 2-input 4-output Programmable Spin Wave Logic 258 (R) Joonas Multanen, Kari Hepola and Pekka Jääskeläinen. Programmable Dictionary Code Compression for Instruction Stream Energy Efficiency
72 (S) Rohit Sreekumar, Prattay Chowdhury and Benjamin Carrion Schafer. Bespoke Approximate Behavioral Processors 190 (S) Giovanni Bambini, Robert Balas, Christian Conficoni, Andrea Tilli, Luca Benini, Simone Benatti and Andrea Bartolini. An Open-Source Scalable Thermal and Power Controller for HPC Processors
145 (R) Ponnanna Kelettira Muthappa, Florian Neugebauer, Ilia Polian and John Hayes. Hardware-based Fast Real-time Image Classification with Stochastic Computing 234 (R) Sandeep Krishna Thirumala, Arnab Raha, Vijay Raghunathan and Sumeet Kumar Gupta. IPS-CiM: Enhancing Energy Efficiency of Intermittently Powered Systems with Compute-in-Memory
201 (R) Kunal Bharathi, Jiang Hu and Sunil P. Khatri. Scaled Population Subtraction for Approximate Computing 319 (S) Ki-Dong Kang, Hyungwon Park, Gyeongseo Park and Daehoon Kim. Improving the Efficiency of Power Management via Dynamic Interrupt Management
12:10-12:50 5A: Test and Verification
Session Chair: Rathish Jayabharathi, Intel
12:15-12:50 5B:Hybrid Memory Systems
Session Chair: Alaa Alameldeen (Simon Fraser U)
70 (R) Cheng Tan, Chenhao Xie, Ang Li, Kevin Barker and Antonino Tumeo. OpenCGRA: An Open-Source Framework for Modeling, Testing, Evaluating CGRAs 63 (S) Wenpeng He, Fang Wang and Dan Feng. H²ORAM: Low Response Latency Optimized ORAM for Hybrid Memory Systems
110 (S) Jin Wu, Jian Dong, Ruili Fang, Wenwen Wang and Decheng Zuo. PerfDBT: Efficient Performance Regression Testing of Dynamic Binary Translation 131 (R) Rui Xu, Edwin Hsing.-Mean Sha, Qingfeng Zhuge, Shouzhen Gu and Liang Shi. Optimizing Data Placement for Hybrid SPM with SRAM and Racetrack Memory
113 (S) Rahul Krishnamurthy and Michael Hsiao. Transforming Natural Language Specifications to Logical Forms for Hardware Verification 130 (S) Chao-Hsuan Huang and Ishan Thakkar. Mitigating the Latency-Area Tradeoffs for DRAM Design with Coarse-Grained Monolithic 3D (M3D) Integration
123 (R) Yongjian Li, Taifeng Cao, David N. Jansen, Jun Pang and Xiaotao Wei. Accelerated Verification of Parametric Protocols with Decision Trees 156 (S) Beomjun Kim, Prashant Nair and Seokin Hong. ADAM: Adaptive Block Placement with Metadata Embedding for Hybrid Caches
Wednesday, October 21, 2020
09:00-10:00 6A: Logic and Circuit Design
Session Chair: Hamed Tabkhivayghan, UNCC
09:00-10:00 6B: Hardware Accelerators for Neural Networks
Session Chair: Sebastien Pillement, Université de Nantes, France
49 (R) Anthony Agnesina, Da Eun Shim, James Yamaguchi, Christian Krutzik, John Carson, Daniel Nakamura and Sung Kyu Lim. A Fault-Tolerant and High-Speed Memory Controller Targeting 3D Flash Memory Cubes for Space Applications 57 (S) Xiaowei Wang, Li Zhao and Pengcheng Li. High Throughput CNN Inference and Training with In-Cache Computation
108 (R) Ankit Wagle, Sunil Khatri and Sarma Vrudhula. A Configurable BNN ASIC using a network of Programmable Threshold Logic Standard Cells 59 (S) Hongxiang Fan, Martin Ferianc, Shuanglong Liu, Zhiqiang Que, Xinyu Niu and Wayne Luk. RNAS: Reconfigurable CNN Accelerator with Differentiable Neural Architecture Search
178 (S) Raghda El Shehaby and Andreas Steininger. On the Effects of Permanent Faults in QDI Circuits - A Quantitative Perspective 73 (R) Xinyi Zhang, Weiwen Jiang and Jingtong Hu. Achieving Full Parallelism in LSTM via a Unified Accelerator Design
183 (R) Ross Thompson and James Stine. A Novel Rounding Algorithm for a High Performance IEEE 754 Double-Precision Floating-Point Multiplier 184 (R) Dawen Xu, Cheng Chu, Cheng Liu, Qianlong Wang, Ying Wang, Lei Zhang, Huaguo Liang and Kwang-Ting Tim Cheng. A Hybrid Computing Architecture for Fault-tolerant Deep Learning Accelerators
198 (S) Farid Ahmed, Zarin Tasnim Sandhie and Masud H Chowdhury. An Implementation of External Capacitor-less Low-DropOut Voltage Regulator in 45nm Technology with Output Voltage Ranging from 0.4V-1.2V 237 (R) Jianhao Chen, Joseph Riad, Edgar Sánchez-Sinencio and Peng Li. Dynamic Heterogeneous Voltage Regulation for Systolic Array-Based DNN accelerators
302 (S) Wei Chu, Wei-Hao Chen and Shi-Yu Huang. Duty-Cycle Correction for a Clock Signal Supporting A Super-Wide Frequency Range from 10MHz to 1.2GHz 281 (S) Rui Xu, Sheng Ma, Yaohua Wang and Yang Guo. CMSA: Configurable Multi-directional Systolic Array for Convolutional Neural Network Accelerators
10:10-11:00 7A: Smart Embedded Systems
Session Chair: Christian Pilato, Politecnico di Milano
10:10-11:00 7B: Security I
Session Chair: Jakub Szefer (Yale)
74 (R) Ajinkya Bankar, Shi Sha, Vivek Chaturvedi and Gang Quan. Thermal Aware Lifetime Reliability Optimization for Automotive Distributed Computing Applications 2 (S) Xin Wang and Wei Zhang. pacSCA: A Profiling-Assisted Correlation-based Side-Channel Attack on GPUs
221 (S) Md Toufiq Hasan Anik, Mohammad Ebrahimabadi, Hamed Pirsiavash, Jean-Luc Danger, Sylvain Guilley and Naghmeh Karimi. On-Chip Voltage and Temperature Digital Sensor for Security, Reliability, and Portability 104 (R) Md Hafizul Islam Chowdhuryy, Hang Liu and Fan Yao. BranchSpec: Information Leakage Attacks Exploiting Speculative Branch Instruction Executions
276 (R) Zhe Jiang, Shuai Zhao, Dong Pan, Dawei Yang, Nan Guan, Neil Audsley and Ran Wei. Re-Thinking Mixed-Criticality Architecture for Automotive Industry 114 (R) Md Shohidul Islam, Abraham Kuruvila, Kanad Basu and Khaled N. Khasawneh. ND-HMDs: Non-Differentiable Hardware Malware Detectors against Evasive Transient Execution Attacks
300 (R) Sangyoung Park, Swaminathan Swaminathan and Samarjit Chakraborty. Design-Time Optimization of Reconfigurable PV Architectures for Irregular Surfaces 191 (R) Yukui Luo, Cheng Gongye, Shaolei Ren, Yunsi Fei and Xiaolin Xu. Stealthy-Shutdown: Practical Remote Power Attacks in Multi-Tenant FPGAs
321 (R) Wei Yang, Hailong Zhang, Yansong Gao, Anmin Fu and Songjie Wei. Side-Channel Leakage Detection Based on Constant Parameter Channel Model
11:10-11:55 8A: Non-Volatile Memory
Session Chair: Zhe Wang, Intel
11:20-12:40 8B: Potpourri
Session Chair: Rasit Topaloglu, IBM
48 (S) Zhiyuan Lu, Jianhui Yue, Yifu Deng and Yifeng Zhu. Improving the Performance of NVM Crash Consistency under Multicore 83 (R) Junichiro Kadomoto, Hidetsugu Irie and Shuichi Sakai. Design of Shape-Changeable Chiplet-Based Computers Using an Inductively Coupled Wireless Bus Interface
295 (R) Ning Bao, Yunpeng Chai, Chuanwen Wang and Dafang Zhang. More Space may be Cheaper: Multi-Dimensional Resource Allocation for NVM-based Cloud Cache 122 (S) Qiong Chang, Aolong Zha, Weimin Wang, Xin Liu, Masaki Onishi and Tsutomu Maruyama. Z^2-ZNCC: ZigZag Scanning based Zero-means Normalized Cross Correlation for Fast and Accurate Stereo Matching on Embedded GPU
297 (R) Chundong Wang and Sudipta Chattopadhyay. Isle-Tree: A B+-Tree with Intra-Cache Line Sorted Leaves for Non-volatile Memory 124 (R) Haoran Zhao, Tian Xia, Chenyang Li, Wenzhe Zhao, Nanning Zheng and Pengju Ren. Exploring Better Speculation and Data Locality in Sparse Matrix-Vector Multiplication on Intel Xeon
298 (R) Wei Li, Libing Wu, Mengting Yuan, Jason Xue, Jingling Xue and Qingan Li. Loop2Recursion: Compiler-Assisted Wear Leveling for Non-Volatile Memory 146 (R) Wei Wang, Lei Cui, Zhiyu Hao, Haiqiang Fei and Chonghua Wang. pRnR: A Parallel Record-Replay Framework for Virtual Machines
147 (S) Tom Glint, Jitesh Sah, Manu Awasthi and Joycee Mekie. ANSim: A Fast and Versatile Asynchronous Network-On-Chip Simulator
277 (R) Xi Zeng, Tian Zhi, Zidong Du, Qi Guo, Ninghui Sun and Yunji Chen. ALT : Optimizing Tensor Compilation in Deep Learning Compilers with Active Learning
288 (R) Hyunjong Choi, Mohsen Karimi and Hyoseung Kim. Chain-Based Fixed-Priority Scheduling of Loosely-Dependent Tasks
12:05-12:50 9A: Security II
Session Chair: Nader Sehatbakhsh, UCLA
87 (R) Kalle Ngo, Elena Dubrova and Michail Moraitis. Attacking Trivium at the Bitstream Level
116 (R) Han Wang, Hossein Sayadi, Gaurav Kolhe, Avesta Sasan, Setareh Rafatirad and Houman Homayoun. Phased-Guard: Multi-Phase Machine Learning Framework for Detection and Identification of Zero-Day Microarchitectual Side-Channel Attacks
294 (R) Prashanth Mohan, Wen Wang, Bernhard Jungk, Ruben Niederhagen, Jakub Szefer and Ken Mai. ASIC Accelerator in 28 nm for the Post-Quantum Digital Signature Scheme XMSS
303 (S) Zhixin Pan, Jennifer Sheldon and Prabhat Mishra. Hardware-Assisted Malware Detection using Explainable Machine Learning
