EFRI-COPN_ Damage and Failure Tolerant Adaptive Flight Control ...

By Vernon Warren,2014-02-10 01:56
6 views 0
EFRI-COPN_ Damage and Failure Tolerant Adaptive Flight Control ...

1. Vision and Goals

    The goal of the proposed activity is to develop new, spiking adaptive critic theory and algorithms for the design and implementation of nanoscale neuromorphic chips that obey principles observed in biological neuronal networks, such as, spike-timing dependent plasticity (STDP). Recently, Mazumder’s group

    demonstrated the experimental implementation of synaptic functions, such as STDP, in a hybrid nanoscale silicon circuit consisting of CMOS neurons and memristor synapses [1]-[4]. Because it can recreate the same type of synaptic plasticity, device density, scalability, and fault-tolerance observed in biological neuronal networks, this new generation of CMOS/memristor devices has the potential for enabling the development of nanoscale neuromorphic circuits that can adaptively interact with uncertain and changing environments [5,6]. The fundamental challenge that must first be overcome in this development is controlling neuron-level activity (spiking) and plasticity (STDP) in CMOS/memristor-devices, such that the subsequent system-level response achieves desired macroscopic behavioral goals. Recent studies have shown that, in biological sensorimotor systems, the neuron-level activity is modulated by dopaminergic (DA) neuronal activity in the basal ganglia that resembles that observed in the critic network, and is related to internal representations of macroscopic behavioral goals [7]-[21]. However, existing adaptive critic designs are not applicable to CMOS/memristor training, because they rely on the direct manipulation of synaptic weights which is unsupported in STDP memristor hardware. The proposed research will develop CMOS/memristor adaptive controllers that can be trained to solve sensorimotor tasks adaptively, using spike-based adaptive critic designs in which learning occurs by adapting a deterministic spike model of adjustable radial basis functions (RBFs). The proposed activity will develop mathematical models of state-of-the-art CMOS/memristor devices, and demonstrate through virtual experiments that they can be used to develop neuromorphic systems characterized by very-low power computing, large synaptic integration, and neuronal and synaptic densities unheard of in analog neuromorphic chips [22,23]. The neuromorphic systems developed in this research will be tested on aerial and ground robots that must perform complex sensorimotor tasks adaptively, such as, processing sensory input to map an unknown and variable environment, while simultaneously avoiding collisions with fixed and moving obstacles. The specific aims of the proposed research are: Develop new theory and algorithms for CMOS/memristor adaptive critic control systems. Develop CMOS/memristor mathematical models and software prototyped in simulation environment

    using SPICE, Verilog, and Simulink, and interfaced with an overarching adaptive critic software. Demonstrate CMOS/memristor-based neuromorphic systems through simulations and experiments on

    robot sensorimotor learning and control, using 3D physics-based PSG robotic software. The outcome of the proposed activity will be CMOS/memristor adaptive critic software for robotic applications, which will be made available to the scientific community (see Data Management Plan). 2. Background and Introduction

    The sequential processing of fetch, decode, and execution of instructions through the classical von Neumann bottleneck of conventional digital computers has resulted in less efficient machines as their ecosystems have grown to be increasingly complex. Though current digital computers possess the computing speed and complexity to emulate elements of brain functionality from a spider, mouse, and cat,


    the associated energy dissipation grows exponentially along the hierarchy of animal intelligence [24]-[26]. Biological brains are configured quite differently from the von Neumann digital architecture. The keys to both their high efficiency and flexibility are the large connectivity between neurons, which offers highly parallel processing power, and the ability to precisely adjust the synaptic strength or weight between two

    neurons by means of precisely-timed changes in the ionic transport through them, known as action potentials or spikes [27]. A biological synapse can be viewed as a two-terminal device that can now be mimicked by fabricating electrical nanodevices named memristors (memory + resistor) [28]. Recently, Mazumder’s group has verified that the conductance of a memristor can be incrementally modified by

    precisely timing and controlling pulses of electric voltage applied to pre- and post-synaptic CMOS neurons similarly to a biological synapse [1,4]. These and other recent results on memristor research provide direct experimental support for the future development of memristor-based neuromorphic systems comprised of very-large-scale integration (VLSI) systems containing mixed-mode analog/digital circuits and software that mimic the nervous system [5]-[6].

    The outstanding challenge in developing memristor-based neuromorphic systems is being able to induce STDP through programming voltages in such a way that the subsequent macroscopic response of the VLSI achieves the desired behavioral objectives. The same critical challenge has been identified in neuroscience research aimed at reverse engineering the brain [7]-[16], where one of the main stumbling blocks is understanding the relationship between observed rates and patterns of neuronal activity (spiking), and the resulting behavioral response. Due to this knowledge gap, even when a measure of adequate or desired behavior is available, it may not be easily utilized to stimulate a neural network at the cell level in order to produce the appropriate macroscopic behavior. The proposed research aims to help fill this

    knowledge gap by developing new spike-based training algorithms that adapt (or modulate) a new deterministic RBF spike-train model used to stimulate and induce plasticity in the actor network based on the feedback from the critic. In this novel paradigm, the critic learns a macroscopic representation of the behavioral goals, which are then optimized by adapting the microscopic neuron-level activity through the spike-train model. The paradigm will be tested and demonstrated on CMOS/memristor-based adaptive critic hardware model applied to adaptive robot sensorimotor control.

    3. Proposed Research

    The proposed research effort aims to model and replicate in hardware the closures required to translate cell-level neuronal activity and plasticity into system-level sensorimotor control and learning abilities analogous to those exhibited by biological brains. Existing adaptive critic designs have been shown very effective for designing control systems that learn and adapt in real time, subject to nonlinear, non-convex plant dynamics, and uncertain and changing environments [29]-[42]. These designs rely on the use of artificial neural networks (ANNs) in which (i) each synaptic weight can be manipulated directly by the training algorithm, and (ii) the input/output relationship of the network can be modeled compactly in closed form. However, assumptions (i)-(ii) are both violated when dealing with CMOS/memristor hardware. The reasons are that, in CMOS/memristors, (I) the synaptic weights can only be adjusted by manipulating pre- and post-synaptic spike trains, and (II) the neural activity is modeled by nonlinear differential equations for which the solution is not available in closed form. Thus, the output response can only be obtained through numerical integration for a given set of inputs and initial conditions.


    The proposed research will develop new adaptive critic theory and algorithms for adaptive control systems comprised of CMOS/memristor actor and critic networks (Section 3.3). Each CMOS/memristor chip will be modeled as a spiking neural network (SNN) using the approach described in Section 3.2, and trained to represent the control law (actor), or the behavioral goals (critic). A spike-based adaptive critic approach, presented in Section 3.3.2, will be utilized to translate the behavioral goals learned by the critic chip into optimal stimulation patterns for the actor chip, ultimately training the adaptive critic via STDP to perform and optimize desired behavioral goals. As a result, the actor and the critic will learn the missing closures implicitly, without an explicit model of the chip output response. The approach will be demonstrated through sensorimotor applications involving autonomous robots that must perform sensing and control tasks simultaneously, such as, navigating an unknown environment, while performing simultaneous localization and mapping (SLAM), as explained in Section 3.4. The proposed activity will leverage ongoing NSF-funded research efforts by the PIs (Section 6), through which the students funded on this project would have access to experimental research and facilities on biological neuronal networks grown in vitro, a testbed of autonomous robots equipped with sophisticated sensor, communication, and gripper/manipulator devices, and state-of-the-art CMOS/memristor fabrication and development. 3.1 CMOS/Memristor Device Technology

    The power and size required by conventional computers are a significant impediment to the development of intelligent robots, able to replicate the abilities of biological brains to solve complex sensorimotor control problems, and to adapt to complex, variable environments. Biological brains are configured dramatically differently from the von Neumann digital architecture, and function based on mechanisms that are far removed from classical ANNs. Recent developments in nanoscale memristor devices [1]-[43] have shown that a hybrid system composed of CMOS neurons and memristor synapses can support important synaptic functions, such as spike-timing dependent plasticity (STDP) and winner-take-all (WTA) neurons, that are believed to play a key role in the adaptation of biological synaptic strengths, also known as synaptic-level plasticity, which enables biological systems to learn and function as efficiently and effectively as they do. Previous CMOS Analog VLSI hardware implementations for mimicking STDP [43,44] have been characterized by several drawbacks, such as, excessive consumption of silicon area, lack of synaptic connection densities found in the biological world, and binary implementation of the synapse, that have suggestively limited their scope and applicability to VLSI neuromorphic systems that closely mimic biological nervous systems. The CMOS implementation in [43] employs weak inversion transistors and large area capacitors in order to obtain response times in the millisecond (ms) range. The CMOS implementation in [44] uses binary synapses in order to conserve area. From a density and processing power perspective, CMOS STDP disadvantages have won out over its advantages especially compared to DSP chips and digital circuit implementations.

    Mazumder in collaboration with his colleague, Wei Lu, have recently shown through fabricated devices and off-shelf components how STDP may be realized with the memristor (Fig. 1.a) [1]. They have developed programmable synapses (area ~ 100 nm x 100 nm) consisting of a co-sputtered Ag and Si layer with a properly designed mixture ratio gradient that forms a front between the Ag-rich (high conductivity) and Ag-poor (low conductivity) regions in the active switch layer [43]. Injection and depletion of ions in the memristor moves the front between the ion rich region and ion-poor region, and


    causes a continuous change in conductance. The proof-of-concept neuron circuit consists of integrate-and-fire neurons, which provide the potentiation (LTP) and depression (LTD) timing data. The synaptic weight of the memristor synapse was measured and the change of the synaptic weight versus the spike timing of the pre- and post-neurons. The data (Fig. 1.c), fit by an exponential decay function, experimentally confirmed that STDP characteristics similar to those of biological synapses (Fig. 1.b) can be obtained with the Ag-aSi-cSi based memristor synapse (Fig. 1.a).

    错误;链接无效。Fig. 1. Schematic of concept using nanoscale memristors as synapses between neurons (a), and demonstration of how the measured change in excitatory postsynaptic current (EPSC) in a rat

    hippocampal neuron (b) can be replicated by STDP in the memristor synapse (c).

    The memristor technology that will be modeled in this research is that of the nanodevice developed in HP labs [28], with two layers of TiO thin film material. One of the thin film layers is doped with oxygen 2

    which reduces the resistivity of this layer, while the other layer is left undoped. The total resistance of the nanodevice depends on the resistance combination of both TiO layers. The memristor’s resistance 2

    (memristance) can be modulated by electrically biasing (current or voltage) the device. The current through the device moves oxygen dopants laterally, thereby widening (narrowing) the doped region depending on bias direction [28]. The neuromorphic adaptive critic design will not depend on one memristor alone but on the combinatory effect of multiple memristors. The larger the number of dependencies, the less effect one device has on the target outcome. For a purely digital CMOS case, the memristor with low yield and untested lifetime drift is not very attractive. Digital CMOS depends on certainty and Boolean logic, hence just like 2+2 cannot equal 3, the deterministic nature of Boolean logic for digital computation cannot tolerate failure. However, similarly to biological brains, a neuromorphic system would be able to tolerate some errors [45].

    The memristor responds to the pulse in different ways depending on its current state. Since the device is a state-based device, the approach to pulse response is very different depending on its starting point. Following the linear-drift model result in [46], memristance as a function of flux is,

    2R(t) (1) M(t)R10T2QR00

    where, M is the total memristance at time t. R is the initial resistance of the memristor, is the sign of T0

    the applied bias (+/?1 for positive/negative bias), ΔR is the memristor’s resistive range, defined as the

    difference between the maximum and the minimum resistance, and is the total flux through the device

    at time t. Q is the charge required to pass through the memristor for dopant boundary to move a distance 0

    comparable to the device width. Then, the memristance as a function of flux, plotted in Fig. 2.a, exhibits an analog behavior when flux values are closer to zero, and a digital behavior as flux increases. The continuum provides an operating region for memristors, and also presents “test” cases for failed states.

    Memristance can vary widely in a 20MΩ span, but gradual percentage change is preferable to a sudden

    change for neuromorphic applications. Sudden change would be a “test” failed state; “test” because the

    device can be corrected to a low flux state which will put it back to an analog state. Proper use of pulses will try to traverse the memristance curve and keep the memristor in the analog mode of operation. While STDP and other neuron functionalities, such as WTA and LTP, can also be implemented in hybrid CMOS/memristor chips [4], the introduction of external reward signals that modify the synaptic


weights directly is not easily implemented in the memristor. Therefore, the proposed research will

    develop a framework of hardware implementation that relies purely on STDP and neuron spiking, and an external overarching adaptive critic software for modulating the pre- and post-synaptic neuron firing in the memristor based on the high-level behavioral response and reward. In order to train the hardware

    implementation to solve complex control problems adaptively, this research will develop a novel spike-based adaptive critic approach, described in Section 3.3. Developing CMOS/memristor neuromorphic systems will also require implementing and integrating new methods for SNN modeling and simulation with CMOS/memristor circuits, which will be accomplished as explained in the next subsection. 错误;链接无效。Figure 2. In the slowly-changing region of operation for the memristor the magnitude of memristance-change ranges from ~2 MΩ to 3 MΩ for every 1 Wb flux change, while when the flux

    exceeds 2.5Wb the change in memristance increases drastically (a); circuit diagram for implementing Izhikevich neuron dynamics, with state variables representing the voltages across capacitors C and C VU


    3.2 Computational Models of CMOS/Memristor Chips

    Computational models of prototyped CMOS/memristor devices will be developed using SPICE, Verilog, and Simulink software that can be easily integrated with adaptive critic and robotic simulation software (Section 3.4) on conventional digital computers. The CMOS/memristor computational models will then be used to develop, analyze, and test novel CMOS/memristor adaptive critic designs prior to implementing them in silico. At this time, only virtual CMOS/memristor implementation will be pursued and additional funding will be sought for fabrication of actual neuromorphic circuits. Physical tests and experiments conducted by Mazumder through ongoing funding will be used to refine and calibrate the chip computational models, until a faithful software model is obtained for the combined CMOS/memristor hardware and overarching adaptive critic architecture. Then, the software model will be demonstrated through the sensorimotor learning and control applications described in Section 3.4. Various models of SNNs have been proposed in recent years, motivated by biological studies that have shown complex spike patterns and dynamics to be an essential component of information processing and learning in the brain. The two crucial considerations involved in determining a suitable SNN model are the range of neuro-computational behaviors the model can reproduce, and its computational efficiency [47]. As can be expected, the implementation efficiency typically increases with the number of features and behaviors that can be accurately reproduced [47], such that each model offers a tradeoff between these competing objectives. The leaky integrate-and-fire (LIF) is the simplest model of spiking neuron. It has the advantages that it displays the highest computational efficiency, and is amenable to mathematical analysis. While CMOS neurons are able to produce the firing dynamics of the LIF model [1], these dynamics only capture the behavior of Class-1 excitable neurons that fire tonic spikes. The computational neuron model that is most biophysically accurate is the well known Hodgkin-Huxley (HH) model [48]. Due to its extremely low computational efficiency, however, using the HH model to simulate large networks of neurons can be computationally prohibitive [47]. Recently, bifurcation studies have been used to reduce the HH dynamics from four to two differential equations, referred to as the Izhikevich model, which are capable of reproducing a wide range of spiking patterns and behaviors with much higher efficiency than the HH model [47,49]. In the proposed research, CMOS

    neurons will be developed to replicate the Izhikevich model dynamics, by which the membrane potential,


    , are governed by a system of two ordinary differential v, and the membrane recovery variable, ?

    equations (ODEs) obtained from the fast and slow nullclines of current balance equations in the HH model. While the HH model requires tens of parameters, the Izhikevich model relies on four dimensionless parameters, denoted by a, b, and c and d, which represent the time constant of , the ?

    sensitivity of to subthreshold fluctuations in v, and the after-spike reset values of v caused by fast/slow ?

    high-threshold of Na and K ions, respectively. Proper choice of these four parameter values allows to represent various firing patterns using the ODEs,

    2 (2) dv/dt0.04v5v140?I

     (3) d?/dta(bv?)

    where, after the spike reaches its apex (+30 mV), v and ? are reset according to the following,

    if ? 30 mV, then and . (4) ?vc??d

    Synaptic currents from connected pre-synaptic neurons or injected through dc-current stimulation are captured in I, and the firing threshold depends on the history of the membrane potential. Chattering, regular spiking, fast spiking, intrinsically bursting, and low-threshold spiking (LTS) can all be reproduced by this reduced-order dynamical model. From an analog CMOS standpoint, the LIF neuron structure is simple and amenable, whereas the second-order dynamics and nonlinearities of the Izhikevich model are difficult to replicate by using the linear operating region of CMOS hardware.

    Therefore, one goal of the proposed research is to investigate and simulate a CMOS neuron that can

    reproduce the complex spike patterns and behaviors of the Izhikevich model. The squared behavior exhibited by (2) can be achieved with the relationship between the drain current and gate voltage of the Metal-Oxide Semiconductor Field Effect Transistor (MOSFET), while the reset parameters in (4) can be obtained with comparators that reset v and υ to their corresponding starting values. Equation (3) will be

    approximated by the CMOS neuron model in Fig. 2.b, where the quadratic relationship between the gate voltage and drain current are used not only to implement the squared function, but also to use the current integrative effect of capacitors to realize two state variables representing the voltage across the capacitors C and C (Fig. 2.b). The circuit diagram is divided into two parts: the left diagram composing of M1-VU

    M3 determines the voltage across C, while the right diagram composing of M4-M6 determines the V

    voltage across C. A first-order analysis in saturation then yields the following results: U

    dvk(W/L)k(W/L)k(W/L)22113344 (5) (vv)(vVv)thDthSATdt2Ck(W/L)2CVV22

    ?dk(W/L)k(W/L)225566 (6) (vVv)(?Vv)DthDDthSATdt2C2CUU

    Where, and are the voltages across the capacitors C and C respectively, is the width to (W/L)v?VUx

    length ratio of transistor , is a constant parameter for transistor that describes the effect of carrier kxxx

    mobility and oxide capacitance, is the threshold voltage of the transistors, is the maximum drain vVthDSAT

    voltage, and is the supply voltage. By comparing (5) to (2) it can be seen that the only terms missing VDD

    in (5) are inputs I and ?. These terms can be included by adding current mirrors that sum up at the other terminal of CV, not connected to VSS. Subsequently, (5) will approximate (2) fairly accurately, with


    deviations in the parameters that will depend on the transistor and capacitor sizing. Since (6) does not approximate (3) as accurately, the proposed research will investigate strengthening the coefficients of the

     and V. linear terms after expansion by selecting proper values of VDDD SAT

    The CMOS/memristor synapses will be allowed to change over time according to the STDP rule, which, at every ODE-integration time step Δt adapts the synaptic weight w between a pre-synaptic ij

    neuron j and a neuron i according to the relative timing of the pre- and post-synaptic spikes, denoted by ˆˆ and , respectively, as follows, ttji

    ˆˆˆˆ, where . (7) w(tt)w(t)w(t)w?sgn(tt)exp(|tt|/?)ijijijijijijd

    Where, sgn(?) is the signum function, |?| is the absolute value, and τ is a reference time delay. Then, as d

    observed in biological neurons, if the presynaptic neuron fires before the postsynaptic neuron, the synapse is strengthened, and if it fires after, the synapse is weakened. Thus, |w| decreases exponentially with ij

    timing between the two spikes, such that when the neurons fire closer in time, the synaptic weight undergoes a greater change, as shown by the experimental results plotted in Figs. 1b and 1c. The number of CMOS neurons and their connectivity will be specified by the chip fabrication design constraints, i.e. chip area (affects cost), number of metal layers, and CMOS level density requirements [1]-[4]. 3.3 Neuromorphic Adaptive Critic Design

    Several authors have demonstrated that adaptive critic controllers can solve complex, nonlinear control problems, and optimize performance in the face of unforeseen changes and uncertainty, such as modeling errors, unmodeled dynamics, plant damage, and control failures [29]-[42]. Recently, adaptive critics have also been linked to the brain by experimental studies suggesting that the activity of dopamine neurons in the basal ganglia acts as a reinforcement signal necessary to sensorimotor learning that resembles the signal exchanged by the actor and the critic in adaptive dynamic programming (ADP) algorithms [17]-[21]. So far, existing ADP algorithms have used conventional (e.g., sigmoidal) artificial neural networks to learn the control law and cost-to-go over time, in applications where these mappings cannot be determined a priori, or in order to overcome the curse of dimensionality [50]. The proposed research,

    will extend and apply the adaptive critic formalism to learn the closures required to translate the cell-level neuronal activity into control law and cost-to-go mappings that meet the desired behavioral goals. 3.3.1 Problem Formulation

    The neuromorphic systems developed in this research will consist of a CMOS/memristor critic (SNN) c

    that learns the behavior goals from the plant’s output and trains a CMOS/memristor actor network (SNN) a

    to control a plant with continuous inputs and outputs, such as a mobile robot or a robotic arm. In each network, three subsets of CMOS

    spiking neurons will be tagged as

    input, output, and training

    neurons (Fig. 3), based on the Figure 3. Example of neuromorphic adaptive critic design

    ) and critic comprised of nanoscale CMOS/memristor actor (SNNcontrol problem formulation. a

    (SNN) chips trained by an overarching ADP software via STDP. cTwo SNNs, e.g. SNN and SNN ca

    in Fig. 3, are said to be connected

    when the output neurons of SNN c


     via spike trains. When the output of an SNN is fed to a functional stimulate the input neurons of SNNp

    block with continuous inputs and outputs (e.g. SNN is used to control a sensor or robot), the spike trains a

    of every SNN output neuron will be processed and integrated, or decoded, to produce a corresponding

    time-varying continuous-output signal. Continuous output signals generated from a functional block to an SNN (e.g. from the plant to SNN in Fig. 3) will be converted into spike trains, or encoded, using an a

    integrate-and-fire (IF) sampler to stimulate the SNN input neurons [51]. Neural coding remains one of the fundamental issues in neuroscience [52,53], because while most methods rely on spike-count firing rates [54]-[58], it is well known that firing rates do not provide sufficient information for reconstructing the type of complex, time-varying signals processed by biological brains. This research will explore and

    implement state-of-the-art coding techniques that avoid binning and averaging to retain the temporal structure of the neuronal activity (e.g., see [59]-[61]).

    At the hardware level, the SNN actor and critic will be adapted by a new STDP training algorithm supervised by an adaptive critic architecture through policy and value-iteration routines implemented at the overarching software level. Typically, ADP policy and value-iteration procedures are used to update the actor and critic weights by a synaptic weight increment Δw computed from the ANN gradient [42, ij

    p.359]. However, this approach is unsupported in CMOS/memristors, where changes in the synaptic

    weights can occur only as a result of pre- and post-synaptic neuronal activity. Therefore, this research will develop a new approach by which the policy and value-iteration procedures supervise stimulation patterns in the actor and critic SNNs, such that their synaptic strengths, and subsequently their dynamic mappings, are adapted according to STDP in (7). It is assumed that the plant consists of a dynamical system, such as a robot, which obeys a nonlinear ODE of the form,

    n1m1;, with , and . (8) x(t)f[x(t),u(t),t]x?X??u?U??

    Where, x and u denote the plant state and control inputs, respectively, and (?) denotes the derivative with

    respect to time. The model of the plant (8) may be unknown or imperfect, and may be improved upon over time by a system identification (ID) algorithm. The macroscopic behavioral goals of the plant can be expressed by the value function or cost-to-go,

    t1f?, with u(t) = π[x(t)]. (9) V[x(t)]L[x(t),u(t),x(t1)]kkkkkk?


    which represents the future performance of the plant accrued from the present, t, up to the final time, t, if kf

    subject to the present control law π(?). The Lagrangian represents instantaneous behavioral goals as L[?]

    a function of x and u. Since the control law is adapted over time, and an accurate plant model may not be available, the cost-to-go typically is unknown, and, in this research, is learned by the CMOS/memristor critic, SNN. By Bellman’s principle of optimality [50,62], at any time t the cost-to-go can be minimized ck

    online with respect to the present control law, based on observations of x(t), and predictions of x(t). kk+1

    In [63], Howard showed that if iterative approximations of the control law and optimal cost-to-go, denoted by π and V, respectively, are modified by a Policy-Improvement Routine and a Value-**Determination Operation, respectively, they eventually converge to their optimal counterparts π and V.


    ) corresponding to a The Policy-Improvement Routine states that, given a cost-to-go function V(?,π

    control law π, an improved control law π, can be obtained as follows, +1


    such that V[x(t), π] ? V[x(t), π], for any x(t). Furthermore, the sequence of functions C = {π | = 0, k+1kk*1, 2, …} converges to the optimal control law π. The Value-Determination Operation states that given a

    control law π(?), the cost-to-go can be updated according to the following rule,

     (11) ;;;;;;;;;;,(,(,(Vxt,?Lxt,ut,xtVxt,?1kkkk1k1

    *such that the sequence of functions V = {V | = 0, 1, 2, …} converges to V. In the proposed research, at

    every overarching software’s iteration cycle, denoted by , the above policy and value-iteration

    procedures will be used to in tandem to supervise training of the actor and critic,

    ? and , (12) ;;;;,(,(;;;;u(t)SNN,xt??xt,(,(V[x(t)]SNN,xt,??Vxt,?kakkkckk

    respectively, letting = Mk, where M is a positive constant chosen based on the desired frequency of the updates. It can be seen that, at +1, the desired mappings for SNN and SNN, provided by (10) and (11), ac

    are based on the actor and critic outputs at , and on the output of the plant model, x(t). Then, as shown k+1

    in the next section, an error metric of the decoded SNN output and the desired output will be minimized using a novel STDP training approach realizable in CMOS/memristors chips.

    3.3.2 CMOS/Memristor Training via STDP

    Virtually all neural network training algorithms, including those recently developed for SNNs in [52]-[53], rely on manipulating the neural network weights directly and, as such, are unsupported in STDP CMOS/memristor chips. Furthermore, they require knowledge of what are relevant measures of neuronal activity, and of how these measures influence the functional plasticity of the network based on a known input/output mapping. In ANN-based control systems, for example, this information is available from ANN equations and gradient-based algorithms that describe, mathematically, how the high-level response can be optimized with respect to individual neuron’s outputs and synaptic weights. However, in spiking

    CMOS/memristor chips this relationship is not available in closed form and, thus, may not be used for training. Thus, the proposed research will develop a new training approach that manipulates the neural activity in place of the synaptic weights, and that uses ADP policy- and value-iteration routines to determine what spike trains achieve the desired high-level response and plasticity. Consider a CMOS/memristor chip, denoted by SNN, that is comprised of multiple spiking neurons and STDP synapses, and that must represent a desired mapping between two vectors x and y:

    r1n1y(t) = g[x(t)] ? SNN[x(t), W(t)], where , , W(t) ? {w(t)}. (13) y??x??kkkkkijk

    n1r1The desired mapping g: can be assumed stationary during every iteration cycle , and is ???

    updated by the policy/value-iteration routine. Let W(t) = {w(t)} denote a matrix containing the SNN kijk

    synaptic weights at time t. Then, for proper values of W, if the SNN is provided with an observation of klllˆthe input, x, encoded as n spike train sequences X = {: κ = 1, …, N}, i = 1, …, n, over a time tiii,~

    llinterval [t, t+τ], the decoded output of the SNN must match the output y = g[x]. Where, since τ << kkll(t?t), X can be used to encode instantaneous values of x at any t. Then, the proposed training k+1kik


    algorithm will modify the synaptic weights of the SNN such that a figure of merit representing distance (e.g., see [64,65]) is optimized at the output space of the mapping in (13).

    , cannot be manipulated directly, the distance between y and Since the individual synaptic weights, wij

    the decoded SNN’s output will be optimized with respect to the parameters of a new, deterministic spike train model, as follows. From (13), we identify (tag) two sets of neurons x and y, referred to as input NN

    and output neurons (Fig. 3), where each neuron in the set provides the response for one element of x and y,

    respectively, in the form of a spike train. An additional set of q training neurons is then tagged (Fig. 2)

    and used to train the SNN by simultaneously recording their response and inducing them to fire on command with very high precision over a training time interval [t, t+T], τ < T << (t?t). The training kkk+1k

    neuron firings will be implemented using local programming voltages with controlled pulse width and height that are easily realizable in CMOS/memristor chips [1]. In order to induce STDP in a manner that will improve the SNN representation of the mapping in (13), the programming voltages will be delivered thιιˆbased on an optimized spiking sequence S ={: κ = 1, …, N}, i = 1, …, q, during the ι time interval tiii,~

    ιof the training algorithm, [t, t]. Existing spike models [52,53] cannot be used to generate S because ιι+1i

    they are stochastic. Thus, they do not allow for precise timing of pre- and post-synaptic firings, and as a result may induce undesirable changes in the synaptic weights by virtue of the STDP rule (7). The proposed research will utilize a new deterministic spike model obtained by optimizing a continuous signal modeled by a superposition of Gaussian radial basis functions (RBFs),

    Ni2ˆ, i = 1, …, q, , (14) t?[t,t]?[t,tT]s(t)(exp[(|||tt|)]~~~,,,1kkiiii?


    ˆwhere ω, β, and , for all i,κ, constitute the set of adjustable model parameters denoted by P. The ti,κi,κi,κ

    ˆspike times represent the centers, the biases β determine widths, and the weights ω determine the ti,κi,κi,κ

    heights of the basis functions. The continuous signal (14) will be integrated against a suitable averaging function in a leaky integrator, and then compared to a positive threshold, by means of a so-called IF sampler [51]. As a result, a precise pulse function with a width that depends on β, and an intensity that i,κ

    ˆdepends on ω, is generated at every time , during the interval [t, t]. The optimal RBF parameters ti,κi,κιι+1*ιP used to generate S will be determined by minimizing a measure of the distance between the decoded i

    SNN’s output, y, and the desired output, y, computed by the policy/value iteration routine. N

    3.4 Applications: Artificial Sensorimotor Learning and Adaptation

    A fundamental feature of biological sensorimotor systems is that they can transform sensory inputs into appropriate motor outputs, and when the sensor inputs change, the motor outputs can adapt. Currently, robotic sensors equipped to perform both sensing and motor tasks, such as mapping an environment to localize, map, and avoid an obstacle, must first stop and process the sensor data, and then execute the motion. Also, their ability to adapt movements based on the sensor data, as would be required by a variable environment in which some obstacles (e.g. other non-cooperative robots) move at similar speeds as the robot, is far removed from that observed in biological systems. In the proposed research, the

    neuromorphic design approach described in Section 3.3 will be used to develop and evaluate CMOS/memristor circuits for the adaptive control of mobile robots instrumented with wireless sensors.


Report this document

For any questions or suggestions please email