US 20080244156 A1
In one embodiment, the invention provides a method for accessing memory. The method comprises sending memory transactions to a memory sub-system for a first processor to an intermediate second processor interposed on a communication path between the first processor and the memory sub-system; and controlling when the memory transactions are allowed to pass through the second processor to reach the memory sub-system.
1. A mobile device, comprising: a memory sub-system comprising at least one of a volatile and a non-volatile memory an applications processor comprising at least one CPU;
a baseband processor;
a memory sub-system coupled to the applications processor;
a display device wherein the applications processor comprises an arbitration mechanism for controlling access to the memory sub-system by the baseband and the at least one CPU, and an interface to the display device.
2. The mobile device of
The mobile device of
4. The mobile device of
5. The mobile device of
6. The mobile device of
7. The mobile device of
8. The mobile device of
9. The mobile device of
10. The mobile device of
11. The mobile device of
12. The mobile device of
The present application is continuation of U.S. application Ser. No. 10/405,600 filed Apr. 1, 2003.
This invention relates to memory access within a computer system. In particular, the invention relates to a method for accessing memory and to a computer system which implements the method.
In the last few years the use of wireless technologies has been very prevalent, in particular wireless cellular telephones. Cellular telephones deploy various types of radio frequency baseband and base station modem protocols such as Code Division Multiple Access (CDMA), Global Systems for Mobile Communication (GSM), General Pocket Radio Services (GPRS), (PDC) etc.
In general, a cellular telephone has a baseband chip which provides the computing needs for voice communications. This baseband chip usually includes a Central Processing Unit (CPU) a memory interface for interfacing non-volatile (FLASH type memories) or volatile Pseudo Static Random Memory (pSRAM), Static Random Access Memory (SRAM), Synchronous Dynamic Random Access Memory (SDRAM) type memories) memories, a Liquid Crystal Display (LCD) controller keyboard and audio devices or interfaces to and a mechanism to interface Radio Frequency (RF) components to establish a link to a base station.
Until recently, cellular telephones were used for voice only communications only, but with the Internet, various wireless carriers such as SKT, J-Phone, DoCoMo, Verizon, Vodaphone etc. have sought to provide data services to cellular telephone users in order to realize higher revenues per subscriber.
Such data services generally require a higher performance from the baseband chips. In some cases, in order to reduce the performance demands on the baseband chips, an application chip may also be provided to execute specific applications. The application chip and the baseband chip generally require a memory sub-system.
The memory sub-system of the cellular telephone, represents one of the highest cost components of the cellular telephone, and thus the manner in which access to the memory sub-system by the baseband chip and the application processor can have a significant effect on the cost and performance of the cellular telephone.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the invention.
Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.
The data services for which cellular telephones can be provisioned include location based services, real time or delayed news which can be world news or by geographic location, dating services which show pictures, streaming video from sporting events, two-dimensional (2D) and three-dimensional (3D) gaming, Moving Pictures Experts Group (MPEG4) or Joint Photographic Experts Group (JPEG) support or multimedia messaging with or without the use of cameras which may be built-in to the cellular telephone, etc. Each of these data services are enabled by applications, that require increased performance by the baseband chip. While it is feasible to incorporate the application functions in the baseband chip: it is also possible to separate the baseband modem function and at least some of the applications into two separate chips. The second chip would also include an interface to the memory sub-system of the cellular telephone.
The present invention provides a mechanism to share memory between the baseband device and application processors in a space and cost restricted cellular telephone environment. The invention provides a mechanism to deploy application processors in conjunction with baseband processors such that the memories required in a cellular telephone or other such system, e.g. Personal Digital Assistants (PDA's), two-way pagers, pocket PC's, notebook computers, etc. are shared by the baseband and the application processor resulting in lower power consumption and lower cost. In particular, the mechanisms required to share the memory and arbitration required for accessing the memory sub-system is such that a baseband without a WAIT or READY signal will have immediate access to the memory sub-system and the application processor if accessing memory wilt prematurely terminate or abort its access and retry later.
The data bus (DB) between the baseband processor and memories in a typical cellular telephone is 16 bits wide but can be 32 bits or more. The control typically has signals such as OE (output enable), WE (write enable) and BEs (byte enable/s). There are also multiple CSB (Chip selects) for each memory device the baseband interfaces to and there may also be CSB to address other devices on the same baseband bus, such as LCD, audio or digital camera devices etc., which are not shown in the figures.
Some baseband processors also support a WAIT signal which when asserted by another device would cause the baseband processor to wait for the current read or write transaction or access cycle on the baseband bus. Alternatively a READY signal could be used where the assertion of the READY signal indicates to the baseband processor that the data is available on the baseband bus (DB) to be read or the data on the baseband bus has been written or can now be written. It is also possible to use the WAIT signal in the same manner as the READY signal and READY signal in the same manner as the WAIT signal.
In a typical cellular telephone, the baseband 101 has control of the baseband bus and the baseband initiates transactions on this bus. The transaction types or access types includes normal reads and writes or burst reads and burst writes. In one embodiment, the burst access type transactions may have a clock sourced from the baseband 101 to the memory. It is possible to provide a mechanism for other devices to request control (i.e. multi-master functionality) of the baseband bus in order for other devices to initiate transactions on the baseband bus. While the mechanism for requesting such control using a request and grant type signaling protocol is not key to this invention, those skilled in the art will understand the various arbitration schemes and interconnects between the baseband processors and other devices wanting control of the baseband bus.
In cellular telephones where an application processor is used, the application processor also needs access to various memories (volatile and/or non-volatile). It is not practical to share the memory used by the baseband processor if the baseband has no multi-master functionality on the baseband bus and no WAIT or READY signaling available
The application processors 103, 201, 203 and 301 have internally the necessary mechanisms to execute applications. Further, the application processors 103, 201, 203 and 301 may also implement the various communications buffers mentioned above. The mechanisms required to execute applications includes an interface to the baseband processor, one or more CPUs and/or Digital Signal Processors (DSPs) including associated caches for instructions and data or a unified cache. The various processors additionally include one or more write buffers in order to enhance data write performance for the various processors, a memory controller to interface to the memory sub-system, a state machine for the internal bussing scheme and arbitration for the internal bus and memory sub-system access by the various devices, and state machine/s for the memory controller, etc. Additionally, the application processors 103, 201, 203 and 301 are capable of accelerating Java byte code execution or other platform independent intermediate language such as .NET. In one embodiment the application processor has a hardware accelerator for stack-based virtual machines such as Java or .NET.
In one embodiment the Java byte code accelerator is integrated within one of the CPU's and, when operational, shares the instruction and/or data caches or a unified cache. In another embodiment, the Java byte code accelerator is implemented as a stand-alone accelerator. The stand-alone accelerator also includes instruction and/or data caches or a unified cache. In one embodiment, not all Java byte codes are executed by the accelerator. The Java byte codes which are not executed by the accelerator, are executed in software by the CPU in the application processor or by the baseband processor. In order to request the baseband processor to execute some of the Java byte codes in software, it is necessary to provide data as to which Java byte codes to execute in software, the location in memory or the Java program counter (Java PC) for the java byte codes to execute in software, and other parameters to the baseband processor, e.g. for the byte code ‘new’ or ‘newarray’, at least the Java PC and the object reference needs to be provided. To accomplish this the application processors have a two-way communications buffer similar to the one described below or the same communication method described below would be used. A signal and/or a status bit in a register is also required to indicate that the accelerator is requesting the baseband or the CPU in the application processor to execute the Java byte codes. This signal may be polled or used as an interrupt. Further, the application processors have other peripherals or accelerators such as for MPEG4, digital cameras, LCD controllers. Dynamic Memory Access (DMA) engines, video scalars 2D/3D graphics accelerators, on chip frame buffers for graphics and/or video, IIC/S interfaces, Extensible Mark-up Language (XML) accelerators, communications ports such as Synchronous Data Link Control (SDLC)/High-Level Data Link Control (HDLC) etc.
The application processors may also be referred to as application accelerator chips. The application processors have various internal registers to configure the memory controller as well as other peripheral functions. Further, for the embodiment shown in
In order to establish full duplex communication between the baseband processor and the application processor both processors would have to manage these semaphores. One example of establishing communication between the baseband and application processors uses the following protocol. Upon starting (or hardware or software initiated reset) the communications buffers are cleared (or assumed to be cleared) and the semaphores are cleared or indicate that there is no valid data in the communications buffers. The communications buffers are divided into two segments, the first segment is for the baseband processor to write and the second segment is for the baseband processor to read. The application processor reads the first segment and writes into the second segment. Additionally, both processors have their own semaphore registers (alternatively the semaphores may be in memory or known locations in the communications buffers) where the baseband processor and the application processor can read and/or write their respective semaphores for control and messaging.
The processor receiving a message can only clear the semaphore written by the processor writing the message. Instead of clearing the semaphore, an acknowledgement flag message can also be stored in the semaphore register. Upon leaving the reset state or startup the respective processors write a query message into their respective communication buffer segments which they are allowed to write in, inquiring for the presence of the other processor and writing a message in their respective semaphore registers indicating that a valid message has been written into their respective communication buffers. One or both processors can initiate such a query.
Both processors after having read a message may clear or write an acknowledge message in the semaphore register written by the other processor indicating that the message has been read. Other examples of messages which may be written in the semaphore registers include ready, error, or retry messages, etc. Only after the semaphore has been cleared by the processor receiving the message, the processor sending the message is able to respond to the query by writing the communication buffer again or posting another message. The application processor requires at least one CSB (chip select) to enable the baseband to select writing or reading from the application processor 103. The exemplary embodiment of
The pass-through of the baseband accessing memory can be accomplished by having a set of multiplexers (as shown in
In one embodiment, a synchronizer for detecting a baseband processor access uses both rising and failing edges of the clock within the application processor. This facilitates faster detection of the baseband access. The same clock is used by the memory controller of the application processor. Alternatively, a clock stepped up in frequency by a phase locked loop (PLL) or stepped down in frequency by clock divider circuitry may be used. It should be noted that truncating a volatile memory device's access could result in a loss of data in the memory core for memories based on dynamic ram technology such as pseudo static rams. The six transistor (6T) based SRAMS also have this issue when writing the SRAM and a write cycle is terminated. In a 6T SRAM, if a write cycle is terminated while the row and/or column decoders have not finished decoding the address presented, the data may be lost or written to an unknown location in memory.
To avoid losing or corrupting data in an unknown location, in one embodiment, the application processor 103, 201, 203, 301 will assert the WE signal to the memory sub-system after the decoders in the memory have settled or a full access time has been met. After this decoder settling time, the location where data may get corrupted is known and is the address presented to the memory for writing. The memory controller in the application processors described above are capable of producing the necessary types of memory cycles including burst and synchronous cycles. In one embodiment the application processor 103, 201, 203, or 301 will retain the address and data of the location that was corrupted and rewrite data to that address once the baseband processor is finished with its access. While this technique is possible with 6 transistor (6T) SRAM due to its structure, pseudo static srams (pSRAM) have issues of losing data in multiple locations due to prematurely terminated cycles of the types shown in
As with DRAMs, pSRAMS also need to have periodic refresh which is produced internally. Typically, pSRAMS operate faster internally than the specified or advertised access speeds. In a typical pSRAM, there is a mechanism for producing a refresh request using a timer, where a row of data will be read and restored. There is also a row address counter which may increment or decrement to indicate which row to refresh. For each refresh request, a row of data is refreshed in this manner until all rows are refreshed. This refresh mechanism runs constantly so that all the rows in the pSRAM are regularly getting refreshed. If the refresh mechanism were to stop, the pSRAM would lose data after some finite time. It may be that a refresh cycle has just started and an external device wants access to the pSRAM. The pSRAM typically finishes the refresh cycle and then allows access to the external device. The external device would see a longer access time (approximately twice as long) since it has to allow the refresh cycle to finish. Since the refresh request is asynchronously produced internal to the pSRAM, it is not predictable as to when the refresh cycle occurs and so the access time specified for the pSRAM includes the refresh cycle time. When an internal refresh request coincides with an external access, there is logic to arbitrate and manage the access as well as the refresh.
Typically, pSRAMS and SDRAMS go through a address decode, internal row, access and pre-charge cycle, the pre-charge cycle replenishing the data since the access usually discharges the data held in the core for the particular row being accessed. One of the requirements of such pSRAMS is that at least the address has to be stable during the entire memory access cycle. If the address is not stable for the entire cycle, it is possible to lose a whole or partial row of data. This is because once a row access has started and if the address for the row changes before the pre-charge cycle of the DRAM cells, a new access will ensue for a different row and so the data for the previous row will be lost. In one embodiment, the pSRAM waits until the pre-charge cycle has completed before starting the access for the new address. In one embodiment it would be required that the pSRAM would have this characteristic for both address and chip selects.
If the application processor is accessing pSRAM when the baseband processor begins a pass-through access and a refresh request becomes pending, the pSRAM has to cope with three devices requiring access to the pSRAM resulting in corrupted data. If each of the three accesses were to happen at the same time and allowed to finish one after another, the access time for the one of the devices would be more than two times longer than the internal access time. This would cause considerable slow down in performance.
To overcome this, in one embodiment, a mechanism that disables the internal refresh request of the pSRAM and generates an external refresh request signal indicating when to refresh, shown in
In one embodiment, when the application processor is accessing the memory sub-system and the baseband processor begins its access to the memory sub-system before the application processor has completed its access, the application processor asserts the WAIT signal to the baseband processor, or de-asserts the READY signal to the baseband processor depending on which type of signal is supported by the baseband processor. These signals would stall the baseband processor while the application processor completes its access. In the case of the READY signal being asserted, the baseband processor may expect valid data for its read or write transaction completed. In this case, the application processor, after having finished its access, would keep the WAIT signal asserted or the READY signal de-asserted. The application processor would then enable the multiplexers to source the baseband addresses; data (in case of a baseband write access, other wise the data is read), and control lines to the memory subsystem, thereby initiating an access on behalf of the baseband processor, for a time sufficient to make a full access to the memory sub-system. Thereafter, the application processor would de-assert WAIT signal or assert READY signal to the baseband processor. Thus, when the WAIT signal is de-asserted or the READY signal is asserted, the baseband processor would get valid data.
The application processors 103, 201, 203 and 301 include logic to detect the baseband processor requiring access while the application processors are accessing the memory sub-system. The application processors also include state machines to at least partially manage the memory access on behalf of the baseband.
In one embodiment, the memory sub-system has SDRAM with multiple internal banks, in addition to other types of memory. If the baseband processor presents asynchronous type timing, and the SDRAM memory expects synchronous timing along with a clock, the application processor detects the baseband access and synchronous to the application processors clock, accesses the SDRAM to accomplish a read, write, burst read or burst write into the SDRAM. The clocks for the baseband and application processor may be synchronous or asynchronous to each other.
In one embodiment, to avoid the loss of data in the SDRAM in a similar way to pSRAMs as explained above, when the application processor is accessing the memory sub-system and the baseband processor initiates an access to the memory sub-system (thus requiring a pass-through), and the application processor has not yet finished, the application processors 103, 20, 203 and 301 would access one or more of the SDRAM banks agreed upon or allocated to it, but not all the banks in the SDRAM. Additionally, the baseband processor would only access the banks which are not accessed by the application processor. This mode of operation would only be observed while the application processor is running; otherwise the baseband can access all the banks at any time. With this mode of operation, the SDRAM controller in the application processors 103, 201, 203 and 301 can leave at least one SDRAM banks open for the application processor when the baseband requires pass-through, and simply open and/or close the banks required for the baseband processor during and/or after pass-through.
In another embodiment, the application processor 203 would have a split bus available to the memory sub-system where the memories are in two groups as illustrated in
To avoid any conflict on the data bus, the application processor 203 would de-assert the GE going to group 205 memories. In one embodiment, the application processor 203 would produce an internal stall or wait in its memory controller state machine while the baseband processor is accessing group 206. The address (AM1) to group 205 memories would remain asserted and stable during this time and while finishing the access thereafter CSM1 and/or WE, BLE: BHE going to groups 205 may be treated in the same manner as the addresses (AM1).
In another embodiment the application processor would latch the addresses (AM1) going to group 205 and keep them latched while de-asserting OE to group 205 memories to avoid data bus contention. The application processor 203 memory controller state machine would restart the memory access either from the beginning or part way through the memory controller state machine once the baseband processor is finished with its access. In order to accomplish this, the synchronizers mentioned above would be used to detect the baseband processor access whereby any CSB from the baseband processor targeted to group 206 would be synchronized to produce an indication of pass-through. If the application processor is reading data from the memories and the baseband processor attempts to write data to the memories there would be bus contention for a short period until the memories have their outputs disabled since the application processor would begin driving the data bus to the memories due to pass-through. The contention happens when the application processor is reading from one memory and the baseband processor attempts to read from or write to another memory in pass-through mode. In one embodiment, the application processor 103, 201, 203 and 301 would not drive the data bus toward the memories for a short period at the beginning of the baseband access and thus avoid any bus contention. Additionally the OE signals to all memories would be de-asserted for a similar short period even though the baseband processor may have its OE asserted. This short period is made programmable and would have a default value after reset. It should be noted that the memories in groups 205 and 206 may comprise any mix of volatile and/or non-volatile memories.
In another embodiment, the application processor 301 has two complete and independent buses to two groups of memories 302 and 303 as shown in
As with application processors 103, 201 and 203, a mechanism to avoid bus contention may be incorporated within application processor 301. The multiplexers depicted in application processor 501, are used for pass-through Multiplexers 502, 503, 504, 505 and 506 may have separate select signals for pass-through operation, such that they can be controlled separately or through a common select. Multiplexers 505, 506 are used to enable passing data read from the memory to the baseband processor when the baseband processor is performing a read access to the memory sub-system, otherwise the multiplexers 505, 506 are used to read data internal to the application processor 501. In one embodiment at least one of the multiplexers shown in 501 would also be in application processors 103, 201, 203 and 301. In one embodiment the application processors 103, 201, 203 and/or 301 and at least some or all of the memory sub-system/s are in a stack package where the application processor and memory dies are stacked in a package. In another embodiment of the stack package, some dies may be placed side by side in any one or more stack layer. In another embodiment, one of the stack layers is an interposer made from silicon to facilitate the routing of signals, buses and/or power to dies mounted on the interposer or to other dies in the stack or to the substrate of the package. This kind of packaging would provide space saving application processing to cellular telephones and other devices.
Since the functionality of the baseband processor is to provide communication with a base station for voice and/or data, the performance requirement for the baseband is not very high. For this reason there are many baseband processors in the market where the processor in the baseband does not have any cache memory. For baseband processors with no cache memory and a WAIT or READY signal,
In another embodiment, the frame buffer is incorporated in the application processor 701 and the data is sourced directly to the display device from 701. For both these embodiments it is necessary to have an indication of the number of elements in direction X of the display device and the number of elements in the Y direction of the display device and where in the memory the frame buffer resides.
Additionally application processor 701 is capable of receiving an updated image command from the baseband processor and is capable of loading the image on the display device. The application processor generates at least some of the addresses required to read the frame buffer from the memory internally. In one embodiment, counters would be maintained for X and Y to generate the memory addresses. In another embodiment there would be an indication that the application processor 701 is busy loading the image data to the display device and/or an indication that the loading of the image is done. In another embodiment if the baseband processor attempts to access the memory subsystem while the display device is being loaded with an image, the WAIT signal would be asserted or the READY signal de-asserted until the image is loaded. The image loading may be stopped in the middle to allow the baseband processor to pass-through.
while various inventions have been shown referencing application processors 103, 201, 203, 301, 501, 601 and 701, one skilled in the art would realize that all the invention disclosed here are applicable to all the application processors.
Although the present invention has been described with reference to specific exemplary embodiments, it will be evident that the various modification and changes can be made to these embodiments without departing from the broader spirit of the invention as set forth in the claims. Accordingly, the specification and drawings are to be regarded in an illustrative sense rather than in a restrictive sense.