GAE 数据保存方式

2011-08-23 / Java Translation Google GAE

GAE 为应用提供了多种保存数据的方式，例如大家熟知的 Datastore，不过也有一些方式大家不太熟悉。本文将尽量全地列出所有 GAE 上保存数据的方式，并分别描述其优势与劣势，以便开发者更明晰地进行选择。

Datastore

大家最熟悉、使用最广泛的保存数据的方式自然是 Datastore 了。Datastore 是 GAE 的非关系型数据库，它提供了健壮、持久的存贮，也提供了最具灵活性性的数据存取操作。

优势

持久 - 保存在 datastore 里的数据是持久的。
读写 - 应用能够读写保存在 datastore 里的数据，并且 datastore 提供了事务机制以保证完整性。
全局一致 - 所有应用实例拥有同一个 datastore 视图。
灵活 - 通过查询（Query）与索引提供了多种获取数据的方式。

劣势

延迟 - datastore 是把数据保存在磁盘上，并提供了可靠保证，写操作必须确保数据已经被保存才返回（译注：异步 API 呢？），读操作不得不经常从磁盘取回数据（译注：底层如何进行缓存的呢？）

Memcache

Memcache 是大家最熟悉的“二级”存储机制，它缓存数据以避免重复进行代价高（译注：例如 CPU、API 耗时特别高的调用）的操作。Memcache 常作为其他 APIs 的缓存层，例如 datastore，缓存某种数据源生成的结果集。

优势

快 - 毫秒级别的存取操作（译注：1-4 毫秒，还是比较慢的-_-b）。
全局一致 - 所有应用实例拥有同一个 memcache 视图。Memcache 提供了操作数据的原子操作，保证完整性（译注：对象序列化）。

劣势

不可靠 - 数据可能随时被丢弃（译注：GAE 默认会尽可能长时间地持有缓存项）。

Blobstore

Blobstore 提供了便捷高效保存用户上传的大数据的方法。

优势

支持大文件 - 2GB（译注：没有在现在的官方文档里找到根据。另外，30 秒的请求限制能传 2G？-_-b）。
不用自己处理 blob。
提供了高性能的 blobs 存取，特别是对图片。
应用能够向读取本地文件一样读取 blob。

劣势

只读 - 应用不能自行创建 blobs，也不能修改已上传的 blobs。
需要要付费才能使用 blobstore。

实例内存

应用实例也能通过类成员（译注：Java 里是类字段）或 globals（译注：貌似是 Python 里的概念）在本地（译注：该实例 JVM）内存中缓存数据。这个方式提供了最快的存取速度，但却有一些不利的方面。

优势

快 - 能有多快有多快，因为数据保存在正在访问的同一个进程中。
方便 - 不需要 GAE APIs。
灵活 - 能够保存任何格式的数据，没有序列化/反序列化限制。

劣势

不可靠 - 实例可能在任何时间被启动或关闭，所以应用只应该用这种方式来进行缓存数据（译注：应用实例的 JVM 内存不是跨实例的哦，所以这种方式缓存的数据可能会有一致性问题，见下点）。
不全局一致 - 每个应用实例都拥有自己的运行环境，自己的本地变量。对某一实例中的变量修改并不会影响其他实例。
容量限制 - 实例的内存使用是有大小限制的。

本地文件

应用可以使用标准文件系统 APIs 来读取随应用一起上传的文件。

优势

快 - 读取本地文件只需要访问应用实例运行的机器磁盘，延迟与 Memcache 接近。
可靠 - 只要应用在服务，就能够读取到文件。
灵活 - 能够使用任何格式的文件。

劣势

只读 - 应用不能修改文件；这些文件的内容在部署的时候就已经确定了。
容量限制 - 单个文件最大 10MB，应用所有文件大小总和必须小于 150MB。

任务队列负荷（Payloads）

虽然这不是传统意义上的存贮，但任务队列是能够保存持有一定的数据的。

优势

快 - 任务运行时发送 payloads，获取这些数据不用调用额外的 API。
如果使用得到，可以不用保存任务数据（译注：？）。

劣势

意图单一 - payloads 是提供给任务队列任务的数据存贮。
容量限制 - 任务最大大小 10KB（译注：现在 Push 队列是 100KB，Pull 队列是 1MB），包含了负荷数据。

结论

GAE 提供了多种数据存储机制，我们得在应用场景下选择适当的存储方式。通常，较好的方案是组合使用这些数据保存方式，例如 datastore 与 memcache，本地文件与实例内存。

Nick Johnson 于 2010 年 11 月 3 日

注：

原文中的链接是 GAE/Python 版的，译文中的链接是 GAE/Java 版的
“优势”/“劣势”这两个词可能翻译不当....

----

附上原文（Storage options on App Engine）：

App Engine provides a number of ways for your app to store data. Some, such as the datastore, are well known, but others are less so, and all of them have different characteristics. This article is intended to enumerate the different options, and describe the pros and cons of each, so you can make more informed decisions about how to store your data.

Datastore

The best known, most widely used, and most versatile storage option is, of course, the datastore. The datastore is App Engine's non-relational database, and it provides robust, durable storage, as well as providing the most flexibility in how your data is stored, retrieved, and manipulated.

Pros

Durable - data stored in the datastore is permanent.
Read-write - apps can both read and write datastore data, and the datastore provides transaction mechanisms to enforce integrity.
Globally consistent - all instances of an app have the same view of the datastore.
Flexible - queries and indexing provide many ways to query and retrieve data.

Cons

Latency - because the datastore stores data on disk and provides reliability guarantees, writes need to wait until data is confirmed to be stored before returning, and reads often have to fetch data from disk.

Memcache

Memcache is the best known of the 'secondary' storage mechanisms. The memcache API provides a means for applications to optimistically cache data to avoid redoing expensive operations. Memcache is often used as a caching layer for other APIs, such as the datastore, or to cache generated results from any source.

Pros

Fast - memcache accesses typically take only a few milliseconds to complete.
Globally consistent - all instances of an app have the same view of memcache. Memcache provides atomic operations so applications can ensure the integrity of data stored in it.

Cons

Unreliable - data may be evicted from memcache at any time.

Blobstore

The blobstore offers a way to store and serve large amounts of user-uploaded data easily and efficiently.

Pros

Supports large files - up to 2GB per blob.
Removes the need for you to handle blobs yourself.
Provides mechanism for high-performance serving of blobs, particularly images.
Applications can read blob contents as they would local files.

Cons

Read-only - applications cannot modify uploaded blobs, or create new ones.
Billing must be enabled to use the blobstore.

Instance memory

Application instances may also cache data in local memory, through the use of globals or class members. This provides the ultimate in speed, but comes with several downsides.

Pros

Fast - literally as fast as it's possible to be, since data is stored in the same process that is accessing it.
Convenient - no API required, just store data in globals or class members.
Flexible - data can be stored in any format your program can manipulate. No serialization or deserialization is required.

Cons

Unreliable - instances can be started or stopped at any time, so applications should only use it to cache data.
Not globally consistent - each instance of your app has its own runtime environment, and hence its own local variables. Changes in one instance are not reflected in other instances.
Limited capacity - instances are limited in how much memory they can consume before they are terminated. This puts a hard limit on how much data you can cache in memory.

Local files

Applications may read from any file that was uploaded with the application and not marked as static content, using standard filesystem operations. This includes read-only datasets that the application may need.

Pros

Fast - reading local files requires only standard disk access on the machine the application instance is running on, so latency is almost as good as memcache.
Reliable - if your app is serving, your local files are always available
Flexible - you can use any format or mechanism for accessing local files that you wish.

Cons

Read-only - applications may not modify the contents of local files; they are fixed at deployment time.
Limited capacity - applications are limited to 10MB per file, and 150MB in total for the application.

Task queue payloads

While not storage in the traditional sense, task queue tasks can have payloads attached, which can obviate the need to use other storage systems.

Pros

Fast - payloads are sent to the task when it's run, so no additional API calls are required to fetch the data.
Used properly, allows you to avoid the need to store task data elsewhere.

Cons

Single-purpose - payloads are only useful as storage for data being provided to a task queue task.
Limited capacity - tasks are limited to 10KB in size, including their payload data.

Conclusion

App Engine provides more data storage mechanisms than is apparrent at first glance. All of them have different tradeoffs, so it's likely that one - or more - of them will suit your application well. Often, the ideal solution involves a combination, such as the datastore and memcache, or local files and instance memory.

03 November, 2010