Iterate through pagination in the Rest API
1 Preface
About 4 months ago, icewind1991 created an exciting PR that adding Stream/Iterator
based versions of methods with paginated results, which makes enpoints in Rspotify more much ergonomic to use, and Mario completed this PR.
In order to know what this PR brought to us, we have to go back to the orignal story, the paginated results in Spotify’s Rest API.
2 Orignal Story
Taking the artist_albums
as example, it gets Spotify catalog information about an artist’s albums.
The HTTP response body for this endpoint contains an array of simplified album object wrapped in a paging object and use limit
field to control the number of album objects to return and offset
field to set the index of the first album to return.
So designed endpoint in Rspotify
looks like this:
/// Paging object
///
/// [Reference](https://developer.spotify.com/documentation/web-api/reference/#object-pagingobject)
#[derive(Clone, Debug, Serialize, Deserialize, PartialEq, Eq)]
pub struct Page<T> {
pub href: String,
pub items: Vec<T>,
pub limit: u32,
pub next: Option<String>,
pub offset: u32,
pub previous: Option<String>,
pub total: u32,
}
/// Get Spotify catalog information about an artist's albums.
///
/// Parameters:
/// - artist_id - the artist ID, URI or URL
/// - album_type - 'album', 'single', 'appears_on', 'compilation'
/// - market - limit the response to one particular country.
/// - limit - the number of albums to return
/// - offset - the index of the first album to return
/// [Reference](https://developer.spotify.com/documentation/web-api/reference/#endpoint-get-an-artists-albums)
pub fn artist_albums<'a>(
&'a self,
artist_id: &'a ArtistId,
album_type: Option<&'a AlbumType>,
market: Option<&'a Market>,
) -> ClientResult<Page<SimplifiedAlbum>>;
Supposing that you fetched the first page of an artist’s ablums, then you would to get the data of the next page, you have to parse a URL:
{
"next": "https://api.spotify.com/v1/browse/categories?offset=2&limit=20"
}
You have to parse the URL and extract limit
and offset
parameters, and recall the artist_albums
endpoint with setting limit
to 20 and offset
to 2.
We have to manually fetch the data again and again until all datas have been consumed. It is not elegant, but works.
3 Iterator Story
Since we have the basic knowledge about the background, let’s jump to the iterator version of pagination endpoints.
First of all, the iterator pattern allows us to perform some tasks on a sequence of items in turn. An iterator is responsible for the logic of itreating over each item and determining when the sequence has finished.
If you want to know about about Iterator
, Jon Gjengset has covered a brilliant tutorial to demonstrate Iterators
in Rust.
All iterators implement a trait named Iterator
that is defined in the standard library. The definition of the trait looks like this:
pub trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
// methods with default implementations elided
}
By implementing the Iterator
trait on our own types, we could have iterators that do anything we want. Then working mechanism we want to iterate over paginated result will look like this:
Now let’s dive deep into the code, we need to implement Iterator
for our own types, the pseudocode looks like:
impl<T> Iterator for PageIterator<Request>
{
type Item = ClientResult<Page<T>>;
fn next(&mut self) -> Option<Self::Item> {
match call endpoints with offset and limit {
Ok(page) if page.items.is_empty() => {
we are done here
None
}
Ok(page) => {
offset += page.items.len() as u32;
Some(Ok(page))
}
Err(e) => Some(Err(e)),
}
}
}
In order to iterate paginated result from different endpoints, we need a generic type to represent different endpoints. The
Fn
trait comes to our mind, the function pointer that points to code, not data.
Then the next version of pseudocode looks like:
impl<T, Request> Iterator for PageIterator<Request>
where
Request: Fn(u32, u32) -> ClientResult<Page<T>>,
{
type Item = ClientResult<Page<T>>;
fn next(&mut self) -> Option<Self::Item> {
match (function_pointer)(offset and limit) {
Ok(page) if page.items.is_empty() => {
we are done here
None
}
Ok(page) => {
offset += page.items.len() as u32;
Some(Ok(page))
}
Err(e) => Some(Err(e)),
}
}
}
Now, our iterator story has iterated to the end, the next item is that current full version code is here, check it if you are interested in :)
4 Stream Story
Are we done? Not yet. Let’s move our eyes to stream story.
The stream story is mostly similar with iterator story, except that iterator is synchronous, stream is asynchronous.
The Stream
trait can yield multiple values before completing, similiar
to the Iterator
trait.
trait Stream {
/// The type of the value yielded by the stream.
type Item;
/// Attempt to resolve the next item in the stream.
/// Returns `Poll::Pending` if not ready, `Poll::Ready(Some(x))` if a value
/// is ready, and `Poll::Ready(None)` if the stream has completed.
fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>)
-> Poll<Option<Self::Item>>;
}
Since we have already known the iterator
, let make the stream story short. We leverage the
async-stream
for using macro as Syntactic sugar to avoid clumsy type declaration and notation.
We use stream!
macro to generate an anonymous type implementing the Stream
trait, and the Item
associated type is the type of the values yielded from the stream, which is ClientResult<T>
in this case.
The stream full version is shorter and clearer:
/// This is used to handle paginated requests automatically.
pub fn paginate<T, Fut, Request>(
req: Request,
page_size: u32,
) -> impl Stream<Item = ClientResult<T>>
where
T: Unpin,
Fut: Future<Output = ClientResult<Page<T>>>,
Request: Fn(u32, u32) -> Fut,
{
use async_stream::stream;
let mut offset = 0;
stream! {
loop {
let page = req(page_size, offset).await?;
offset += page.items.len() as u32;
for item in page.items {
yield Ok(item);
}
if page.next.is_none() {
break;
}
}
}
}
5 Appendix
Whew! It took more than I expected. Since iterators is the Rust features inspired by functional programming language ideas, which contributes to Rust’s capability to clearly express high-level ideas at low-level performance.
It’s good to leverage iterators wherever possible, now we can be thrilled to say that all endpoints don’t need to manuallly loop over anymore, they are all iterable and rusty.
Thanks Mario and icewind1991 again for their works :)