Content Layer #946

matthewp · 2024-06-07T20:44:31Z

Accepted Date: 2024-06-07
Reference Issues/Discussions: Content Layer #935
Author: @FredKSchott
Champion(s):
Implementation PR:

Summary

Explore a new and improved content layer for Astro.
Improve the current experience of loading/defining data into content collections
Improve the current experience of querying data from content collections

Background & Motivation

Content Collections are a key primitive that brings people to Astro. Content Collections make it easy to work with local content (MD, MDX, Markdoc, etc) inside of your Astro project. They give you structure (src/content/[collection-name]/*), schema validation for frontmatter, and querying APIs.

Goals

Explore a new and improved content layer for Astro.
Improve the current experience of loading/defining data into content collections
Improve the current experience of querying data from content collections

Example

// A folder full of Markdown (MDX) files
defineCollection({
    name: 'blog',
    data: glob('./content/blog/*.mdx'),
  });
// A single file containing an array of objects
defineCollection({
    name: 'authors',
    data: file('./content/authors.json'),
});
// Remote data, loaded with a custom npm package
defineCollection({
    name: 'articles',
    data: storyblokLoader({startsWith: 'articles/posts'}),
});
// Custom data, loaded from anywhere you'd like
defineCollection({
    name: 'my-custom-collection',
    data: () => { /* ... */ },
});

The text was updated successfully, but these errors were encountered:

florian-lefebvre · 2024-06-08T06:30:36Z

I think it would be nice to support singletons, as suggested in #449 and #806. Many CMSs support those (eg. Keystatic, the content layer seems like it would fit well)

lloydjatkinson · 2024-06-12T15:53:18Z

Just wondering how this will work (if at all) for components, like in MDX? So the use case I'm thinking is when a CMS is used, they typically have a WYSIWYG rich text editor where custom components can be inserted as shown here: https://www.storyblok.com/tp/create-custom-components-in-storyblok-and-astro

Will this new API support this concept?

xavdid · 2024-06-12T17:26:59Z

This is cool!

I do a sort-of version of this for my review site (david.reviews), where all the data is stored in Airtable.

I load objects from the Airtable API based on a schema and cache responses in JSON locally (since loading and paging takes a while and is too slow for local development). It sort of feels like what an API-backed custom content collection could look like.

The whole thing is strongly typed, which is cool! I wrote about it in more detail here: https://xavd.id/blog/post/static-review-site-with-airtable/

JacobNWolf · 2024-06-12T22:20:05Z

Is this enacted somewhere yet/available as an experimental feature or on a beta version?

I've been trying to build collections from WordPress content fetched via GraphQL queries and I think this'll fix exactly what I want.

rambleraptor · 2024-06-12T22:37:51Z

How would this work with dependency files like images?

Right now, the src/content folder contains MDX files and can contain image files referenced from those MDX files.

I'm personally excited to separate my content folder and astro theme into separate GitHub repos, so that's where my my perspective comes from.

ashhitch · 2024-06-13T09:36:12Z

Coming from Gatsby and loving the data layer.

A few points that I always found hard:

Integrations with providers not maintained by the core team
Preview draft CMS content
Linked content, e.g blog listing, or cross linking component not updating when new Article added (more todo with incremental updates)

brian-montgomery · 2024-06-13T13:47:16Z

I like the idea of being able to store content in a single file. I explored adding CSV support to the existing content collection APIs, and found there were too many assumptions around a directory of files. Providing a higher-level of abstraction around how the data is accessed/retrieved while keeping the simple APIs and excellent typing would be ideal.

In essence, separating the client usage (the query APIs, generated schemas, metadata (frontmatter) and representation) from how the source is retrieved (single file, flat-file directory structure, database, remote service) would be really helpful.

ascorbic · 2024-06-13T15:02:45Z

Is this enacted somewhere yet/available as an experimental feature or on a beta version?

No, still at the planning stage right now. We'll share experimental builds once they're available.

NuroDev · 2024-06-13T16:17:27Z

Love the initial look of this.

Will is still be possible to include a (zod) schema of some kind in the defineCollection function so if you are fetching data from a remote source it can be validated before being published?

ascorbic · 2024-06-13T16:29:21Z

Love the initial look of this.

Will is still be possible to include a (zod) schema of some kind in the defineCollection function so if you are fetching data from a remote source it can be validated before being published?

Yes, we'd want that to generate types too. Hopefully content sources could define this automatically in some scenarios.

stefanprobst · 2024-06-14T18:34:01Z

my wishlist for astro content layer:

support for more complex collection schemas, for example multiple richtext/mdx fields per collection entry (e.g. a "summary" and "content" field). specifically i would love support for everything i can express with keystatic's collection schema.
support for singletons (single-entry collections)
draft mode / preview mode (should work with "hybrid" rendering, and should work with node.js adapter)

louiss0 · 2024-06-14T19:39:35Z

Is Zod still going to be the official validator for Content Collections or can people use something else.

reasonadmin · 2024-06-19T16:24:42Z

Love this concept!

In the current implementation entries from a collection have a render function (and an undocumented function in Astro addContentEntryType for adding new types).

Is there a spec yet for what the data property has to be? It would be nice to remove the limitation of only having a single render function that has to return a string of fully rendered contents. Being able to manipulate data here might bring in interesting UnifiedJS style options as any data can be converted into an AST and then any Unified plugin could run against it.

For example, you could load data from a CSV into a MDAST with a table structure - and then it would render the CSV as if it had been created as a table in markdown (only the data is much more manageable in the CSV for large data sets).

ascorbic · 2024-06-20T08:36:44Z

@reasonadmin I think support for implementing custom renderers is a must, and it would then make sense to allow these to be more flexible than a single render() function. Perhaps allow it to accept arguments, which could specify different fields to render, or render options. For different filetypes I think it would make sense for them to be implemented as separate integrations, so something like:

defineCollection({
    name: 'accounts',
    data: csvLoader("data/accounts/**/*.csv"),
});

..and then:

import { getEntry, getEntries } from 'astro:content';

const account = await getEntry('accounts', '2024-20');

// Access the parsed CSV data
const { rows } = account;

// Render a table. Pass options to render, or maybe make them props for `<Content />`
const { Content } = account.render({ /* typesafe filter, sort options etc */ });

reasonadmin · 2024-06-20T11:18:36Z

@ascorbic Is it possible to join these two ideas together:

#763

For example:

defineCollection({
  name: 'my-data',
  data: async (db: DB, watcher: FileSystemWatcher) => {
    const hash = sha256(content);
    await db.addContent(hash, content);
    // Only build updates to files if the hash is different from the one in the DB
    // DB is accessible between static builds for incremental SSG rendering
  },
})

How about something like this for rendering:

defineCollection({
    name: 'accounts',
    data: Astro.autoDBWrapper(   [{page: 1},{page: 2}]   ),
    render: async(entry, options) => {
      //Return an object that has minimum fields {metadata : Object , content : String}
      //As this content will not be directly loaded via an import statement (e.g. const {Content} = import ('myfile.md');
      //We don't need a loader and therefore don't need to Stringify things as JS for the loader?
    }
});

/* -- */

import { getEntry } from 'astro:content';
const { entry, render } = await getEntry('accounts', '2024-20');

const {metadata, content} = render(entry, { 'fileType: 'HTML' });
const {metadata, content} = render(entry, { 'fileType: 'XML' });

ArmandPhilippot · 2024-06-22T16:28:13Z

In addition to the proposed singletons, here my thoughts.

Rich query API

I like the ideas proposed in #574 or #518.

Here another format possible:

const welcomePost = await queryEntry('posts', {where: {slug: "welcome", lang: "en"}});

const frenchPosts = await queryCollection("posts", {
  first: 10,
  where: { lang: "fr" },
  sort: {key: "publicationDate", order: "DESC"},
});

Sub-collections / Nested data-types

Idea

It would be nice to allow "sub-collections". The idea would be to improve content organization and to share some commons data-types between a same collection while sub-collections could have additional data-types.

Then we could:

get the collection with an union of the different sub-collections types with await getCollection('collectionName')
get a sub-collection directly with await getSubCollection('collectionName', 'subCollectionName').

Example

Maybe an example will help describe my proposal, so imagine a collection named posts. A post could have different "formats", for example: "changelog" (new features on the website), "tutorials" (some tutorials about software) and "thoughts" (for everything else).

All the formats share common data-types:

a publication date
a status (draft/published)

Then each format could have additional data-types:

A "changelog" does not need anything else,
A "tutorial" could have those additional data-types:
- software
- difficulty
A "thought" could have:
- main subject/category

The collection could be defined as follow:

const posts = defineCollection({
  type: 'content',
  schema: z.object({
    isDraft: z.boolean(),
    publicationDate: z.string().transform((str) => new Date(str)),
  }),
  subCollections: {
    changelog: {
      type: 'content',
    },
    thought: {
      type: 'content',
      schema: z.object({
        subject: z.string()
      }),
    },
    tutorial: {
      type: 'content',
      schema: z.object({
        difficulty: z.enum(["easy", "medium", "hard"]),
        software: z.string(),
      }),
    },
  }
});

export const collections = {
	posts
}

The generated types would be:

type Changelog = {
	isDraft: boolean;
	publicationDate: Date;
	subCollection: "changelog";
}

type Tutorial = {
	isDraft: boolean;
	publicationDate: Date;
	subCollection: "tutorial";
	software: string;
	difficulty: "easy" | "medium" | "hard"
}

type Thought = {
	isDraft: boolean;
	publicationDate: Date;
	subCollection: "thought";
	subject: string;	
}

type Post = Changelog | Tutorial | Thought;

When validating data-types, an error is thrown with the following examples:

when any sub-collection is missing one of the common data-types (isDraft and/or publicationDate)
if a "changelog" has unexpected keys like software or subject
if a "thought" is missing the subject key

If the subCollection key is not defined, Astro will behave like it does currently.

Then it would be possible to get all the posts (with mixed formats) using await getCollection("posts") to display them in a page (like a blog page). The consumer could then use a different presentation depending on the "format".

It would also be possible to query a sub-collection directly with, for example, await getSubCollection("posts, "tutorial") and display only the tutorials.

For the organization, I don't know what would be best:

a flat structure, the subCollection key in the frontmatter of Markdown files will be used to validate the current file
a nested structure with sub-collection name as subdirectory name (and an optional subCollection key)
or allow both but forbid mixed structure (the choice would be to the consumer and the compiler will check for files when there are no matching subdirectories)

src/content/
└── posts/
    ├── changelog/
    ├── thought/
    └── tutorial/

wassfila · 2024-06-23T07:24:51Z

I wonder if this will allow to render pure .md with an Astro component (my use case is pure markdown, e.g. existing github repo, and not .mdx that is not as tolerant as md parser). e.g. I have Heading.astro that takes props, and Code.astro,... if so how would that look like ?
If so I wonder then how would this scale to "rich cms content" like renderer, which might require recursive nodes like the AST provided by md parsers.
for info here's a link to my custom markdown renderer with Astro, https://github.com/MicroWebStacks/astro-big-doc/blob/main/src/components/markdown/AstroMarkdown.astro
I wish that becomes possible with content 2.0, as it's the only way to make md a true headless cms.

ascorbic · 2024-06-26T16:49:07Z

We have a preview release available, so I'd love if you can give it a try and share your feedback. Full details are in the PR, including changes in the API: withastro/astro#11334

lloydjatkinson · 2024-06-26T19:48:30Z

We have a preview release available, so I'd love if you can give it a try and share your feedback. Full details are in the PR, including changes in the API: withastro/astro#11334

Will the new Content Layer support loading custom components?

ematipico · 2024-06-27T08:00:31Z

We have a preview release available, so I'd love if you can give it a try and share your feedback. Full details are in the PR, including changes in the API: withastro/astro#11334

Will the new Content Layer support loading custom components?

What would they look like? Genuinely asking. We're considering various ways to render a new content collection, and components aren't off the table, but how would you run the schema against them?

lloydjatkinson · 2024-06-27T15:35:27Z

We have a preview release available, so I'd love if you can give it a try and share your feedback. Full details are in the PR, including changes in the API: withastro/astro#11334

Will the new Content Layer support loading custom components?

What would they look like? Genuinely asking. We're considering various ways to render a new content collection, and components aren't off the table, but how would you run the schema against them?

Honestly I don't know, but this is surely an important thing to think about as in the Storyblok example - content writers are likely to want to use them somehow. If a content loader doesn't support that, would that be a feature regression from what is avaliable this way? https://www.storyblok.com/tp/create-custom-components-in-storyblok-and-astro

matthewp · 2024-06-27T15:52:31Z

@lloydjatkinson you'll still be able to use mdx with this new API, so yes you can still use components.

stefanprobst · 2024-06-28T08:54:14Z

We have a preview release available, so I'd love if you can give it a try and share your feedback. Full details are in the PR, including changes in the API: withastro/astro#11334

will this support multiple markdown/richtext fields per collection? for example, would it be possible to express something like:

const schema = object({
  title: string(),
  sections: array(object({
    title: string(),
    content: mdx(),
  }))
})

jlengstorf · 2024-07-05T17:03:47Z

One thing that comes to mind that was a huge pain in the ass for Gatsby (and any other content aggregation abstraction I've worked with) is relationships between content.

A major motivator for using a content abstraction like this is centralizing data access. However, if there's no way to define relationships between the data, then teams are still defaulting to creating userland data merging and manipulation, which is (in my experience, at least) one of the key pain points that leads to wanting an aggregation layer in the first place.

I may have missed it in other discussion, but is there any plan or initial thoughts around how this would be managed?

Example Use Case

For example:

Content team is writing blogs in Contentful
Blog likes are a bespoke solution using Astro DB

Idea 1: Explicit API for creating relationships

I don't know that I like this API, but for some pseudo-code to show how this might work:

import { defineCollection, file, z, createRelationship } from 'astro:content';
import { contentful, contentfulSchema } from '../loaders/contentful';
import { likes, likesSchema } from '../loaders/likes';

const blog = defineCollection({
	type: "experimental_data",
	loader: contentful(/* some config */),
	schema: z.object({
		...contentfulSchema,
		likes: createRelationship({
			collection: 'likes',
			type: z.number(), // <-- (optional) type for the linked data — could be inferred?
			key: 'id', // <-- the Contentful schema field to link on
			foreignKey: 'blogId', // <-- the 'likes' schema field to link on
			resolver: (entry) => entry.count, // <-- (optional) how to link data in (full entry if omitted),
		}),
	}),
});

const likes = defineCollection({
	type: "experimental_data",
	loader: likes(),
	schema: z.object({
		...likesSchema,
		blog: createRelationship({ collection: 'likes', key: 'blog_id', foreignKey: 'id' }),
	}),
});

export const collections = { blog, likes };

Idea 2: Joins and an optional projection API

I like the way GraphQL and Sanity's GROQ allow you to dig into a referenced entry and get just the fields you need. Maybe something like that is possible?

import { defineCollection, file, z, reference } from 'astro:content';
import { contentful, contentfulSchema } from '../loaders/contentful';
import { comments, commentsSchema } from '../loaders/comments';
import { likes, likesSchema } from '../loaders/likes';

const blog = defineCollection({
	type: "experimental_data",
	loader: contentful(/* some config */),
	references: {
		// an optional resolver allows for custom projections of linked content
		comments: reference(contentfulSchema.id, commentsSchema.blogId, (comment) => ({
			author: comment.author.displayName,
			content: comment.content,
			date: new Date(comment.date).toLocaleString(),
		})),

		// returning a single value is also possible
		likes: reference(contentfulSchema.id, likesSchema.blogId, (entry) => entry.count),
	},
});

const comments = defineCollection({
	type: "experimental_data",
	loader: comments(),
	references: {
		// by default the full blog post entry is added as the `blog` key value
		blog: reference(commentsSchema.blogId, contentfulSchema.id),
	},
});

const likes = defineCollection({
	type: "experimental_data",
	loader: likes(),
	references: {
		blog: reference(likesSchema.blogId, contentfulSchema.id),
	},
});

export const collections = { blog, comments, likes };

I don't have strong opinions about the specifics, but I do think it's really important to talk through how cross-collection relationships fit into the content layer. Dropping this more to start the conversation than to try and assert a "right way" to do anything.

(Also, let me know if I should move this to a separate discussion.)

julrich · 2024-07-18T07:04:56Z

Really excited to see this, as a former Gatsby resolver addict 😅
And 👍 to everything @jlengstorf wrote! Those were topics we dealt with quite a bit.
Someone in the thread also mentioned images / assets, which were also a major topic in Gatsby content ingestion for us!

For background: what worked quite well for us with Gatsby was creating our own unifying content schema, based on our components and page types, and then just using resolvers to massage the incoming data into the format defined by those (e.g. Appearances pulled from Contentful):
https://github.com/kickstartDS/gatsby-theme-kickstartDS/blob/master/gatsby-theme-kickstartds/gatsby-node.js#L388
https://github.com/kickstartDS/gatsby-theme-kickstartDS/blob/master/gatsby-theme-kickstartds/src/schema/types/KickstartDsAppearancePageType.graphql
https://github.com/kickstartDS/gatsby-theme-kickstartDS/blob/master/gatsby-theme-kickstartds/create/createAppearanceList.js
https://github.com/kickstartDS/gatsby-theme-kickstartDS/blob/master/gatsby-transformer-kickstartds-contentful/gatsby-node.js

This is still a pretty simplified description of it., page and component definitions can actually be generic here, everything gets generated based off JSON Schema definitions that describe said pages and components. But that whole setup ended up being way to complex, tbh!

This is also interesting to me / us btw, because we're currently thinking about adding Astro starters in addition to our existing Next.js based ones.

ascorbic · 2024-07-18T10:01:08Z

I think that a lot of what @jlengstorf and @julrich are suggesting in terms of projections and resolvers could be achieved with the zod schema, particularly with things like coerce and transform. Every entry should be resolved using the schema, so these sort of transforms will be applied at that point. If you omit any fields from the schema then those will not be present in the data either.

julrich · 2024-07-18T10:32:36Z

I think you're right @ascorbic. Really need to further dig into Zod, but it sure looks like it could do a lot of the heavy lifting!

lorenzolewis · 2024-07-20T10:52:21Z

Are there any restrictions around watcher that is mentioned in the original issue? In the impl PR the mentioned type is an FSWatcher. But would it be possible to implement a watcher outside of the FS?

For example, if I'm using a future Astro Studio connector, I would like to be able to update values in the Astro Studio web UI and then have local process watching for changes and trigger a local dev preview update without me needing to refresh the page.

This may be possible already with what's created or in mind, but didn't seem to be the case explicitly.

rambleraptor · 2024-07-26T21:06:53Z

Will file types still be determined by extensions? I've got a bunch of .md.j2 files that I would love to be recognized as basic md files.

It looks like Content Layer is taking in a glob of files, so the only way to determine file type would be extension.

ascorbic · 2024-07-28T08:38:22Z

@lorenzolewis The watcher is optional, and you can handle watching however you want. I'm hoping to expose hooks to integrations to enable them to do updates in their own way.

ascorbic · 2024-07-28T08:39:08Z

@rambleraptor yes, currently that's how it's handled. I'd be interested in ideas for how they could be specified otherwise.

dustinlacewell · 2024-07-28T20:37:37Z

Today content collection entries have a render method. This makes it annoying to simply pass entries to framework components. Instead the render method should just be a library function taking a content entry.

That's to say, collection entries should be serializable by default.

Chinoman10 · 2024-07-28T23:27:46Z

Pardon for my question, and I don't mean to cause any pressure, but do we have any estimate on when we might be able to see an experimental release?
Me and my team are about to migrate a pretty large website with dozens of thousands of pages from Jekyll to Astro, and being able to define a collection that loads from an API (Strapi, Instagram, etc.) would be absolutely amazing for us. We actually already started laying the ground work, but I'm hoping that by the time we are finished creating the components, sections, layouts and styles, we'll already have an exp. build done that we can use to populate the page/content generation...

Our current flow with Jekyll is: Excel Spreadsheet with hundreds of lines and many dozen columns each -> macro that generates YAML and JSON files -> copies (& transforms) the images to the right folders too -> we ingest everything into the project and load it all up with a mix of Liquid (Jekyll's language) and JS... since we started building this jekyll site back in the day, we've already developed almost 10 other websites with Astro and it's been a blessing, so we can't wait for this migration...

Edit: I see we're gonna get a discussion going tomorrow! Exciting. I'll def. try and attend. I see there's a Draft PR already open too... so if I'd had to take a wild guess, in about 2-3 weeks or a month at most we should be seeing an experimental release 😍

rambleraptor · 2024-07-31T00:10:40Z

@rambleraptor yes, currently that's how it's handled. I'd be interested in ideas for how they could be specified otherwise.

I haven't dug into the code at all to support this idea. Right now, it looks like data takes in an array of File objects. We could potentially create a Markdown object, JSON object, etc that has a reference to an underlying file. If an array object is a normal File, the extension check occurs. Otherwise, it defaults to whatever object type the user specified.

ascorbic · 2024-07-31T07:48:37Z

@rambleraptor the file list comes from the filesystem glob, so there's no realistic way to know automatically what type it is except via the file extension. It would need to be configured somewhere as an additional extension for that type.

rambleraptor · 2024-08-01T05:51:55Z

@rambleraptor the file list comes from the filesystem glob, so there's no realistic way to know automatically what type it is except via the file extension. It would need to be configured somewhere as an additional extension for that type.

Got it! That makes a lot of sense. Something I've been thinking about is creating a loader that acts as a preprocessor (could take a md.j2 file, do some preprocessing, return a .md file)

One way or another, I wouldn't call this a major use case. I think the existing API could allow for this use case

ascorbic mentioned this issue Jun 26, 2024

feat: add Content Layer loader withastro/astro#11334

Merged

ascorbic mentioned this issue Jun 27, 2024

Content Layer withastro/astro#11360

Draft

ascorbic mentioned this issue Jul 23, 2024

Content Layer #982

Draft

Content Layer #946

Content Layer #946

Comments

matthewp commented Jun 7, 2024

Summary

Background & Motivation

Goals

Example

florian-lefebvre commented Jun 8, 2024

lloydjatkinson commented Jun 12, 2024

xavdid commented Jun 12, 2024

JacobNWolf commented Jun 12, 2024

rambleraptor commented Jun 12, 2024

ashhitch commented Jun 13, 2024

brian-montgomery commented Jun 13, 2024

ascorbic commented Jun 13, 2024

NuroDev commented Jun 13, 2024

ascorbic commented Jun 13, 2024

stefanprobst commented Jun 14, 2024

louiss0 commented Jun 14, 2024

reasonadmin commented Jun 19, 2024

ascorbic commented Jun 20, 2024 • edited Loading

reasonadmin commented Jun 20, 2024

ArmandPhilippot commented Jun 22, 2024

Rich query API

Sub-collections / Nested data-types

Idea

Example

wassfila commented Jun 23, 2024 • edited Loading

ascorbic commented Jun 26, 2024

lloydjatkinson commented Jun 26, 2024

ematipico commented Jun 27, 2024

lloydjatkinson commented Jun 27, 2024 • edited Loading

matthewp commented Jun 27, 2024

stefanprobst commented Jun 28, 2024

jlengstorf commented Jul 5, 2024 • edited Loading

Example Use Case

Idea 1: Explicit API for creating relationships

Idea 2: Joins and an optional projection API

julrich commented Jul 18, 2024

ascorbic commented Jul 18, 2024

julrich commented Jul 18, 2024

lorenzolewis commented Jul 20, 2024

rambleraptor commented Jul 26, 2024

ascorbic commented Jul 28, 2024

ascorbic commented Jul 28, 2024

dustinlacewell commented Jul 28, 2024

Chinoman10 commented Jul 28, 2024 • edited Loading

rambleraptor commented Jul 31, 2024

ascorbic commented Jul 31, 2024

rambleraptor commented Aug 1, 2024

ascorbic commented Jun 20, 2024 •

edited

Loading

wassfila commented Jun 23, 2024 •

edited

Loading

lloydjatkinson commented Jun 27, 2024 •

edited

Loading

jlengstorf commented Jul 5, 2024 •

edited

Loading

Chinoman10 commented Jul 28, 2024 •

edited

Loading