Skip to content
Advertisement

Mapping a Star Schema with Spring Boot JPA

I’m struggling to identify the right annotations to use to map a star schema with Spring Boot JPA.

Let’s consider 3 tables:

DIM_One (1)——–(1..n) FACT (1..n) ———— (1) DIM_Two

DIM_One and DIM_Two both have an id that is their primary key. FACT’s primary key is the combination of (DIM_One_pk, DIM_Two_pk)

For now, the annotations in my DIM tables are similar to :

@Table(name="DIM_One")
@Entity
@Getter
@ToString
public class One {

    @Id
    @Column(name = "dim_one_id")
    private UUID id;

    //...
}

As for the FACT table, I have :

@Entity
@Table(name = "FACT")
@ToString
@Getter
public class Fact {
    @EmbeddedId
    private FactId id;

    //...

}

with the corresponding FactId class :

@Embeddable
@Getter
@EqualsAndHashCode
public class FactId implements Serializable {
    private One one;
    private Two two;
}

I feel a bit lost with the right annotations I would need to use to make it correspond to the cardinality:

DIM_One (1)——–(1..n) FACT (1..n) ———— (1) Dim_Two

Furthermore, should it actually be mapped as OneToMany or OneToOne ?

Advertisement

Answer

Your diagram shows the (1..n)—(1) relationship so it should be mapped like this.

Other then that you need to think about how you want to use this:

  • If loading a fact, do you want to load the associated dimension entries? This leads to the decision between eager and lazy loading.
  • Do you want to be able to navigate from fact to dimension or the other way round? Or both? This leads to the decision about directionality.
  • If you persist, delete … a fact should dimensions join in that operation? => cascade configuration.

Note: While in principle this should work without major problem, since a star schema is still just a bunch of tables it sounds like a really bad idea.

Star schema are used for large amounts of data and are highly denormalised in order to optimise for reads and aggregations. This means updates typically hit from a few hundred rows to many thousands, possibly millions. JPA is not build for this kind of operation and will perform horrible compared to specifically taylored SQL statements.

On the read side you’ll constantly operate with aggregate functions and probably windowing functions with non trivial expressions. JPQL, the query language of JPA again is not build for this and will severely limit your options.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement