Skip to content

Conversation

@Narpimers
Copy link

No description provided.

@yunchen4 yunchen4 self-assigned this May 29, 2025
Copy link

@yunchen4 yunchen4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ilias,

There is a big point that you need to think again and rework. It's about the relationship between authors and papers. Is it one-to-one, one-to-many, or many-to-many?
Please let me know on Slack when you finish the rework. Don't hesitate to contact me if you have any questions!

CREATE TABLE IF NOT EXISTS authors (
author_id INT AUTO_INCREMENT PRIMARY KEY,
author_name VARCHAR(50) NOT NULL,
university TEXT,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: TEXT is for long text. VARCHAR should be enough for university name.

university TEXT,
date_of_birth DATE NOT NULL,
h_index INT NOT NULL,
paper_id INT,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need rework: if an author write several papers, is paper_id a good idea to have in the author table?

You can re-think about the relationship between papers and authors.

@Narpimers
Copy link
Author

@yunchen4 Thank you for your feedback! I’ve made changes based on your suggestions and added an additional join table between authors and papers. I hope this is a good solution for handling cases where one author has multiple research publications.

Copy link

@yunchen4 yunchen4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ilias,

Your new table schema is correct. However you also need to change your solutions for aggregating functions to use the new relationship table. I have made them explicit.
Please let me know if you have any questions!

paper_id INT AUTO_INCREMENT PRIMARY KEY,
paper_title VARCHAR(100) NOT NULL,
conference TEXT,
author_id INT,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have a new table for the relationship between authors and papers, you don't need author_id column in paper table, as one paper can be written by several authors.


export const joins = async() => {
connection.query("SELECT author_name, mentor FROM authors");
connection.query("SELECT authors.author_name, research_papers.paper_title FROM authors LEFT JOIN research_papers ON authors.author_id = research_papers. author_id");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rework: what if a paper has several different authors? You should use the relationship table.



export const aggreg = async() => {
connection.query("SELECT paper_title, COUNT(author_id) AS author_count FROM research_papers GROUP BY paper_title");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs rework: based on the current table schema, you always get one author per paper, as author_id in paper table accepts only one int. You need to use the relationship table.

connection.query("SELECT paper_title, COUNT(author_id) AS author_count FROM research_papers GROUP BY paper_title");
connection.query("SELECT COUNT(research_papers.paper_id) FROM research_papers INNER JOIN authors on authors.author_id = research_papers.author_id WHERE authors.gender = 'female'");
connection.query("SELECT university, FROM authors GROUP BY university");
connection.query("SELECT university, COUNT(paper_id) FROM authors GROUP BY university");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection.query("SELECT university, COUNT(paper_id) FROM authors GROUP BY university");
Needs rework: you have changed your table schemas to have the relationship table, and there's no paper_id column in authors. So you have to write this query using the relationship table. And be careful: what if several authors from the same university are writing the same paper? Will the same paper be counted multiple times?


export const aggreg = async() => {
connection.query("SELECT paper_title, COUNT(author_id) AS author_count FROM research_papers GROUP BY paper_title");
connection.query("SELECT COUNT(research_papers.paper_id) FROM research_papers INNER JOIN authors on authors.author_id = research_papers.author_id WHERE authors.gender = 'female'");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection.query("SELECT COUNT(research_papers.paper_id) FROM research_papers INNER JOIN authors on authors.author_id = research_papers.author_id WHERE authors.gender = 'female'");
Needs rework: same here, you need to use the relationship table. And be careful: what if several female authors are writing the same paper? Will the same paper be counted multiple times?

Comment on lines 9 to 10
export const relationships = async() => {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw you export those functions but I didn't see you use them?

Copy link

@yunchen4 yunchen4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ilias,
I left some comments for the aggregate problems. Overall you're doing them correctly, but there are some points you need to check again and be aware of.
I will approve your PR.


export const aggreg = async() => {
connection.query("SELECT rp.paper_id,rp.paper_title, COUNT(ap.author_id) AS number_of_authors FROM research_papers rp LEFT JOIN author_papers ap ON rp.paper_id = ap.paper_id GROUP BY rp.paper_id, rp.paper_title");
connection.query("SELECT COUNT(ap.paper_id) AS total_papers_by_female_authors FROM authors a JOIN author_papers ap ON a.author_id = ap.author_id a.gender = 'female';");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please notice this query has syntax error: "SELECT COUNT(ap.paper_id) AS total_papers_by_female_authors FROM authors a JOIN author_papers ap ON a.author_id = ap.author_id a.gender = 'female';"

You're missing where.

And notice that you will count the same paper for multiple times if several female authors write this paper. If you don't want to count duplicated paper, you need to use DISTINCT.

connection.query("SELECT rp.paper_id,rp.paper_title, COUNT(ap.author_id) AS number_of_authors FROM research_papers rp LEFT JOIN author_papers ap ON rp.paper_id = ap.paper_id GROUP BY rp.paper_id, rp.paper_title");
connection.query("SELECT COUNT(ap.paper_id) AS total_papers_by_female_authors FROM authors a JOIN author_papers ap ON a.author_id = ap.author_id a.gender = 'female';");
connection.query("SELECT AVG(h_index) AS avg_h_index FROM authors GROUP BY university");
connection.query("SELECT a.university, COUNT(ap.paper_id) AS total_papers FROM authors a JOIN author_papers ap ON a.author_id = ap.author_id GROUP BY a.university");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, if several authors from the same university write the same paper, this paper will be counted multiple times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants