Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Migrate volcano-sh/scheduler into volcano-sh/volcano #241

Closed
k82cn opened this issue Jun 20, 2019 · 20 comments
Closed

[Discussion] Migrate volcano-sh/scheduler into volcano-sh/volcano #241

k82cn opened this issue Jun 20, 2019 · 20 comments
Labels
area/controllers area/scheduling kind/feature Categorizes issue or PR as related to a new feature. priority/high

Comments

@k82cn
Copy link
Member

k82cn commented Jun 20, 2019

Is this a BUG REPORT or FEATURE REQUEST?:

/kind feature

Description:

Propose to migrate volcano-sh/scheduler into volcano-sh/volcano, as

  1. Several features need to enhance both repo, e.g. DelayPodCreation
  2. When there's fix in volcano-sh/scheduler, it has to be cherry-pick to volcano-sh/volcano; if any missing, there maybe quality issue.
  3. We also need to re-submit the scheduler part to kube-batch which means we have to maintain 3 repos :(

After the migration, we will

  1. make sure adding notes on the scheduler part is built based on kube-batch to follow the license requirement
  2. make sure the scheduler part use the same api with kube-batch
  3. any others?
@k82cn
Copy link
Member Author

k82cn commented Jun 20, 2019

@k82cn
Copy link
Member Author

k82cn commented Jun 20, 2019

If any comments/suggestions, please let me know :)

@k82cn k82cn added kind/feature Categorizes issue or PR as related to a new feature. priority/high area/controllers area/scheduling labels Jun 20, 2019
@gaocegege
Copy link
Contributor

SGTM I have a question here:

How to migrate scheduler to the repo? Are we using git submodule or move the code in scheduler to volcano directly?

@k82cn
Copy link
Member Author

k82cn commented Jun 21, 2019

Are we using git submodule or move the code in scheduler to volcano directly?

Both are OK to me. My original idea is to move the code to volcano directly which is easier to manage and when we donate volcano to CNCF or Kuberentes, it's also easier to migrate (but we may lost the commit history of scheduler). If using git submodule, maybe volcano-sh/controllers, volcano-sh/schedulers and volcano-sh/apis and so on; we can work together on volcano-sh/apis part to make sure related projects are on the same page :)

@gaocegege
Copy link
Contributor

gaocegege commented Jun 21, 2019

@k82cn Now scheduler is a fork of kube-batch. If we move it to this repo, it may be hard to rebase the upstream, IMO. I am not sure about the relationship between the upstream kubebatchd and volcano scheduler. If we do not need to rebase the upstream, then both ways work.

@k82cn
Copy link
Member Author

k82cn commented Jun 21, 2019

I am not sure about the relationship between the upstream kubebatchd and volcano scheduler.

Almost the same, one minor different because of release cycles. Some features require interaction
between controller & scheduler, the scheduler part will be migrated to kube-batch because of
its scope; and it'll make sure the scheduler part will NOT bind to volcano job.

If we do not need to rebase the upstream, then both ways work.

we do rebase manually right now :)

@gaocegege
Copy link
Contributor

Then both work for me

@hex108
Copy link
Contributor

hex108 commented Jun 21, 2019

Some questions :)

Several features need to enhance both repo, e.g. DelayPodCreation

Now we need both modify code of volcano-sh/scheduler and volcano-sh/volcano. After migratingvolcano-sh/scheduler into volcano-sh/volcano, we also need to modify code of volcano-sh/scheduler and volcano-sh/volcano related code, in different directories instead of different repos. Ah, is there any major difference?

When there's fix in volcano-sh/scheduler, it has to be cherry-pick to volcano-sh/volcano; if any missing, there maybe quality issue.

Most code of volcano-sh/scheduler and volcano-sh/volcano are independent. Are there many code need be cherry-picked?

@k82cn
Copy link
Member Author

k82cn commented Jun 21, 2019

we also need to modify code of volcano-sh/scheduler and volcano-sh/volcano related code, in different directories instead of different repos. Ah, is there any major difference?

For now, volcano includes scheduler as vendor for release and e2e test; so every PR in scheduler are cherry-picked into volcano-sh/volcano. If any interaction, we need to review PR in scheduler, bump into volcano, review PR for other part in volcano. Another option is git submodules as Ce suggested; but the PRs has to be reviewed in different repos.

/cc @mrbobbytables @jeefy , who're familiar with k8s's process , may give some suggestions :)

@hex108
Copy link
Contributor

hex108 commented Jun 21, 2019

LGTM

gi submodules is a little tricky.

@jeefy
Copy link
Contributor

jeefy commented Jun 25, 2019

I might be missing the full picture, so I'm sorry. :(

I feel like cherry-picking upstream commits into v/scheduler is the wrong choice. Also, I feel like until Volcano has a permanent home (ie. CNCF donation) the scheduler code should remain in k-sigs/kube-batch.

Is there a technical or a licensing issue vendoring k-sigs/kube-batch?

@k82cn
Copy link
Member Author

k82cn commented Jun 26, 2019

Is there a technical or a licensing issue vendoring k-sigs/kube-batch?

The issue is that we're going to modify v/scheduler for Volcano release; and the release cycle of volcano & kube-batch maybe different.

For now, we fork kube-batch as v/scheduler which takes lots of work to update vendor (from v/scheduler to v/volcano); so I open this discussion to see how to reduce such kind of effort.

@kevin-wangzefeng
Copy link
Member

Sorry for the late reply, thought I summitted comments before.

I can see some benefits of hosting the scheduler code in tree, but we need to be careful to make sure that code changes happen in two places trackable and easy to sync in bidirectional way.

Manually copy files or cherry-picking commits in to the tree would make history massive, which is not recommanded.

I've created an exmaple PR (#264) show how things look like if we decided to host scheduler code in-tree. The code is checked in by scripts (checkout this for details), and we can use similar commands to sync changes back to upstream if necessary.

@mrbobbytables
Copy link

With regard to the kubernetes processes, the general goal is to establish a single source of truth. For items that aren't managed in their own repo, they tend to be handled via the staging directory. It serves as the source of truth for a slew of repos that are updated via the publishing bot.

The issue is that we're going to modify v/scheduler for Volcano release; and the release cycle of volcano & kube-batch maybe different.

I can't speak to the differences or upcoming changes between the v/scheduler and kube-batch, but it seems like a good goal to try and bring those in line to reference kube-batch itself as the single source of truth (at least for scheduling related items), and pulled in via vendor. As it's a sub-project, it doesn't have to adhere to the standard kubernetes release cycle and should generally be able to align with a cadence that is usable by volcano or other projects. If the releases are hard to manage, could also reference a specific commit after the needed feature(s) are merged.

If v/scheduler is going to diverge a fair amount and become more tightly coupled to volcano I'd lean towards @kevin-wangzefeng suggestion, git submodule (@gaocegege suggestion) or if the in-tree code should be the source of truth -- publishing bot. Folks touching the code in the scheduler sub-directory should be cognizant that the code will (may) be pushed upstream and they should stage their commits wisely for easier import.

@k82cn
Copy link
Member Author

k82cn commented Jun 29, 2019

If v/scheduler is going to diverge a fair amount and become more tightly coupled to volcano

That's the reason I open this discussion; and seems kevin-wangzefeng@ suggestion is simpler to other contributors.

@k82cn
Copy link
Member Author

k82cn commented Jul 3, 2019

If no objection, we'd like to follow kevin-wangzefeng@ suggestion to make process simpler for other contributors :)

@kevin-wangzefeng
Copy link
Member

kevin-wangzefeng commented Jul 4, 2019

To summarize:

  • To manage code in-tree for better daliy development experience:

  • To follow licensing compliance:

    • Add description in the main repo readme, to clarify the scheduler code copyright (major requirement of Apache 2.0 License) -- We can do it once scheduler code is in.
    • Integrate fossa to project CI, make sure the whole project is compliant with license requirements from its dependencies -- We can do this in parallel, it's acutally not depending on whether we decide to manage scheduler code in-tree or not.

We can timebox lazy consensus to this Friday 23:59 Beijing Time.

@k82cn
Copy link
Member Author

k82cn commented Jul 8, 2019

@asifdxtreme , please help on "Integrate fossa to project CI,"

@k82cn
Copy link
Member Author

k82cn commented Jul 30, 2019

/close

All tasks are done.

@volcano-sh-bot
Copy link
Contributor

@k82cn: Closing this issue.

In response to this:

/close

All tasks are done.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controllers area/scheduling kind/feature Categorizes issue or PR as related to a new feature. priority/high
Projects
None yet
Development

No branches or pull requests

7 participants