Coding Period Week 4 & 5

Posted Jun 30, 2024

By Swastik Sharma 5 min read

Week 4 & 5 progess (June 17 - June 30)

These two weeks were crucial for the project as the mid-term evaluation approached, and we needed to cover most of the work. We held a special meeting in addition to our weekly status call, where my mentors and I discussed the project in a more detailed and thorough manner.

After receiving valuable feedback from my mentors on my PR, I began working on resolving the issues.

Weekly Status Call (17th June)

Discussion

Following the implementation of the Post Scan Plugin, the agenda of this meeting focused on incorporating the license_clarity_score attribute into the codebase scan for each individual package.

Implemented this with changes in the package_plugin side.

Firstly moved the --package-summary post scan plugin from summarycode/package_summary.py in the packagedcode/package_plugin.py where the main scan_plugins like package, system-package, package-only were there as normal scan options.

Changes in setup.cfg

  
 scancode_post_scan =
   package_summary = packagedcode.plugin_package:PackageSummary
   summary = summarycode.summarizer:ScanSummary
   tallies = summarycode.tallies:Tallies
   tallies-with-details = summarycode.tallies:TalliesWithDetails

Changes in plugin_package.py

  
   from commoncode.cliutils import SCAN_GROUP, POST_SCAN_GROUP
   from plugincode.post_scan import post_scan_impl
   from plugincode.post_scan import PostScanPlugin
   ---------Previous-imports------------------------------------------
   @post_scan_impl
   class PackageSummary(PostScanPlugin):
       """
       Summary at the Package Level.
       """
   options = [
       PluggableCommandLineOption(('--package-summary',),
       is_flag=True, default=False,
       help='Generate Package Level summary',
       help_group=POST_SCAN_GROUP)
   ]
   def is_enabled(self, package_summary, **kwargs):
   return package_summary

   def process_codebase(self, codebase, package_summary, **kwargs):
   """
   Process the codebase.
   """
   if not self.is_enabled(package_summary):
       return

Also renaming the attribute name from license_clarity to license_clarity_score.
Regenerating tests in accordance with the changes ensures the code remains consistent and everything runs smoothly.

Checkout the commits- https://github.com/nexB/scancode-toolkit/pull/3792/commits

Attendees

GSoC Sync Call (21st June)

Discussion

This was a special call arranged by my mentors to gain more clarity on the progress and how the project should meet its milestones.

Additionally, we discussed how the license_clarity_score field would be included in the Package instance following the use of the postscan plugin.

Major implentation in the PackageScanner (Class that Scans a Resource for Package data and report these as “package_data” at the file level. Then create “packages” from these “package_data” at the top level.)

In the process codebase method of this class, there was need to include the package_summary flag as input so to have check if package-summary plugin is enabled or not.

  
 def process_codebase(self, codebase, strip_root=False, package_only=False, package_summary=False, **kwargs):
      """
      Populate the ``codebase`` top level ``packages`` and ``dependencies``
      with package and dependency instances, assembling parsed package data
      from one or more datafiles as needed.

      Also perform additional package license detection that depends on either
      file license detection or the package detections.
      """
      # If we only want purls, we want to skip both the package
      # assembly and the extra package license detection steps
      if package_only:
          return

      has_licenses = hasattr(codebase.root, 'license_detections')

      # These steps add proper license detections to package_data and hence
      # this is performed before top level packages creation
      for resource in codebase.walk(topdown=False):
          # populate `from_file` attribute in matches
          for package_data in resource.package_data:
              for detection in package_data['license_detections']:
                  populate_matches_with_path(
                      matches=detection['matches'],
                      path=resource.path,
                  )

              for detection in package_data['other_license_detections']:
                  populate_matches_with_path(
                      matches=detection['matches'],
                      path=resource.path,
                  )

Passing the package_summary flag to the create_package_and_deps func that creates and save top-level Package and Dependency from the parsed package data present in the codebase.

  
     # Create codebase-level packages and dependencies
     create_package_and_deps(codebase, package_summary, strip_root=strip_root, **kwargs)
     #raise Exception()

Updates in the create_package_and_deps func to have the license_clarity_score in the package attribute when plugin was used.

  
 def create_package_and_deps(codebase, package_summary= False ,package_adder=add_to_package, strip_root=False, **kwargs):
 """
 Create and save top-level Package and Dependency from the parsed
 package data present in the codebase.
 """
 packages, dependencies = get_package_and_deps(
     codebase,
     package_adder=package_adder,
     strip_root=strip_root,
     **kwargs
 )
 codebase.attributes.packages.extend(
     package.to_dict(package_summary= package_summary)
     for package in packages
 )
 codebase.attributes.dependencies.extend(dep.to_dict() for dep in dependencies)

To make sure the Package Model had this attribute, updates in models.py

With the default value as False for the package_summary, the to_dict method returns the package details with license_clarity_score when only the plugin was used.

  
 def to_dict(self, package_summary= False):
  data = super().to_dict(with_details=False)
  if not package_summary:
      data.pop("license_clarity_score")
  return data

Attendees

- [Ayan Sinha Mahapatra](https://github.com/AyanSinhaMahapatra)
- [Avishrant Sharma](https://github.com/AvishrantSsh)
- [Swastik Sharma](https://github.com/swastkk)

Weekly Status Call (24th June)

Discussion

Major point that was dicussed in this meet, was to include the major changes that was there in scancode-toolkit develop branch to my Pull Request

Some of the tests were failing after those commits and needed to be regenerated. I regenerated the tests and reviewed the git diff to ensure that only the changes related to the PR and commit were included.

Also recived a suggestion to have the license_clarity_score to be stored as dict and not list

Also this week, we had a small test addition in the test_plugin_package-

  
    # just for test purpose, used change-case example
    def test_plugin_package_with_package_summary(self):
        test_dir = self.get_test_loc('package_summary/change-case-change-case-5.4.4.zip-extract') 
        result_file = self.get_temp_file('json')
        expected_file = self.get_test_loc('package_summary/expected.json')

        run_scan_click(['--package', '--strip-root', '--processes', '-2','--package-summary','--json-pp', result_file, test_dir])
        check_json_scan(expected_file, result_file, remove_uuid=True, remove_file_date=True, regen=REGEN_TEST_FIXTURES)

With the dir structure as follows-

.
└── change-case-change-case-5.4.4
    ├── LICENSE
    ├── README.md
    ├── SECURITY.md
    ├── package-lock.json
    ├── package.json
    ├── packages
    │   ├── change-case
    │   │   ├── README.md
    │   │   ├── package.json
    │   │   ├── src
    │   │   └── tsconfig.json
    │   ├── sponge-case
    │   │   ├── README.md
    │   │   ├── package.json
    │   │   ├── src
    │   │   └── tsconfig.json
    │   ├── swap-case
    │   │   ├── README.md
    │   │   ├── package.json
    │   │   ├── src
    │   │   └── tsconfig.json
    │   └── title-case
    │       ├── README.md
    │       ├── package.json
    │       ├── src
    │       └── tsconfig.json
    └── tsconfig.base.json

This type of example was chosen because it contains 4 packages, making it easy to run tests and providing more comprehensive information.

Attendees

Visitor Count

python, package, summary, tests, assembly

gsoc open-source

This post is licensed under CC BY 4.0 by the author.

Week 4 & 5 progess (June 17 - June 30)

Weekly Status Call (17th June)

Discussion

Attendees

GSoC Sync Call (21st June)

Discussion

Attendees

Weekly Status Call (24th June)

Discussion

Attendees

Trending Tags